Interview Query

10 Data Engineer SQL Questions

Overview

SQL is a must-have skill for data engineers. They use the querying language to perform essential tasks like modeling data, extracting performance metrics, and developing reusable data structures.

Data engineer SQL questions tend to mirror the work that engineers do.

Therefore, data engineers need to be proficient not just in querying data and pulling metrics, but also in data structures, manipulation and security within SQL. Broadly, a data engineer may face SQL questions in these categories:

  • SQL queries - Using SQL data query language (DQL) statements to pull metrics and analyze data. Commands to know: SELECT
  • Data modeling - Using DDL commands to create database schema and define data structures. Commands to know: CREATE, ALTER, DROP, RENAME, TRUNCATE, COMMENT
  • Data manipulation - Using DML statements to retrieve and manipulate data. Commands to know: INSERT, UPDATE, DELETE, MERGE, CALL, EXPLAIN PLAN, LOCK TABLE
  • Data security - Using DCL (data control language) commands to manage database security. Commands to know: GRANT, REVOKE

This data engineering SQL questions guide provides an overview of the types of questions you might face, as well as example data engineer SQL questions to help you prepare for your interview.

Topics Covered in SQL Interviews for Data Engineers

SQL interview questions can vary from short-response explanations and definitions, to writing intermediate-to-advanced SQL queries. Here are the most frequently asked SQL topics:

  • Database design - These questions assess your understanding of data architecture and design. Typically, they ask you to design databases from scratch, and include SQL data manipulation and data definition statements.
  • ETL SQL questions - A frequently asked topic in data engineering interviews, these questions ask you to perform ETL tasks, e.g. extracting, transforming and loading data. These questions require proficiency in DDL and DML statements.
  • Definition-based SQL questions - Short basic SQL questions like “What’s the difference between WHERE and HAVING?”
  • Reporting and metrics - Writing queries to pull metrics. These questions will have a clearly defined metric to pull, and they can range from beginner to advanced.

Data Engineer Database Design SQL Questions

Database design questions are common in SQL interviews for data engineering roles. You should be prepared to design the database schema for an application. One step further, you might be asked to write a query to produce a metric, based on the schema you proposed.

Some database design concepts to study for data engineering interviews include:

  • Entity-relationship modeling
  • Normalization and denormalization trade-offs
  • Dimensional modeling

Q1. Design a database for a stand-alone fast food restaurant.

With this fast food restaurant database design question, expect to be asked to write queries for the data. In this question, you’re asked to write a SQL query for the top three items by revenue and the percentage of customers who order drinks with their meals.

Q2. Design a database for traffic on the Golden Gate Bridge.

This crossing bridges database design question asks you for some specifics, as well:

  • What would the table schema look like if we wanted to track how long each car took coming into San Francisco to enter and exit the bridge?
  • Write a query on the given tables to get the time of the fastest car on the current day.

Q3. How would you create a schema to represent client click data on the web?

This click data schema question is more architecture-based and assesses experience within developing databases, setting up architectures, and in this case, representing client-side tracking in the form of clicks. A simple but effective design schema would be to first represent each action with a specific label. In this case assigning each click event a name or label describing it’s specific action.

SQL ETL Questions for Data Engineers

ETL questions are one of the most important topics covered in data engineering interviews, and a broad range of concepts get covered. You might get more definition/theory questions, as well as code writing tasks.

In particular, you want to focus on:

  • DDL, DML and DCL keywords
  • JOINs
  • SQL constraints
  • Transactions
  • Views
  • Indexes
  • Triggers
  • Variables
  • Query optimization

Q1. Write a query to get the current salary data for each employee.

This ETL error SQL question provides you with a table representing company payroll schema. Due to an ETL error, the employees table isn’t properly updating salaries, but instead is doing an insert when performing compensation adjustments.

Q2. We want to create an aggregate table with the song count by date for each user. Write a SQL query that would make this ETL each day.

With this lifetime plays SQL question, we have a table called song_plays that tracks each time a user plays a song. For this problem, we use the INSERT INTO keywords to add rows into the lifetime_plays table. If we set this query to run daily, it becomes a daily extract, transform, and load (ETL) process.

Example SQL Definitions Questions for Data Engineers

You’ll likely face SQL definition-based questions early in the interview process, because they’re used to quickly assess your skill level. For this type of question, it’s best to study definitions and learn how to define what’s going on in layman’s terms.

Although a range of topics come up, be sure you can quickly define:

  • Common table expressions
  • Window functions
  • CASE and PIVOT
  • WITH, UNION and HAVING
  • Aggregate case statements
  • Transitive closure
  • Query plans
  • Dialect differences

Q1. What are aggregate functions in SQL?

Aggregate functions in SQL are functions where values from multiple rows are merged to form a single value. Some of the most common ones are COUNT, MIN, MAX, SUM and AV

Q2. What is an index in SQL? When would you use an index?

Indexes are lookup tables used by the database to perform data retrieval more efficiently. An index can be used to speed up SELECT or WHERE clauses, but they do slow down UPDATE and INSERT statements.

Q3. What’s the difference between IN and BETWEEN?

In SQL, BETWEEN is used to test if a particular expression lies between a range of value. It returns all the values that lie within the range. The IN operator, on the other hand, tests whether the values expressed match those in a specific list.

SQL Analytics Problems for Data Engineers

Practice intermediate to advanced SQL reporting problems to prepare. These questions are usually clearly defined, e.g. write a query to return X metric, and they assess your ability to quickly write clean code.

This type of SQL question is so common because data engineers are tasked with building tables, ETLs, and aggregation views for visualizations and dashboards. You might also get an SQL case study question.

To practice, you should focus on:

  • Complex joins
  • Sub-queries
  • Self-joins
  • Window functions
  • Aggregations

Q1. Write a query to compute a metric to measure the quality of the search results for each query.

With this search ratings SQL problem, you’re given a table that represents search results from searches on Facebook. The query column is the search term, position column represents each position the search result came in, and the rating column represents the human rating of the search result from 1 to 5 where 5 is high relevance and 1 is low relevance.

Q2. Write a query to return data to support or disprove the hypothesis that CTR is dependent on search rating.

This search click-through-rate SQL problem,

You’re given a table that represents search results from searches on Facebook. The query column is the search term, position column represents each position the search result came in, and the rating column represents the human rating from 1 to 5 where 5 is high relevance and 1 is low relevance.

More Data Engineer Interview Resources

Getting ready for a data engineering interview? Check out these resources from Interview Query: