Business Analyst SQL Interview Questions

Business Analyst SQL Interview Questions

Overview

Business analytics is one of the fastest-growing job sectors in the United States. According to the Bureau of Labor Statistics, hiring for business analysts is expected to grow by 14% over the next decade.

Typically, a business analyst’s SQL interview questions are designed to quickly assess your ability to pull metrics and process data with SQL. In general, you might have to whiteboard SQL queries or produce SQL code in a code editor.

What is SQL and how is SQL used by a Business Analyst?

SQL, which stands for a structured query language, is a programming language used to manipulate data in databases. SQL is useful to professionals when used in databases with sizable amounts of data. Business analysts often work with databases, and they can use SQL to retrieve data for reports and analysis.

As such, SQL skills can be valuable if you’re searching for a business analyst job. Once you get an interview, you may need to know how to answer SQL interview questions.

What kind of SQL questions do you need to know as a Business Analyst?

The core responsibility of a business analyst is to gather, clean, and analyze business data. This can include data such as revenue, sales, or customer engagement metrics. Therefore, business analysts should be able to interpret data, share findings, and make recommendations.

This requires in-depth knowledge of how to use SQL and how to perform statistical analysis. Business analysts should be able to generate reports, answer business problems, and answer questions like what would you do if you noticed a decline in revenue?

Do you need to know SQL to be a Business Analyst?

The answer to this question is subjective and depends on the need. Although a strong knowledge of SQL is helpful to have and would give you an advantage over the competition, it is not always mandatory.

According to this business analyst job listing analysis on Kaggle based on Glassdoor data, only 27% of business analyst job listings include SQL as a requirement. However, the need for SQL is dependent on company size, career experience, and a company’s ability to provide on-job SQL training.

In a nutshell, basic knowledge of SQL is required and advanced knowledge is usually only required at some companies and isn’t deemed an absolute necessity.

Business analyst SQL interview questions usually fall into three categories:

  • Easy SQL questions - These are definition-based questions that ask about SQL features, simple use cases, and comparisons between commands like WHERE and HAVING. They may also include simple queries.
  • Medium SQL questions - These questions ask you to write complex queries using joins, sub-queries, self-joins, and window functions. They may also include analytics case studies.
  • Hard SQL questions - These questions ask you to write advanced queries, including using indices and hard SQL clauses. They may also include more advanced analytics case studies.

Easy Business Analyst SQL Interview Questions

Business analyst interviews and technical screens generally start with beginner SQL questions. There are two main types of easy SQL questions:

  • Basic SQL queries - Simple SQL questions that require you to write a query. You might be asked to get the COUNT of a table, make a simple join, or use the HAVING clause.
  • Definitions - Less frequently encountered, these questions ask you to explain technical concepts, compare two or more SQL functions, or define how a concept is used.

Here are some easy business analyst SQL interview questions:

1. What is the difference between DELETE TABLE and TRUNCATE TABLE in SQL?

Although they are both used to delete data, a key difference is that DELETE is a database manipulation language command, whereas TRUNCATE is a data definition language command.

Therefore, DELETE removes specific data from a table but TRUNCATE removes all the rows of a table without maintaining the table’s structure. Another difference between the two is that DELETE can be used with the WHERE clause but TRUNCATE cannot. In this case, the DELETE TABLE would remove all the data from within the table while maintaining the structure. In contrast, TRUNCATE TABLE would delete the table in its entirety.

2. Write an SQL query to select all records of employees with last names between “Bailey” and “Frederick”.

For this question, assume the table is called Employees and the last name column is LastName. The query should look like this:

SELECT * FROM Employees WHERE LastName BETWEEN 'Bailey' AND 'Frederick' 

3. Write an SQL query to find the year from a YYYY-MM-DD date.

EXTRACT allows you to pull temporal data types like date, time, timestamp, and interval from the date and time values. If you wanted to find the year from 2022-03-22, you would write EXTRACT( FROM), as shown below:

SELECT EXTRACT(YEAR FROM DATE '2022-03-22') AS year;

4. Write an SQL query to select the second highest salary in the engineering department.

To answer this question, you need the name of the department to be associated with each employee in the employees table in order to understand which department each employee is a part of.

The department_id field in the employees table is associated with the “id” field in the departments table. You can call the department_id a foreign key because it is a column that references the primary key of another table, which in this case is the “id” field in the departments table.

Based on this shared field, you can join both tables using INNER JOIN to associate the department name to their employees as exemplified below:

  • SELECT salary
  • FROM employees
  • INNER JOIN departments
  • ON employees.department_id = departments.id

With the department name in place, you can now look at the employees of the engineering team and sort by salary to find the second highest paid.

5. Explain the different types of joins in SQL.

A JOIN is a clause in SQL that’s used to join rows from two or more tables based on a common column between the tables. This clause is used for merging tables and retrieving data. The most common types of JOIN commands include:

  • INNER JOIN- An INNER JOIN is the most common type of JOIN command and is used to return all rows from two or more tables when the JOIN condition is met.
  • LEFT JOIN - This type of JOIN command returns rows from the left table when a matching row from the right meets the JOIN condition.
  • RIGHT JOIN - This JOIN command is similar to a LEFT JOIN, but rows are returned instead from the right table when the JOIN condition on the left is met.
  • FULL JOIN - FULL JOIN returns all rows from the left and the right when there is a match in any of the tables.

6. What is the PRIMARY KEY in SQL?

PRIMARY KEY is a constraint that uniquely identifies each record. Notably, the constraint cannot have NULL values, and all values must be UNIQUE. A table can have only one PRIMARY KEY, but the PRIMARY KEY can consist of single or multiple columns.

7. What are constraints?

Constraints in SQL are rules that can be applied to the type of data in a table. They are used to limit the type of data that can be stored in a particular column within a table. Some common constraints are:

  • NOT NULL - This constraint prevents null values from being stored in a column.
  • UNIQUE - This constraint says that values in a column must be unique. PRIMARY KEY uses the UNIQUE constraint.
  • PRIMARY KEY - This constraint is used to specify which field is the primary key.
  • FOREIGN KEY - This constraint uniquely identifies a row in another table.

8. We’re given two tables, a users table with demographic information and the neighborhood they live in, and a neighborhoods table. Write a query that returns all of the neighborhoods that have 0 users.

users table

Columns Type
id INTEGER
name VARCHAR
neighborhood_id INTEGER
created_at DATETIME

neighborhoods table

Columns Type
id INTEGER
name VARCHAR
city_id INTEGER

Output:

Columns Type
name VARCHAR

Hint: The predicament here is finding all the neighborhoods without users. In a sense, you need all the neighborhoods that do not have a singular user living in them. This means you have to introduce a concept of existence of a column in one table but not in the other.

Intermediate Business Analyst SQL Questions

For mid-level business analyst roles, expect intermediate SQL questions to challenge your knowledge and skill. Medium SQL questions fall into two categories:

  • Complex SQL queries - Intermediate SQL questions ask you to perform joins, sub-queries, self-joins, and window functions.
  • SQL/Analytics case studies - Many intermediate questions take the form of case studies or ask you to perform analysis on the data you pull from a query.

Here are some intermediate business analyst SQL interview questions:

1 . Write a query to obtain the top five most expensive projects by the following criteria: budget to employee count ratio.

For more context, you are given two tables. One is named projects and the other maps employees to the projects they are working on. Exclude projects with zero employees. Assume each employee works on only one project.

To start, think about how to calculate the combined budget for each project and then determine what the budget per employee attached to the project equals. After that, you can think about how to rank these project values from most to least expensive to determine which are the most expensive.

2. Given a transactions table with date timestamps, sample every fourth row ordered by date.

Here’s a hint for this question to get you started: if you are sampling from this table and you want to specifically sample every fourth value, you will probably have to use a window function.

A general rule of thumb to follow is when a question states or asks for some Nth value (like the third purchase of each customer or the tenth notification sent), then a window function is the best option. Window functions allow you to use the RANK() or ROW_NUMBER() function to provide a numerical index based on a certain order.

3. Write a query to get the number of customers that were upsold by purchasing additional products.

For this problem, you are given a table of product purchases. Each row in the table represents an individual product purchase.

Note: If the customer purchased two things on the same day, that does not count as an upsell because they were purchased within a similar time frame. You are looking for a customer returning on a different date to purchase a product.

This question is slightly tricky because you have to note the dates that each user purchased products. You can’t just group by the user_id to find where the number of products purchased is greater than one because of the upsell condition.

You have to group by both the date field and the user_id to obtain each transaction broken out by day and user:

SELECT 
 user_id
 , DATE(created_at) AS date
FROM transactions
GROUP BY 1,2

The query above will now provide a user_id and date field for each row. If there is a duplicate user_id, then you know that the user purchased on multiple days, which satisfies the upsell condition. What comes next?

4. Write a query to support or disprove the hypothesis: clickthrough rate is dependent on search rating.

This question provides a table that represents search results on Facebook, including a query, a position, and a human rating.

5. Select the largest three departments with ten or more employees and rank them according to the percentage of employees making over $100,000.

In this problem, you are given two tables: an employees table and a departments table.

Example:

Input:

employees table

Columns Type
id INTEGER
first_name VARCHAR
last_name VARCHAR
salary INTEGER
department_id INTEGER

departments table

Columns Type
id INTEGER
name VARCHAR

Output:

Column Type
percentage_over_100k FLOAT
department_name VARCHAR
number_of_employees INTEGER

First, break down the question to understand what it’s asking. Specifically, break the question down into three clauses of conditions:

  • Top three departments by employee count.
  • Percent of employees making over $100,000 a year.
  • Departments with at least ten employees.

Then, think about how you would associate employees with their department, calculate and display the percentage of employees making over $100,000 a year, and order those results to provide an answer to the original question.

Hard Business Analyst SQL Questions

Advanced business analyst SQL interview questions are common for mid and senior-level roles, and they require you to write advanced SQL queries or work through complex logic-based case studies. The two main types of hard SQL questions are:

  • Advanced SQL writing - Writing queries to debug code, using indices to tune SQL queries, and using advanced SQL clauses.
  • Logic-based questions - More challenging analytics case studies or queries that first require you to solve a logic-based problem.

1. Write an SQL query that creates a cumulative distribution of the number of comments per user. Assume bin buckets class intervals of one.

To solve this cumulative distribution practice problem, you are given two tables, a users table, and a comments table.

frequency cumulative
0 10
1 25
2 27

2. Given a users table, write a query to get the cumulative number of new users added by day, with the total reset every month.

Input:

users table

Columns Type
id INTEGER
name VARCHAR
created_at DATETIME

Output:

DATE INTEGER
2020-01-01 5
2020-01-02 12
2020-02-01 8
2020-02-02 17
2020-02-03 23

Hint: This question first seems like it could be solved by just running a COUNT(*) and grouping by date or that maybe it’s just a regular cumulative distribution function. But it is important to notice that you are actually grouping by a specific interval of month and date, and when the next month comes around, you want to reset the count of the number of users.

3. Given a table of product subscriptions with a subscription start date and end date for each user, write a query that returns true or false whether or not each user has a subscription date range that overlaps with any other user.

Input:

subscriptions table

Column Type
user_id INTEGER
start_date DATETIME
end_date DATETIME
user_id start_date end_date
1 2019-01-01 2019-01-31
2 2019-01-15 2019-01-17
3 2019-01-29 2019-02-04
4 2019-02-05 2019-02-10

Output:

user_id overlap
1 1
2 1
3 1
4 0

Hint: Take a look at each of the conditions first and see how they could be triggered. Given two date ranges, what determines if the subscriptions would overlap?

4. Let’s say you have two tables, transactions, and products. Hypothetically, the transactions table consists of over a billion rows of purchases bought by users.

You are trying to find paired products that are often purchased together by the same user, such as wine and bottle openers, chips and beer, etc.

Write a query to find the top five paired products and their names.

Note: For the purposes of satisfying the test case, P1 should be the item that comes first in the alphabet.

Input:

transactions table

Column Type
id INTEGER
user_id INTEGER
created_at DATETIME
product_id INTEGER
quantity INTEGER

products table

Column Type
id INTEGER
name VARCHAR
price FLOAT

Output:

Column Type
p1 VARCHAR
p2 VARCHAR
qty INTEGER

Hint: To solve this question, you need to break it into several steps. First, you should find a way to select all the instances in which a user purchased two or more products at the same time. How can you use user_id and created_at to accomplish this?

5. Calculate the first-touch attribution for each user_id that converted.

The schema below is for a retail online shopping company consisting of two tables, attribution and user_sessions. Here are some details of the two tables:

The attribution table logs a session visit for each row. If the conversion is true, then the user converted to a purchase on that session.

The channel column represents which advertising platform the user was attributed to for that specific session. Lastly, the user_sessions table maps session visits back to one user from a single visit all the way up to several on the same day.

First-touch attribution is defined as the channel to which the converted user was associated when they first discovered the website. It is helpful to sketch out the attribution model for converting users:

  • 1st session - User sees Facebook ad -> Clicks to order -> Leaves.
  • 2nd session - User sees Google ad -> Leaves.
  • 3rd session - User types in website -> Clicks to order -> Purchases.

How do you figure out the beginning path of the Facebook ad and connect it to the end purchasing user?

You need to take two actions:

  1. Subset all of the users that converted to customers.
  2. Figure out their first session visit to attribute the actual channel.

You can do that by creating a subquery that only provides the distinct users that have actually converted.

In Google Data Analyst interviews, SQL questions make up a sizable part of the interview. Your preparation should include a solid study of the basics and SQL definitions, but you should also be comfortable with more complex queries and sub-queries.

One tip: ask the recruiter the types of questions that will come up in the interview. This will give you an idea of where to focus your study.

SQL is a common topic in business analyst interviews regardless of the job level. It is best to be familiar with the topic from basic syntax to complex queries and sub-queries. Excelling in this portion of the interview could make the difference between getting the job or being asked to look elsewhere.

Learn SQL Fundamentals in this easy-to-follow SQL course

This course will help you brush up on your SQL skills and learn basic to advanced techniques.