SQL questions popup in Google data science interviews all the time. And if you’ve got an interview coming up, you might be wondering: How can I prepare for SQL interviews at Google?
First, you’ll want to be ready to answer any definition-based SQL questions, like “what is a join?” Basic questions test your competency level with SQL, and they're often asked early in the interview process. More importantly, though, in Google SQL interviews, you’ll likely be asked SQL business-focused questions, e.g. writing SQL queries around customer data.
Today, we’re looking specifically at the SQL portion of Google interviews. We cover: What you need to know, how SQL questions align to various Google roles, and SQL practice problems you can use to prep.
Interested in more SQL interview help? Check out our Ultimate Guide to SQL Interview Questions.
Google has an extremely rigorous interview process. Each stage is as important as the next. But you likely won’t be asked SQL questions until the technical screen and on-site interview. Here’s what it looks like:
- Initial Phone Screen - The initial call typically asks behavioral questions and questions about your experience. They're typically conducted with a recruiter, and they're designed to see if you’re a right fit for Google.
- Technical Screen - During the technical screen, you should expect basic SQL questions, including definitions of basics like should expect basic SQL questions on DATE, GROUP BY, and JOIN. You might be asked to perform basic queries as well.
- Onsite Interview - The final round of interviews includes intermediate to advanced SQL questions using CASE, JOIN, sub-queries and complex queries.
What Gets Asked in Google SQL Interviews?
Like all FAANG companies, Google relies heavily on data. And SQL is a go-to tool for processing and analyzing that data. Google SQL interviews don’t just ask basics. These interviews tend to ask case study SQL questions. In other words, you’ll be presented with more practical problems and real data, and be asked to write queries for that dataset. The most common questions to get asked include:
- Basic SQL interview questions - These are typically definition-based questions, and they come up on the technical screen. One tip: Develop your ability to explain these basic SQL concepts in layman’s terms. (Check out our Basic SQL Questions guide for examples.)
- SQL query questions - These types of questions test your knowledge in writing queries and statements in SQL. You’ll be presented with a dataset, and asked to write SQL code to return a specific value.
- Advanced SQL questions - Finally, you may be asked specific case questions that will require you to write advanced SQL queries. With this type of questions, you’ll be asked to write queries that address a specific case, and use a range of SQL clauses, from basics like SELECT and FROM, to advanced like HAVING.
Who Gets Asked SQL Questions at Google?
SQL is used in many different roles at Google. It will most commonly come up in interviews for:
Business Intelligence Analyst
Business analysts at Google use SQL to generate insights, maintain reports and run analyses. In business analyst interviews, you can expect the basics, like DATE, GROUP BY and JOIN, and in the final round of interviews, more complex queries.
Business Intelligence Engineer
BI engineers at Google are at the “intersection of product, data, and business strategy.” In other words, you’ll be tasked with using data to make business decisions that improve customer experience. SQL is one of the most important skills tested in BI engineer interviews at Google.
Google data analysts perform a range of key tasks. Google analysts tend to derive business insights from data, and provide that information to key stakeholders. These roles vary by the team you work with; for example, a data analyst on the Google Ads team will have a much different role than one working on Google Drive.
At Google, data scientists perform a variety of roles, and they’re tested on many different tools and skills. Although SQL questions get asked, these interviews tend to focus more on statistics, algorithms and machine learning topics. But be prepared for basic SQL syntax questions, as well as solvable queries.
Example Google SQL Interview Questions
These are examples of SQL questions you might expect in a Google interview:
Q1. Explain the different types of joins in SQL.
A JOIN is a clause in SQL that’s used to join rows from two or more tables, based on a common column between the tables. It is used for merging tables, as well as retrieving data. The most common types of joins include:
- Inner Join: An INNER JOIN is the most common, and it is used to return all rows from two or more tables, when the JOIN condition is met.
- Left Join: This type of join returns rows from the left table, when a matching row from the right meets the JOIN condition.
- Right Join: Similar to a left join, but rows are returned from the right table, when the JOIN condition on the left is met.
- Full Join: Full joins returns all rows from the left and right, when there is a match in any of the tables.
Q2. What is the PRIMARY KEY in SQL?
Primary keys are constraints that uniquely identify each record. One thing to note: Primary keys cannot have NULL values and all values must be UNIQUE. A table can have only one primary key, but the primary key can consist of a single or multiple columns.
Q3. What are constraints?
Constraints in SQL are rules that can be applied to the type of data in a table. They are used to limit the type of data that can be stored in a particular column within a table. Some common types used in SQL are:
- NOT NULL - This constraint prevents null values from being stored in a column.
- UNIQUE - This constraint says that values in a column must be unique. The primary key uses the UNIQUE constraint.
- PRIMARY KEY - This constraint is used to specify which field is the primary key.
- FOREIGN KEY - A foreign key is a constraint that can uniquely identify a row in another table.
Q4. What’s the difference between DELETE and TRUNCATE statements in SQL?
DELETE is used to remove specific data from a table. This statement is a DML command, and it’s slower than TRUNCATE. One key difference: You can rollback data after using DELETE. TRUNCATE, on the other hand, is a DDL command, and it is used to delete all the rows from a table.
Q5. What is query optimization?
Inefficient SQL queries can drain a database, and lead to slow performance and loss of service. Optimization is especially critical when working with production databases. As such, query optimization is the process of making SQL queries more efficient. More efficient queries provide outputs faster, and minimizes the impact on the database.
Q6. Given the tables below, select the top three departments with at least ten employees and rank them according to the percentage of their employees making over $100,ooo in salary.
| columns | types | | -------------- | ------- | | id | int | | first_name | varchar | | last_name | varchar | | salary | int | | department_id | int |
| columns | types | | ------- | --------------- | | id | int | | name | varchar |
| percentage_over_100K | department name | number of employees | | ---------------------- | --------------- | ------------------- | | .9 | engineering | 25 | | .5 | marketing | 50 | | .12 | sales | 12 |
Hint: What's the question really asking? Breaking it down, we can subset this into separate clauses of conditions: 1) Top 3 departments, 2) % of employees making over $100,000 in salary, and 3) Departments must have at least 10 employees.
Q7. Given a users table, write a query to get the cumulative number of new users added by day, with the total reset every month.
| columns | type | | ----------- | ---------------- | | id | int | | name | varchar | | created_at | datetime |
| Date | Monthly Cumulative | | ---------- | ------------------ | | 2020-01-01 | 5 | | 2020-01-02 | 12 | | ... | ... | | 2020-02-01 | 8 | | 2020-02-02 | 17 | | 2020-02-03 | 23 |
Hint: This question first seems like it could be solved by just running a COUNT(*) and grouping by date. Or maybe it's just a regular cumulative distribution function? But we have to notice that we are actually grouping by a specific interval of month and date. And when the next month comes around, we want to the reset the count of the number of users.
Q8. Given a table of product subscriptions with a subscription start date and end date for each user, write a query that returns true or false whether or not each user has a subscription date range that overlaps with any other user.
| column | type | | ----------- | ------------ | | user_id | int | | start_date | date | | end_date | date |
| user_id | start_date | end_date | | -------- | ----------- | ------------------ | | 1 | 2019-01-01 | 2019-01-31 | | 2 | 2019-01-15 | 2019-01-17 | | 3 | 2019-01-29 | 2019-02-04 | | 4 | 2019-02-05 | 2019-02-10 |
| user_id | overlap | | -------- | ------- | | 1 | 1 | | 2 | 1 | | 3 | 1 | | 4 | 0 |
Hint: Let's take a look at each of the conditions first and see how they could be triggered. Given two date ranges, what determines if the subscriptions would overlap?
Q9. Given a table of students and their SAT test scores, write a query to return the two students with the closest test scores with the score difference.
If there are multiple students with the same minimum score difference, select the student name combination that is higher in the alphabet.
| column | type | | ------- | ------- | | id | integer | | student | varchar | | score | integer |
| id | student | score | | -- | ------- | ------------ | | 1 | Jack | 1700 | | 2 | Alice | 2010 | | 3 | Miles | 2200 | | 4 | Scott | 2100 |
| one_student | other_student | score_diff | | ------------ | -------------- | ----------- | | Alice | Scott | 90 |
Hint: Given the problem statement is referencing one table with only two columns, we have to self-reference different creations of the same table. It's helpful to think about this problem in the form of two different tables with the same values.
Q10. We're given two tables, a users table with demographic information and the neighborhood they live in and a neighborhoods table. Write a query that returns all of the neighborhoods that have 0 users.
| columns | type | | ---------------- | ---------------- | | id | int | | name | varchar | | neighborhood_id | int | | created_at | datetime |
| columns | type | | -------- | ----------- | | id | int | | name | varchar | | city_id | int |
| columns | type | | ------------------ | --------------- | | neighborhood_name | varchar |
Hint: Our predicament is to find all the neighborhoods without users. In a sense we need all the neighborhoods that do not have a singular user living in them. This means we have to introduce a concept of existence of a column in one table, but not in the other.
Q11. Given a table of transactions and products, write a query to return the product id, product price, and average transaction price of all products with price greater than the average transaction price.
| column | type | | ----------- | -------- | | id | integer | | user_id | integer | | created_at | datetime | | product_id | integer | | quantity | integer |
| column | type | | ------ | ------- | | id | integer | | name | string | | price | float |
Hint: We first need to find the average price of all transactions. The total price of a transaction is price*quantity so we write a sub-query to find the average over all transactions.
Q12. Let's say we have two tables, `transactions` and `products`. Hypothetically the `transactions` table consists of over a billion rows of purchases bought by users.
We are trying to find paired products that are often purchased together by the same user, such as wine and bottle openers, chips and beer, etc..
Write a query to find the top five paired products and their names.
Note: for the purposes of satisfying the test case, P1 should be the item that comes first in the alphabet.
| column | type | | ----------- | --------------- | | id | integer | | user_id | integer | | created_at | datetime | | product_id | integer | | quantity | integer |
| column | type | | ------ | ------------- | | id | integer | | name | string | | price | float |
| column | type | | ------ | --------------- | | P1 | string | | P2 | string | | count | integer |
Hint: To solve this, we need to break this into several steps. First, we should find a way to select all the instances a user purchased 2 or more products at the same time. How can we use user_id and created_at to accomplish this?
In interviews for many Google positions, especially analyst roles, SQL questions make up a sizeable part of the interview. Your prep should include a solid study of the basics and SQL definitions, yet you should also be comfortable with more complex queries and sub-queries. One tip: Ask the recruiter the types of questions that will come up in the interview. This will give you an idea of where to focus your study.
If you need additional help, be sure to check out the SQL module in our Data Science Course. It offers a solid review of basic-to-advanced concepts. You might also want to see our Amazon SQL Questions guide for a look at how Amazon tests SQL skills in their interviews.