In many of our crowd-sourced interview guides, Facebook data scientists, analysts, and engineers are all asked SQL questions, moreso than the average technical interview. That's why it's extremely important to prepare for the SQL section of the Facebook technical screen, as it's sure to come up.
Facebook candidates can be expected to demonstrate a solid foundation of SQL knowledge – if not expertise – in several different categories. With such broad constraints for the technical screen, it may be hard to determine exactly what to focus on for the SQL portion. To help you, we've prepared some sample questions to prep for Facebook SQL interviews.
Need some more SQL help? Check out our Ultimate Guide to SQL Interview Questions, which includes 50+ practice problems.
What SQL Questions Get Asked at Facebook?
It's helpful to prep when you have an idea of the SQL questions that get asked at Facebook. Essentially, these questions fall into two categories:
1) Definition-based questions - Definition questions are used to quickly assess your SQL knowledge. These will be simple questions like, "What is a JOIN?" or more theoretical questions like "Why would you use an index in SQL?" You'll need a solid grasp of key terms and concepts to nail this portion of the Facebook interview.
2) Writing queries - Facebook interviews ask a lot of SQL questions. And the majority tend to fall into this category. In Facebook SQL interviews, you'll likely be asked to write a query using two or more datasets. These are generally based on Facebook (or Instagram) cases.
How Much SQL Is Asked in Facebook Interview?
To answer this question, it really depends on the role. In some positions, like product analysts - for example - you'll use SQL everyday, and will likely be asked a wide range of SQL query questions. Other roles, like data engineers, will use SQL for data processing and database management, which might ask more theoretical questions.
Here's a look at SQL by role:
- Product Strategist - Interviews for strategist roles require the ability to quickly isolate insights from product and user data. Focus your prep on writing beginner-to-advanced queries in SQL.
- Data Analyst - Facebook employees analysts in several capacities, including product, community relations and business intelligence. These roles will require the ability to write SQL using customer and product data.
- Data Engineer - Data engineer roles will get asked query questions. But also more theoretical SQL questions around query optimization, database management, and database design.
- Data Scientist - Many Facebook data scientist roles are aligned with analytics, e.g. using analytics to make data-driven product decisions. SQL (as well as a scripting language like Python) are important skills. These questions tend to ask query questions - from basics, to more complex queries.
SQL Concepts for Facebook Interview
Here are some sample Facebook SQL questions that are definitions, as well as theoretical. As you study, focus on those definitions (especially being able to explain them in layman's terms), as well as how they might apply theoretically.
Q1. What is a foreign key in SQL? What role does it play?
A foreign key is a field or multiple fields in one table that can refer back to the primary key in another. The table with the primary key is referred to as the parent or referenced table, while the one with the foreign key is the child table. Foreign keys are used to provide the referential integrity between tables.
Q2. Are blank spaces or zero values treated the same as NULL?
No, NULL is not the same as a zero or blank space. Instead, NULL is used in the absence of any value. In other words, the value is unavailable, unknown, unassigned or not appropriate. Zeroes and blank spaces are treated differently, and they can be compared to other zeroes or blank spaces, respectively. NULL cannot be compared to another NULL.
Q3. What is meant by SQL injection? What would be the first step in preventing or mitigating an attack?
SQL injection, also known as SQLI, is a type of vulnerability that uses malicious SQL code to give attacks access to the backend database. Attackers can then manipulate the database and gain access to private information like user lists and customer details.
One of the first steps would be input validation, to identify the illegitimate user inputs.
Q4. What's the difference between IN and BETWEEN operators?
Both allow you to find multiple values from the table. But a key difference is the type of data you are selecting with these operators. For example, BETWEEN selects a range of data between two values. Alternatively, IN allows you select multiple values.
Q5. What is a cursor?
In SQL, a cursor is a temporary memory or workstation. They store database tables, and in SQL there are two main types: 1) implicit cursors and 2) explicit cursors. Implicit cursors are allocated by the SQL server when users perform DML operations. Explicit cursors are based on user inputs.
Q6. Explain what a trigger is in SQL. When would you use a trigger?
A trigger is a stored procedure that automatically runs when a certain event occurs in the database server. For example, DML triggers run when a user tries to modify data. Triggers can be used to:
- Audit database activity
- Implement rules
- To enforce referential integrity
Facebook SQL Coding Questions
In the technical screen, you'll be asked a range of query-based SQL questions. These will ask you to write queries based on provided datasets. The best prep strategy is to practice and write lots of sample queries aligned to real case studies. Here are questions that will help you develop that skill:
Q1: Write a SQL query to create a histogram of number of comments per user in the month of January 2020. Assume bin buckets class intervals of one.
| columns | type | |-----------------|----------| | id | integer | | name | string | | created_at | datetime | | neighborhood_id | integer | | mail | string |
| columns | type | |------------|----------| | user_id | integer | | body | text | | created_at | datetime |
Hint: Since a histogram is just a display of frequencies of each user, all we really need to do is get the total count of user comments in the month of January 2020 for each user, and then group by that count.
Q2: In the table below, column `action` represents either ('post_enter', 'post_submit', 'post_canceled') for when a user starts a post (enter), ends up canceling it (cancel), or ends up posting it (submit). Write a query to get the post success rate for each day in the month of January 2020.
| columns | type | |------------|----------| | id | integer | | user_id | integer | | created_at | datetime | | action | string | | url | string | | platform | string |
Hint: Let's see if we can clearly define the metrics we want to calculate before just jumping into the problem. We want post success rate for each day over the past week. How would we get that metric?
We can assume post success rate can be defined as: (total posts created) / (total posts entered)
Additionally, since the success rate must be broken down by day, we must make sure that a post that is entered must be completed on the same day.
Now that we have these requirements, it's time to calculate our metrics. We know we have to GROUP BY the date to get each day's posting success rate. We also have to break down how we can compute our two metrics of total posts entered and total posts actually created.
Q3: We want to build a naive recommender and we're given two tables, one table called `friends` with a user_id and friend_id columns representing each user's friends, and another table called `page_likes` with a user_id and a page_id representing the page each user liked. Write an SQL query to create a metric to recommend pages for each user based on recommendations from their friends liked pages.
| columns | type | |-----------|---------| | user_id | integer | | friend_id | integer |
page likes table:
| columns | type | |---------|---------| | user_id | integer | | page_id | integer |
Solution: We can start by visualizing what kind of output we want from the query. Given that we have to create a metric for each user to recommend pages, we know we want something with a user_id and a page_id along with some sort of recommendation score.
How can we easily represent the scores of each user_id and page_id combo? One naive method would be to create a score by summing up the total likes by friends on each page that the user hasn't currently liked. The max value on our metric would be the most recommendable page.
The first thing we have to do then is to write a query to associate users to their friends liked pages. We can do that easily with an initial join between the two tables.
Q4: Given the following three tables that represent customer transactions and customer attributes, write a query to get the average order value by gender.
| columns | type | |------------|----------| | id | integer | | user_id | integer | | created_at | datetime | | product_id | integer | | quantity | integer |
| columns | type | |---------|---------| | id | integer | | name | varchar | | sex | varchar |
| columns | type | |---------|---------| | id | integer | | name | string | | price | float |
Q5: Say our example output from the query in Question 4 is:
- AOV: $46.3
- Total purchases: 2500
- Unique purchasers: 1500
- AOV: $50.2
- Total purchases: 3500
- Unique purchasers: 500
Would the difference in AOV (average order value) be significant?
Note: remember to round your answer to two decimal places!
Q6 (Part 1): You're given a table that represents search results from searches on Facebook. The query column is the search term, position column represents each position the search result came in, and the rating column represents the human rating of the search result from 1 to 5 where 5 is high relevance and 1 is low relevance.
Write a query to compute a metric to measure the quality of the search results for each query.
This is an unusual SQL problem, given it asks to define a metric and then write a query to compute it. Generally, this should be pretty simple. Can we rank by the metric and figure out which query has the best overall results?
For example, if the search query for 'tiger' has 5s for each result, then that would be a perfect result.
The way to compute that metric would be to simply take the average of the rating for all of the results. In which the query can very easily be:
SELECT query, ROUND(AVG(rating), 2) AS avg_rating FROM search_results GROUP BY 1
Q6 (Part 2). You want to be able to compute a metric that measures the precision of the above ranking system based on position. For example, if the results for dog and cat are:
We would rank 'cat' as having a better search result ranking precision than 'dog' based on the correct sorting by rating.
Write a query to create a metric that can validate and rank the queries by their search result precision. Round the metric (avg_rating column) to 2 decimal places.
The precision metric is a little more difficult now that we have to account for a second factor, which is position. We now have to find a way to weight the position in accordance to the rating to normalize the metric score.
This type of problem set can get very complicated if we wanted to dive deeper into it. However, the question is clearly more marked towards being practical in figuring out the metric and developing an easy SQL query, rather than developing a search ranking precision scale that optimizes for something like CTR.
To solve the problem, it's helpful to look at the example to construct an approach towards a metric. For example, if the first result is rated at 5 and the last result is rated at a 1, that's good. Even better, however, is if the first result is rated 5 and the last result is also rated 5. Bad is if the first result is 1 and the last result is 5.
However, if we use the approach from Part 1, we'll get the same metric score no matter which way the values are ranked by position. So how do we factor position into the ranking?
Q7: Given the following tables, select the top 3 departments with at least ten employees and rank them according to the percentage of their employees making over 100K in salary.
| columns | types | |---------------|---------| | id | int | | first_name | varchar | | last_name | varchar | | salary | int | | department_id | int |
| columns | types | |---------|---------| | id | int | | name | varchar |
Let's approach this problem by looking at the output of what the response would look like.
We know that we need to calculate the total number employees that are making over $100K by each department. This means that we're going to have to run a ```GROUP BY``` on the department name since we want a new row for each department.
We also need a formula to represent how we can differentiate employees that make over $100K and those that make less. We can calculate that by formulating:
(Number of people making over $100K) / (Total number of people in that department)
Now in terms of implementation, if we first do a JOIN between the employees table and the departments table, then we can get all of the data points we need together. Then all that is left is:
- a function to get all of the employees
- a function to get all employees making over 100K
- dividing those values by each other
Q8: We're given two tables, a users table with demographic information and the neighborhood they live in and a neighborhoods table.
| columns | types | |-----------------|----------| | id | int | | name | varchar | | neighborhood_id | int | | created_at | datetime |
| columns | type | |---------|---------| | id | int | | name | varchar | | city_id | int |
Write a query that returns all of the neighborhoods that have 0 users.
Whenever the question asks about finding values with 0 something (users, employees, posts, etc..) immediately think of the concept of LEFT JOIN! An inner join finds any values that are in both tables, a left join keeps only the values in the left table.
Our predicament is to find all the neighborhoods without users. To do this, we must do a left join from the neighborhoods table to the users table, which will give us an output like this:
neighborhoods.name | users.id ____________________|__________ castro | 123 castro | 124 cole valley | null castro heights | 534 castro heights | 564
If we then add in a where condition of WHERE users.id IS NULL, then we will get every single neighborhood without a singular user.
In nearly all Facebook roles, including Facebook data science internships, SQL questions are an essential part of the interview. Prep for these questions with a solid study of the basics. You need a strong foundation of SQL expertise to ace this part of the technical screen. But you'll also want plenty of experience writing advanced queries as well, especially case-based SQL questions.
Need more help? Check out the SQL module in our Data Science Course, for a refresher of basic through advanced concepts. Plus, check out our company SQL guides for Amazon and Google: Amazon SQL Questions and Google SQL Questions.