In many of our crowd-sourced interview guides, Meta data scientists, analysts, and engineers are all asked SQL interview questions, more than the average technical interview. That’s why it’s extremely important to prepare for the SQL section of the Meta technical screen, as it’s sure to come up.
Meta candidates can be expected to demonstrate a solid foundation of SQL knowledge – if not expertise – in several different categories. With such broad constraints for the technical screen, it may be hard to determine exactly what to focus on for the SQL portion. To help you, we’ve prepared some sample Facebook SQL interview questions.
It’s helpful to prep when you have an idea of the SQL questions that get asked at Meta. Essentially, these questions fall into two categories:
Definition questions are used to quickly assess your SQL knowledge. These will be simple questions like, “What is a JOIN?” or more theoretical questions like “Why would you use an index in SQL?” You’ll need a solid grasp of key terms and concepts to nail this portion of the Meta interview.
Meta interviews ask a lot of SQL questions. And the majority tend to fall into this category. In Meta SQL interviews, you’ll likely be asked to write a query using two or more datasets. These are generally based on Meta (or Instagram) cases.
To answer this question, it really depends on the role. In some positions, like product analysts - for example - you’ll use SQL everyday, and will likely be asked a wide range of SQL query questions. Other roles, like data engineers, will use SQL for data processing and database management, which might ask more theoretical questions.
Here’s a look at SQL by role:
Interviews for strategist roles require the ability to quickly isolate insights from product and user data. Focus your prep on writing beginner-to-advanced queries in SQL.
Meta employees analysts in several capacities, including product, community relations and business intelligence. These roles will require the ability to write SQL using customer and product data.
Data engineer roles will get asked query questions. But also more theoretical SQL questions around query optimization, database management, and database design.
Many Meta data scientist roles are aligned with analytics, e.g. using analytics to make data-driven product decisions. SQL (as well as a scripting language like Python) are important skills. These questions tend to ask query questions - from basics, to more complex queries.
Here are some samples of Facebook SQL interview questions that are definitions, as well as theoretical. As you study, focus on those definitions (especially being able to explain them in layman’s terms), as well as how they might apply theoretically.
A foreign key is a field or multiple fields in one table that can refer back to the primary key in another. The table with the primary key is referred to as the parent or referenced table, while the one with the foreign key is the child table. Foreign keys are used to provide the referential integrity between tables.
No, NULL is not the same as a zero or blank space. Instead, NULL is used in the absence of any value. In other words, the value is unavailable, unknown, unassigned or not appropriate. Zeroes and blank spaces are treated differently, and they can be compared to other zeroes or blank spaces, respectively. NULL cannot be compared to another NULL.
SQL injection, also known as SQLI, is a type of vulnerability that uses malicious SQL code to give attacks access to the backend database. Attackers can then manipulate the database and gain access to private information like user lists and customer details.
One of the first steps would be input validation, to identify the illegitimate user inputs.
Both allow you to find multiple values from the table. But a key difference is the type of data you are selecting with these operators. For example, BETWEEN selects a range of data between two values. Alternatively, IN allows you select multiple values.
In SQL, a cursor is a temporary memory or workstation. They store database tables, and in SQL there are two main types:
A trigger is a stored procedure that automatically runs when a certain event occurs in the database server. For example, DML triggers run when a user tries to modify data. Triggers can be used to:
In the technical screen, you’ll be asked a range of query-based SQL questions. These will ask you to write queries based on provided datasets. The best prep strategy is to practice and write lots of sample queries aligned to real case studies. Here are questions that will help you develop that skill:
users
table
Columns | Type |
---|---|
id |
INTEGER |
name |
STRING |
created_at |
DATETIME |
neighborhood_id |
INTEGER |
mail |
STRING |
comments
table
Columns | Type |
---|---|
user_id |
INTEGER |
body |
VARCHAR |
created_at |
DATETIME |
Hint: Since a histogram is just a display of frequencies of each user, all we really need to do is get the total count of user comments in the month of January 2020 for each user, and then group by that count.
action
represents either (‘post_enter’, ‘post_submit’, ‘post_canceled’) for when a user starts a post (enter), ends up canceling it (cancel), or ends up posting it (submit). Write a query to get the post success rate for each day in the month of January 2020.events
table
Columns | Type |
---|---|
id |
INTEGER |
user_id |
INTEGER |
created_at |
DATETIME |
action |
STRING |
url |
STRING |
platform |
STRING |
Hint: Let’s see if we can clearly define the metrics we want to calculate before just jumping into the problem. We want post success rate for each day over the past week. How would we get that metric?
We can assume post success rate can be defined as: (total posts created) / (total posts entered)
Additionally, since the success rate must be broken down by day, we must make sure that a post that is entered must be completed on the same day.
Now that we have these requirements, it’s time to calculate our metrics. We know we have to GROUP BY the date to get each day’s posting success rate. We also have to break down how we can compute our two metrics of total posts entered and total posts actually created.
More Context: You want to build a naïve recommender, and you are given two tables. One is called friends (user_id and friend_id) and another called page_likes (with user_id and page_id). Note: it shouldn’t recommend pages that the user already likes.
Let’s solve this Meta SQL interview question by visualizing what kind of output we want from the query. Given that we have to create a metric by which to evaluate recommending a user to a page, we know we want something with a user_id and a page_id along with some sort of recommendation score.
Let’s try to think of an easy way to represent the scores of each user_id and page_id combo. One naïve method would be to create a score by summing up the total likes by friends on each page that the user has not currently liked. Then the max value on our metric will be the most recommendable page.
The first thing we have to do is then to write a query to associate users to their friends’ liked pages. We can do that easily with an initial join between the two tables.
WITH t1 AS (
SELECT
f.user_id
, f.friend_id
, pl.page_id
FROM friends AS f
INNER JOIN page_likes AS pl
ON f.friend_id = pl.user_id
)
What comes next to reaching the desired output?
transactions
table
Columns | Type |
---|---|
id |
integer |
user_id |
INTEGER |
created_at |
DATETIME |
product_id |
INTEGER |
quantity |
INTEGER |
users
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
sex |
VARCHAR |
products
table
Columns | Type |
---|---|
id |
INTEGER |
name |
STRING |
price |
FLOAT |
Men
Women
Would the difference in AOV (average order value) be significant?
Note: Remember to round your answer to two decimal places.
Part 1:
Write a query to compute a metric to measure the quality of the search results for each query.
This is an unusual SQL problem, given it asks to define a metric and then write a query to compute it. Generally, this should be pretty simple. Can we rank by the metric and figure out which query has the best overall results?
For example, if the search query for ‘tiger’ has 5s for each result, then that would be a perfect result.
The way to compute that metric would be to simply take the average of the rating for all of the results. In which the query can very easily be:
SELECT query, ROUND(AVG(rating), 2) AS avg_rating
FROM search_results
GROUP BY 1
Part 2: You want to be able to compute a metric that measures the precision of the above ranking system based on position. For example, if the results for dog and cat are:
query | result_id | position | rating | notes |
---|---|---|---|---|
dog | 1000 | 1 | 2 | picture of hotdog |
dog | 998 | 2 | 4 | dog walking |
dog | 342 | 3 | 1 | zebra |
cat | 123 | 1 | 4 | picture of cat |
cat | 435 | 2 | 2 | cat memes |
cat | 545 | 3 | 1 | pizza shops |
We would rank ‘cat’ as having a better search result ranking precision than ‘dog’ based on the correct sorting by rating.
Write a query to create a metric that can validate and rank the queries by their search result precision. Round the metric (avg_rating column) to 2 decimal places.
The precision metric is a little more difficult now that we have to account for a second factor, which is position. We now have to find a way to weight the position in accordance to the rating to normalize the metric score.
This type of problem set can get very complicated if we wanted to dive deeper into it. However, the question is clearly more marked towards being practical in figuring out the metric and developing an easy SQL query, rather than developing a search ranking precision scale that optimizes for something like CTR.
To solve the problem, it’s helpful to look at the example to construct an approach towards a metric. For example, if the first result is rated at 5 and the last result is rated at a 1, that’s good. Even better, however, is if the first result is rated 5 and the last result is also rated 5. Bad is if the first result is 1 and the last result is 5.
However, if we use the approach from Part 1, we’ll get the same metric score no matter which way the values are ranked by position. So how do we factor position into the ranking?
employees
table
Columns | Types |
---|---|
id |
int |
first_name |
VARCHAR |
last_name |
VARCHAR |
salary |
INTEGER |
department_id |
INTEGER |
departments
table
Columns | Types |
---|---|
id |
INTEGER |
name |
VARCHAR |
Let’s approach this problem by looking at the output of what the response would look like.
We know that we need to calculate the total number employees that are making over $100K by each department. This means that we’re going to have to run a GROUP BY
on the department name since we want a new row for each department.
We also need a formula to represent how we can differentiate employees that make over $100K and those that make less. We can calculate that by formulating:
(Number of people making over $100K) / (Total number of people in that department)
Now in terms of implementation, if we first do a JOIN between the employees table and the departments table, then we can get all of the data points we need together. Then all that is left is:
users
table
Columns | Types |
---|---|
id | INTEGER |
name | VARCHAR |
neighborhood_id | INTEGER |
created_at | DATETIME |
neighborhoods
table
Columns | Type |
---|---|
id |
INTEGER |
name |
VARCHAR |
city_id |
INTEGER |
Write a query that returns all of the neighborhoods that have 0 users.
Whenever the question asks about finding values with 0 something (users, employees, posts, etc..) immediately think of the concept of LEFT JOIN! An inner join finds any values that are in both tables, a left join keeps only the values in the left table.
Our predicament is to find all the neighborhoods without users. To do this, we must do a left join from the neighborhoods table to the users table, which will give us an output like this:
neighborhoods.name |
users.id |
---|---|
castro | 123 |
castro | 124 |
cole valley | null |
castro heights | 534 |
castro heights | 564 |
If we then add in a where condition of WHERE users.id IS NULL, then we will get every single neighborhood without a singular user.
Let’s use SQL’s DATEDIFF function to calculate the difference between a user’s first and last session dates. We’ll use the MIN and MAX functions to identify these dates and restrict our query to sessions in 2020 with a WHERE clause using the YEAR function. Finally, we’ll GROUP BY user_id to get a summary row for each user.
Note: Display only quarters where at least one transaction occurred. Quarter names should be Q1,Q2,Q3 and Q4. Q1 is from January to March.
Categorize transaction dates into fiscal quarters, with Q1 covering January to March, Q2 from April to June, Q3 from July to September, and Q4 from October to December. Calculate the total spending for each department—IT, HR, and Marketing—within these quarters. Sum any transactions from other departments into a separate “Other” category. Group the results by quarter to provide a summary of departmental spending across the fiscal year.
In nearly all Meta roles, including Meta data science internships, SQL interview questions are an essential part of the interview. Prep for these questions with a solid study of the basics. You need a strong foundation of SQL expertise to ace this part of the technical screen. But you’ll also want plenty of experience writing advanced queries as well, especially case-based SQL questions.
Looking for more topics like this? Visit our blog! We talk about everything from PostgreSQL interview questions to machine learning case studies and articles on careers in data science.
Discover more with the resources at Interview Query!