Meta (Facebook) SQL Interview Questions

Meta (Facebook) SQL Interview Questions

Overview

In many of our crowd-sourced interview guides, Facebook data scientists, analysts, and engineers are all asked SQL interview questions, more than the average technical interview. That’s why it’s extremely important to prepare for the SQL section of the Facebook technical screen, as it’s sure to come up.

Facebook candidates can be expected to demonstrate a solid foundation of SQL knowledge – if not expertise – in several different categories. With such broad constraints for the technical screen, it may be hard to determine exactly what to focus on for the SQL portion. To help you, we’ve prepared some sample questions to prep for Facebook SQL interviews.

What SQL Questions Get Asked at Facebook?

It’s helpful to prep when you have an idea of the SQL questions that get asked at Facebook. Essentially, these questions fall into two categories:

  1. Definition-based questions - Definition questions are used to quickly assess your SQL knowledge. These will be simple questions like, “What is a JOIN?” or more theoretical questions like “Why would you use an index in SQL?” You’ll need a solid grasp of key terms and concepts to nail this portion of the Facebook interview.

  2. Writing queries - Facebook interviews ask a lot of SQL questions. And the majority tend to fall into this category. In Facebook SQL interviews, you’ll likely be asked to write a query using two or more datasets. These are generally based on Facebook (or Instagram) cases.

How Much SQL Is Asked in Facebook Interview?

To answer this question, it really depends on the role. In some positions, like product analysts - for example - you’ll use SQL everyday, and will likely be asked a wide range of SQL query questions. Other roles, like data engineers, will use SQL for data processing and database management, which might ask more theoretical questions.

Here’s a look at SQL by role:

  • Product Strategist - Interviews for strategist roles require the ability to quickly isolate insights from product and user data. Focus your prep on writing beginner-to-advanced queries in SQL.
  • Data Analyst - Facebook employees analysts in several capacities, including product, community relations and business intelligence. These roles will require the ability to write SQL using customer and product data.
  • Data Engineer - Data engineer roles will get asked query questions. But also more theoretical SQL questions around query optimization, database management, and database design.
  • Data Scientist - Many Facebook data scientist roles are aligned with analytics, e.g. using analytics to make data-driven product decisions. SQL (as well as a scripting language like Python) are important skills. These questions tend to ask query questions - from basics, to more complex queries.

SQL Concepts for Facebook Interview

Here are some sample Facebook SQL questions that are definitions, as well as theoretical. As you study, focus on those definitions (especially being able to explain them in layman’s terms), as well as how they might apply theoretically.

1. What is a foreign key in SQL? What role does it play?

A foreign key is a field or multiple fields in one table that can refer back to the primary key in another. The table with the primary key is referred to as the parent or referenced table, while the one with the foreign key is the child table. Foreign keys are used to provide the referential integrity between tables.

2. Are blank spaces or zero values treated the same as NULL?

No, NULL is not the same as a zero or blank space. Instead, NULL is used in the absence of any value. In other words, the value is unavailable, unknown, unassigned or not appropriate. Zeroes and blank spaces are treated differently, and they can be compared to other zeroes or blank spaces, respectively. NULL cannot be compared to another NULL.

3. What is meant by SQL injection? What would be the first step in preventing or mitigating an attack?

SQL injection, also known as SQLI, is a type of vulnerability that uses malicious SQL code to give attacks access to the backend database. Attackers can then manipulate the database and gain access to private information like user lists and customer details.

One of the first steps would be input validation, to identify the illegitimate user inputs.

4. What’s the difference between IN and BETWEEN operators?

Both allow you to find multiple values from the table. But a key difference is the type of data you are selecting with these operators. For example, BETWEEN selects a range of data between two values. Alternatively, IN allows you select multiple values.

5. What is a cursor?

In SQL, a cursor is a temporary memory or workstation. They store database tables, and in SQL there are two main types:

  1. Implicit cursors- Implicit cursors are allocated by the SQL server when users perform DML operations.
  2. Explicit cursors- Explicit cursors are based on user inputs.

6. Explain what a trigger is in SQL. When would you use a trigger?

A trigger is a stored procedure that automatically runs when a certain event occurs in the database server. For example, DML triggers run when a user tries to modify data. Triggers can be used to:

  • Audit database activity
  • Implement rules
  • To enforce referential integrity

Facebook SQL Coding Questions

In the technical screen, you’ll be asked a range of query-based SQL questions. These will ask you to write queries based on provided datasets. The best prep strategy is to practice and write lots of sample queries aligned to real case studies. Here are questions that will help you develop that skill:

1. Write a SQL query to create a histogram of the number of comments per user in January 2020. Assume bin buckets class intervals of one.

users table

Columns Type
id INTEGER
name STRING
created_at DATETIME
neighborhood_id INTEGER
mail STRING

comments table

Columns Type
user_id INTEGER
body VARCHAR
created_at DATETIME

Hint: Since a histogram is just a display of frequencies of each user, all we really need to do is get the total count of user comments in the month of January 2020 for each user, and then group by that count.

2. In the table below, column action represents either (‘post_enter’, ‘post_submit’, ‘post_canceled’) for when a user starts a post (enter), ends up canceling it (cancel), or ends up posting it (submit). Write a query to get the post success rate for each day in the month of January 2020.

events table

Columns Type
id INTEGER
user_id INTEGER
created_at DATETIME
action STRING
url STRING
platform STRING

Hint: Let’s see if we can clearly define the metrics we want to calculate before just jumping into the problem. We want post success rate for each day over the past week. How would we get that metric?

We can assume post success rate can be defined as: (total posts created) / (total posts entered)

Additionally, since the success rate must be broken down by day, we must make sure that a post that is entered must be completed on the same day.

Now that we have these requirements, it’s time to calculate our metrics. We know we have to GROUP BY the date to get each day’s posting success rate. We also have to break down how we can compute our two metrics of total posts entered and total posts actually created.

3. Write a SQL query to create a metric to recommend pages for each user based on recommendations from pages their friends have already liked.

More Context: You want to build a naïve recommender, and you are given two tables. One is called friends (user_id and friend_id) and another called page_likes (with user_id and page_id). Note: it shouldn’t recommend pages that the user already likes.

Let’s solve this Facebook SQL interview question by visualizing what kind of output we want from the query. Given that we have to create a metric by which to evaluate recommending a user to a page, we know we want something with a user_id and a page_id along with some sort of recommendation score.

Let’s try to think of an easy way to represent the scores of each user_id and page_id combo. One naïve method would be to create a score by summing up the total likes by friends on each page that the user has not currently liked. Then the max value on our metric will be the most recommendable page.

The first thing we have to do is then to write a query to associate users to their friends’ liked pages. We can do that easily with an initial join between the two tables.

WITH t1 AS (
    SELECT 
        f.user_id 
        , f.friend_id
        , pl.page_id 
    FROM friends AS f
    INNER JOIN page_likes AS pl
        ON f.friend_id = pl.user_id
)

What comes next to reaching the desired output?

4. Given three tables representing customer transactions and customer attributes, write a query to get the average order value by gender.

transactions table

Columns Type
id integer
user_id INTEGER
created_at DATETIME
product_id INTEGER
quantity INTEGER

users table

Columns Type
id INTEGER
name VARCHAR
sex VARCHAR

products table

Columns Type
id INTEGER
name STRING
price FLOAT

5. Say our example output from the query in Question 4 is:

Men

  • AOV: $46.3
  • Total purchases: 2500
  • Unique purchasers: 1500

Women

  • AOV: $50.2
  • Total purchases: 3500
  • Unique purchasers: 500

Would the difference in AOV (average order value) be significant?

Note: Remember to round your answer to two decimal places.

6. You’re given a table that represents search results from searches on Facebook. The query column is the search term, position column represents each position the search result came in, and the rating column represents the human rating of the search result from 1 to 5 where 5 is high relevance and 1 is low relevance.

Part 1:

Write a query to compute a metric to measure the quality of the search results for each query.

This is an unusual SQL problem, given it asks to define a metric and then write a query to compute it. Generally, this should be pretty simple. Can we rank by the metric and figure out which query has the best overall results?

For example, if the search query for ‘tiger’ has 5s for each result, then that would be a perfect result.

The way to compute that metric would be to simply take the average of the rating for all of the results. In which the query can very easily be:

sql
SELECT query, ROUND(AVG(rating), 2) AS avg_rating 
FROM search_results 
GROUP BY 1

Part 2: You want to be able to compute a metric that measures the precision of the above ranking system based on position. For example, if the results for dog and cat are:

query result_id position rating notes
dog 1000 1 2 picture of hotdog
dog 998 2 4 dog walking
dog 342 3 1 zebra
cat 123 1 4 picture of cat
cat 435 2 2 cat memes
cat 545 3 1 pizza shops

We would rank ‘cat’ as having a better search result ranking precision than ‘dog’ based on the correct sorting by rating.

Write a query to create a metric that can validate and rank the queries by their search result precision. Round the metric (avg_rating column) to 2 decimal places.

The precision metric is a little more difficult now that we have to account for a second factor, which is position. We now have to find a way to weight the position in accordance to the rating to normalize the metric score.

This type of problem set can get very complicated if we wanted to dive deeper into it. However, the question is clearly more marked towards being practical in figuring out the metric and developing an easy SQL query, rather than developing a search ranking precision scale that optimizes for something like CTR.

To solve the problem, it’s helpful to look at the example to construct an approach towards a metric. For example, if the first result is rated at 5 and the last result is rated at a 1, that’s good. Even better, however, is if the first result is rated 5 and the last result is also rated 5. Bad is if the first result is 1 and the last result is 5.

However, if we use the approach from Part 1, we’ll get the same metric score no matter which way the values are ranked by position. So how do we factor position into the ranking?

7. Given the following tables, select the top 3 departments with at least ten employees and rank them according to the percentage of their employees making over 100K in salary.

employees table

Columns Types
id int
first_name VARCHAR
last_name VARCHAR
salary INTEGER
department_id INTEGER

departments table

Columns Types
id INTEGER
name VARCHAR

Let’s approach this problem by looking at the output of what the response would look like.

We know that we need to calculate the total number employees that are making over $100K by each department. This means that we’re going to have to run a GROUP BY on the department name since we want a new row for each department.

We also need a formula to represent how we can differentiate employees that make over $100K and those that make less. We can calculate that by formulating:

(Number of people making over $100K) / (Total number of people in that department)

Now in terms of implementation, if we first do a JOIN between the employees table and the departments table, then we can get all of the data points we need together. Then all that is left is:

  • a function to get all of the employees
  • a function to get all employees making over 100K
  • dividing those values by each other

8. We’re given two tables, a users table with demographic information and the neighborhood they live in and a neighborhoods table.

users table

Columns Types
id INTEGER
name VARCHAR
neighborhood_id INTEGER
created_at DATETIME

neighborhoods table

Columns Type
id INTEGER
name VARCHAR
city_id INTEGER

Write a query that returns all of the neighborhoods that have 0 users.

Whenever the question asks about finding values with 0 something (users, employees, posts, etc..) immediately think of the concept of LEFT JOIN! An inner join finds any values that are in both tables, a left join keeps only the values in the left table.

Our predicament is to find all the neighborhoods without users. To do this, we must do a left join from the neighborhoods table to the users table, which will give us an output like this:

neighborhoods.name users.id
castro 123
castro 124
cole valley null
castro heights 534
castro heights 564

If we then add in a where condition of WHERE users.id IS NULL, then we will get every single neighborhood without a singular user.

Conclusion

In nearly all Facebook roles, including Facebook data science internships, SQL interview questions are an essential part of the interview. Prep for these questions with a solid study of the basics. You need a strong foundation of SQL expertise to ace this part of the technical screen. But you’ll also want plenty of experience writing advanced queries as well, especially case-based SQL questions.