LinkedIn Data Scientist Interview Guide

LinkedIn Data Scientist Interview GuideLinkedIn Data Scientist Interview Guide


The data science job at LinkedIn is generally focused on the business side rather than engineering, and the data science role functions more like a product analyst and analytics job than it does at many other companies.

LinkedIn’s data science team leverages billions of data points to empower member engagement, business growth, and monetization efforts. With over 500 million members worldwide and a mix of B2B and B2C programs, the data science team has a huge impact in determining product direction.

LinkedIn Data Scientist Interview Process

The LinkedIn data science interview process is relatively straight-forward. Recruiters at LinkedIn like to dogfood their own product. So they will likely send you a message or InMail through LinkedIn to schedule a 30-minute phone screen, during which they’ll ask behavioral interview questions and see if the role is a good fit.

Technical Screen

The initial technical screen consists of two separate phone interviews, each lasting between 30 to 45 minutes long.

One interview is more technical-focused and specializes in testing concepts on SQL and data processing, while the other will run through a product and business case study. Depending on how your interview is structured, either interview could be the first one of the two. However, you are not guaranteed both interviews if you do poorly on one of them. Both interviewers are also going to be employees on the LinkedIn data science team, leaving ample time in the end to ask questions.

We’ve gathered this data from parsing thousands of interview experiences sourced from members.

LinkedIn Data Scientist Interview Questions

Typically, interviews at LinkedIn vary by role and team, but they follow fairly standardized data science interview questions across these topics.

LinkedIn Product and Business Case Study Questions

1. We’re working on a new feature for a LinkedIn chat. We want to implement a green dot to show an “active user,” but given engineering constraints, we can’t A/B test it before release.

How would you analyze the effectiveness of this new feature?

Hint: While it may be tempting to correlate increased usage of LinkedIn chat as a sign that the green dot is “effective,” you may need a more solid metric that relates this increased usage to the profit-generating aspects of LinkedIn’s business model.

2. We’re given a dataset of page views where each row represents one page view. How would you differentiate between scrapers and real people?

There is no exact right answer for problems like this. Modeling-based theoretical questions are more meant to assess whether you can make realistic assumptions and come up with a solution under these assumptions. Likely it will go down the path the interviewer explores as you make assumptions and draw conclusions.

We’re given a dataset of page views with likely scrapers and real users visiting the site. Because a scraper intends to extract data out of the LinkedIn network, a scraper will almost surely have a lot of page views, and the duration of these views will likely be rather short since a robotic scraper can process information much faster than a human (it just needs to download the fetched page and do some simple processing, say extract URLs that lead to other pages on LinkedIn).

The link traversal between users would also be more nuanced. We’d expect users to traverse the pages more through links on the site rather than a scraper making requests to different URLs. A real user, on the other hand, tends to visit the page fewer times and spend more time on each visit.

LinkedIn SQL Interview Questions

1. We’re interested in analyzing the career paths of data scientists, given a table of user experiences representing each person’s past work experiences and timelines. The titles we care about are bucketed into data scientist, senior data scientist, and data science manager.

user_experiences table

Column Type
id integer
user_id integer
title string
company string
start_date datetime
end_date datetime
is_current_role boolean

Determine if a data scientist who switches jobs more often gets promoted to a manager role faster than a data scientist who stays at one job for longer by writing a query to prove or disprove this hypothesis.

The hypothesis is that data scientists who switch jobs more often get promoted faster.

Therefore, in analyzing this dataset, we can prove this hypothesis by separating the data scientists into specific segments on how often they jump in their careers.

For example, if we looked at the number of job switches for data scientists that have been in their field for 5 years, we could prove the hypothesis that the number of data science managers increased as the number of careers jumped.

Never switched jobs: 10% are managers Switched jobs once: 20% are managers Switched jobs twice: 30% are managers Switched jobs three times: 40% are managers We could look at this over different buckets of time to see if the correlation stays consistent after 10 years and 15 years in a data science career.

2. Given a table of job postings, write a query to break down the number of users who have posted their jobs once versus the number of users who have posted at least one job multiple times.

job_postings table

Column Type
id integer
job_id integer
user_id integer
date_posted datetime

First, let’s visualize what the output would look like.

We want the value of two different metrics, the number of users who have posted their jobs once versus the number of users who have posted at least one job multiple times. What does that mean exactly?

If a user has 5 jobs but only posts them once, they are part of the first statement. But if they have 5 jobs and posted 7 times, they had to post one job at least multiple times.

3. Given a table of transactions and products, write a function to get the month-over-month change in revenue for the year 2019. Make sure to round month_over_month to 2 decimal places.

transactions table

Column Type
id integer
user_id integer
created_at datetime
product_id integer
quantity integer

products table

Column Type
id integer
name string
price float

Whenever there is a question on month-over-month, week-over-week, or year-over-year change, note that it can generally be done in two different ways.

One is using the LAG function that is available in certain SQL services. Another is to do a sneaky join.

For both, we’ll have to first sum the transactions and group by the month and the year. Grouping by the year is generally redundant in this case because we are only looking for the year 2019.

WITH monthly_transactions AS (
        MONTH(created_at) AS month,
        YEAR(created_at) AS year,
        SUM(price * quantity) AS revenue
    FROM transactions AS t
    INNER JOIN products AS p
            ON t.product_id =
    WHERE YEAR(created_at) = 2019
    GROUP BY 1,2
    ORDER BY 1

SELECT * FROM monthly_transactions

4. Write a query to return the percentage of users that satisfy either of the following conditions.

posted a job that is more than 180 days old. posted a job that has the same job_id as a previous job posting(s), which is more than 180 days old. job_postings table

Column Type
id integer
job_id integer
user_id integer
date_posted datetime

5. We’re given two tables, a table of notification deliveries, and a table of users with created and purchase conversion dates. The conversion_date column is NULL if the user hasn’t purchased.

notification_deliveries table

Column Type
notification varchar
user_id int
created_at datetime

users table

Column Type
id int
created_at datetime
conversion_date datetime

Write a query to get the distribution of total push notifications before a user converts.

If we’re looking for the distribution of total push notifications before a user converts, we can evaluate that we want our result to look something like this:

total_pushes frequency
0 100
1 250
2 300

To get there, we have to follow a couple of logical conditions for the JOIN between users and notification_deliveries

We have to join the user_id field in both tables. We have to exclude all users that have not converted. To get all notifications sent to the user, we have to set the conversion_date value as greater than the created_at value in the delivery table. We know this has to be a LEFT JOIN additionally to get the users that converted off of zero push notifications.

We can get the count per user and then group by that count to get the overall distribution.

LinkedIn Statistics and Probability Interview Questions

1. A deck of 500 cards are shuffled randomly, numbered from 1 to 500. If you are asked to pick three cards, one at a time, what’s the probability of each subsequent card being larger than the previously drawn card?

Hint: Let’s say the question is 100 cards, and you select 3 cards without replacement. Does the answer change? Imagine this as a sample space problem ignoring all other distracting details. If you have to draw three different numbered cards without replacement, and they are all unique, then we assume that there will be effectively the lowest, middle, and high cards.

Let’s make it easy and assume we drew the numbers 1, 2, and 3. In our scenario, if we drew (1, 2, 3), that would be the winning scenario. But what’s the full range of outcomes we could draw?

LinkedIn Machine Learning Interview Questions

1. You’re working on a job recommendation engine. You have access to all user LinkedIn profiles, a list of jobs each user applied to, and answers to questions that the user filled in about their job search.

Using this information, how would you build a job recommendation feed?

For this problem, we have to understand what our dataset consists of before being able to build a model for recommendations. More importantly, we need to understand what a recommendation feed might look like for the user.

For example, we’re expecting that the user could go to a tab or open up a mobile app and then view a list of recommended jobs sorted by highest recommended at the top.

We can either use an unsupervised or supervised model. For an unsupervised model, we could use the nearest neighbors or a collaborative filtering algorithm off of features from users and jobs. But if we want more accuracy, we would likely go with a supervised classification algorithm.

If we use a supervised model, we need to understand our training dataset in the form of features and a metric output label (whether the user applied or not).

The expected result is that for each user, we will have user feature data in the form of their user-profiles and user activity data by extracting information from questions that they have answered. Additionally, we’ll have all of the jobs that the user applied. What we’re missing is the data on jobs that the user did not apply to.

More LinkedIn Data Scientist Interview Questions below:

Ask Chance
Very High
Product Metrics
Very High
Very High

View all Linkedin Data Scientist questions

Check out Interview Query's SQL Course

This course is designed to help you learn everything you need to know about working with data, from basic concepts to more advanced techniques.

LinkedIn Data Scientist Salary


Average Base Salary


Average Total Compensation

Min: $106K
Max: $200K
Base Salary
Median: $126K
Mean (Average): $137K
Data points: 251
Min: $133K
Max: $381K
Total Compensation
Median: $250K
Mean (Average): $249K
Data points: 68

View the full Data Scientist at Linkedin salary guide

LinkedIn Data Scientist Jobs

👉 Reach 100K+ data scientists and engineers on the #1 data science job board.
Submit a Job
Senior Data Scientist Causal Inference
Senior Staff Data Scientist Flagship Data Science
Senior Staff Data Scientist Flagship Data Science
Senior Staff Data Scientist Flagship Data Science
Staff Data Scientist Inference Algorithms
Senior Software Engineer Applications
Sr Product Manager Account Abuse
Staff Software Engineer Storage Engineering
Senior Software Engineer Ml Systems Foundational Ai
Software Engineer Systems Infrastructure