LinkedIn Data Scientist Role
LinkedIn’s Data Science team leverages billions of data points to empower member engagement, business growth, and monetization efforts. With over 500 million members around the world and a mix of B2B and B2C programs, the data science team has a huge impact in determining product direction. The data science job at LinkedIn is generally focused on the business side rather than engineering and the data science role functions more like a product analyst and analytics job at many other companies.
There are four levels within the data science team at LinkedIn.
Senior Data Scientist
Staff Data Scientist
Principal Data Scientist
Each has its own leveling system which determines pay and compensation bands. Regular data scientists or associate data scientists at LinkedIn are generally candidates with less than two years of experience. Principal data scientists generally have almost 10+ years of experience.
- Work with a team of high-performing data science professionals, and cross-functional teams to identify business opportunities, optimize product performance or go to market strategy.
- Analyze large-scale structured and unstructured data; develop deep-dive analyses and machine learning models to drive member value and customer success.
- Design and develop core business metrics, create insightful automated dashboards and data visualization to track them and extract useful business insights.
- Design and analyze experiments to test new product ideas or go to market strategies. Convert the results into actionable recommendations.
The LinkedIn data scientist interview process is relatively straight-forward. Recruiters at LinkedIn like to dog-food their own product so they will likely send you a message or Inmail through LinkedIn to contact you about the data science role. The recruiter will schedule a 30-minute phone screen to talk to you about LinkedIn to understand your interests in the company and see if the role is a good fit.
Given how business facing the LinkedIn data science role is, it might so happen that after the phone call they may transfer you to another recruiter on the machine learning team. The data science team does have a more algorithmic and engineering heavy track but it’s only a small subset of LinkedIn’s total data science team.
The Technical Screen
The initial technical screen consists of two separate phone interviews each lasting between 30 to 45 minutes long.
One interview is more technical focused and specializes in testing concepts on SQL and data processing. The other is more of a business and product case study testing your critical thinking skills. Depending on how your interview is structured, either interview could be the first one of the two. However, you are not guaranteed both interviews if you do poorly on one of them. Both interviewers are also going to be employees on the LinkedIn data science team leaving ample time at the end to ask questions.
Business and Product Case Study
For the LinkedIn phone screen with the product and business case study, the interviewer will gauge how you solve business questions and problems, as well as how creative and articulate you are at thinking through these problems while solving them. This technical screen portion of the interview involves product intuition and analytics based questions.
It’s helpful to spend time engaging with LinkedIn products as someone who is tasked with improving or developing the products as a data scientist.
A good example is understanding what LinkedIn’s business objectives are. How would you design features and analyze data to come to actionable conclusions? At its core, the interviewer will ask you how to design a feature for LinkedIn and how to then analyze its performance. Be ready to quantifiably measure the success of the feature.
To prepare for this part of the interview, put yourself in the shoes of the product team who built the product or features and ask questions like:
- What could be done to improve the product?
- What kind of metrics you’d want to consider when solving for questions around health, growth, or the engagement of a product?
- How would you measure the success of different parts of the product?
- What metrics would you assess when trying to solve business problems related to our products?
- How would you tell if a product is performing well or not?
- How would you set up an experiment to evaluate any new products or improvements?
It’s also important to layout structure on how to answer the question. Make sure to coalesce your thoughts all in one place and organize your answer to thoughtfully explain how you’re investigating each problem.
- How would you grow LinkedIn messaging?
- LinkedIn wants to release a new auto-complete messaging feature in Inmail. How would you measure the success of the feature?
SQL and Data Processing
In the SQL and data processing portion you’ll be given a series of data processing questions that will be more and more complex. It is highly suggested to solve these questions in SQL.
An example for the first question would be to write a simple select statement with a where condition (SELECT * FROM table WHERE column = value). Then each question begins to get more complex using the same tables given for the initial problem with sub-queries, complex joins, and conditional statements. Following the initial question, the interviewer will likely add three additional modifications into the initial one to make it more complex.
Given a table representing users, write a query to get the count of users that signed up for LinkedIn in January 2020.
Given two additional tables representing jobs and job applications, write a query to get the most applied to job position this year.
It’s important to brush up on these concepts in SQL and product before the interview! You can find many of these questions on Interview Query specifically around data science specific SQL and product analytics problems.
The Onsite Interview
The onsite interview at LinkedIn is a full day, five hour interview, meeting with five plus different employees across the company. Each interview may have more than one person on it as well. The interviews are around 45 minutes long and leave ample time for you to ask questions at the end.
- Probability and statistics interview
- Data science manager interview / behavioral
- Data manipulation
- Product analytics problem solving
- PM partnership and product sense
- You’ll also spend 1:1 time with a Data Scientist during a lunch break to learn more about LinkedIn. This is usually a one hour lunch interview that they’ll let you take a break or talk through what they work on at LinkedIn
Expect advanced probability and statistical topics to come up in the first round. The product case round is where they give you a use case study with a general question to answer. You have to come up with different ways you would answer this question. The PM round is similar to more of an A/B testing question with a focus on metrics and how you would validate your metrics. The product case round is more extensive whereas the A/B testing round involves digging deeper into metrics like: how do you know if this metric is significant, etc.
- It’s important to learn a lot about the culture at LinkedIn. Understand how each team collaborates with each other given how far the breadth of the role is on the team.
- Data scientists generally oversee most of the products. To succeed at LinkedIn, it’s important to work on big projects and push them through. One of their best wins for growth was the “People You May Know” feature that was built by data scientists.
- How well you do in the interview is one of the main factors in determining the level of compensation at the end of the day. This means in all of your interviews it’s important to be able to structure your answers in a well-formatted and succinct way.
Sample LinkedIn Data Scientist Interview Questions
- What's your favorite kernel function?
- What’s the difference between l1 and l2 regularization and why would you use each?
- How do you generate a uniform number using a nonuniform distributed function?
- Let’s say we’re given a dataset of page views where each row represents one page view. How would you differentiate between scrapers and real people?
- Write a function to sample from a multinomial distribution.
- Due to engineering constraints, the company can’t AB test a feature before launching it. How would you analyze how the feature is performing?