Google data science interviews include both behavioral and technical problems, covering a range of topics. Most Google behavioral questions, for instance, are centered around how well the candidate fits in with Google's work culture. On the other hand, Google's technical data science questions span multiple areas, including statistics, machine learning, coding and product sense.
Prep for your Google interview with our Data Science course, featuring modules in all technical areas - including stats, SQL and machine learning - as well as helpful practice interview questions.
What Types of Questions Get Asked in Google Data Science Interviews?
Onsite interviews for Google data science positions are demanding. The panel interview typically consists of five 45-minute interviews with various teams, and you'll be assessed on your data science knowledge in a range of areas. The most common Google data science interview question topics include:
- Behavioral Questions - These questions are designed to assess your "Googlyness," e.g. how well you work with others, how you can navigate workplace ambiguity, and how well you can work under pressure.
- Statistics & Probability - A strong emphasis is placed on statistics and probability questions in Google interviews. These questions assesses your ability to explain complex statistical terms and perform statistical coding.
- Product Sense - As a data scientist at Google, you'll be tasked with using your technical knowledge to solve product and business problems. These questions assess your ability to generate insights that can be used to improve Google products.
- Coding - Google data scientists use coding every day to mine datasets and generate insights. As such, you'll likely be asked SQL, data analysis and Python coding questions.
- Machine Learning - Google's data science teams use machine learning principles to build and improve algorithms. These questions will assess your machine learning knowledge as it applies to algorithms.
Google Behavioral Interview Questions
Behavioral interview questions usually occur during the recruiter screen and throughout the onsite Google interview. These types of questions are designed to assess your ability to think on your feet, whether you are the right culture fit for Google, and your ability to communicate ideas.
Q1. Why Google?
Provide concrete examples of what interests you in a job at Google. You might talk about your love for Google's data science culture or how the company encourages employees to continuously learn and expand their skills.
Q2. Describe a past data science project you worked on.
Hint: Tell the interviewer why the project was successful. Provide any metrics and positive change you were able to bring about.
Q3. How do you prioritize tasks when working on many different projects?
Q4. What career goals do you have? How do you plan to achieve them?
Q5. Do you have a favorite Google product? What do you love about it?
Again, researching Google's products before the interview is an absolute must. Be sure you can talk confidently about the majority of the company's product offerings; but also have 2-3 products that you know in-depth.
Q6. Describe a time when a project you were working wasn't successful. What did you learn?
Hint: Questions like these can be intimidating. Don't be afraid to be honest. But also explain how you apply what you have learned as you approach a new project.
Google Machine Learning Questions
During Google data science interviews, a variety of machine learning questions come up. They might be basic definition-based questions about regression models or feature selection, all the way up to advanced algorithm questions.
Looking for machine learning resources? Check out Interview Query's Modeling & Machine Learning and Machine Learning Systems Design courses.
Q1. What is the difference between K-mean and EM?
Q2. Let's say you have a categorical variable with thousands of distinct values, how would you encode it?
Hint: Does this depend on whether the problem is asking about a regression or a classification model?
Say it's a regression model. One way we could tackle this problem would be to cluster features based on the response variable by working backwards.
Q3. What is the function of p-values in high dimensional linear regression?
Q4. How would you build the recommendation algorithm for type-ahead search for a media company like Netflix?
Hint: We can begin to think of the solution in the form of a prefix table. How a prefix table works is that your prefix, that is your input string, outputs your output string, one at a time to start with. For an MVP, we could input a string and output a suggestion string with added fuzzy matching and context matching.
Q5. Given two strings A and B, return whether or not A can be shifted some number of times to get B.
A = 'abcde' B = 'cdeab' can_shift(A, B) == True A = 'abc' B = 'acb' can_shift(A, B) == False
Hint: This problem is relatively simple if we figure out the underlying algorithm that allows us to easily check for string shifts between strings A and B.
Google Statistics & Probability Questions
These questions are featured prominently in Google onsite panel interviews. To best prepare, make sure you have a strong grasp of statistical concepts, and know how to perform statistical coding with a tool like Python.
Need some help? Check out our Python statistical coding guide. Or our statistics and A/B testing practice problems.
Q1. Let's say we have a sample size of N. The margin of error for our sample size is 3. How many more samples would we need to decrease the margin of error to 0.3?
Hint: In order to decrease our margin of error, we'll probably have to increase our sample size. But by how much?
Q2. What is the assumption of error in linear regression?
Q3. Let's say we use people to rate ads. There are two types of raters. Random and independent from our point of view:
- 80% of raters are careful and they rate an ad as good (60% chance) or bad (40% chance).
- 20% of raters are lazy and they rate every ad as good (100% chance).
1. Suppose we have 100 raters each rating one ad independently. What's the expected number of good ads?
2. Now suppose we have 1 rater rating 100 ads. What's the expected number of good ads?
3. Suppose we have 1 ad, rated as bad. What's the probability the rater was lazy?
Hint: Keep in mind that in order for the rater to rate an ad, the rater must first be selected. So the event that the rater is selected happens first, then the rating happens. How would you represent this fact arithmetically using basic properties of probability?
Q4. What are the assumptions of error in linear regression?
There are several assumptions of linear regression. These assumptions are baked into the dataset and how the model is built. Otherwise if these assumptions are violated, we become privy to the phrase "garbage in, garbage out."
Q5. Explain how a probability distribution could be not normal and give an example scenario.
Hint: Think about things that generally have a normal distribution. Are there other things that we might want to measure that might not be similar to those things? Normal distributions generally measure things like size, mass, content, but what about measures like time, random-number generators, or likelihood?
Q6. You flip a fair coin 576 times. Without using a calculator, calculate the probability of flipping at least 312 heads.
Hint: What sort of probability distribution should we use to model experiments with only two outcomes?
Q7. What is the difference between parametric and non-parametric testing?
Q8. Let's say you have a function that outputs a random integer between a minimum value, N, and maximum value, M.
Now let's say we take the output from the random integer function and place it into another random function as the max value with the same min value N.
1. What would the distribution of the samples look like?
2. What would be the expected value?
Hint: This question asks you about two different random variables, one of which is conditional on the result of the other. How would you model this relationship?
Q9. You have a deck and you take one card at random and guess what the card is. What is the probability you guess right?
Google Product & Business Case Questions
Your interview will likely include business and product case study questions. You'll dive deep on product or business metrics, and be tasked with proposing solutions, analyzing the success of a feature, and measuring results.
Need some help with case interview questions? Check out our guides for data science case study interviews and product data science interviews.
Q1. How would you detect inappropriate content on YouTube?
Q2. You are a data scientist at YouTube focused on creators. A PM comes to you worried that amateur video creators could do well before but now it seems like only “superstars” do well.
What data points and metrics would you look at to decide if this is true or not?
Hint: With questions like these, try to rephrase it as a hypothesis. What hypothesis could you draw from the information provided?
Q3. How would you investigate a 10% drop in usage on Google Docs?
Hint: The first step in product case questions is to clarify the question. With this example, you would want some clarity on the type of drop (e.g. time on page, storage, etc.), as well as the timeframe for the usage drop.
Q4. Let's say we're given a dataset of page views where each row represents one page view.
How would you differentiate between scrapers and real people?
Hint: Modeling-based theoretical questions are meant to assess whether you can make realistic assumptions and come up with a solution under these assumptions.
Q5. How do you test if a new feature has increased engagement in Google's ecosystem?
Q6. If the outcome of an experiment results in one group clicking 5% than the other, is that a good result?
Hint: Always ask for clarity. With a question like this, we'd need more information to answer.
Q7. Let's say that your company is running a standard control and variant A/B test on a feature to increase conversion rates on the landing page. The PM checks the results and finds a .04 p-value.
How would you assess the validity of the result?
Hint: What is the interviewer leaving out, and how might we rephrase the question for clarity? We could likely re-phrase the question to: How do you set up and measure an AB test correctly?
Q9. Given three random variables independent and identically distributed from a uniform distribution of 0 to 4, what is the probability that the median is greater than 3?
If we break down this question, we'll find that another way to phrase it is to ask what the probability is that at least two of the variables are larger than 3. For example, if look at the combination of events that satisfy the condition, the events can actually be divided into two exclusive events.
- Event A: All three random variables are larger than 3.
- Event B: One random variable is smaller than 3 and two are larger than 3.
Given these two events satisfy the condition of the median > 3, we can now calculate the probability of both of the events occurring. The question can now be rephrased as P(Median > 3) = P(A) + P(B).
Let's calculate the probability of the event A. The probability that a random variable > 3 but less than 4 is equal to 1/4. So the probability of event A is:
P(A) = (1/4) * (1/4) * (1/4) = 1/64
The probability of event B is that two values must be greater than 3, but one random variable is smaller than 3. We can calculate this the same way as the calculating the probability of A. The probability of a value being greater than 3 is 1/4 and the probability of a value being less than 3 is 3/4. Given this has to occur three times we multiply the condition three times.
P(B) = 3 * ((3/4) * (1/4) * (1/4)) = 9/64
Therefore the total probability is P(A)+P(B) = 1/64 + 9/64 = 10/64
Check out the Interview Query Statistics course for more practice with statistical concepts and coding.
Google Programming Interview Questions
At Google, data scientists work with vast datasets, and are tasked with using coding to generation insights and solutions. Typically, statistical coding (with a tool like Python), SQL queries and algorithmic coding are all covered in Google interviews for data science positions.
Q1. Write a function to generate N samples from a normal distribution and plot the histogram.
Hint: This is a relatively simple problem because we have to set up our distribution and then generate N samples from it which are then plotted. In this question, we make use of the SciPy library which is a library made for scientific computing.
Q2. You’re given two dataframes. One contains information about addresses and the other contains relationships between various cities and states. Write a function to create a single dataframe with complete addresses in the format of street, city, state, zip code.
Tip: Follow the link to find the relevant data on Interview Query for this question.
Q3. You are given the layout of a rectangular building with rooms forming a grid. Each room has four doors to the room to the north, east, south, and west where exactly one door is unlocked and the other three doors are locked. In each time step, you can move to an adjacent room via an unlocked door.
Your task is to determine the minimum number of time steps required to get from the northwest corner to the southeast corner of the building.
The input is given as:
- a non-empty 2d-array of letters 'N', 'E', 'S', 'W' named 'building'
- 'building' represents the open door at the northwest corner.
- The rows of this array are associated with north-south direction.
- The columns are associated with east-west direction.
Expected Output: 6
Q4. Given a percentile threshold and N samples, write a function to simulate a truncated normal distribution.
threshold = 0.75 n = 6 truncated_dist(n, percentile_threshold)
# with mean of 2 and std deviation of 1 output = [2, 1.1, 2.2, 3, 1.5, 1.3]
Q5. Let's say you're given a list of standardized test scores from high schoolers from grades 9 to 12.
Given the dataset, write code in Pandas to return the cumulative percentage of students that received scores within the buckets of <50, <75, <90, <100.
Q6. The schema below is for a retail online shopping company consisting of two tables, attribution and user_sessions.
- The attribution table logs a session visit for each row.
- If conversion is true, then the user converted to buying on that session.
- The channel column represents which advertising platform the user was attributed to for that specific session.
- Lastly the `user_sessions` table maps many to one session visits back to one user.
First touch attribution is defined as the channel to which the converted user was associated with when they first discovered the website.
Calculate the first touch attribution for each user_id that converted.
| column | type | | ----------- | -------- | | session_id | integer | | created_at | datetime | | user_id | integer |
| column | type | | ----------- | --------------- | | session_id | integer | | created_at | datetime | | user_id | integer |
Looking for more SQL practice problems? Prep for your Google coding interview with our SQL Interview Questions guide.
How Google's Onsite Panel Interview Process Works
Typically, Google onsite interviews consist of five 45-minute panels, where you'll meet with various teams. These interviews are very product-focused, so as you prep, be sure you have strong knowledge of Google's product suite.
The data science interview panels at Google look like this:
- Business case study: Questions involve real-life Google problems. The interviewer may ask you to then write a query to analyze the business case study using SQL.
- Applied statistics and ML interview: This interview covers statistical concepts and modeling questions as well as a related coding question.
- Product Metrics: This interview will be a deep dive into a product and how to analyze success of a feature or debug what might be happening in the data.
- Leadership and Product Sense Interview: This interview assesses your leadership skills. The aim here is to understand how you leverage your communication and decision-making to influence others.
- Googlyness Interview: This interview is basically about how well you work with others, help team members achieved team goals, how you can navigate workplace ambiguity, and how well you can work under pressure
How to Prepare for Your Google Interview
Plan to spend plenty of time preparing for the rigorous interview process. You should plan to brush up on any technical skills, as well as try as many practice interview questions and mock interviews as possible. A few tips for acing your Google interview include:
- Know Your Google Products: It is of worth to note that Google questions are standardized and rely heavily on situational scenarios with their products. Study Google's large breadth of products and understand how you would personally improve or test them.
- Be Data Driven: Google’s data science interview assess how well you can provide business-driving insights with data science. Brush up on your knowledge of statistics and probability, given these questions can be some of the hardest to solve.
- Embody the Spirit: Google at its core has an employee-focused culture. It has a corporate culture that motivates employees to share information cross-functionally to support innovation that enables it to maintain its competitiveness. This ecosystem ensures that every employee maintains competitiveness and innovativeness through training and informally through personalized leadership and management support.
What Google Is Looking For In Data Science Interviews
There are four general attributes that Google looks for in candidates.
- First is the general cognitive ability, which screens based on how candidates can learn and adapt to new situations.
- The second is role-related knowledge which is based on background, skillsets, and experience that are specific and relevant to the roles.
- The third is the leadership attribute. Google’s core culture is about building a team of high performers individuals who are great team players and can one day step into leadership roles.
- The fourth and last attribute is the Googlyness, to ensure candidate succeed in their roles. Google assesses on “comfort with ambiguity,” “bias to action,” and a “collaborative nature.”
Thanks for Reading!
We have plenty of great resources to help you prep for your Google interview. Check out Google interview experiences for data science from our members. And be sure to see our Google Product Analyst interview guide.