If you’re applying for a data scientist position at almost any company, A/B testing and statistics questions are bound to come up in the interview process. Statistics and A/B testing are some of the most fundamental questions asked in data science and analytics interviews. Today we’ll go into how to approach, solve, and prepare for different statistics and A/B testing interview questions.
Need to brush up on your statistics and A/B testing skills? Check out the A/B Testing and Statistics course from Interview Query.
Types of A/B Testing & Statistics Interview Questions
In general, interview questions on A/B testing and statistics will test your practical knowledge of these concepts, as well as encourage you to solve practice problems. Here's a closer look at the types of A/B testing and stats questions you'll likely be asked:
The A/B Testing Case Study Question - This type of A/B testing interview question revolves around a hypothetical A/B testing scenario. The goal is to evaluate candidates based on their practical working knowledge of how to design a functional A/B test. You may be given a specific set of two (or more) features that a business wants to compare in an A/B test, then be asked how you would go about setting up the test, account for confounding variables, and measure the significance of your results.
Statistical Concept Interview Questions - This type of question tests two things:
- Your conceptual knowledge of statistics
- Your ability to communicate statistical information to a layperson
Probability and Statistics Questions - These questions are simply meant to assess your core capabilities as a statistician and tend to be pretty straightforward with a definite right or wrong answer. You might be asked to compute mean and variance in a non-normal distribution, or to compute the relationship between a sample size and its margin of error. There’s no limit to the number of questions or their variety.
Looking at what math skills you need for data science jobs? Take a look at our recent guide: Essential Math for Data Science.
A/B Testing: Sample Interview Questions
You'll want to approach A/B testing case study questions with a clear understanding of the process of A/B test experiment design. Therefore, you want to take into account all of the different factors that should be considered when designing an A/B test. The more that you can demonstrate a thorough understanding of the scope of the problem, the more attractive you'll seem as a candidate. Some sample split testing interview questions include:
Q1. What types of questions would you ask when designing an A/B test?
This type of question is assessing your foundational knowledge of A/B test design. You don't want to jump in too quickly without first getting an understanding of the problem the test aims to solve. Some questions you might ask include:
- What does our sample population look like?
- Have we taken steps to ensure that our control and test groups are truly randomized?
- Is this a multivariate A/B test? If so, how does that affect the significance of our results?
Q2. How would you approach designing an A/B test? What factors would you consider?
In general, you should start with understanding what you want to measure, from there, you can begin to design and implement a test. There are four key aspects to consider:
- Setting Metrics - A good metric is simple, directly related to the goal at hand, and quantifiable. Every experiment should have one key metric that determines whether the experiment was a success or not.
- Constructing Thresholds - Determine by what degree your key metric must change in order for the experiment to be considered successful.
- Sample Size and Experiment Length - how large of a group are we going to test on and for how long?
- Randomization and Assignment - who gets which version of the test and when? We need at least one control group and one variant group. As the number of variants increases, the number of groups that we need increase too, which is something to consider for multivariate testing.
Q3. What significance level would you target in an A/B test?
Typically, the significance level of an experiment is 0.05 and the power is 0.8, but these values may shift depending on how much change must be detected in order to implement the design change, which can be related to external factors, like the time needed to implement the change once the decision has been made.
A p-value of less than 0.05 strongly indicates that your hypothesis is correct and the results aren't random.
Q4. Let's say that your company is running a standard control and variant AB test on a feature to increase conversion rates on the landing page. The PM checks the results and finds a .04 p-value.
How would you assess the validity of the result?
Hint: Is the interviewer leaving out important details? Are there more assumptions that we can make about the context of how the AB test was set up and measured that will lead us to discovering invalidity?
Q5. Let's say you work at Uber. A PM comes to you considering a new feature where instead of a direct ETA estimate like 5 minutes, would instead display a range of something like 3-7 minutes.
How would you conduct this experiment and how would you know if your results were significant?
Q6. What are some common reasons A/B tests fail?
There are many different scenarios where bucket testing won't reach statistical significance or the results end up unclear. Here are some reasons you might avoid A/B testing:
- Not enough data - A statistically significance sample size is key for an effective A/B test. If a landing page isn't receiving enough traffic, you likely won't have a large enough sample size for an effective test.
- Your metrics aren't clearly defined - An A/B test is only as effective as its metrics. If you haven't clearly defined what you're measuring or your hypothesis can't be quantified, your A/B test will be a muddled mess.
- Testing too many variables - Trying to test too many variables in a single test can lead to unclear results.
Q6. How long should an A/B test run for?
Experiment length is a function of sample size, since you’ll need enough time to run your experiment on X users per day until you reach your total sample size. However, time introduces variance into an A/B test; there may be factors present one week that aren’t present in another, like holidays, or weekdays vs. weekends.
The rule of thumb is to run your experiment for about two weeks, provided you can reach your required sample size in that time. Most split tests run for 2-8 weeks. Ultimately, the length of the test depends on many factors like traffic volume and the variables that are being tested.
Q7. What are some alternatives to A/B testing? When is an alternative the better choice?
If you're looking for an alternative to A/B testing, there are two common tests that are used to make UI design decisions. They include:
- A/B/N Tests - This type of test compares several different versions at once. (The N stands for "number," e.g. the number of variations being tested.) This type of test is best for testing major UI design choices.
- Multivariate - This type of test compares multiple variables at once, e.g. all the possible combinations that can be used. Multivariate testing saves time, as you won't have to run dozens of A/B tests. This type of test is best when several UI design changes are being considered.
Q8. What metrics might you consider in an A/B test?
In general, there are many different metrics you might consider in an A/B test. But some of the most common are:
- Impression count
- Conversion rate
- Click-through rate (CTR)
- Button hover time
- Time spent on page
- Bounce rate
Which one should you use? That's all based on your hypothesis and what you're testing. If you're testing a button variation, Button Hover Time or CTR are probably the best choices. But if you're testing messaging choices on a long-form landing page, time spent on page and bounce rate would likely be the best metrics to consider.
Q9. What are some of the common types of choices you can test with A/B testing?
In general, A/B testing works best at informing UI design changes, as well as with promotional and messaging choices. You might consider an A/B test for:
- UI design decisions
- Testing promotions, coupons or incentives
- Testing messaging variations (e.g. different headlines or calls-to-action)
- Funnel optimizations
Q10. What's the importance of randomization in split testing?
We need to make sure that we have a normal distribution of users with a wide variety of attributes to make sure the results of our A/B test are valid; if we don’t randomize sufficiently, we may find ourselves faced with confounding variables further down the line.
It also matters exactly when we’re giving our users an A/B test. For instance, are we giving every new user an A/B test? How will that affect our assessment of existing users? Conversely, if we’re assigning an A/B test to all users, and some of those users signed up for the website this week, and others have been around for much longer, are we sure that the ratio of new users to existing users is representative of the larger population of the site?
Finally, we want to make sure that our control group and our variant group are of equal size so that they can be easily (and accurately) compared at the end of the test.
Statistical Concepts Interview Questions
As a data scientist, you need to be able to perform complex analyses on large amounts of data and also communicate your findings to a number of external stakeholders. Statistical concept interview questions weed out candidates who don’t have a firm technical grasp of statistics, as well as candidates with statistics knowledge, but who struggle to communicate their findings to others.
You might be asked to describe the difference between Type I (false positive) and Type II (false negative) errors, as well as how to go about detecting them, or to describe what a result with a significance level of 0.05 actually means.
In general, if you’re familiar with the statistical concepts that are likely to be relevant to your job as a data scientist, this part of the interview should be pretty straightforward. Remember to keep your explanations simple and direct.
Q1. What is a null hypothesis?
There is no significant difference between populations that can’t be explained by chance or sampling error.
Q2. How would you explain p-value to someone who is unfamiliar with the term?
P-value and confidence interval are both concepts that come from statistics.
Hint: What an interviewer is looking for here is whether you can answer this question in a way that conveys both your understanding of statistics and the ability to explain this concept to a non-technical worker.
Say we're conducting an A/B test of an ad campaign. In this type of test, you have two hypotheses. The null hypothesis states that our ad campaign will not have a measurable increase in daily active users. The test hypothesis states that our ad campaign will have a measurable increase in daily active users.
We then use data to run a statistical test to find out which hypothesis is true. The p-value can help us determine this by giving us a probability that we would observe the current data if the null hypothesis were true. Note, this is just a statement about probability given an assumption, the p-value is not a measure of “how likely” the null hypothesis is to be right, nor does it measure “how likely” the observations in our data are due to random chance, which are the most common misinterpretations of what the p-value is. The only thing the p-value can say is how likely we are to have gotten the data we got if the null hypothesis were true. The difference may seem very abstract and not practical, but using incorrect explanations helps contribute to cult-like worship of p-values in non-technical circles.
Thus, a low p-value indicates that it would be extremely unlikely that our data would result in this way if the null hypothesis were true.
Q3. What are the primacy and novelty effects?
The primacy effect involves users being resistant to change, while the novelty effect involves users becoming temporarily excited by new things.
Q4. What are Type I and Type II errors?
A Type I error is a ‘false positive,’ or the rejection of a true null hypothesis. A Type II error is a ‘false negative,’ or the failure to reject a true null hypothesis.
Q5. What is the difference between covariance and correlation? Provide an example.
Hint: What values can covariance take? What about correlation?
Q6. What is a holdback experiment?
A holdback experiment rolls out a feature to a high proportion of users, but holds back the remaining percent of users to monitor their behavior over a longer period of time. This allows analysts to quantify the ‘lift’ of a change over a longer amount of time.
Q7. What is an unbiased estimator and can you provide an example for a layman to understand?
Hint: What would a biased estimator look like? How does an unbiased estimator differ?
Q8. What is the Central Limit Theorem?
The central limit theorem, or CLT, says that if we have a very large sample of independent variables, they will eventually become normally distributed.
Q9. What is a normal distribution?
Most people probably recognize this as the ordinary “bell curve” distribution, in which the relevant features of a given group are evenly distributed around the mean of the curve.
Probability and Statistics Practice Problems
In statistics interviews for data science positions, you'll likely be asked computation questions. These will test your knowledge of statistics and probability, and most of these questions will have a right or wrong answer.
When answering this type of question, make sure that you present not only a solution to the problem at hand, but also walk the interviewer through the thought process you used when arriving at your answer. Cite any relevant statistical concepts that you’re employing in your solution and, in general, make your response as intelligible as possible for the layperson. Here are some sample statistics interview questions:
Q1. Given uniform distributions X and Y and the mean 0 and standard deviation 1 for both, what’s the probability of 2X > Y?
Hint: Given that X and Y both have a mean of 0 and a standard deviation of 1, what does that indicate for the distributions of X and Y? What are some of the scenarios we can imagine when we randomly sample from each distribution? Write out each of the possibilities where X > Y and X < Y, as well as possible values of X and Y in each.
Q2. Given X and Y are independent variables with normal distributions, what is the mean and variance of the distribution of 2X - Y when the corresponding distributions are X ~ N (3, 2²) and Y ~ N(1, 2²)?
The linear combination of two independent normal random variables is a normal random variable itself.
How does this change how we solve for the mean of 2X-Y?
Hint: The variance of aX-bY depends on the constants a and b, Var(X)/Var(Y), and Cov(X,Y). If we realize that between independent random variables the covariance equals 0, how can we use this information for computing the variance?
Q3. Let's say we have a sample size of N. The margin of error for our sample size is 3. How many more samples would we need to decrease the margin of error to 0.3?
Recall the equation:
Hint: In order to decrease our margin of error, we'll probably have to increase our sample size. But by how much?
Q4. You're given a fair coin. You flip the coin until either Heads Heads Tails (HHT) or Heads Tails Tails (HTT) appears. Is one more likely to appear first? If so, which one and with what probability?
Hint: What do our two sequences, HHT and HTT have in common? Since the question asks us to flip a fair coin until one of our two sequences comes up, it may be useful to consider the problem given a larger sample space. What happens to the probabilities of each sequence appearing as we flip the coin four times? Five times? N times?
Q5. Three zebras are chilling in the desert when a lion suddenly attacks. Each zebra is sitting on a corner of an equally length triangle, and randomly picks a direction to run along the outline of the triangle to either edge of the triangle. What is the probability that none of the zebras collide?
Hint: How many scenarios are there in which none of the zebras collide? There are two scenarios in which the zebras do not collide: if they all move clockwise or if they all move counterclockwise. How do we calculate the probability that an individual zebra chooses to move clockwise or counterclockwise? How can we use this individual probability to calculate the probability that all zebras choose to move in the same direction?
Q6. Let's say that you're drawing N cards from a deck of 52 cards. Compute the probability that you will get a pair from your hand of N cards.
Hint: What's the probability of never drawing a pair?
Q7. What do you think the distribution of time spent per day on Facebook looks like? What metrics would you use to describe that distribution?
Having the vocabulary to describe a distribution is an important skill as a data scientist when it comes to communicating statistical ideas to your peers. There are four important concepts, with supporting vocabulary, that you can use to structure your answer to a question like this. They are:
- Center (mean, median, mode)
- Spread (standard deviation, inter quartile range, range)
- Shape (skewness, kurtosis, uni or bimodal)
- Outliers (Do they exist?)
In terms of the distribution of time spent per day on Facebook (FB), one can imagine there may be two groups of people on Facebook: a) People who scroll quickly through their feed and don’t spend too much time on FB, and b) People who spend a large amount of their social media time on FB.
Therefore, how could we describe the distribution using the four terms used previously.
Q8. Let's say you're trying to calculate churn for a subscription product. You noticed that out of all customers that bought subscriptions in January 2020, about 10% of them canceled their membership before their next cycle on February 1st.
If you assume that your new customer acquisition is uniform throughout each month and that customer churn goes down by 20% month over month, what's the expected churn for March 1st out of all customers that bought in January?
Q9. Imagine a deck of randomly-shuffled 500 cards numbered from 1 to 500. If you're asked to pick three cards, one at a time, what's the probability of each subsequent card being larger than the previous drawn card?
Hint: Let's say the question is actually 100 cards and you select 3 cards without replacement. Does the answer change?
Looking for more resources to help you practice statistics problems? Check out all of the real interview questions on Interview Query. Or see our guide 14 Probability Interview Questions for 2021 for more practice exercises.
Above, we’ve given you a bit of a primer for the challenges of the Statistics & A/B testing interview process. However, if you’re looking for even more resources to prepare yourself, we have a full Statistics & A/B Testing course available on Interview Query, along with a bank of real interview questions asked by real companies, ordered by date and difficulty.