Introduction

A product analyst is like a hybrid role between a data analyst and a business analyst. At its core, product analysts work cross-functionally with other teams to bring new products to the market by using data to drive the teams decision making.

The average product analyst interview goes over four to five main subjects. We analyzed over 10K interview experiences to find out what questions were most often asked in product analyst interviews.

Here's a breakdown on what you can expect:

  • Take-home exercises are asked extremely often during the product analyst interview process.
  • Product focused case studies: These are questions about how you would evaluate new features and improve existing products.
  • SQL questions: You should be able to query and manipulate large databases.
  • A/B Testing and Experimentation: Running experiments and understanding tradeoffs.
  • Statistics & Probability: You should know how to frame and conduct hypothesis tests, and more specifically, A/B tests.
  • Python / R: For a role or company that relies more on code - you should know how to manipulate data using Python or R and write scripts or create analyses.
  • Behavioral Section: Focusing on how to answer project, resume, and questions about interacting with cross-functional teams.

Product case study questions

Q1. How would you determine why the number of comments per user is decreasing at a social media company for the last three months? (Asked by Pinterest)

Let's model out an example scenario to help us see the data.

Jan: 10000 users, 30000 comments, 3 comments/user
Feb: 20000 users, 50000 comments, 2.5 comments/user
Mar: 30000 users, 60000 comments, 2 comments/user

We're given information that total user count is increasing linearly which means that the decreasing comments/user is not an effect of a declining user base creating a loss of network effects on the platform.

What else can we hypothesize then?

Read more on the question decreasing comments.

Q2. What’s the first change that you would make to product X? / What is the first new feature that you would add to product X?

If the interviewer asks this question, he or she is looking to see that you’ve done your research on the company's product.

  • Do you understand what the product is and the product features?
  • Do you know who the target audience of the product is?

You should understand the problems that the product solves and speak about how a particular change or additional feature will enhance the solution or address a related problem. Know the customers, know the product, and know the company’s overarching objective.

Q3. What metrics would you use to evaluate the performance of a particular feature?

There are a number of metrics that you can use to evaluate the performance of a product feature — the specific metric that you choose ultimately depends on the objective of the feature and the objective of the company overall.

For example, mature companies may solely be focused on reducing churn rates, and thus, it should be the desired metric used to evaluate a feature. Sometimes, however, a product feature is not always aligned with the company’s main objective at the time.

For example, a company’s main objective is to increase conversions, but they may have a feature that is focused on retention. Coursera is an example of this, as they implemented a feature that offers users a 50% discount before users attempt to cancel any subscription.

Overall, there are a number of metrics that you can consider, including but not limited to:

  • Lifetime value (LTV)
  • Customer Acquisition Cost (CAC)
  • Conversions
  • Monthly Active Users
  • Session duration
  • Retention rate
  • Net Promoter Score (NPS)
For more reading - check out our course on analyzing product metrics and data.

SQL Interview Questions

Q4: What is the difference between the WHERE and HAVING clause? (Asked by Amazon)

Both WHERE and HAVING are used to filter a table to meet the conditions that you set. The difference between the two is shown when they are used in conjunction with the GROUP BY clause. The WHERE clause is used to filter rows before grouping (before the GROUP BY clause) and HAVING is used to filter rows after grouping.

Q5: What are the different types of joins? Explain them each.

There are four different types of joins:

  1. Inner join: Returns records that have matching values in both tables
  2. Left join: Returns all records from the left table and the matched records from the right table
  3. Right join: Returns all records from the right table and the matched records from the left table
  4. Full join: Returns all records when there is a match in either left or right table

Q6: How do you create a histogram using SQL? (Asked by Facebook)

For example, let's say you wanted to create a histogram to model the number of comments per user in the month of January 2020.

A histogram with bin buckets of size one means that we can avoid the logical overhead of grouping frequencies into specific intervals.

For example, if we wanted a histogram of size five, we would have to run a SELECT statement like so:

SELECT
    CASE WHEN frequency BETWEEN 0 AND 5 THEN 5 
        WHEN frequency BETWEEN 5 AND 10 THEN 10 etc..

Here's the solution to comments histogram

Q7: Write a SQL query to find all duplicate emails in a table

+----+---------+
| Id | Email   |
+----+---------+
| 1  | [email protected] |
| 2  | [email protected] |
| 3  | [email protected] |
+----+---------+

For this question we have to figure out that both id 1 and 3 are the same. The easiest way to do so is to apply a GROUP BY to the email column and then count the number of values that get returned.

If the number of the count is greater than one, then we know for a fact that the number of duplicates exceeds 1.

SELECT Email
FROM Person
GROUP BY Email
HAVING count(Email) > 1

A/B Testing and Experimentation

Q8: Let's say that your company is running a standard control and variant AB test on a feature to increase conversion rates on the landing page. The PM checks the results and finds a .04 p-value. How would you assess the validity of the result?

Let's start out by asking some clarifying questions here:

  • What details is the interviewer leaving out of the question?
  • Are there more assumptions that we can make about the context of how the AB test was set up and measured that will lead us to discovering invalidity?
  • What rephrasing of the question would help us understand more about the problem at hand?

For example - what if the question was: How do you set up and measure an AB test correctly?

How does this change the result?

Q9: How do you assess the statistical significance of an A/B test?

You would perform hypothesis testing to determine statistical significance.

  1. You would state the null hypothesis and alternative hypothesis.
  2. You would calculate the p-value, the probability of obtaining the observed results of a test assuming that the null hypothesis is true.
  3. You would set the level of the significance (alpha) and if the p-value is less than the alpha, you would reject the null — in other words, the result is statistically significant.

Statistics and Probability Questions

Q10: Let's say we have a sample size of N. The margin of error for our sample size is 3. How many more samples would we need to decrease the margin of error to 0.3? (Asked by Google)

Here's a hint - the equation for a margin of error is:

Read the rest of the solution here

Q11: What is an outlier? Explain how you might screen for outliers and what would you do if you found them in your dataset. Also, explain what an inlier is and how you might screen for them and what would you do if you found them in your dataset.

An outlier is a data point that differs significantly from other observations.

Depending on the cause of the outlier, they can be bad from a machine learning perspective because they can worsen the accuracy of a model. If the outlier is caused by a measurement error, it’s important to remove them from the dataset. There are a couple of ways to identify outliers:

Z-score/standard deviations: if we know that 99.7% of data in a data set lie within three standard deviations, then we can calculate the size of one standard deviation, multiply it by 3, and identify the data points that are outside of this range. Likewise, we can calculate the z-score of a given point, and if it’s equal to +/- 3, then it’s an outlier.
Note: that there are a few contingencies that need to be considered when using this method; the data must be normally distributed, this is not applicable for small data sets, and the presence of too many outliers can throw off z-score.

Interquartile Range (IQR): IQR, the concept used to build boxplots, can also be used to identify outliers. The IQR is equal to the difference between the 3rd quartile and the 1st quartile. You can then identify if a point is an outlier if it is less than Q1–1.5*IRQ or greater than Q3 + 1.5*IQR. This comes to approximately 2.698 standard deviations.

Q12: Amy and Brad take turns in rolling a fair six-sided die. Whoever rolls a "6" first wins the game. Amy starts by rolling first. What's the probability that Amy wins?

Let's set some definitions.

pA = Probability that Amy wins
pB = Probability that Brad wins.

Note that pA = P[win if go first].

So we can then deduce that Brad's probability of winning then becomes the probability of going first after Amy loses the first rol. We can represent that with this equation of: pB = P[Amy loses first roll] * P[win if go first].

We also know that the probabilities of either Amy or Brad winning should add up to 1. So mathematically we can create two equations: pB = 5/6 * pA and pA + pB = 1.

Here's the rest of the solution

Q13: Given two fair dices, what is the probability of getting scores that sum to 4? to 8?

There are 4 combinations of rolling a 4 (1+3, 3+1, 2+2):
P(rolling a 4) = 3/36 = 1/12

There are combinations of rolling an 8 (2+6, 6+2, 3+5, 5+3, 4+4):
P(rolling an 8) = 5/36

Python / R Interview Questions

Q14: Write a function that can take a string and return a list of bigrams.

Q15: Write a function to generate N samples from a normal distribution and plot the histogram.

Q16: We're given a string of integers that represent page numbers.

Write a function to return the last page number in the string. If the string of integers is not in correct page order, return the last number in order.

input = '12345'
output = 5

input = '12345678910111213'
output = 13

input = '1235678'
output = 3
Read more on Python interview questions asked in analytics interviews


Behavioral Interview Questions

Q17: What are your favorite data visualization techniques?

Q18: What are some of your favorite products and why?

Q19: What would you do if the product manager on your team gave you some slightly unclear requirements for their next projects?

Q20: How would you determine data quality given a dataset provided to you by your data engineer?

If you're interested in more product analyst practice problems, check out Interview Query for our product analyst guides, interview questions, and data science community!