A product analyst is a hybrid role between a data analyst and a business analyst. At its core, product analysts work cross-functionally with other teams to bring new products to the market by using data to drive decision making.
Product analyst interviews assess the unique skill set needed to pull this balancing act off. You can expect plenty of data analytics questions - namely SQL writing and statistics - as well as business/product sense questions. These interviews are designed to test your ability to use data in making sound product decisions.
Product analyst interviews vary by company. However, after analyzing more than 15,000 product analyst interview experiences, we know the most commonly asked topics are product metrics case studies, SQL questions, and data analytics:
Product analyst interviews typically include 3-5 rounds that assess your technical skills, product intuition, and ability to communicate. Google Product Analyst interviews, for example, include three rounds:
Step 1: Recruiter Screen
The majority of tech companies use recruiter screens as a first step. This provides the recruiter the chance to get to know the candidate and understand if the candidate is right for the role, assess communication skills and see if the candidate has a genuine interest in the role.
Step 2: Technical Screen
A technical round assesses your technical skills. Specifically, SQL and product intuition are tested during this round. You will be asked to write SQL queries and functions to solve problems.
You can also expect product case study and metrics questions. These questions ask you to analyze data and use that analysis to answer questions about a product.
Step 3: On-Site Round
During the on-site round, you can expect more technical and product rounds. Many companies also do a one-on-one with the product manager (PM). This interview assesses your knowledge of the product, and to see if your product intuition is a strong match.
Product case study questions assess your ability to use data to influence product decisions. Typically, these questions ask about feature changes, metrics anomalies, measuring product success and/or product improvements.
To add context on why the question is being posed, even though comments per user is decreasing, overall the company has been consistently growing users month-over-month for three months.
With a question like this, start by modeling the scenario. Your model might look like this:
Using this model, you might also model churn as Month 1 - 25%, Month 2 - 20% and Month 3 - 15%. Knowing that some users are churning off the platform each month, what can you infer about the decrease in comments per user?
Interviewers ask this question to see that you have done your research and have knowledge of the company’s products. In particular, they want to know:
Prior to the interview, create some answers for a question like this. In particular, you should propose changes or new features that will enhance the product, address a problem and align with the company’s overarching objectives.
This product question is more focused on growth and is actively in Facebook’s growth marketing analyst technical screen. With growth questions, we have to come up with solutions in the form of growth ideas and provide data points for how they might support our hypothesis.
One hypothesis we could propose is that implementing notifications to Facebook users of friends that have joined Instagram would help to promote Instagram. So if a user’s friend on Facebook decides to join Instagram, we could send a notification to the user that their friend joined Instagram. We can test this hypothesis by implementing an A/B test. We can randomly bucket users into a control and test group where the test group gets notifications on Facebook each time their friend joins Instagram, while the control group does not. At the end of the test, we can observe the sign-up rate on Instagram between the two groups.
One of the most effective ways is to conduct quantitative analysis. You can measure the opportunity size of each idea using historical data.
For example, if one of the ideas was to introduce cart upsells, you could analyze the number of multi-item orders historically. If only a small percentage of customers purchase multiple items, introducing upsells would be a sizable opportunity. You might then choose an A/B test related to cart upsells.
With metrics questions, start by listing broad variables that could affect ETA. In this case, that would include things like:
Once you’ve created a list of broad variables, you can then start to go deeper and choose which ones might have the greatest effect. Weather, how crowded a location is, and wrong turn rate could all help to improve the accuracy of the ETA model.
Let’s say that Dropbox wants to change the logic of the trash folder from never permanently deleting items to automatically deleting items after 30 days. How would you validate this idea? See a step-by-step solution to this question on YouTube:
More context. Let’s say at Netflix we offer a subscription where customers can enroll for a 30-day free trial. After 30 days, customers will be automatically charged based on the package selected, unless they opt out. What metrics would you look at?
First step, think about Netflix’s business model. They want to focus on:
How would this free-trial plan affect how Netflix might acquire new users or manage their customer churn?
The types of SQL questions in product analyst interviews range from definition-based discussions, e.g. “When would you use DELETE vs TRUNCATE?”, to writing queries based on provided data. Multi-step SQL case studies are also common. These questions ask you to propose metrics, and then write SQL to pull those metrics.
Both WHERE and HAVING are used to filter a table to meet the conditions that you set. The difference between the two is shown when they are used in conjunction with the GROUP BY clause. The WHERE clause is used to filter rows before grouping (before the GROUP BY clause) and HAVING is used to filter rows after grouping.
There are four different types of joins:
EXTRACT allows us to pull temporal data types like date, time, timestamp and interval from date and time values.
If you wanted to find the year from 2022-03-22, you would write EXTRACT ( FROM ):
SELECT EXTRACT(YEAR FROM DATE '2022-03-22') AS year;
More context. Given a table of bank transactions with columns
created_at (date and time for each transaction), write a query to get the last transaction for each day. The output should include the id of the transaction, datetime of the transaction, and the transaction amount. Order the transactions by datetime.
created_at column is in DATETIME format, we can have multiple entries that were created at different times on the same date. For example, transaction 1 could happen on ‘2020-01-01 02:21:47’, and transaction 2 could happen on ‘2020-01-01 14:24:37’.
To make partitions, we should remove information about the time that the transaction was created. But, we would still need that information to sort the transactions
Is there a way you could do both these tasks at once?
Let’s say you wanted to create a histogram to model the number of comments per user in the month of January 2020. Assume the bin buckets have intervals of one.
A histogram with bin buckets of size one means that we can avoid the logical overhead of grouping frequencies into specific intervals.
For example, if we wanted a histogram of size five, we would have to run a SELECT statement like so:
SELECT CASE WHEN frequency BETWEEN 0 AND 5 THEN 5 WHEN frequency BETWEEN 5 AND 10 THEN 10 etc..
More context. A table contains information about the phases of writing a new social media post. The action column can have values post_enter, post_submit, or post_canceled for when a user starts to write a post (post_enter), successfully posts (post_submit), or ends up canceling their post (post_cancel).
Write a query to get the post success rate for each day in the month of January 2020. You can assume that a single user may only make one post per day.
Let’s see if we can clearly define the metrics we want to calculate before just jumping into the problem. We want the post success rate for each day over the past week. To get that metric let’s assume post success rate can be defined as:
(total posts submitted) / (total posts entered)
Additionally, since the success rate must be broken down by day, we must make sure that a post that is entered must be completed on the same day. What comes next?
In addition to thinking through possible insights, what do you think the distribution of the number of conversations created by each user per day looks like? Write a query to get the distribution of the number of conversations created by each user by day in the year 2020. This visualization can also help you hone the insights to be gleaned.
See a step-by-step solution to this problem on YouTube:
Analytics questions are a subset of SQL questions, and often require SQL code writing. These questions assess your ability to pull actionable insights from data. In these questions, you might be asked to pull metrics or perform a multi-step data analytics case study.
This is a classic data analytics case study type question, in that you are being asked to:
With this question, start by thinking about how we could prove or disprove the hypothesis. For example, if CTR is high when search ratings are high, and low when search ratings are low, then the hypothesis is supported. With that in mind, you can solve this problem by looking at results split into different search ratings buckets.
Quick solution. For this problem, note that we are going to assume that the question states the average order value for all users that have ordered at least once. Therefore, we can apply an INNER JOIN between users and transactions.
SELECT u.sex , ROUND(AVG(quantity *price), 2) AS aov FROM users AS u INNER JOIN transactions AS t ON u.id = t.user_id INNER JOIN products AS p ON t.product_id = p.id GROUP BY 1
More context. The
products table includes
category_id information for customers. In addition, your output should include a boolean column with 1 if the customer has previously purchased that product category, or 0 if they have not.
Additionally, the table should have a boolean column with a value of 1 if the user has previously purchased that product category and 0 if it is their first time buying a product from that category.
Your output should look like this:
Product analysts are tested on their ability to design, conduct and evaluate A/B tests. These questions explore A/B testing and statistics, and include definitions-based questions and A/B testing case studies.
The biggest difference comes down to sample size. Z-tests are best performed when the experiment has a large sample size, while t-tests are best for small sample sizes.
Further, a z-test is a statistical test that is used to determine whether the means of two samples are different, a calculation which requires variance to be known as well as a large sample size. A t-test is a type of statistical test that is used to determine if the means of two samples are different, and the datasets that you have used must follow a normal distribution while potentially having unknown variance.
A question like this assesses your foundational knowledge of A/B testing. A sample response might include that there are three components you need:
Then, provide an example. If you wanted to know what effect an upsell offer on a cart had on users (more personalized vs. best-sellers), you might say: If we personalize the upsell offer (variable), then customers will convert at higher rates (result), because the personalized upsells are more relevant for the audience (rationale).
This question assesses your ability to design an A/B test. First, ask about the problem the A/B test is trying to solve. This will help you tailor the questions you would ask. Some examples you might use are:
Let’s start out by asking some clarifying questions here:
Basically, this type of question is asking: Was the A/B test set up and measured correctly? If it was set up and measured correctly, what could we say about the p-value?
More context. The results of an A/B test show that the treatment group ($10 reward) has a 30% response rate, while the control group without rewards has a 50% response rate. Can you explain why that happened? How would you improve the experimental design?
See a step-by-step solution for this question on YouTube:
Product analyst interviews may include basic Python questions, especially for coding-intensive roles. In particular, Python questions cover definitions or ask you to perform a basic-to-intermediate coding exercise.
A split() is used to separate strings in Python. For example, if the string was “basic python,” the split function would break that into ‘basic’, ‘python’. Here’s an example:
string='basic python' print(string.split())
list = ['1', '4', '0', '6', '9'] list = [int(i) for i in list] list.sort() print (list)
[0, 1, 4, 6, 9]
At its core, bi-grams are two words that are placed next to each other. Two words versus one word feature in engineering for a NLP model that gives an interaction effect. To actually parse them out of a string, we need to first split the input string. We would use the python function .split() to create a list with each individual word as an input. Create another empty list that will eventually be filled with tuples.
Then, once we have identified each individual word, we need to loop through the list k-1 times (if
k is the amount of words in a sentence) and append the current word and subsequent word to make a tuple. This tuple gets added to a list that we eventually return.
def find_bigrams(sentence): input_list = sentence.split() bigram_list =  # Now we have to loop through each word for i in range(len(input_list)-1): #strip the whitespace and lower the word to ensure consistency bigram_list.append((input_list[i].strip().lower(), input_list[i+1].strip().lower())) return bigram_list
This is a relatively simple problem because we have to set up our distribution and then generate n samples from it, which are then plotted. In this question, we make use of the scipy library which is a library made for scientific computing.
First, we will declare a standard normal distribution. A standard normal distribution, for those of you who may have forgotten, is the normal distribution with mean = 0 and standard deviation = 1. To declare a normal distribution, we use the scipy stats.norm(mean, variance) function and specify the parameters as mentioned above
Statistics and probability questions are asked to test your data sense, as well as your ability to analyze large datasets. These questions can include basic definitions alongside short statistical problems that require you to make a calculation.
In the simplest terms, p-value is used to measure the statistical significance of a test. The higher the p-value, the more likely you are to accept the null hypothesis (typically that the two variables can be explained by random interaction). A smaller p-value would indicate that there was a statistically significant interaction between the variables, and that you are able to reject the null, which is to say, something more than randomness explains how the variables interact.
More Context: Capital approval rates have gone down for our overall approval rate. Let’s say last week it was 85%, but fell this week to 82%, a statistically significant reduction.
The first analysis shows that all approval rates stayed flat or increased over time when looking at the individual products.
This would be an example of Simpson’s Paradox, which is a phenomenon in statistics and probability. Simpson’s Paradox occurs when a trend shows in several groups but either disappears or is reversed when combining the data. This is often because the subgroups are offset from each other on the Y-axis, and when aggregated show only the movement between the groups, and not the trends within. For the original example, there could have been quite a few more sales of Product 2, which pulled the overall approval rate down, even though no drop actually occured for the product.
This is a simple calculation problem:
There are 4 combinations of rolling a 4 (1+3, 3+1, 2+2): P(rolling a 4) = 3⁄36 = 1⁄12
There are 5 combinations of rolling an 8 (2+6, 6+2, 3+5, 5+3, 4+4):
Solution: P(rolling an 8) = 5⁄36
Behavioral questions are discussion-based, and they are designed to understand if you are the right culture fit for a position. These questions also dig into your past experiences and soft skills. Your responses should reference your work and impact.
Before you jump into an answer, work backwards. The best techniques really depend on the data being conveyed. For example, you might choose a donut chart if you’re conveying percentages. Then, provide some examples of your favorite visualizations you have created, what was unique about them, and what techniques you used.
For any product-related role, expect a question like this. First, you want to provide an overview of the product: what it is, key features, etc. Then, explain the problems the product solves for the user (which is you in this case). Finally, explain why the product solves the problem better than competitors.
If you are given unclear directions, it may be because the PM is not sure how to proceed. First, you might ask some clarifying questions like:
Once you have more information, you can create a plan and run it by the PM, asking for feedback, suggestions or the final green light. A good PM might use your guiding questions to clarify with other stakeholders, so being comprehensive in your probing can create a better result for internal and external partners.
Interviewers ask this question to gauge your passion for analytics and product. To answer, you might talk about:
With this question you might talk about performing a data quality assessment and analyzing particular features of the dataset. Some metrics you might be interested in include:
Expect a variation of this question. Interviewers ask it to assess your drive and ambition, as well as how fast you can adapt to new situations.
Start with an overview of what you would need. What information would you gather in the first 30 days? How would you familiarize yourself with the product? Then, think about where you might be able to add the most value. If you have deep experience in churn analysis, you could describe jumping into details like analyzing churn, developing predictive models for churn, and identifying opportunities to reduce it.