Interview Query

Top 50 Machine Learning Interview Questions for 2022

Check out 30+ machine learning interview questions and learn more about machine learning interviews


Most machine learning interviews are tough to pass. Here’s 50+ questions to help you prepare for your next data science or machine learning interview.

Most machine learning questions assess two things:

  1. Your past experience working with machine learning.

  2. Your capacity to memorize concepts and apply them towards a solution.

Therefore, many of these questions will be designed to test your knowledge, e.g. definitions-based and theoretical questions, and also your ability to apply ML theory towards a business goals.

In this guide, we’ll focus on different types of machine learning questions including the different types of algorithms, applied modeling questions, and questions on machine learning system design.

How to Prepare for the Machine Learning Interview

Machine learning interview questions follow a couple of patterns. We’ve broken them down into six different types of questions.

1. Different machine learning algorithms and theory

These questions assess your working knowledge of algorithm fundamentals. Often, they’re posed as comparison questions.

Sample question: What is the difference between a parametric learning algorithm vs a non-parametric learning algorithm?

2. Machine learning case studies

Machine learning case studies ask you to explain and walk the interviewer through building a model and the various different tradeoffs you can make.

Sample question: How would you build a model for a Product X?

3. Applied modeling questions

Applied modeling questions take machine learning concepts and ask how they could be applied to fix a certain problem.

These problems are slightly different than case studies as they’re more specific towards understanding machine learning theory rather than a business case.

For example: You’re given a model with 90% accuracy, should you deploy it?

4. Machine learning system design

System design questions look at the design and architecture of recommendation systems, machine learning models, and concepts on scaling these systems

Sample question: 1. How would you build Twitter-style social media feed to display relevant posts to users?

5. Recommendation and search engines

These questions are often posed like case studies, but they’re specific to recommendation and search engines. They’re very common in machine learning interviews.

Sample question: How would you build a recommendation engine to recommend news to users on Google?

6. Writing machine learning algorithms from scratch

These questions ask you to code machine learning algorithms from scratch without the use of helper packages. For example, you might be asked to re-create an algorithm from Scikit-learn or NumPy from the ground up.

Sample question: Given a list of tuples representing coordinates on a 2-D plane, write a function to compute the maximum gradient descent coefficient.

Now we’ll take a look at machine learning questions in each of these categories with hints and sample solutions.

Machine Learning Algorithms Interview Questions

Machine learning algorithms questions assess your conceptual knowledge of machine learning. Companies ask these questions mostly to machine learning and deep learning specialists that would be focusing on the specific building and training of a machine learning model.

Many times algorithms questions can be asked in different forms. But three of the most common ways are:

  • Comparing differences algorithms
  • Identifying similarities between algorithms
  • Definitions of algorithm terms

Why do they get asked?

Algorithm interview questions test your foundational knowledge. For example, a common question like bias/variance tradeoff helps the interviewer know how deep your knowledge of the concept truly is, as well as your ability to communicate complex ideas.

Q1. You’re asked to build a model to predict booking prices on Airbnb. Which model would perform better, linear regression or random forest regression?

With a question like this, you should define both, and then explain your reasoning for your solution.

Random forest regression is based on the ensemble machine learning technique of bagging. The two key concepts of random forests are:

  1. Random sampling of training observations when building trees.
  2. Random subsets of features for splitting nodes.

Compared to linear regression, random forest can also handle missing values and cardinality well, while avoiding a sizable impact by outliers. Random forest will also tend to perform better with categorical predictors.

Linear regression on the other hand is the standard regression technique in which relationships are modeled using a linear predictor function, the most common example of y = Ax + B. Linear regression models are often fitted using the least-squares approach.

There are also four main assumptions in linear regression:

  • A normal distribution of error terms
  • Independence in the predictors
  • The mean residuals must equal zero with constant variance
  • No correlation between the features

Q2. What is bias in a model?

Bias is the amount our predictions are systematically off from the target. Bias is the measure of how “inflexible” the model is.

Q3. What is variance in a model?

Variance is the measure of how much the prediction would vary if the model was trained on a different dataset, drawn from the same population. Can be also thought of as the “flexibility” of the model.

Q4. What is regularization?

Illustrated example of regularization

Regularization is the act of modifying our objective function by adding a penalty term, to reduce overfitting.

Q5. What is gradient descent?

Gradient descent is a method of minimizing the cost function. The form of the cost function will depend on the type of supervised model.

When optimizing our cost function, we compute the gradient to find the direction of steepest ascent. To find the minimum, we need to continuously update our Beta, proportional to the steps of the steepest gradient.

Q6. How do you interpret linear regression coefficients?

Interpreting linear regression coefficients is much simpler than logistic regression. The regression coefficient signifies how much the mean of the dependent variable changes, given a one-unit shift in that variable, holding all variables constant.

Q7. What is maximum likelihood estimation?

Maximum likelihood estimation is where we find the distribution that is most likely to have generated the data. To do this, we have to estimate the parameter theta, that maximizes the the likelihood function evaluated at x.

Q8. What is linear discriminant analysis?

LDA is a predictive modeling algorithm for multi-class classification. LDA will compute the directions that will represent the axes that maximize the separation between classes.

Q9. What’s the difference between precision and recall?

Recall: What proportion of actual positives was identified correctly?

Precision: What proportion of positive identifications was actually correct?

Q10. What is the intuition behind F1 score?

Precision vs Recall illustration

The intuition is that we’re taking the harmonic mean between precision/recall. In a scenario where classes are imbalance, we’re likely to have either precision extremely high or recall extremely low, or vice-versa. As a result, this will be reflected in our F1 score, since the lower of the two metrics should drag the F1 score down.

Q11. Explain what GloVe embeddings are.

Rather than use contextual words, we calculate a co-occurrence matrix of all words. GloVe will also take local contexts into account, per a fixed window size, then calculate the covariance matrix. Then, we predict the co-occurence ratio between the words in the neural network.

GloVe will learn this matrix and train word vectors that predict co-occurrence ratios. Loss is weighted by word frequency.

Q12. How would you prevent overfitting in a deep learning model?

You can reduce overfitting by training the network on more examples or reduce overfitting by changing the complexity of the network.

The benefit of very deep neural networks is that their performance continues to improve as they are fed larger and larger datasets. A model with a near-infinite number of examples will eventually plateau in terms of what the capacity of the network is capable of learning.

Q13. How to extract semantics from a body of text?

Use named entity recognition techniques or use specific packages to measure cosine similarity and overlap.

Q14. Describe a situation where you would use MSE as a measure of quality?

Mean Square Error (MSE) is defined as Mean or Average of the square of the difference between actual and estimated values.

We would use MSE when looking at the accuracy of a regression model.

Q15. Would an additional feature improve GBM or Logistic Regression more?

Adding an additional feature does not necessarily improve the performance of GBM or Logistic regression because adding new features without a multiplicative increase in the number of observations will lead to a phenomenon where by we have a complex dataset (a dataset with many features) and a small amount of observations.

Q16. How do you optimize model parameters during model building?

Model parameter optimization is a process of finding the best values that a model’s parameters take. Model parameters can be tuned by using the Grid Search algorithm or Random Search.

Q17. What is the relationship between PCA and LDA?

Both techniques are used for dimensionality reduction. PCA is unsupervised while LDA is supervised.

Q18. What is a difference between supervised and unsupervised learning?

In supervised learning, input data is provided to the model along with the output. In unsupervised learning, only input data is provided to the model.

The goal of supervised learning is to train the model so that it can predict the output when it is given new data.

Q19. How does the support vector machine algorithm work?

Support Vector Machine is a linear model for classification and regression problems. The idea of is that the algorithm creates a line or a hyperplane which separates the data into different classes.

Q20. List three unsupervised algorithms and three supervised algorithms.

Q21. How would you explain how the bag-of-words model works to a three year old?

Q22. How would you split your dataset into testing and training data?

Q23. What situation would you use the K-mean algorithm? What situation would you avoid using it?

Machine Learning Case Study Questions

Photo of line graph on desk

The machine learning case study requires a candidate to evaluate and explain a particular part of the model building process. A common case study problem would be for a candidate to explain how they would build a model for a product that exists at the company.

For the machine learning lifecycle, we have around six different steps that we should touch on from beginning to end:

  1. Data Exploration & Pre-Processing
  2. Feature Selection & Engineering
  3. Model Selection
  4. Cross Validation
  5. Evaluation Metrics
  6. Testing and Roll Out

Need some help? Check out our machine learning case study in our interview course.

Q1. Describe how you would build a model to predict Uber ETAs after a rider requests a ride.

Many times, this can be scoped down into a specific portion of the model building process. For instance, taking the example above, we could instead reword the problem to:

  • How would you evaluate the predictions of an Uber ETA model?
  • What features would you use to predict the Uber ETA for ride requests?

The main point of these case questions is to determine your knowledge of the full modeling lifecycle and how you would apply it to a business scenario.

We want to approach the case study with an understanding of what the machine learning and modeling lifecycle should look like from beginning to end, as well as creating a structured format to make sure we’re delivering a solution that explains our thought process thoroughly.

Q2. You’re tasked with building a model to predict if a driver on Uber will accept a ride request or not.

What algorithm would you use to build this model? What are the tradeoffs between different classifiers?

You can see a full mock interview with a solution for this question on YouTube.

Q3. Let’s say that you work at a bank that wants to build a model to detect fraud on the platform.

The bank wants to implement a text messaging service in addition that will text customers when the model detects a fraudulent transaction in order for the customer to approve or deny the transaction with a text response.

How would we build this model?

Need a hint? We know that since we’re working with fraud, there has to be a case where there either is a fraudulent transaction or there isn’t.

We should summarize our findings by building out a binary classifier on an imbalanced dataset.

A few considerations we have to make are:

  • How accurate is our data? Is all of the data labeled carefully? How much fraud are we not detecting if customers don’t even know they’re being defrauded?
  • What model works well on an imbalance dataset? Generally tree models come to mind.
  • How much do we care about interpretability? Building a highly accurate model for our dataset may not be the best method if we don’t learn anything from it. In the case that our customers are being comprised without us even knowing, then we run into the issue of building a model that we can’t learn from and feature engineer for in the future.
  • What are the costs of misclassification? If we look at precision versus recall, we can understand which metrics we care given the business problem at hand.

Q4. Let’s say you have a categorical variable with thousands of distinct values, how would you encode it?

This depends on whether the problem is a regression or a classification model.

If it’s a regression model, one way would be to cluster them based on the response by working backwards. You could sort them by the response variable, and then split the categorical variables into buckets based on the grouping of the response variable. This could be done by using a shallow decision tree to reduce the number of categories.

Another way given a regression model would be to target encode them. Replace each category in a variable with the mean response given that category. Now you have one continuous feature instead of a bunch of categories.

For a binary classification, you can target encode the column by finding the conditional probability of the response variable being a one, given that the categorical column takes a particular value. Then replace the categorical column with this numerical value. For example if you have a categorical column of city in predicting loan defaults, and the probability of a person who lives in San Francisco defaults is 0.4, you would then replace “San Francisco” with 0.4.

Q5. How would you develop a machine learning model to handle acronyms so that SVM and Support Machine Learning would convey the same meaning and word?

Applied Modeling Interview Questions


Applied modeling questions take machine learning concepts and ask how they could be applied to fix a certain problem. These questions are a little more nuanced, require more experience, but are great litmus tests of modeling and machine learning knowledge.

These types of questions are similar to case studies in that they are mostly ambiguous, require more contextual knowledge and information gathering from the interviewer, and are used to really test your understanding in a certain area of machine learning.

Q1. Suppose we have a binary classification model that classifies whether or not an applicant should be qualified to get a loan. Because we are a financial company we have to provide each rejected applicant with a reason why.

Given we don’t have access to the feature weights, how would we give each rejected applicant a reason why they got rejected?

Let’s pretend that we have three people: Alice, Bob, and Candace that have all applied for a loan. Simplifying the financial lending loan model, let’s assume the only features are: 

  • Total number of credit cards
  • Dollar amount of current debt
  • Credit age

Let’s say Alice, Bob, and Candace all have the same number of credit cards and credit age but not the same dollar amount of current debt.

  • Alice: 10 credit cards, 5 years of credit age, $20K of debt
  • Bob: 10 credit cards, 5 years of credit age, $15K of debt
  • Candace: 10 credit cards, 5 years of credit age, $10K of debt

Alice and Bob get rejected for a loan but Candace gets approved. We would assume that given this scenario, we can logically point to the fact that Candace’s 10K of debt has swung the model to approve her for a loan.

How did we reason this out? If the sample size analyzed was instead thousands of people who had the same number of credit cards and credit age with varying levels of debt, we could figure out the model’s average loan acceptance rate for each numerical amount of current debt.

Then we could plot these on a graph to model out the y-value, average loan acceptance, versus the x-value, dollar amount of current debt.

Q2. We want to build a model to predict housing prices in the city of Seattle.

We’ve scraped 100K sold listings over the past three years but found that around 20% of the listings are missing square footage data.

How do we deal with the missing data to construct our model?

This is a pretty classic modeling interview question. Data cleanliness is a well-known issue within most datasets when building models. Real-life data is messy, missing, and almost always needs to be wrangled with.

The key to answering this interview question is to probe and ask questions to learn more about the specific context. For example, we should clarify if there are any other features missing data in the listings.

If we’re only missing data within the square footage data column, we can build models of different sizes of training data.

Now, what’s the second method?

Q3. Let’s say we have 1 million app rider journey trips in the city of Seattle. We want to build a model to predict ETA after a rider makes a ride request.

How would we know if we have enough data to create an accurate enough model?

Collecting data can be costly. This question assesses the candidate’s skill in being able to practically figure out how a candidate might approach a problem with evaluating a model.

Specifically, what other kinds of information should we look into when we’re given a dataset and build a model with a “pretty good” accuracy rate.

If this is the first version of a model, how would we ever know if we should put any effort into iteration of the model? And exactly how can we evaluate the cost of extra effort into the model?

There are a couple of factors to look into.

1. Look at the feature set size to training data size ratio. If we have an extremely high number of features compared to data points, then the model will be prone to overfitting and inaccuracy.

2. Create an existing model off a portion of the data, the training set, and measure performance of the model on the validation sets, otherwise known as using a holdout set. We hold back some subset of the data from the training of the model, and then use this holdout set to check the model performance to get a baseline level.

Q4. Let’s say you’re a data scientist at Facebook. How you would evaluate the effect on engagement of teenage users when their parents join Facebook?

See a solution for this machine learning interview question on YouTube.

Q5. What conclusions can be drawn if the area under the ROC-curve is 0.5?

When AUC=0.5, then the classifier is not able to distinguish between positive and negative classifications. Meaning either the classifier is predicting a random class or constant class for all the data points.

Q6. If two predictors are highly correlated, what is the effect on the coefficients in the logistic regression?

Multicollinearity is a statistical phenomenon in which two or more predictor variables in a multiple logistic regression model are highly correlated or associated. Multicollinearity does not reduce the predictive power or reliability of the model as a whole as it only affects calculations regarding individual predictors.

Q7. Let’s say you want to predict the probability of a flight delay, but there are flights with delays of up to 12 hours that are really messing up your model. How would you fix this issue?

One way is to create groups for the output class.

  • Delays less than 1 and 2 hours
  • Between 2 to 10 hours
  • Over 10+ hours.

That way the outliers are skewed out into a specific classification problem instead of regression.

Another way would be to just filter them out from the analysis.

Machine Learning System Design Interview Questions

Machine learning system design interview questions ask you about design and architecture of machine learning applications. Essentially, these questions test your ability to solve the problem of deploying a machine learning model that meet the specific business requirements.

To answer machine learning system design questions, you should follow a framework:

  1. Setting the problem statement.
  2. Architecting the high-level infrastructure.
  3. Explaining how data moves from one part to the next.
  4. Understand how to measure performance of the machine learning models.
  5. Deal with common problems around scale, reliability, and deployment.

Q1. How would you build a machine learning system to generate Spotify’s discover weekly playlist?

See a solution for this question on Interview Query.

Q2. How would you build a restaurant recommender on Facebook? Start with how you would go about getting this data, then talk about how you would build it.

What are some downfalls about adding this feature to Facebook?

See a solution for this question on Interview Query.

Q3. Build a video recommendation system for YouTube users. We want to maximize user engagement and recommend new types of content to users.

You should start to answer this question buy outlining metrics and design recommendations.

Offline Metrics

  • Precision (the fraction of relevant instances among the retrieved instances)
  • Recall (the fraction of the total amount of relevant instances that were actually retrieved)
  • Ranking Loss
  • Logloss

Online Metrics

  • Use A/B testing to compare:
  • Click Through Rates (CTR)
  • Watch time
  • Conversation rates


  • User behavior is generally unpredictable and videos can become viral during the day. Ideally, we want to train many times during the day to capture temporal changes.


  • For every user to visit the homepage, the system will have to recommend 100 videos for them. The latency needs to be under 200ms, ideally sub 100ms.
  • For online recommendations, it’s important to find the balance between exploration vs. exploitation. If the model over-exploits historical data, new videos might not get exposed to users. We want to balance between relevance and fresh new content.

Q4. How would you build Twitter-style social media feed to display relevant posts to users?

Q5. Build an advertising bidding system that presents personalized ads to users

Q6. Design a machine learning system that can identify fraudulent transactions.

Recommendation and Search Engines Questions

Recommendation and search engines are questions that are technically a combination of case study questions and system design questions. But they are asked so frequently that it’s important to conceptualize them into their own category.

Q1. Let’s say that you’re working on a job recommendation engine.

With this question, let’s assume have access to all user LinkedIn profiles, a list of jobs each user applied to, and answers to questions that the user filled in about their job search.

Using this information, how would you build a job recommendation feed? What would the job recommendation workflow look like?

Can we lay out the steps the user takes in the actual recommendation of jobs that allows us to understand what a potential dataset would first look like?

For this problem we have to understand what our dataset consists of before being able to build a model for recommendations. More so we need to understand what a recommendation feed might look like for the user.

For example, what we’re expecting is that the user could go to tab or open up a mobile app and then view a list of recommended jobs sorted by highest recommended at the top.

We can either use an unsupervised or supervised model. For an unsupervised model, we could use a nearest neighbors or collaborative filtering algorithm off of features from users and jobs. But if we want more accuracy, we would likely go with a supervised classification algorithm.

Q2. How would you build a recommendation engine to recommend news to users on Google?

Q3. How would you evaluate a new search engine that your co-worker built?

Q4. How would you build the recommendation algorithm for type-ahead search for Netflix?

With this question, let’s think about a simple use case to start out with. Let’s say that we type in the word “hello” for the beginning of a movie.

If we typed in h-e-l-l-o, then a suitable suggestion might be a movie like “Hello Sunshine” or a Spanish movie named “Hola”.

Machine Learning Algorithm Coding Questions

Coding machine learning algorithms are increasingly common in interviews, especially for specialized subject areas like computer vision. These questions are framed around deriving machine learning algorithms encapsulated on sci-kit learn or other packages from scratch.

The interviewer is mainly testing a raw understanding of coding optimizations, performance, and memory on existing machine learning algorithms. Additionally this would be testing if the candidate REALLY understood the underlying algorithm if they could build it without using anything but the Numpy Python package.

Generally these type of machine learning interview questions are pretty controversial. They’re hard to do within a specific timeframe and generally pretty vague in how they’re graded.

Practice with these Python machine learning questions, including sample questions and an overview of the Python machine learning interview process.

Q1. Given a dictionary with keys of letters and values of a list of letters, write a function closest_key to find the key with the input value closest to the beginning of the list.


dictionary = {
    'a' : ['b','c','e'],
    'm' : ['c','e'],
input = 'c'

closest_key(dictionary, input) -> 'm'

c is at distance 1 from a and 0 from m. Hence closest key for c is m.

Hint: Is your computed distance always positive? Negative values for distance (for example between ‘c’ and ‘a’ instead of ‘a’ and ‘c’) will interfere with getting an accurate result.

Q2. You’re given two words begin_word and end_word which are elements of word_list.

Write a function shortest_transformation to find the length of shortest transformation sequence from begin_word to end_word through the elements of word_list.

Note that only one letter can be changed at a time and each transformed word in the list must exist.


begin_word = "same",
end_word = "cost",
word_list = ["same","came","case","cast","lost","last","cost"]

def shortest_transformation(begin_word, end_word, word_list) -> 5

#since the transformation sequence is ['same','came','case','cast','cost'] which is five elements long

Q3. Write a function to build K-NN from scratch on a sample input of a list of lists of integers.

Q4. Given a list of tuples, write a function to compute the maximum gradient descent coefficient.

Q5. Find the shortest path out of a maze represented by a list of tuples.

More Machine Learning Interview Resources

Become an Interview Query premium member for access to 50+ real machine learning interview questions with solutions. Or take a look at our data science interview course, which features sections in machine learning and machine learning system design.

If you’re looking for company-specific resources, check out guides Amazon Machine Learning Questions, Google Machine Learning Questions, or Facebook Machine Learning Questions.