Table of Contents
Machine learning and modeling interview questions cover some of the most basic fundamentals in data science. Given that it’s a rapidly evolving field, machine learning is almost always in need of updates. Therefore, as a data scientist, it’s important to keep up with the latest trends and technologies that are constantly being released.
Modeling interview questions and the machine learning interview are many times an abstraction for testing a candidate’s experience in the field, as well as determining to what degree a data scientist or machine learning engineer can critically apply theory towards a business goal.
As we go through each framework, interview question, and machine learning concept, it’s worth remembering that machine learning and modeling interview questions are ultimately indicative of two things:
- A candidate’s past experience working with machine learning.
- The capacity to memorize concepts and apply them towards solutions the interviewer is looking for.
Not quite sure if machine learning is right for you? Check out our Software Engineering vs Machine Learning article today.
How much machine learning do I need to know?
This is the most repetitive question that I have gotten ever since starting Interview Query. Why do you think that is? Because there is an infinite amount of knowledge you can consume in machine learning.
Literally infinite. The very definition of machine learning and AI conceptualizes this fact.
Machine learning is a technology that is breaking ground every day. Technically, it should be improving faster and faster, given that machine learning and artificial intelligence is essentially supposed to be learning itself.
However, machine learning tested in an interview is completely different from how it is generally framed in real practice. It is also different depending on the type of role that you’re interviewing for.
A data scientist is not expected to know the same level of knowledge necessary for machine learning compared to a machine learning engineer or research scientist. This varying expectation, however, can be confounded by what the employer thinks a data scientist does versus a machine learning engineer, such as a case where the role is titled data scientist, but the position is instead designed for building machine learning infrastructure the whole time.
Let’s look at how much each role and position needs to know about machine learning interview questions.
The data scientist role is primarily responsible for solving business problems using data to pull, munge, and generate insights from data. Data scientists will explore all aspects of the business and work cross-functionally with different teams to do everything from developing dashboards for reporting and exploring analytics for insights, to building models.
The last part of building models is tricky in determining how much machine learning a data scientist should know. Many data science roles that are focused on analytics don’t require any machine learning at all, while some roles are essentially machine learning engineers with a data scientist title. Generally, the main way to understand the difference is to ask everyone at the company about the day-to-day responsibilities of the role that you’re interviewing for.
For example, if we look at the Facebook Data Scientist role, we won't see much machine learning tested in their interview.
But if we compare it with the data science role for C3.ai and we see a huge emphasis on machine learning.
Machine Learning Engineers and Data Engineers
Engineers build models and deploy them, develop infrastructure to scale, and work with data scientists to understand the best-use cases. They leverage data tools, programming frameworks, and data pipelines to ensure that models scale appropriately for any technical specifications.
Machine learning engineers should also have a strong knowledge of machine learning and theory, given their responsibility for building tooling and automation over the model creation, training, and evaluation life cycle.
Regular software engineers aren't expected to know too much about machine learning. But data engineers will likely need to know how to scale up data infrastructure alongside the machine learning engineers so that the models can retrieve and output the correct data points.
Research Scientists and AI Researchers
Research scientists are typically roles meant for teams to break new ground with machine learning in the research domain. The level of machine learning and statistics knowledge needed is usually very high.
Given these three roles, the best way to estimate how much machine learning knowledge is needed for the interview would be to first understand how embedded in machine learning your job will be. This is done with individual research on the company, position, team, and background information of your interview panel.
Types of Machine Learning Interview Questions
Machine learning interview questions follow a couple of patterns. While they can seem abstract and overwhelming, we can break them down into six types of situational problems and case studies.
- Modeling and Machine Learning Case Study
- Recommendation and Search Engines
- Machine Learning Algorithms
- Applied Modeling
- Machine Learning System Design
- Python Machine Learning from Scratch
Modeling and Machine Learning Case Study Interview
The modeling case study requires a candidate to evaluate and explain a particular part of the model building process. A common case study problem would be for a candidate to explain how they would build a model for a product that exists at the company.
Example Question: Describe how you would build a model to predict Uber ETAs after a rider requests a ride.
Many times, this can be scoped down into a specific portion of the model building process. For instance, taking the example above, we could instead reword the problem to:
- How would you evaluate the predictions of an Uber ETA model?
- What features would you use to predict the Uber ETA for ride requests?
The main point of these case questions is to determine your knowledge of the full modeling lifecycle and how you would apply it to a business scenario.
We want to approach the case study with an understanding of what the machine learning & modeling lifecycle should look like from beginning to end, as well as creating a structured format to make sure we’re delivering a solution that explains our thought process thoroughly.
For the machine learning lifecycle, we have around six different steps that we should touch on from beginning to end:
- Data Exploration & Pre-Processing
- Feature Selection & Engineering
- Model Selection
- Cross Validation
- Evaluation Metrics
- Testing and Roll Out
Try a machine learning case question on Interview Query: Bank Fraud Model
Let's say that you work at a bank that wants to build a model to detect fraud on the platform.
The bank wants to implement a text messaging service in addition that will text customers when the model detects a fraudulent transaction in order for the customer to approve or deny the transaction with a text response.
How would we build this model?
Read more about how to frame a machine learning case study in our interview course.
Check out a mock machine learning case study interview asked by Uber.
Recommendation and Search Engines Interview Questions
Recommendation and search engines are questions that are technically case study questions but are asked so frequently that it’s important to conceptualize them into their own category.
- How would you build a recommendation engine to recommend news to users on Google?
- How would you evaluate a new search engine that your co-worker built?
In the section that follows, we've laid out an ideal answer to a recommendation and search engine type question (or, really, any machine learning case study that you're likely to encounter).
YouTube Video Recommendation
1. Problem Statement
Build a video recommendation system for YouTube users. We want to maximize user engagement and recommend new types of content to users.
2. Metrics Design and Requirements
- Precision (the fraction of relevant instances among the retrieved instances)
- Recall (the fraction of the total amount of relevant instances that were actually retrieved)
- Ranking Loss
- Use A/B testing to compare:
- Click Through Rates (CTR)
- Watch time
- Conversation rates
- User behavior is generally unpredictable and videos can become viral during the day. Ideally, we want to train many times during the day to capture temporal changes.
- For every user to visit the homepage, the system will have to recommend 100 videos for them. The latency needs to be under 200ms, ideally sub 100ms.
- For online recommendations, it’s important to find the balance between exploration vs. exploitation. If the model over-exploits historical data, new videos might not get exposed to users. We want to balance between relevance and fresh new content.
3. Multi-Stage Models
There are two stages: candidate generation and ranking. The reason for two stages is to make the system scale. It’s a common pattern that you will see in many machine learning systems.
We will explore the two stages in the section below:
- The candidate model will find the relevant videos based on user watch history and the type of videos the user has watched.
- The ranking model will optimize for the view likelihood, i.e. videos that have a high watch possibility should be ranked high. It’s a natural fit for the logistic regression algorithm.
Candidate Generation Model
- Each user has a list of video watches (videos, minutes_watched).
- For generating training data, we can make a user-video watch space. We can start by selecting a period of data like last month, last six months, etc. This should find a balance between training time and model accuracy.
- The candidate generation can be done by matrix factorization. The purpose of candidate generation is to generate “somewhat” relevant content to users based on their watch history. The candidate list needs to be big enough to capture potential matches for the model to perform well with desired latency.
- The ideal choice is to use collaborative algorithms because the inference time is fast and it can capture the similarity between user tastes in the user-video space.
During inference, the ranking model receives a list of video candidates given by the Candidate Generation Model. For each candidate, the ranking model estimates the probability of that video being watched. It then sorts the video candidates based on the probability and returns the list to the upstream process.
- We can use User Watched History data. Normally, the ratio between watched vs. not-watched is 2/98. So most of the time the user does not watch a video.
At the beginning, it’s important that we started with a simple model, since we can add complexity later.
- A fully connected neural network is simple yet powerful for representing non-linear relationships and it can handle big data.
- We start with a fully connected neural network with sigmoid activation at the last layer. The reason for this is that the sigmoid function returns values in the range [0,1]. Therefore, it’s a natural fit for estimating probability.
- For deep learning architecture, we can use relu (Rectified Linear Unit) as an activation function for hidden layers. It’s very effective in practice.
- The loss function can be cross-entropy loss.
4. Calculation & estimation
For the sake of simplicity, we can make these assumptions:
- Video views per month are 150 billion.
- 10% of videos watched are from recommendations, a total of 15 billion videos.
- On the homepage, a user sees 100 video recommendations.
- On average, a user watches two videos out of 100 video recommendations.
- If users do not click or watch some video within a given time frame, i.e., 10 minutes, then it is a missed recommendation.
- The total number of users is 1.3 billion.
- For 1 month, we collected 15 billion positive labels and 750 billion negative labels.
- Generally, we can assume that for every data point we collect, we also collect hundreds of features. For simplicity, each row takes 500 bytes to store. In one month, we need 800 billion rows.
- Total size: 500 * 800 * 10**9 = 4 * 10 ** 15 bytes = 4 Petabytes. To save costs, we can keep the last six months or one year of data in the data lake and archive old data in cold storage.
- Assume that every second we have to generate a recommendation request for 10 million users. Each request will generate ranks for 1k-10k videos.
- Support 1.3 billion users
5. System design
- User Watched history stores which videos are watched by a particular user over time.
- Search Query DB stores ahistorical queries that users have searched in the past. User/Video DB stores a list of Users and their profiles along with Video metadata.
- User historical recommendations stores past recommendations for a particular user.
- Resampling data: It’s part of the pipeline to help scale the training process by down-sampling negative samples.
- Feature pipeline: A pipeline program to generate all required features for training a model. It’s important for feature pipelines to provide high throughput, as we require this to retrain models multiple times. We can use Spark or Elastic MapReduce or Google DataProc.
- Model Repos: Storage to store all models, using AWS S3 is a popular option.
- In practice, during inference, it’s desirable to be able to get the latest model near real-time. One common pattern for the inference component is to frequently pull the latest models from Model Repos based on timestamp.
Huge data size
- Solution: Pick 1 month or 6 months of recent data.
- Solution: Perform random negative down-sampling.
- Solution 1: Use model-as-a-service, each model will run in Docker containers.
- Solution 2: We can use Kubernetes to auto-scale the number of pods.
When a user requests a video recommendation, the Application Server requests Video candidates from the Candidate Generation Model. Once it receives the candidates, it then passes the candidate list to the ranking model to get the sorting order. The ranking model estimates the watch probability and returns the sorted list to the Application Server. The Application Server then returns the top videos that the user should watch.
6. Scale the design
- Scale out (horizontal) multiple Application Servers and use Load Balancers to balance loads.
- Scale out (horizontal) multiple Candidate Generation Services and Ranking Services.
- It’s common to deploy these services in a Kubernetes Pod and take advantage of the Kubernetes Pod Autoscaler to scale out these services automatically.
- In practice, we can also use Kube-proxy so the Candidate Generation Service can call Ranking Service directly, reducing latency even further.
You can learn more about the Machine Learning System Design interview with our Machine Learning System Design course on Interview Query.
Try out solving a recommendation feed interview question asked by LinkedIn: Job Recommendations
Let's say that you're working on a job recommendation engine. You have access to all user Linkedin profiles, a list of jobs each user applied to, and answers to questions that the user filled in about their job search.
Using this information, how would you build a job recommendation feed?
Machine Learning Algorithms Interview Questions
These types of questions exist to get an in-depth understanding of your conceptual knowledge of machine learning. Companies ask these questions mostly to machine learning and deep learning specialists that would be focusing on the specific building and training of a machine learning model.
These types of questions would be something akin to “How does random forest generate trees?” or “What’s the difference between SVM and Gradient Boosting Trees?”.
For example, a common question asked within the machine learning algorithms interview questions is on the bias/variance tradeoff.
What is bias in a model? Bias is the amount our predictions are systematically off from the target. Bias is the measure of how “inflexible” the model is.
What is variance in a model? Variance is the measure of how much the prediction would vary if the model was trained on a different dataset, drawn from the same population. Can be also thought of as the “flexibility” of the model.
Generally, what happens to bias & variance as we increase the complexity of the model? Bias decreases and variance increases
Try out solving a machine learning algorithms interview question asked by Airbnb: Booking Regression
It’s clear that these questions are meant to test if candidates understand the situations in which they would apply different types of models. They’re also mostly definition based questions, so if you memorize a bunch of different machine learning definitions and applications, you will usually do okay in this part.
Applied Modeling Interview Questions
Applied modeling questions take machine learning concepts and ask how they could be applied to fix a certain problem. These questions are a little more nuanced, require more experience, but are great litmus tests of modeling and machine learning knowledge.
An example question would be: You’re given a model with 90% accuracy, should you deploy it?.
These types of questions are similar to case studies in that they are mostly ambiguous, require more contextual knowledge and information gathering from the interviewer, and are used to really test your understanding in a certain area of machine learning.
Try an example modeling interview question from Zillow: Missing Housing Data
We want to build a model to predict housing prices in the city of Seattle. We've scraped 100K sold listings over the past three years but found that around 20% of the listings are missing square footage data.
How do we deal with the missing data to construct our model?
Machine Learning System Design
Machine learning system design interview questions comprise of higher level design and architecture of recommendation systems, deploying machine learning models, and concepts on scaling these systems. At its core, machine learning system design problems are understanding how to solve the problem of deploying machine learning models that will work for all aspects of business requirements.
- How would you build Twitter-style social media feed to display relevant posts to users?
- Build an advertising bidding system that presents personalized ads to users
- Design an machine learning system that can identify fraudulent transactions.
Preparing for the machine learning system design interview requires understanding a multi-step process of:
- Setting the problem statement.
- Architecting the high-level infrastructure.
- Explaining how data moves from one part to the next.
- Understand how to measure performance of the machine learning models.
- Deal with common problems around scale, reliability, and deployment.
Try the machine learning system design problem asked by Netflix on Interview Query: Type-Ahead Search
How would you build the recommendation algorithm for type-ahead search for Netflix?
Python Machine Learning Interview Questions
Coding machine learning algorithms are increasingly becoming more common on interviews. These questions are framed around deriving machine learning algorithms encapsulated on sci-kit learn or other packages from scratch.
The interviewer is mainly testing a raw understanding of coding optimizations, performance, and memory on existing machine learning algorithms. Additionally this would be testing if the candidate REALLY understood the underlying algorithm if they could build it without using anything but the Numpy Python package.
Generally these type of machine learning interview questions are pretty controversial. They're hard to do within a specific timeframe and generally pretty vague in how they're graded.
Write a function to build K-NN from scratch on a sample input of a list of lists of integers.
Machine Learning Interview Questions and Answers
Here are some quick machine learning concepts to review before your next interview.
If you'd like a full list of real machine learning and modeling interview questions from top tech companies, check out the full questions database on Interview Query with machine learning, product, SQL, and coding questions.
Practice Real Machine Learning Questions
Machine Learning Interview Questions and Concepts
What is regularization?
Regularization is the act of modifying our objective function by adding a penalty term, to reduce overfitting.
Which regularization method would you prefer to treat correlated variables? Why?
Typically, we should prefer the regularization method that would drive feature coefficients to remove correlated features. LASSO could work here, however, if the data has a lot of features relative to the data size, then elastic net may be better.
Describe different regularization methods
L2 Regularization minimizes the sum of the squared residuals plus lambda times the slope squared. This is called the Ridge Regression Penalty. This increases the bias of the model, making the fit worse on the training data, but also decreases the variance.
What is gradient descent?
Gradient descent is a method of minimizing the cost function. The form of the cost function will depend on the type of supervised model. When optimizing our cost function, we compute the gradient to find the direction of steepest ascent. To find the minimum, we need to continuously update our Beta, proportional to the steps of the steepest gradient.
What is the difference between a parametric learning algorithm vs non-parametric learning algorithm?
A parametric learning algorithm has a finite set of parameters the learning algorithm estimates.
A non-parametric learning algorithm has a non-finite set of parameters. This means, that as the dataset grows, the learning algorithm can estimate more and more parameters from the dataset.
How do you interpret Linear Regression coefficients?
Interpreting Linear Regression coefficients is much simpler than Logistic Regression. The regression coefficient signifies how much the mean of the dependent variable changes, given a one-unit shift in that variable, holding all variables constant.
What is Maximum Likelihood Estimation?
Maximum Likelihood Estimation is where we find the distribution that is most likely to have generated the data. To do this, we have to estimate the parameter theta, that maximizes the the likelihood function evaluated at x. P(data | X)
What is Linear Discriminant Analysis?
LDA is a predictive modeling algorithm for multi-class classification. LDA will compute the directions that will represent the axes that maximize the separation between classes.
What's the difference between precision and recall?
Recall: What proportion of actual positives was identified correctly?
Precision: What proportion of positive identifications was actually correct?
What is the intuition behind F1 score?
The intuition is that we’re taking the harmonic mean between precision/recall. In a scenario where classes are imbalance, we’re likely to have either precision extremely high or recall extremely low, or vice-versa. As a result, this will be reflected in our F1 score, since the lower of the two metrics should drag the F1 score down.
Explain what Glove embeddings are.
Rather than use contextual words, we calculate a co-occurrence matrix of all words. Glove will also take local contexts into account, per a fixed window size, then calculate the covariance matrix. Then, we predict the co-occurence ratio between the words in the neural network.
GloVe will learn this matrix and train word vectors that predict co-occurrence ratios. Loss is weighted by word frequency.
If you have a machine learning interview coming up, check out our machine learning course on Interview Query!