Interview Query

Python Machine Learning Interview Questions

Overview

Python machine learning questions focus on model deployment and model building, and they’re common in interviews for data scientists, AI scientists, and machine learning engineers.

In general, there are two main types of Python machine learning interview questions, and both types test your ability to use Python coding in algorithms. They are:

  1. Algorithmic coding problems
  2. Machine learning algorithms from scratch

In this short guide, we’ll take a closer look at each type and provide some example Python problems to help you study.

Algorithmic Coding Questions

Algorithmic coding questions are similar to what you’d find on Leetcode. These are short problems that test algorithmic coding and ask you to produce Python code.

Specifically, there are three main areas that they test:

  • Algorithm functions - Algorithmic coding questions test your ability to perform functions like sorting (insertion sort, bubble sort, etc.), searching, matching, and permutation. These questions tend to be clearly defined in what’s being asked, and you’ll be tasked with writing Python code to perform a certain task.
  • Data structures - Algorithm coding questions deal a lot with data structures, which is why strong knowledge of data structures is required to do well in machine learning interviews. Topics include structures like lists/arrays, sets, maps/dictionaries, and tuples, as well as formats like strings, integers, and floats.
  • Algorithmic coding techniques - Algorithmic questions will also test your ability to apply techniques like recursion, divide/conquer and backtracking, as well as practices like plotting, debugging, inheritance, and decorator patterns.

Looking for general algorithm questions? Check out our interview guide Machine Learning Algorithm Questions.

Writing Machine Learning Algorithms from Scratch

Problems that ask you to write an algorithm from scratch are increasingly common in machine learning and computer vision interviews. The algorithms you are asked to write are like what you’d see on scikit-learn.

In general, this type of question tests your familiarity with an algorithm, as well as your ability to code a bug-free version as efficiently as possible.

Most importantly they test your knowledge of ML concepts by asking you to build the algorithms from scratch. So no more writing: rfr = RandomForest(x,y)

However, you don’t have to study every algorithm. Only a few fit the format of an hour-long on-site interview, as many are too complicated to break down in such a short timeframe. These are the algorithms you should study for the machine learning interview:

  • K-nearest neighbors
  • Decision tree
  • Linear regression
  • Logistic regression
  • K-means clustering
  • Gradient descent

Example Python Machine Learning Questions

Here are some common Python algorithm coding questions that you might see in a machine learning interview:

Q1. Given two strings, string1 and string2, write a function str_map to determine if there exists a one-to-one correspondence between the characters of string1 and string2.

The easiest way to solve this problem is to check conditions that make a bijection between string characters impossible. We return ‘False’ if our strings fit a condition, and ‘True’ otherwise.

Q2. Write a function shortest_transformation to find the length of the shortest transformation sequence from begin_word to end_word through the elements of word_list.

Note that only one letter can be changed at a time and each transformed word in the list must exist.

Try this question on Interview Query.

Q3. Write a function given the formatted list of dictionaries above to return a list of matches along with scheduled times.

You’re given a list of people to match together in a pool of candidates.

We want to match up people based on two ways:

  1. A hard filter on scheduled availability
  2. A closest match based on similar interests

In this scenario, we would return a match of Bob and Joe while also matching Carolyn and Dan. Even though Carolyn and Dan don’t have any interest overlap, Carolyn is the only one with availability to meet Dan’s schedule.

The goal is to optimize the total number of matches first while then subsequently optimizing on matching based on interests.

Hint: For this problem we will use the Blossom Algorithm. This algorithm is suited for the problem because given a graph, it returns the largest weighted number of edges where no vertex is included more than once.

Q4. Given a dictionary with keys of letters and values of a list of letters, write a function closest_key to find the key with the input value closest to the beginning of the list.

Hint: With this question, ask: Is your computed distance always positive? Negative values for distance (for example between ‘c’ and ‘a’ instead of ‘a’ and ‘c’) will interfere with getting an accurate result.

Q5. Given two strings, string1 and string2, write a function max_substring to return the maximal substring shared by both strings.

Example:

Input:

string1 = 'mississippi'

string2 = 'mossyistheapple'

Try this question on Interview Query.

Example Writing an Algorithm from Scratch Problem

You might expect a Python algorithm writing question like this during a machine learning interview:

Q1. Build a K-nearest neighbors classification model from scratch with the following conditions:

  • Use Euclidean distance (aka, the “2 norm”) as your closeness metric
  • Your function should be able to handle data frames of arbitrary many rows and columns
  • If there is a tie in the class of the k nearest neighbors, rerun the search using k-1 neighbors instead
  • You may use pandas and numpy but NOT scikit-learn

Example

Input:

k = 5

new_point = [0.5,-2,8]

print(data)

...

Var1      Var2      Var3  Target

0  -3.279536  3.362223  2.847892       2

1  -0.791565  1.742475  2.151587       2

2  -0.785992 -0.938681 -0.459770       0

3  -1.068190  1.461051  0.127130       3

4  -0.367568 -0.870240 -0.225734       0

..       ...       ...       ...     ...

95 -1.327175  1.971085 -0.690689       2

96 -3.203714  1.847649  0.778901       2

97 -0.587640  0.647458  2.094385       2

98  0.363644 -0.509795  2.514191       1

99 -0.673498  2.955285  2.102122       4

[100 rows x 4 columns]

Output:

def kNN(k,data,new_point) -> 2

Try this question on Interview Query.

More Machine Learning Interview Resources