Prepare for and practice interview questions from Truveta across topics like Statistics, SQL, Machine Learning and more.

Truveta Interview Questions

Truveta Interview Guides

Data Structures & Algorithms

Given two sorted lists, write a function to merge them into one sorted list.

Merge Sorted Lists

Find the missing integer from a array of consequtive integers

Find the Missing Number

Given an integer N, write a function that returns all of the prime numbers up to N

Prime to N

One Element Removed

String Subsequence

Machine Learning

Booking Regression

Say you are given a dataset of perfectly linearly separable data. What would happen when you run logistic regression?

Perfectly Separable

Missing Housing Data

Bank Fraud Model

Bagging vs Boosting

A/B Testing

Button AB Test

Network Experiment Design

Green Dot

AB Test Ties

Testing Price Increase

Subscription Overlap

Monthly Customer Report

Over-Budget Projects

Total Spent on Products

Unique Work Days

Statistics

P-value to a Layman

Using R Squared

Possibly Biased Coin

Correlation in Regression

Multicollinearity in Regression

When an interviewer asks a question along the lines of:

<ul>
<li>What would your current manager say about you? What constructive criticisms might he give?</li>
<li>What are your three biggest strengths and weaknesses you have identified in yourself?</li>
</ul>

How would you respond?

When asked about your strengths in an interview, what is an effective way to respond?

When asked about your strengths in an interview, what is an effective way to respond?

Your Strengths and Weaknesses I

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Your Strengths and Weaknesses II

What do you tell an interviewer when they ask you what your strengths and weaknesses are?

Your Strengths and Weaknesses

Brainteasers

When an interviewer asks you a question along the lines of:

<ul>
<li>Why did you apply to our company?</li>
<li>What are you looking for in your next job?</li>
<li>What makes you a good fit for our company?</li>
</ul>

How should you respond?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

Why Do You Want to Work With Us I

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

Why Do You Want to Work With Us II

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

Why Do You Want to Work With Us III

How would you answer when an Interviewer asks why you applied to their company?

Why Do You Want to Work With Us

Describe a data project you worked on. What were some of the challenges you faced?

Describing a data project and its challenges

Hurdles In Data Projects

Analytics

How would you explain what a p-value is to someone who is not technical?

What does a p-value in a statistical test represent?

What does a p-value in a statistical test represent?

P-value to a Layman I

In a statistical test, how does a low p-value (less than 0.05) influence our decision about the null hypothesis?

In a statistical test, how does a low p-value (less than 0.05) influence our decision about the null hypothesis?

P-value to a Layman II

How would you handle the data preparation for building a machine learning model using imbalanced data?

Addressing imbalanced data in machine learning through carefully prepared techniques.

Data Preparation for Imbalanced Data

Let’s say we’re comparing two machine learning algorithms. In which case would you use a bagging algorithm versus a boosting algorithm? 

Give an example of the tradeoffs between the two.

In machine learning, when would you use a bagging algorithm over a boosting algorithm?

In machine learning, when would you use a bagging algorithm over a boosting algorithm?

Bagging vs. Boosting

Given a string, write a function to determine if it is palindrome or not.

Note: A palindrome is a word/string that is read the same way forward as it is backward, e.g. <code>&#39;reviver&#39;</code>, <code>&#39;madam&#39;</code>, <code>&#39;deified&#39;</code> and <code>&#39;civic&#39;</code> are all palindromes, while <code>&#39;tree&#39;</code>, <code>&#39;music&#39;</code> and <code>&#39;person&#39;</code> are not palindromes.

Example:

Input:

<pre tabindex="0" class="chroma"><code>word1 = &#34;tree&#34;
word2 = &#34;radar&#34;
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def is_palindrome(word1) -&gt; False
def is_palindrome(word2) -&gt; True
</code></pre>

Given a string, write a function to determine if it is palindrome or not.

String Palindromes

Let’s say that you work at a B2B SAAS company that’s interested in testing the pricing of different levels of subscriptions.

Your project manager comes to you and asks you to run a two-week-long A/B test to test an increase in pricing.

How would you approach designing this test? How would you determine whether the increase in pricing is a good business decision?

Create a class named <code>LRUCache</code> that implements the behavior expected of a Least Recently Used (LRU) Cache.

The <code>LRUCache</code> class implements the following methods:

<ul>
<li><code>__init__(self, capacity)</code> - initializes the object and takes in a capacity parameter specifying the maximum capacity of the cache.</li>
<li><code>get_if_exists(self, key) -&gt; Any|None</code> - gets a value based on a key. If the key exists, returns the associated value. If the key has expired or did not exist in the first place, returns <code>None</code>.</li>
<li><code>put(self, key, value)</code> - places a value inside the cache. In the case wherein the cache is full, invalidates the least recently used element. When two keys collide, the older value should be invalidated in place of the new one.</li>
</ul>

Example:

<pre tabindex="0" class="chroma"><code>cache = LRUCache(2)
cache.put(&#34;sample&#34;, 55)
cache.get_if_exists(&#34;sample&#34;) # returns 55
cache.get_if_exists(&#34;key&#34;) # returns None
cache.put(&#34;hello&#34;, 10) 
cache.put(&#34;sample&#34;, 9)
cache.put(&#34;world&#34;, 5)
cache.get_if_exists(&#34;hello&#34;) # returns None
</code></pre>

Notes:

<ul>
<li>It is assured that the value in the <code>put</code> method will never receive a <code>None</code> value. Moreover, it is assured that the capacity will always be &gt; 0.</li>

<li>All operations must be O(1).</li>

<li>Aside from Python’s standard dictionary implementation (<code>{}</code>), you may not use any other built-in data structures.</li>
</ul>

LRU Cache 1

A team wants to A/B test multiple different changes through a sign-up funnel.

For example, on a page, a button is currently red and at the top of the page. They want to see if changing a button from red to blue and/or from the top of the page to the bottom of the page will increase click-through.

How would you set up this test?

Your team wants to run an AB test on the color (red or blue) and position (top/bottom of the page) of a button that links to a promotion.

How many different variants of the test should you run to see the effect that a red button has on the bottom of the page, if currently the button is blue and on the top of the page.

Your team wants to run an AB test on the color (red or blue) and position (top/bottom of the page) of a button that links to a promotion.

How many different variants of the test should you run to see the effect that a red button has on the bottom of the page, if currently the button is blue and on the top of the page.

A/B Test Variants

Let’s say we want to launch a re-design of a landing page to improve the click-through rate. We can do this by implementing an AB test.
Given that we launch an AB test, how would you infer if the results of the click-through rate were statistically significant or not?

How can you accurately conclude if the results of an A/B test, conducted to evaluate the effectiveness of a landing page redesign, are statistically significant?

How can you accurately conclude if the results of an A/B test, conducted to evaluate the effectiveness of a landing page redesign, are statistically significant?

Statistically Significant Test

Precisely ascertain whether the outcomes of an A/B test, executed to assess the impact of a landing page redesign, exhibit statistical significance.

Given an array and a target integer, write a function <code>sum_pair_indices</code> that returns the indices of two integers in the array that add up to the target integer. If not found, just return an empty list.

Note: Can you do it on \(O(n)\) time?

Note: Even though there could be many solutions, only one needs to be returned.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>array = [1 2 3 4] 
target = 5 
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def sum_pair_indices(array, target) -&gt; [0 3] or [1 2]
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>array = [3]
target = 6 
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>Do NOT return [0 0] as you can&#39;t use an index twice.
</code></pre>

Given an array and a target integer, write a function that returns the indices of two integers in the array that add up to the target integer.

Target Indices

Let’s say that you work at a bank that wants to build a model to detect fraud on the platform.

The bank wants to implement a text messaging service in addition that will text customers when the model detects a fraudulent transaction in order for the customer to approve or deny the transaction with a text response.

How would we build this model?

Let’s say that you work at a bank that wants to build a model to detect fraud on the platform.

The bank wants to implement a text messaging service that will text customers when the model detects a fraudulent transaction in order for the customer to approve or deny the transaction with a text response.

Which statement is true?

Let's say that you work at a bank that wants to build a model to detect fraud on the platform.

The bank wants to implement a text messaging service that will text customers when the model detects a fraudulent transaction in order for the customer to approve or deny the transaction with a text response.

Which statement is true?

Given an integer <code>N</code>, write a function that returns a list of all of the prime numbers up to <code>N</code>.

Note: Return an empty list there are no prime numbers less than or equal to <code>N</code>.

Example:

Input:

<pre tabindex="0" class="chroma"><code>N = 3
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def prime_numbers(N) -&gt; [2,3]
</code></pre>

<h1>Binary Tree Validation</h1>

You are given the root of a binary tree. You need to determine if it is a valid binary search tree (BST).

A valid BST is defined as follows:

<ul>
<li>The left subtree of a node contains only nodes with values less than or equal to the node’s value.</li>
<li>The right subtree of a node contains only nodes with values greater than or equal to the node’s value.</li>
<li>Both the left and right subtrees must also be binary search trees.</li>
</ul>

Given the function <code>def is_valid_bst(root: Node) -&gt; bool:</code>, return True if the binary tree is a valid BST. Otherwise, return False.

<h3>Example:</h3>

Input:

<img src="https://d2qpirhrfplx04.cloudfront.net/9664817c-3e12-4a40-9236-8382b13f55a0.png" alt="image"/>Converted Binary Tree.png

Output:

<pre tabindex="0" class="chroma"><code>def is_valid_bst(Node(3)) -&gt; True
</code></pre>

Given the root node, verify if a binary search tree is valid or not.

Binary Tree Validation

Let’s say you are working on Google Docs. A product manager comes to you and asks how the product is doing. 

What are the top five metrics that you would start tracking to understand the health of Google Docs?

Docs Metrics

Product Sense & Metrics

The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1. It is often used in algorithm examples, and is defined by the following formula: F(n) = F(n-1) + F(n-2), with F(0) = 0 and F(1) = 1.

Your task is to implement the Fibonacci algorithm in three different methods:
1. Recursively
2. Iteratively
3. Using Memoization

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>n = 5
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>fibonacci(n) -&gt; 5
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>n = 10
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>fibonacci(n) -&gt; 55
</code></pre>

The Fibonacci sequence starts as follows: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55…

This question requires the implementation of the Fibonacci sequence using three different methods: recursively, iteratively, and using memoization.

Implementing the Fibonacci Sequence in Three Different Methods

How would you tackle multicollinearity in multiple linear regression?

In the context of hypothesis testing, what are type I errors (type one errors) and type II errors (type two errors)? What is the difference between the two?

Bonus: Describe the probability of making each type of error mathematically.

What is the difference between type I and type II errors?

Type I and II Errors

Given two sorted lists, write a function to merge them into one sorted list.

Bonus: What’s the time complexity?

Example:

Input:

<pre tabindex="0" class="chroma"><code>list1 = [1,2,5]
list2 = [2,4,6]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def merge_list(list1,list2) -&gt; [1,2,2,4,5,6]
</code></pre>

You have two sorted lists, one with m elements and one with n elements. What would be the time complexity of merging the two lists via merge sort?

You have two sorted lists, one with m elements and one with n elements. What would be the time complexity of merging the two lists via merge sort? 

Merging Lists Complexity

Let’s say you want to test the close friends feature on Instagram Stories.

How would you make a control group and test group to account for network effects?

What could be a potential risk when Facebook segments the metrics of the user interface change by market or demographic groups?

What could be a potential risk when Facebook segments the metrics of the user interface change by market or demographic groups?

Network Experiment Design I

What is the primary metric that Facebook should monitor if it decides to make the user interface of its posting feature more like Instagram's?

What is the primary metric that Facebook should monitor if it decides to make the user interface of its posting feature more like Instagram's?

Network Experiment Design II

If Facebook changes its user interface to mimic Instagram's, which adverse effect should they anticipate and monitor?

If Facebook changes its user interface to mimic Instagram's, which adverse effect should they anticipate and monitor?

Network Experiment Design III

How do you detect and handle correlation between variables in linear regression? What will happen if you ignore the correlation in the regression model?

You have an array of integers, <code>nums</code> of length <code>n</code> spanning <code>0</code> to <code>n</code> with one missing. Write a function <code>missing_number</code> that returns the missing number in the array.

Note: Complexity of \(O(n)\) required.

Example:

Input:

<pre tabindex="0" class="chroma"><code>nums = [0,1,2,4,5] 
missing_number(nums) -&gt; 3
</code></pre>

We want to build a model to predict housing prices in the city of Seattle. We’ve scraped 100K sold listings over the past three years but found that around 20% of the listings are missing square footage data.

How do we deal with the missing data to construct our model?

We want to build a model to predict housing prices in the city of Seattle. We’ve scraped 100K sold listings over the past three years, but have discovered that around 20% of the listings are missing square footage data.

How would you approach dealing with this missing data, in order to construct the most useful predictive model possible?

We want to build a model to predict housing prices in the city of Seattle. We’ve scraped 100K sold listings over the past three years, but have discovered that around 20% of the listings are missing square footage data.

How would you approach dealing with this missing data, in order to construct the most useful predictive model possible?



Given a list of integers, identify all the duplicate values in the list. Assume that the list can contain both positive and negative numbers, and the order of the list does not matter. A number is considered a duplicate if it appears more than once in the list. Return a list of the duplicate numbers.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>nums = [1, 2, 3, 1, 2, 3]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>find_duplicates(nums) -&gt; [1, 2, 3]
</code></pre>

The numbers 1, 2, and 3 all appear more than once in the list, so they are considered duplicates.

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>nums = [1, -1, 2, 3, 3, -1]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>find_duplicates(nums) -&gt; [-1, 3]
</code></pre>

The numbers -1 and 3 both appear more than once in the list, so they are considered duplicates. Note that the order of the output does not matter.

Example 3:

Input:

<pre tabindex="0" class="chroma"><code>nums = [1, 2, 3, 4, 5]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>find_duplicates(nums) -&gt; []
</code></pre>

None of the numbers in the list appear more than once, so there are no duplicates.

This problem involves identifying duplicate numbers in a list of integers. The function should return a list of the duplicate numbers.

Find Duplicate Numbers in a List

Let’s say that you’re the data scientist for the marketing/advertising division of your company. The marketing executive wants to test multiple new channels including:

<ul>
<li>Youtube ads</li>
<li>Google search ads</li>
<li>Facebook ads</li>
<li>Direct mail campaigns</li>
</ul>

Given these new marketing channels, how would you design an a/b test to utilize the marketing budget in the most efficient way possible?

Marketing Dollar Efficiency

Write a function <code>fib</code> which takes an integer <code>n</code> and returns the nth fibonacci number. For this question, <code>fib</code> MUST be recursively defined.

Note: <code>for</code> and <code>while</code> keywords are disabled for this question, as well as <code>functools</code>. Any workarounds around this restriction is prohibited.

Example:

Input:

<pre tabindex="0" class="chroma"><code>n = 4
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def fib(n) -&gt; 3
# the sequence is [1, 1, 2, 3, 5, 8...]
</code></pre>

Implement a dynamic recursive fibonacci function.

Impossibly Iterative Fibonacci

You are given a binary tree of unique positive numbers. Each node in the binary tree is represented as a dictionary with the following keys:

<ul>
<li><code>data</code>: integer value stored in the node</li>
<li><code>left</code>: reference to the left child (or <code>None</code>)</li>
<li><code>right</code>: reference to the right child (or <code>None</code>)</li>
</ul>

<pre tabindex="0" class="chroma"><code>node = {
 &#34;data&#34;: 6,
 &#34;left&#34;: {
 &#34;data&#34;: 3,
 &#34;left&#34;: {...},
 &#34;right&#34;: {...}
 },
 &#34;right&#34;: {
 &#34;data&#34;: 9,
 &#34;left&#34;: {...},
 &#34;right&#34;: {...}
 }
}
</code></pre>

Given two node values as input (<code>value1</code> and <code>value2</code>), write a function to return the value of the nearest common ancestor (lowest node in the tree that has both nodes as descendants).

Note: If one of the nodes doesn’t exist in the tree, return <code>-1</code>.

Example:

Input:

<pre tabindex="0" class="chroma"><code># Diagram of the binary tree
&#39;&#39;&#39;
 6 
 / \ 
 3 9 
 / \
 2 11
 / \
 5 8
&#39;&#39;&#39;
value1 = 8
value2 = 2
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>common_ancestor(root,value1,value2) -&gt; 3
</code></pre>

Explanation:

<ul>
<li>Ancestors of <code>8</code>: <code>11 → 3 → 6</code></li>
<li>Ancestors of <code>2</code>: <code>3 → 6</code></li>
<li>Common ancestors: <code>3</code> and <code>6</code></li>
<li>The nearest common ancestor is <code>3</code>.</li>
</ul>

Write a function to return the value of the nearest node that is a parent to both nodes.

Nearest Common Ancestor

Given a <code>ListNode</code> representing the head of a linked list, write a function <code>is_cyclic</code> which returns <code>True</code> if a linked list is cyclic and <code>False</code> if it is not. A linked list is cyclic when there is no tail to the linked list, and the supposed tail is attached to another node inside the list itself, creating a cycle.

A <code>ListNode</code> is defined as:

<pre tabindex="0" class="chroma"><code>class ListNode:
	value: ListNode = None
 next: ListNode = None
</code></pre>

Example:

Input:

<img src="https://d2qpirhrfplx04.cloudfront.net/1a65651d-a483-4481-b073-2738d65ab9d7.png" alt="image"/>

Output:

<pre tabindex="0" class="chroma"><code>def is_cyclic(head) -&gt; True
</code></pre>

Cyclic Detection

Find if there is a path from a starting point to an ending point in a walled maze

Calculate Moving Average	SQL	Easy
Predict Customer Churn	Machine Learning	Medium
A/B Test Significance	Statistics	Medium
Optimize Query Performance	SQL	Hard
Feature Importance Analysis	Machine Learning	Medium
Clean Missing Data	Python	Easy
Neural Network Architecture	Deep Learning	Hard
Calculate Cohort Retention	SQL	Medium
Bayesian Probability	Statistics	Easy
Recommend Similar Products	Machine Learning	Hard

Truveta Interview Questions

Truveta Interview Guides

Truveta Interview Questions

Challenge

Truveta Salaries by Position

Discussion & Interview Experiences

Discussion & Interview Experiences