Prepare for and practice interview questions from Ginger.

Ginger Interview Questions

Ginger Interview Guides

Data Structures & Algorithms

Given a string, write a function to find its first recurring character.

Recurring Character

Given an integer N, write a function that returns all of the prime numbers up to N

Prime to N

Write code to generate a sample from a multinomial distribution with keys 

Drawing Balls From Bin

String Mapping

Write a function to retrieve the combination that allows you to spend all of your store credit while getting at least two books at the lowest weight.

Book Combinations

Machine Learning

Coefficients of Logistic Regression

Bagging vs Boosting

Random Forest Explanation

choosing k value during k-means clustering

Choosing k

Xgboost vs Random Forest

A/B Testing

Experiment Validity

Button AB Test

Network Experiment Design

Non-Normal AB Testing

The role of A/B testing in measuring the success rate of an analytics experiment

Success Measurement

Select the 2nd highest salary in the engineering department

2nd Highest Salary

Get the top 3 highest employee salaries by department

Top Three Salaries

Top 3 Users

Size of Joins

Statistics

P-value to a Layman

Correlation in Regression

Multicollinearity in Regression

What is the difference between type I and type II errors?

Type I and II Errors

When an interviewer asks a question along the lines of:

<ul>
<li>What would your current manager say about you? What constructive criticisms might he give?</li>
<li>What are your three biggest strengths and weaknesses you have identified in yourself?</li>
</ul>

How would you respond?

When asked about your strengths in an interview, what is an effective way to respond?

When asked about your strengths in an interview, what is an effective way to respond?

Your Strengths and Weaknesses I

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Your Strengths and Weaknesses II

What do you tell an interviewer when they ask you what your strengths and weaknesses are?

Your Strengths and Weaknesses

Brainteasers

When an interviewer asks you a question along the lines of:

<ul>
<li>Why did you apply to our company?</li>
<li>What are you looking for in your next job?</li>
<li>What makes you a good fit for our company?</li>
</ul>

How should you respond?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

Why Do You Want to Work With Us I

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

Why Do You Want to Work With Us II

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

Why Do You Want to Work With Us III

How would you answer when an Interviewer asks why you applied to their company?

Why Do You Want to Work With Us

Describe a data project you worked on. What were some of the challenges you faced?

Describing a data project and its challenges

Hurdles In Data Projects

Analytics

How would you explain what a p-value is to someone who is not technical?

What does a p-value in a statistical test represent?

What does a p-value in a statistical test represent?

P-value to a Layman I

In a statistical test, how does a low p-value (less than 0.05) influence our decision about the null hypothesis?

In a statistical test, how does a low p-value (less than 0.05) influence our decision about the null hypothesis?

P-value to a Layman II

How would you handle the data preparation for building a machine learning model using imbalanced data?

Addressing imbalanced data in machine learning through carefully prepared techniques.

Data Preparation for Imbalanced Data

Let’s say we’re comparing two machine learning algorithms. In which case would you use a bagging algorithm versus a boosting algorithm? 

Give an example of the tradeoffs between the two.

In machine learning, when would you use a bagging algorithm over a boosting algorithm?

In machine learning, when would you use a bagging algorithm over a boosting algorithm?

Bagging vs. Boosting

Talk about a time when you had trouble communicating with stakeholders. How were you able to overcome it?

Strategically resolving misaligned expectations with stakeholders for a successful project outcome

Stakeholder Communication

Given a string, write a function to determine if it is palindrome or not.

Note: A palindrome is a word/string that is read the same way forward as it is backward, e.g. <code>&#39;reviver&#39;</code>, <code>&#39;madam&#39;</code>, <code>&#39;deified&#39;</code> and <code>&#39;civic&#39;</code> are all palindromes, while <code>&#39;tree&#39;</code>, <code>&#39;music&#39;</code> and <code>&#39;person&#39;</code> are not palindromes.

Example:

Input:

<pre tabindex="0" class="chroma"><code>word1 = &#34;tree&#34;
word2 = &#34;radar&#34;
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def is_palindrome(word1) -&gt; False
def is_palindrome(word2) -&gt; True
</code></pre>

Given a string, write a function to determine if it is palindrome or not.

String Palindromes

Write a SQL query to select the 2nd highest salary in the engineering department.

Note: If more than one person shares the highest salary, the query should select the next highest salary.

Example:

Input:

<code>employees</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>first_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>last_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>salary</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>department_id</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<code>departments</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>name</code></td>
<td>VARCHAR</td>
</tr>
</tbody>
</table>
Output:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>salary</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>

Explain the difference between the XGBoost and random forest algorithms and give an example where you would use one over the other.

Why might a data scientist choose to use XGBoost instead of Random Forest for a particular machine learning task?

Why might a data scientist choose to use XGBoost instead of Random Forest for a particular machine learning task?

XGBoost vs Random Forest: Choice

How does random forest generate the forest? Additionally, why would we use it over other algorithms such as logistic regression?

What happens when you average the output on multiple decision trees?

What happens when you average the output on multiple decision trees?

Average Trees

Tell me about a project in which you had to clean and organize a large dataset.

Describing a real-world data cleaning and organization project

Data Cleaning Experiences

Data Pipelines

A team wants to A/B test multiple different changes through a sign-up funnel.

For example, on a page, a button is currently red and at the top of the page. They want to see if changing a button from red to blue and/or from the top of the page to the bottom of the page will increase click-through.

How would you set up this test?

Your team wants to run an AB test on the color (red or blue) and position (top/bottom of the page) of a button that links to a promotion.

How many different variants of the test should you run to see the effect that a red button has on the bottom of the page, if currently the button is blue and on the top of the page.

Your team wants to run an AB test on the color (red or blue) and position (top/bottom of the page) of a button that links to a promotion.

How many different variants of the test should you run to see the effect that a red button has on the bottom of the page, if currently the button is blue and on the top of the page.

A/B Test Variants

Given an array and a target integer, write a function <code>sum_pair_indices</code> that returns the indices of two integers in the array that add up to the target integer. If not found, just return an empty list.

Note: Can you do it on \(O(n)\) time?

Note: Even though there could be many solutions, only one needs to be returned.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>array = [1 2 3 4] 
target = 5 
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def sum_pair_indices(array, target) -&gt; [0 3] or [1 2]
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>array = [3]
target = 6 
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>Do NOT return [0 0] as you can&#39;t use an index twice.
</code></pre>

Given an array and a target integer, write a function that returns the indices of two integers in the array that add up to the target integer.

Target Indices

Let’s say that your company is running a standard control and variant AB test on a feature to increase conversion rates on the landing page. The PM checks the results and finds a .04 p-value.

How would you assess the validity of the result?

Let’s say you work at Allstate. Allstate is running <code>N</code> online ads right now. The table <code>ads</code> contains all those ads, ranked by popularity via the <code>id</code> column (e.g., the entry with <code>id = 1</code> is the most popular, etc.).

Create a subquery or common table expression named <code>top_ads</code> containing the top 3 ads (by popularity) and return the number of rows that would result from the following operations

<ol>
<li><code>ads INNER JOIN top_ads</code></li>
<li><code>ads LEFT JOIN top_ads</code></li>
<li><code>ads RIGHT JOIN top_ads</code></li>
<li><code>ads CROSS JOIN top_ads</code></li>
</ol>

Note: Please make the <code>join_type</code> column in your output have the values <code>inner_join</code>, <code>left_join</code>, etc. for each of their respective join types

Note: Please return only one query with each number in a different row

Example:

Input:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>name</code></td>
<td>VARCHAR</td>
</tr>
</tbody>
</table>
Output:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>join_type</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>number_of_rows</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>

Given an integer <code>N</code>, write a function that returns a list of all of the prime numbers up to <code>N</code>.

Note: Return an empty list there are no prime numbers less than or equal to <code>N</code>.

Example:

Input:

<pre tabindex="0" class="chroma"><code>N = 3
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def prime_numbers(N) -&gt; [2,3]
</code></pre>

The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1. It is often used in algorithm examples, and is defined by the following formula: F(n) = F(n-1) + F(n-2), with F(0) = 0 and F(1) = 1.

Your task is to implement the Fibonacci algorithm in three different methods:
1. Recursively
2. Iteratively
3. Using Memoization

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>n = 5
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>fibonacci(n) -&gt; 5
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>n = 10
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>fibonacci(n) -&gt; 55
</code></pre>

The Fibonacci sequence starts as follows: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55…

This question requires the implementation of the Fibonacci sequence using three different methods: recursively, iteratively, and using memoization.

Implementing the Fibonacci Sequence in Three Different Methods

Given the <code>employees</code> and <code>departments</code> table, write a query to get the top 3 highest employee salaries by department. If the department contains less that 3 employees, the top 2 or the top 1 highest salaries should be listed (assume that each department has at least 1 employee). 

Note: The output should include the full name of the employee in one column, the department name, and the salary. The output should be sorted by department name in ascending order and salary in descending order. 

Example:

Input:

<code>employees</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>first_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>last_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>salary</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>department_id</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<code>departments</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>name</code></td>
<td>VARCHAR</td>
</tr>
</tbody>
</table>
Output:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>employee_nam</code>e</td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>department_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>salary</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>

You’re given a string that may contain the characters <code>{</code>, <code>}</code>, <code>[</code>, <code>]</code>, <code>(</code>, and <code>)</code>.

Task: Verify that the string is balanced. A balanced string is one where every opening character, <code>{</code>, <code>[</code>, or <code>(</code>, has a corresponding closing character, <code>}</code>, <code>]</code>, or <code>)</code>.

Write a function called <code>is_balanced(string: str) -&gt; bool</code> which verifies the balance of a string.

Example:

<pre tabindex="0" class="chroma"><code>is_balanced(&#39;(())[]{}&#39;) -&gt; True
</code></pre>

<pre tabindex="0" class="chroma"><code>is_balanced(&#39;{([(){}])()}&#39;) -&gt; True
</code></pre>

<pre tabindex="0" class="chroma"><code>is_balanced(&#39;{}[]())&#39;) -&gt; False
</code></pre>

<hr/>

Write a function that tests whether a string of brackets is balanced.

The Brackets Problem

How would you tackle multicollinearity in multiple linear regression?

In the context of hypothesis testing, what are type I errors (type one errors) and type II errors (type two errors)? What is the difference between the two?

Bonus: Describe the probability of making each type of error mathematically.

Let’s say you want to test the close friends feature on Instagram Stories.

How would you make a control group and test group to account for network effects?

What could be a potential risk when Facebook segments the metrics of the user interface change by market or demographic groups?

What could be a potential risk when Facebook segments the metrics of the user interface change by market or demographic groups?

Network Experiment Design I

What is the primary metric that Facebook should monitor if it decides to make the user interface of its posting feature more like Instagram's?

What is the primary metric that Facebook should monitor if it decides to make the user interface of its posting feature more like Instagram's?

Network Experiment Design II

If Facebook changes its user interface to mimic Instagram's, which adverse effect should they anticipate and monitor?

If Facebook changes its user interface to mimic Instagram's, which adverse effect should they anticipate and monitor?

Network Experiment Design III

How do you detect and handle correlation between variables in linear regression? What will happen if you ignore the correlation in the regression model?

Given a list of integers, identify all the duplicate values in the list. Assume that the list can contain both positive and negative numbers, and the order of the list does not matter. A number is considered a duplicate if it appears more than once in the list. Return a list of the duplicate numbers.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>nums = [1, 2, 3, 1, 2, 3]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>find_duplicates(nums) -&gt; [1, 2, 3]
</code></pre>

The numbers 1, 2, and 3 all appear more than once in the list, so they are considered duplicates.

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>nums = [1, -1, 2, 3, 3, -1]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>find_duplicates(nums) -&gt; [-1, 3]
</code></pre>

The numbers -1 and 3 both appear more than once in the list, so they are considered duplicates. Note that the order of the output does not matter.

Example 3:

Input:

<pre tabindex="0" class="chroma"><code>nums = [1, 2, 3, 4, 5]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>find_duplicates(nums) -&gt; []
</code></pre>

None of the numbers in the list appear more than once, so there are no duplicates.

This problem involves identifying duplicate numbers in a list of integers. The function should return a list of the duplicate numbers.

Find Duplicate Numbers in a List

Given two strings, write a function to return <code>True</code> if the strings are anagrams of each other and <code>False</code> if they are not.

Note: A word is not an anagram of itself.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>string_1 = &#34;listen&#34;
string_2 = &#34;silent&#34;
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>True
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>string_1 = &#34;banana&#34;
string_2 = &#34;bandana&#34;
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>False
</code></pre>

Valid Anagram

Let’s you’re tasked with pitching a new feature for Google Home. Your co-worker comes to you with an idea to build a game feature for Google Home.

How would you go about deciding whether Google should build it?

Game Feature Home

Business Case

How would you choose the k value when using k-means clustering?

Write a function <code>fib</code> which takes an integer <code>n</code> and returns the nth fibonacci number. For this question, <code>fib</code> MUST be recursively defined.

Note: <code>for</code> and <code>while</code> keywords are disabled for this question, as well as <code>functools</code>. Any workarounds around this restriction is prohibited.

Example:

Input:

<pre tabindex="0" class="chroma"><code>n = 4
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def fib(n) -&gt; 3
# the sequence is [1, 1, 2, 3, 5, 8...]
</code></pre>

Implement a dynamic recursive fibonacci function.

Impossibly Iterative Fibonacci

Implement the k-means clustering algorithm in python from scratch, given the following:

<ul>
<li>A two-dimensional NumPy array <code>data_points</code> that is an arbitrary number of data points (rows) <code>n</code> and an arbitrary number of columns <code>m</code>.</li>
<li>Number of k clusters <code>k</code>.</li>
<li>The initial centroids value of the data points at each cluster <code>initial_centroids</code>.</li>
</ul>

Return a list of the cluster of each point in the original list data_points with the same order (as a integer).

Example

<img src="https://i.ibb.co/KKp2gPn/kemans-before.png" alt="before clustering"/>

After clustering the points with two clusters, the points will be clustered as follows.

<img src="https://i.ibb.co/bdwVWxR/kemans-example-after.png" alt="after clustering"/>

Note: There could be an infinite number of separating lines in this example.

Example

<pre tabindex="0" class="chroma"><code>
#Input
data_points = [(0,0),(3,4),(4,4),(1,0),(0,1),(4,3)]
k = 2
initial_centroids = [(1,1),(4,5)]


#Output 

k_means_clustering(data_points,k,initial_centroids) -&gt; [0,1,1,0,0,1]

</code></pre>

<img src="https://i.ibb.co/pRsRz12/example.png" alt=""/>

Implement the k-means clustering algorithm in python from scratch

Calculate Moving Average	SQL	Easy
Predict Customer Churn	Machine Learning	Medium
A/B Test Significance	Statistics	Medium
Optimize Query Performance	SQL	Hard
Feature Importance Analysis	Machine Learning	Medium
Clean Missing Data	Python	Easy
Neural Network Architecture	Deep Learning	Hard
Calculate Cohort Retention	SQL	Medium
Bayesian Probability	Statistics	Easy
Recommend Similar Products	Machine Learning	Hard

Ginger Interview Questions

Ginger Interview Guides

Ginger Interview Questions

Challenge

Discussion & Interview Experiences

Discussion & Interview Experiences