Prepare for and practice interview questions from Rand Corporation.

Rand Corporation Interview Questions

Rand Corporation Interview Guides

Machine Learning

Why would one algorithm generate different success rates with the same dataset?

Same Algorithm Different Success

What are the logistic and softmax functions? What is the difference between the two?

Softmax vs Logistic

How would you build a model to detect if a post on a marketplace is talking about selling a gun?

Detecting Firearm Sales

Classification and Regression

Xgboost vs Random Forest

Calculate the first touch attribution for each `user_id` that converted.

First Touch Attribution

Project Budget Error

Employee Project Budgets

Count the number of users that like each user

Liker's Likers

Write a query to create a pivot table that shows total sales for each branch by year

Branch Sales Pivot

Analytics

Write a SQL query to find the average number of right swipes for different ranking algorithms.

Swipe Precision

Describing a data project and its challenges

Hurdles In Data Projects

Strategically resolving misaligned expectations with stakeholders for a successful project outcome

Stakeholder Communication

Statistics

P-value to a Layman

Monotonic Function

Data Structures & Algorithms

String Mapping

This question requires the implementation of the Fibonacci sequence using three different methods: recursively, iteratively, and using memoization.

Implementing the Fibonacci Sequence in Three Different Methods

<p>When an interviewer asks a question along the lines of:</p>

<ul>
<li>What would your current manager say about you? What constructive criticisms might he give?</li>
<li>What are your three biggest strengths and weaknesses you have identified in yourself?</li>
</ul>

<p>How would you respond?</p>


<p>When asked about your strengths in an interview, what is an effective way to respond?</p>

When asked about your strengths in an interview, what is an effective way to respond?

Your Strengths and Weaknesses I

<p>Which of the following is an acceptable strategy when discussing weaknesses in an interview?</p>

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Your Strengths and Weaknesses II

What do you tell an interviewer when they ask you what your strengths and weaknesses are?

Your Strengths and Weaknesses

Brainteasers

<p>When an interviewer asks you a question along the lines of:</p>

<ul>
<li>Why did you apply to our company?</li>
<li>What are you looking for in your next job?</li>
<li>What makes you a good fit for our company?</li>
</ul>

<p>How should you respond?</p>


<p>When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?</p>

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

Why Do You Want to Work With Us I

<p>How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'</p>

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

Why Do You Want to Work With Us II

<p>When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?</p>

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

Why Do You Want to Work With Us III

How would you answer when an Interviewer asks why you applied to their company?

Why Do You Want to Work With Us

<p>Describe a data project you worked on. What were some of the challenges you faced?</p>


<p>How would you explain what a p-value is to someone who is not technical?</p>


<p>What does a p-value in a statistical test represent?</p>

What does a p-value in a statistical test represent?

P-value to a Layman I

<p>In a statistical test, how does a low p-value (less than 0.05) influence our decision about the null hypothesis?</p>

In a statistical test, how does a low p-value (less than 0.05) influence our decision about the null hypothesis?

P-value to a Layman II

<p>Talk about a time when you had trouble communicating with stakeholders. How were you able to overcome it?</p>


<p>Imagine you are asked to build a machine learning model to decide new loan approvals for a
financial firm. You ask the data department in the company for a subset of data to get started
working on the problem. The data includes different features about applicants such as age,
occupation, zip code, height, number of children, favorite color, etc. You decide to build
multiple machine learning models to test out different ideas before settling on the best one.</p>

<p>How would you explain the bias-variance tradeoff with regards to
building and choosing a model to use?</p>


Bias vs. Variance Tradeoff

<p>Explain the difference between the XGBoost and random forest algorithms and give an example where you would use one over the other.</p>


<p>Why might a data scientist choose to use XGBoost instead of Random Forest for a particular machine learning task?</p>


Why might a data scientist choose to use XGBoost instead of Random Forest for a particular machine learning task?

XGBoost vs Random Forest: Choice

<p>What are the key differences between classification models and regression models?</p>


<p>What is the MOST important difference between regression and classification models?</p>


What is the MOST important difference between regression and classification models?

Classification vs Regression

<p>What’s the relationship between PCA and K-means clustering?</p>


<p>What does the variable “k” in k-means clustering refer to?</p>


What does the variable "k" in k-means clustering refer to?

Input of K-means

PCA and K-Means

<p>The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1. It is often used in algorithm examples, and is defined by the following formula: F(n) = F(n-1) + F(n-2), with F(0) = 0 and F(1) = 1.</p>

<p>Your task is to implement the Fibonacci algorithm in three different methods:
1. Recursively
2. Iteratively
3. Using Memoization</p>

<p><strong>Example 1:</strong></p>

<p><strong>Input:</strong></p>

<pre tabindex="0" class="chroma"><code><span class="line"><span class="cl"><span class="n">n</span> <span class="o">=</span> <span class="mi">5</span>
</span></span></code></pre>

<p><strong>Output:</strong></p>

<pre tabindex="0" class="chroma"><code><span class="line"><span class="cl"><span class="n">fibonacci</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="mi">5</span>
</span></span></code></pre>

<p><strong>Example 2:</strong></p>

<p><strong>Input:</strong></p>

<pre tabindex="0" class="chroma"><code><span class="line"><span class="cl"><span class="n">n</span> <span class="o">=</span> <span class="mi">10</span>
</span></span></code></pre>

<p><strong>Output:</strong></p>

<pre tabindex="0" class="chroma"><code><span class="line"><span class="cl"><span class="n">fibonacci</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="mi">55</span>
</span></span></code></pre>

<p>The Fibonacci sequence starts as follows: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55…</p>


<p>Let’s say you’re a data engineer at Fidelity Investments, and you’re running a SQL query on a cloud-based data warehouse. All cluster resources and network health metrics look normal, but the query is still taking over 10 minutes to complete.</p>

<p>How would you go about diagnosing and improving the performance of this query?</p>


How would you diagnose and speed up a slow SQL query when system metrics look healthy?

Slow SQL Query

Query Optimization

<p>What does the backpropagation algorithm do in the context of neural networks? What is the informal intuition behind the algorithm? What are some drawbacks of the algorithm compared to other optimization methods?</p>

<p><em>Bonus: Formally derive the backpropagation algorithm and prove that it does what it claims to do.</em></p>


Backpropagation Explanation

<p>Let’s say that you work on the revenue forecasting team at a company like Facebook.</p>

<p>An executive comes to you asking about how much revenue Facebook will make in the coming year.</p>

<p>How would you forecast revenue for the next year?</p>


Forecasting New Year Revenue

Forecasting & Time Series

<p>What are the logistic and softmax functions? What is the difference between the two?</p>

<p>What makes them useful for use in logistic regression?</p>


<p>In logistic regression, how is the predicted class determined based on the logistic function?</p>


In logistic regression, how is the predicted class determined based on the logistic function?

Softmax vs Logistic II

<p>What is the key characteristic of the logistic function that makes it useful for mapping continuous values to a probability?</p>


What is the key characteristic of the logistic function that makes it useful for mapping continuous values to a probability?

Softmax vs Logistic I

<p>As a data scientist at a real estate analytics startup, you’re tasked with building a predictive model for home prices in a large metropolitan area. During your exploratory data analysis, you notice that the <code>home_prices</code> column in your dataset is heavily right-skewed (i.e., there are many lower-priced homes and a few extremely high-priced ones).</p>

<p>How might this skewness impact your modeling approach, and what steps would you consider to address it before training your model?</p>

<p><em>Bonus: If, instead, you discover that the target variable (<code>home_prices</code>) is heavily left-skewed (i.e., there are many high-priced homes and a few low-priced outliers), what would you do differently, if anything?</em></p>


You're building a model to predict home prices and you see that the prices in your training data are right-skewed. How would you handle this situation?

Skewed Pricing

<p>Let’s say you work at a food delivery company.</p>

<p>How would you measure the effectiveness of giving extra pay to delivery drivers during peak hours to meet the demand from consumers?</p>


Extra Delivery Pay

A/B Testing

<p>Why would the same machine learning algorithm generate different success rates using the same dataset?</p>

<p><strong>Note:</strong> <em>When they ask us an ambiguous question, we need to gather context and restate it in a way that’s clear for us to answer.</em></p>


<p>Given two strings, <code>string1</code> and <code>string2</code>, write a function <code>str_map</code> to determine if there exists a one-to-one correspondence (bijection) between the characters of <code>string1</code> and <code>string2</code>.</p>

<p>For the two strings, our correspondence must be between characters in the same position/index.</p>

<p><strong>Example 1:</strong></p>

<p><strong>Input:</strong></p>

<pre tabindex="0" class="chroma"><code><span class="line"><span class="cl"><span class="n">string1</span> <span class="o">=</span> <span class="s1">&#39;qwe&#39;</span>
</span></span><span class="line"><span class="cl"><span class="n">string2</span> <span class="o">=</span> <span class="s1">&#39;asd&#39;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">string_map</span><span class="p">(</span><span class="n">string1</span><span class="p">,</span> <span class="n">string2</span><span class="p">)</span> <span class="o">==</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="c1"># q = a, w = s, and e = d</span>
</span></span></code></pre>

<p><strong>Example 2:</strong></p>

<p><strong>Input:</strong></p>

<pre tabindex="0" class="chroma"><code><span class="line"><span class="cl"><span class="n">string1</span> <span class="o">=</span> <span class="s1">&#39;donut&#39;</span>
</span></span><span class="line"><span class="cl"><span class="n">string2</span> <span class="o">=</span> <span class="s1">&#39;fatty&#39;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">string_map</span><span class="p">(</span><span class="n">string1</span><span class="p">,</span> <span class="n">string2</span><span class="p">)</span> <span class="o">==</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl"><span class="c1"># cannot map two distinct characters to two equal characters</span>
</span></span></code></pre>

<p><strong>Example 3:</strong></p>

<p><strong>Input:</strong></p>

<pre tabindex="0" class="chroma"><code><span class="line"><span class="cl"><span class="n">string1</span> <span class="o">=</span> <span class="s1">&#39;enemy&#39;</span>
</span></span><span class="line"><span class="cl"><span class="n">string2</span> <span class="o">=</span> <span class="s1">&#39;enemy&#39;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">string_map</span><span class="p">(</span><span class="n">string1</span><span class="p">,</span> <span class="n">string2</span><span class="p">)</span> <span class="o">==</span> <span class="kc">True</span>
</span></span><span class="line"><span class="cl"><span class="c1"># there exists a one-to-one correspondence between equivalent strings</span>
</span></span></code></pre>

<p><strong>Example 4:</strong></p>

<p><strong>Input:</strong></p>

<pre tabindex="0" class="chroma"><code><span class="line"><span class="cl"><span class="n">string1</span> <span class="o">=</span> <span class="s1">&#39;enemy&#39;</span>
</span></span><span class="line"><span class="cl"><span class="n">string2</span> <span class="o">=</span> <span class="s1">&#39;ymene&#39;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">string_map</span><span class="p">(</span><span class="n">string1</span><span class="p">,</span> <span class="n">string2</span><span class="p">)</span> <span class="o">==</span> <span class="kc">False</span>
</span></span><span class="line"><span class="cl"><span class="c1"># since our correspondence must be between characters of the same index, this case returns &#39;False&#39; as we must map e = y AND e = e</span>
</span></span></code></pre>


<p>We’re given two tables. One is named <code>projects</code> and the other maps employees to the projects they’re working on.</p>

<p>Write a query to get the top five most expensive projects by budget to employee count ratio.</p>

<p><em>Note: Exclude projects with 0 employees. Assume each employee works on only one project.</em></p>

<p><code>projects</code> table</p>

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>title</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>state_date</code></td>
<td>DATETIME</td>
</tr>

<tr>
<td><code>end_date</code></td>
<td>DATETIME</td>
</tr>

<tr>
<td><code>budget</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<p><code>employee_projects</code> table</p>

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>project_id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>employee_id</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<p><strong>Output:</strong></p>

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>title</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>budget_per_employee</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>


<p>Let’s say you are designing a marketplace for your website.</p>

<p>Selling firearms is prohibited by your website’s Terms of Service Agreement (not to mention the laws of your country). To this end, you want to create a system that can automatically detect if a listing on the marketplace is selling a gun.</p>

<p>How would you go about doing this?</p>


<p>While designing a system to detect firearms listings on a marketplace website, what metrics could be used to minimize both false positives and false negatives?</p>

While designing a system to detect firearms listings on a marketplace website, what metrics could be used to minimize both false positives and false negatives?

<p>There are two tables. One table is called <code>swipes</code> that holds a row for every Tinder swipe and contains a boolean column that determines if the swipe was a right or left swipe called <code>is_right_swipe</code>. The second is a table named <code>variants</code> that determines which user has which variant of an AB test.</p>

<p>Write a SQL query to output the average number of right swipes for two different variants of a feed ranking algorithm by comparing users that have swiped 10, 50, and 100 swipes in a <code>feed_change</code> experiment.</p>

<p><em>Note: Users have to have swiped at least 10 times to be included in the subset of users to analyze the mean number of right swipes.</em></p>

<p><strong>Example:</strong></p>

<p><strong>Input:</strong></p>

<p><code>variants</code> table</p>

<table>
<thead>
<tr>
<th>Columns</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>experiment</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>variant</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>user_id</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<p><code>swipes</code> table</p>

<table>
<thead>
<tr>
<th>Columns</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>user_id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>swiped_user_id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>created_at</code></td>
<td>DATETIME</td>
</tr>

<tr>
<td><code>is_right_swipe</code></td>
<td>BOOLEAN</td>
</tr>
</tbody>
</table>
<p><strong>Output:</strong></p>

<table>
<thead>
<tr>
<th>Columns</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>variant</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>mean_right_swipes</code></td>
<td>FLOAT</td>
</tr>

<tr>
<td><code>swipe_threshold</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>num_users</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>


<p>Let’s say you work at a food delivery startup.</p>

<p>The customer success team is asking for a standardized refund policy for the company after historically processing each case differently.</p>

<p>How would you create a policy for refunds with regards to balancing customer sentiment and goodwill versus revenue tradeoffs?</p>


How would you create a policy for refunds with regards to balancing customer sentiment and goodwill versus revenue tradeoffs?

Food Delivery Refund Policy

Business Case

<p>The schema below is for a retail online shopping company consisting of two tables, <code>attribution</code> and <code>user_sessions</code>.</p>

<ul>
<li><p>The attribution table logs a session visit for each row.</p></li>

<li><p>If <code>conversion</code> is <code>true</code>, then the user converted to buying on that session.</p></li>

<li><p>The <code>channel</code> column represents which advertising platform the user was attributed to for that specific session.</p></li>

<li><p>Lastly the <code>user_sessions</code> table maps many to one session visits back to one user.</p></li>
</ul>

<p>First touch attribution is defined as the channel with which the converted user was associated when they first discovered the website.</p>

<p>Calculate the first touch attribution for each <code>user_id</code> that converted. </p>

<p><strong>Example:</strong></p>

<p><strong>Input:</strong></p>

<p><code>attribution</code> table</p>

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>session_id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>channel</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>conversion</code></td>
<td>BOOLEAN</td>
</tr>
</tbody>
</table>
<p><code>user_sessions</code> table</p>

<table>
<thead>
<tr>
<th>column</th>
<th>type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>session_id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>created_at</code></td>
<td>DATETIME</td>
</tr>

<tr>
<td><code>user_id</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<p><strong>Example output:</strong></p>

<table>
<thead>
<tr>
<th><code>user_id</code></th>
<th><code>channel</code></th>
</tr>
</thead>

<tbody>
<tr>
<td>123</td>
<td>facebook</td>
</tr>

<tr>
<td>145</td>
<td>google</td>
</tr>

<tr>
<td>153</td>
<td>facebook</td>
</tr>

<tr>
<td>172</td>
<td>organic</td>
</tr>

<tr>
<td>173</td>
<td>email</td>
</tr>
</tbody>
</table>


<p>We’re given two tables. One is named <code>projects</code> and the other maps employees to the projects they’re working on. </p>

<p>We want to select the five most expensive projects by budget to employee count ratio. But let’s say that we’ve found a bug where there exist duplicate rows in the <code>employee_projects</code> table.</p>

<p>Write a query to account for the error and select the top five most expensive projects by budget to employee count ratio.</p>

<h3>Schema</h3>

<p><strong>Input:</strong></p>

<p><code>projects</code> table</p>

<table>
<thead>
<tr>
<th>column</th>
<th>type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>title</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>state_date</code></td>
<td>DATETIME</td>
</tr>

<tr>
<td><code>end_date</code></td>
<td>DATETIME</td>
</tr>

<tr>
<td><code>budget</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<p><code>employee_projects</code> table</p>

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>project_id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>employee_id</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<p><strong>Output:</strong></p>

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>title</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>budget_per_employee</code></td>
<td>FLOAT</td>
</tr>
</tbody>
</table>

<h3>Example</h3>

<p><strong>Input:</strong></p>

<p><strong><code>projects</code> table</strong></p>

<table>
<thead>
<tr>
<th>id</th>
<th>title</th>
<th>start_date</th>
<th>end_date</th>
<th>budget</th>
</tr>
</thead>

<tbody>
<tr>
<td>1</td>
<td>party</td>
<td>2006-04-17 00:00:00</td>
<td>2006-05-19 00:00:00</td>
<td>25875</td>
</tr>

<tr>
<td>2</td>
<td>diversity</td>
<td>2017-08-09 00:00:00</td>
<td>2017-09-18 00:00:00</td>
<td>77867</td>
</tr>

<tr>
<td>3</td>
<td>integration</td>
<td>2005-06-29 00:00:00</td>
<td>2005-07-19 00:00:00</td>
<td>75987</td>
</tr>

<tr>
<td>4</td>
<td>testing</td>
<td>2009-01-12 00:00:00</td>
<td>2009-05-03 00:00:00</td>
<td>35946</td>
</tr>

<tr>
<td>5</td>
<td>launch</td>
<td>2005-05-02 00:00:00</td>
<td>2005-10-16 00:00:00</td>
<td>66292</td>
</tr>

<tr>
<td>6</td>
<td>meet</td>
<td>2001-09-09 00:00:00</td>
<td>2002-04-04 00:00:00</td>
<td>71243</td>
</tr>

<tr>
<td>7</td>
<td>payroll</td>
<td>2019-09-25 00:00:00</td>
<td>2020-05-05 00:00:00</td>
<td>97071</td>
</tr>

<tr>
<td>8</td>
<td>admin</td>
<td>2000-07-08 00:00:00</td>
<td>2000-09-10 00:00:00</td>
<td>24000</td>
</tr>

<tr>
<td>9</td>
<td>petronas</td>
<td>2016-07-01 00:00:00</td>
<td>2016-11-28 00:00:00</td>
<td>37088</td>
</tr>

<tr>
<td>10</td>
<td>bel</td>
<td>2005-08-03 00:00:00</td>
<td>2006-04-29 00:00:00</td>
<td>61937</td>
</tr>
</tbody>
</table>
<p><strong><code>employee_projects</code> table</strong></p>

<table>
<thead>
<tr>
<th>project_id</th>
<th>employee_id</th>
</tr>
</thead>

<tbody>
<tr>
<td>1</td>
<td>7</td>
</tr>

<tr>
<td>1</td>
<td>9</td>
</tr>

<tr>
<td>2</td>
<td>6</td>
</tr>

<tr>
<td>3</td>
<td>1</td>
</tr>

<tr>
<td>3</td>
<td>2</td>
</tr>

<tr>
<td>4</td>
<td>5</td>
</tr>

<tr>
<td>5</td>
<td>3</td>
</tr>

<tr>
<td>5</td>
<td>4</td>
</tr>

<tr>
<td>6</td>
<td>8</td>
</tr>

<tr>
<td>8</td>
<td>10</td>
</tr>

<tr>
<td>9</td>
<td>11</td>
</tr>

<tr>
<td>10</td>
<td>12</td>
</tr>

<tr>
<td>5</td>
<td>5</td>
</tr>

<tr>
<td>9</td>
<td>11</td>
</tr>

<tr>
<td>6</td>
<td>8</td>
</tr>

<tr>
<td>4</td>
<td>5</td>
</tr>
</tbody>
</table>
<p><strong>Output:</strong></p>

<table>
<thead>
<tr>
<th>title</th>
<th>budget_per_employee</th>
</tr>
</thead>

<tbody>
<tr>
<td>diversity</td>
<td>77867</td>
</tr>

<tr>
<td>meet</td>
<td>71243</td>
</tr>

<tr>
<td>bel</td>
<td>61937</td>
</tr>

<tr>
<td>integration</td>
<td>37993.5</td>
</tr>

<tr>
<td>petronas</td>
<td>37088</td>
</tr>
</tbody>
</table>


<p>A dating website’s schema is represented by a table of people that like other people. The table has three columns. One column is the <code>user_id</code>, another column is the <code>liker_id</code> which is the <code>user_id</code> of the user doing the liking, and the last column is the <code>date time</code> that the like occurred.</p>

<p>Write a query to count the number of liker’s likers (the users that like the likers) if the liker has one.</p>

<p><strong>Example:</strong></p>

<p><strong>Input:</strong></p>

<p><code>likes</code> table</p>

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>user_id</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>created_at</code></td>
<td>DATETIME</td>
</tr>

<tr>
<td><code>liker_id</code></td>
<td>VARCHAR</td>
</tr>
</tbody>
</table>
<p><strong>Output:</strong></p>

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>user</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>count</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>


<p>Your company, a multinational retail corporation, has been storing sales data from various branches worldwide in separate tables according to the year the sales were made. The current data structure is proving inefficient for business analytics and the management has requested your expertise to streamline the data.</p>

<p>Write a query to create a pivot table that shows total sales for each branch by year.</p>

<p><em>Note: Assume that the sales are represented by the <code>total_sales</code> column and are in USD. Each branch is represented by its unique <code>branch_id</code></em>.</p>

<h3><strong>Example:</strong></h3>

<p><strong>Input:</strong></p>

<p>For simplicity, consider two years: 2021 and 2022.</p>

<p><code>sales_2021</code> table</p>

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td>id</td>
<td>INTEGER</td>
</tr>

<tr>
<td>branch_id</td>
<td>INTEGER</td>
</tr>

<tr>
<td>total_sales</td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<p><code>sales_2022</code> table</p>

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td>id</td>
<td>INTEGER</td>
</tr>

<tr>
<td>branch_id</td>
<td>INTEGER</td>
</tr>

<tr>
<td>total_sales</td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<p><strong>Output:</strong></p>

<p><code>sales_pivot</code> table</p>

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td>branch_id</td>
<td>INTEGER</td>
</tr>

<tr>
<td>total_sales_2021</td>
<td>INTEGER</td>
</tr>

<tr>
<td>total_sales_2022</td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<p>This output pivot table shows the total sales for each branch, broken down by year.</p>


<p>What does it mean for a function to be monotonic?</p>

<p>Why is it important that a transformation applied to a metric is monotonic?</p>


<p>Let’s say you’ve just finished building a churn prediction model for a subscription service, and the team is considering deploying it to automatically trigger retention emails for users at risk of leaving.</p>

<p>How would you ensure this model is ready for production, and what steps would you take to monitor its performance after launch?</p>


How would you prepare and monitor a churn model before and after production deployment?

Calculate Moving Average	SQL	Easy
Predict Customer Churn	Machine Learning	Medium
A/B Test Significance	Statistics	Medium
Optimize Query Performance	SQL	Hard
Feature Importance Analysis	Machine Learning	Medium
Clean Missing Data	Python	Easy
Neural Network Architecture	Deep Learning	Hard
Calculate Cohort Retention	SQL	Medium
Bayesian Probability	Statistics	Easy
Recommend Similar Products	Machine Learning	Hard

Rand Corporation Interview Questions

Rand Corporation Interview Guides

Rand Corporation Interview Questions

Challenge

Discussion & Interview Experiences

Discussion & Interview Experiences