Prepare for and practice interview questions from Los Alamos National Laboratory.

Los Alamos National Laboratory Interview Questions

Los Alamos National Laboratory Interview Guides

A/B Testing

Button AB Test

Network Experiment Design

Green Dot

Testing Price Increase

Precisely ascertain whether the outcomes of an A/B test, executed to assess the impact of a landing page redesign, exhibit statistical significance.

Statistically Significant Test

Data Structures & Algorithms

Given two sorted lists, write a function to merge them into one sorted list.

Merge Sorted Lists

Find the missing integer from a array of consequtive integers

Find the Missing Number

One Element Removed

Given an array and a target integer, write a function that returns the indices of two integers in the array that add up to the target integer.

Target Indices

This problem involves identifying duplicate numbers in a list of integers. The function should return a list of the duplicate numbers.

Find Duplicate Numbers in a List

Machine Learning

What are the logistic and softmax functions? What is the difference between the two?

Softmax vs Logistic

Classification and Regression

Generating Discover Weekly

Bias vs. Variance Tradeoff

Let's say that you're training a classification model.   How would you combat overfitting when building tree-based models?

Overfit Avoidance

Brainteasers

How would you answer when an Interviewer asks why you applied to their company?

Why Do You Want to Work With Us

What do you tell an interviewer when they ask you what your strengths and weaknesses are?

Your Strengths and Weaknesses

Analytics

Strategically resolving misaligned expectations with stakeholders for a successful project outcome

Stakeholder Communication

How would you analyze how the feature is performing?

Recruiting Leads

When an interviewer asks a question along the lines of:

<ul>
<li>What would your current manager say about you? What constructive criticisms might he give?</li>
<li>What are your three biggest strengths and weaknesses you have identified in yourself?</li>
</ul>

How would you respond?

When asked about your strengths in an interview, what is an effective way to respond?

When asked about your strengths in an interview, what is an effective way to respond?

Your Strengths and Weaknesses I

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Your Strengths and Weaknesses II

When an interviewer asks you a question along the lines of:

<ul>
<li>Why did you apply to our company?</li>
<li>What are you looking for in your next job?</li>
<li>What makes you a good fit for our company?</li>
</ul>

How should you respond?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

Why Do You Want to Work With Us I

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

Why Do You Want to Work With Us II

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

Why Do You Want to Work With Us III

Let’s say that you’re training a classification model.

How would you combat overfitting when building tree-based models?

How would you handle the data preparation for building a machine learning model using imbalanced data?

Addressing imbalanced data in machine learning through carefully prepared techniques.

Data Preparation for Imbalanced Data

Talk about a time when you had trouble communicating with stakeholders. How were you able to overcome it?

What are the key differences between classification models and regression models?

What is the MOST important difference between regression and classification models?

What is the MOST important difference between regression and classification models?

Classification vs Regression

Imagine you are asked to build a machine learning model to decide new loan approvals for a
financial firm. You ask the data department in the company for a subset of data to get started
working on the problem. The data includes different features about applicants such as age,
occupation, zip code, height, number of children, favorite color, etc. You decide to build
multiple machine learning models to test out different ideas before settling on the best one.

How would you explain the bias-variance tradeoff with regards to
building and choosing a model to use?

Let’s say that you work at a B2B SAAS company that’s interested in testing the pricing of different levels of subscriptions.

Your project manager comes to you and asks you to run a two-week-long A/B test to test an increase in pricing.

How would you approach designing this test? How would you determine whether the increase in pricing is a good business decision?

Create a class named <code>LRUCache</code> that implements the behavior expected of a Least Recently Used (LRU) Cache.

The <code>LRUCache</code> class implements the following methods:

<ul>
<li><code>__init__(self, capacity)</code> - initializes the object and takes in a capacity parameter specifying the maximum capacity of the cache.</li>
<li><code>get_if_exists(self, key) -&gt; Any|None</code> - gets a value based on a key. If the key exists, returns the associated value. If the key has expired or did not exist in the first place, returns <code>None</code>.</li>
<li><code>put(self, key, value)</code> - places a value inside the cache. In the case wherein the cache is full, invalidates the least recently used element. When two keys collide, the older value should be invalidated in place of the new one.</li>
</ul>

Example:

<pre tabindex="0" class="chroma"><code>cache = LRUCache(2)
cache.put(&#34;sample&#34;, 55)
cache.get_if_exists(&#34;sample&#34;) # returns 55
cache.get_if_exists(&#34;key&#34;) # returns None
cache.put(&#34;hello&#34;, 10) 
cache.put(&#34;sample&#34;, 9)
cache.put(&#34;world&#34;, 5)
cache.get_if_exists(&#34;hello&#34;) # returns None
</code></pre>

Notes:

<ul>
<li>It is assured that the value in the <code>put</code> method will never receive a <code>None</code> value. Moreover, it is assured that the capacity will always be &gt; 0.</li>

<li>All operations must be O(1).</li>

<li>Aside from Python’s standard dictionary implementation (<code>{}</code>), you may not use any other built-in data structures.</li>
</ul>

LRU Cache 1

A team wants to A/B test multiple different changes through a sign-up funnel.

For example, on a page, a button is currently red and at the top of the page. They want to see if changing a button from red to blue and/or from the top of the page to the bottom of the page will increase click-through.

How would you set up this test?

Your team wants to run an AB test on the color (red or blue) and position (top/bottom of the page) of a button that links to a promotion.

How many different variants of the test should you run to see the effect that a red button has on the bottom of the page, if currently the button is blue and on the top of the page.

Your team wants to run an AB test on the color (red or blue) and position (top/bottom of the page) of a button that links to a promotion.

How many different variants of the test should you run to see the effect that a red button has on the bottom of the page, if currently the button is blue and on the top of the page.

A/B Test Variants

Let’s say we want to launch a re-design of a landing page to improve the click-through rate. We can do this by implementing an AB test.
Given that we launch an AB test, how would you infer if the results of the click-through rate were statistically significant or not?

How can you accurately conclude if the results of an A/B test, conducted to evaluate the effectiveness of a landing page redesign, are statistically significant?

How can you accurately conclude if the results of an A/B test, conducted to evaluate the effectiveness of a landing page redesign, are statistically significant?

Given an array and a target integer, write a function <code>sum_pair_indices</code> that returns the indices of two integers in the array that add up to the target integer. If not found, just return an empty list.

Note: Can you do it on \(O(n)\) time?

Note: Even though there could be many solutions, only one needs to be returned.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>array = [1 2 3 4] 
target = 5 
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def sum_pair_indices(array, target) -&gt; [0 3] or [1 2]
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>array = [3]
target = 6 
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>Do NOT return [0 0] as you can&#39;t use an index twice.
</code></pre>

What are the logistic and softmax functions? What is the difference between the two?

What makes them useful for use in logistic regression?

In logistic regression, how is the predicted class determined based on the logistic function?

In logistic regression, how is the predicted class determined based on the logistic function?

Softmax vs Logistic II

What is the key characteristic of the logistic function that makes it useful for mapping continuous values to a probability?

What is the key characteristic of the logistic function that makes it useful for mapping continuous values to a probability?

Softmax vs Logistic I

Given two sorted lists, write a function to merge them into one sorted list.

Bonus: What’s the time complexity?

Example:

Input:

<pre tabindex="0" class="chroma"><code>list1 = [1,2,5]
list2 = [2,4,6]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def merge_list(list1,list2) -&gt; [1,2,2,4,5,6]
</code></pre>

You have two sorted lists, one with m elements and one with n elements. What would be the time complexity of merging the two lists via merge sort?

You have two sorted lists, one with m elements and one with n elements. What would be the time complexity of merging the two lists via merge sort? 

Merging Lists Complexity

Let’s say you want to test the close friends feature on Instagram Stories.

How would you make a control group and test group to account for network effects?

What could be a potential risk when Facebook segments the metrics of the user interface change by market or demographic groups?

What could be a potential risk when Facebook segments the metrics of the user interface change by market or demographic groups?

Network Experiment Design I

What is the primary metric that Facebook should monitor if it decides to make the user interface of its posting feature more like Instagram's?

What is the primary metric that Facebook should monitor if it decides to make the user interface of its posting feature more like Instagram's?

Network Experiment Design II

If Facebook changes its user interface to mimic Instagram's, which adverse effect should they anticipate and monitor?

If Facebook changes its user interface to mimic Instagram's, which adverse effect should they anticipate and monitor?

Network Experiment Design III

You have an array of integers, <code>nums</code> of length <code>n</code> spanning <code>0</code> to <code>n</code> with one missing. Write a function <code>missing_number</code> that returns the missing number in the array.

Note: Complexity of \(O(n)\) required.

Example:

Input:

<pre tabindex="0" class="chroma"><code>nums = [0,1,2,4,5] 
missing_number(nums) -&gt; 3
</code></pre>

Given a list of integers, identify all the duplicate values in the list. Assume that the list can contain both positive and negative numbers, and the order of the list does not matter. A number is considered a duplicate if it appears more than once in the list. Return a list of the duplicate numbers.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>nums = [1, 2, 3, 1, 2, 3]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>find_duplicates(nums) -&gt; [1, 2, 3]
</code></pre>

The numbers 1, 2, and 3 all appear more than once in the list, so they are considered duplicates.

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>nums = [1, -1, 2, 3, 3, -1]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>find_duplicates(nums) -&gt; [-1, 3]
</code></pre>

The numbers -1 and 3 both appear more than once in the list, so they are considered duplicates. Note that the order of the output does not matter.

Example 3:

Input:

<pre tabindex="0" class="chroma"><code>nums = [1, 2, 3, 4, 5]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>find_duplicates(nums) -&gt; []
</code></pre>

None of the numbers in the list appear more than once, so there are no duplicates.

Let’s say that you’re the data scientist for the marketing/advertising division of your company. The marketing executive wants to test multiple new channels including:

<ul>
<li>Youtube ads</li>
<li>Google search ads</li>
<li>Facebook ads</li>
<li>Direct mail campaigns</li>
</ul>

Given these new marketing channels, how would you design an a/b test to utilize the marketing budget in the most efficient way possible?

Marketing Dollar Efficiency

Let’s say you’re a data scientist at LinkedIn where you’re working on a product that sends qualified job candidates to companies. The team has launched a new feature that allows candidates to message hiring managers at companies directly during the interview process to get updates on their status.

Due to engineering constraints, the company can’t AB test the feature before launching it.

How would you analyze how the feature is performing?

Let’s say you work at a food delivery company.

How would you measure the effectiveness of giving extra pay to delivery drivers during peak hours to meet the demand from consumers?

Extra Delivery Pay

Let’s say that you’re working at a global trading company and you’re analyzing the price of a particular asset over time. This dataset is super noisy, volatile, and you’re not guaranteed completely accurate data.

How would you analyze this data to ensure there aren’t any discrepancies?

Time Series Discrepancies

Forecasting & Time Series

Let’s say we’re working on a new feature for LinkedIn chat. We want to implement a green dot to show an “active user” but given engineering constraints, we can’t AB test it before release.

How would you analyze the effectiveness of this new feature?

Let’s say we want to build a new algorithm for Lyft Line.

How should we test it, measure success of the test, and eventually roll it out?

Building Lyft Line

Find all sets of 3 indexes of an array so that the three indexed elements add up to 0.

Note: You may assume the input array is sorted in ascending order.

<h3>Example</h3>

For example, if the input array is:

<pre tabindex="0" class="chroma"><code>array = [-2, -1, 1, 1, 3]
</code></pre>

The result should be:

<pre tabindex="0" class="chroma"><code>result = [{0,1,2}, {2,3,4}]
</code></pre>

Find all sets of 3 indexes whose elements add up to 0.

Three Indexes Adding Zero

Design a database to represent a Tinder style dating app. 

What does the schema look like and what are some optimizations that you think we might need?

Let’s say you are making a swiping app, similar to Tinder.

What would be the most efficient way (in terms of retrieval time) to store which users have swiped on each other on a database backend?

Let's say you are making a swiping app, similar to Tinder.

What would be the most efficient way (in terms of retrieval time) to store which users have swiped on each other on a database backend?

Swipes Database Design

Swiping App Design

Data Modeling

How would you decide whether or not to create a product, like a job board, for Facebook?

Imagine Facebook is creating a job board. How would you design an A/B test to evaluate its effectiveness?

Assessing the market potential and then use A/B testing to measure its effectiveness against user behavior

Facebook Job Board Design

You are given a graph, which is implemented as a 2D array. Each cell in the array represents a node and the value in the cell represents the cost to traverse to that node. You are also given a start node and an end node. Your task is to implement a function that finds the shortest path from the start node to the end node using any shortest path algorithm.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>graph = [[0, 4, 0, 0, 0, 0, 0, 8, 0],
 [4, 0, 8, 0, 0, 0, 0, 11, 0],
 [0, 8, 0, 7, 0, 4, 0, 0, 2],
 [0, 0, 7, 0, 9, 14, 0, 0, 0],
 [0, 0, 0, 9, 0, 10, 0, 0, 0],
 [0, 0, 4, 14, 10, 0, 2, 0, 0],
 [0, 0, 0, 0, 0, 2, 0, 1, 6],
 [8, 11, 0, 0, 0, 0, 1, 0, 7],
 [0, 0, 2, 0, 0, 0, 6, 7, 0]]
start = 0
end = 4
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>shortest_path(graph, start, end) -&gt; [0, 1, 2, 5, 6, 7, 8, 4]
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>graph = [[0, 5, 0, 9, 1],
 [5, 0, 3, 0, 2],
 [0, 3, 0, 0, 0],
 [9, 0, 0, 0, 0],
 [1, 2, 0, 0, 0]]
start = 0
end = 2
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>shortest_path(graph, start, end) -&gt; [0, 4, 1, 2]
</code></pre>

The task is to implement a shortest path algorithm (like Dijkstra's or Bellman-Ford) to find the shortest path from a start node to an end node in a given graph. The graph is represented as a 2D array where each cell represents a node and the value in the cell represents the cost to traverse to that node.

Shortest Path Algorithms

Over budget on a project is defined when the salaries, prorated to the day, exceed the budget of the project.

For example, if Alice and Bob both combined income make 200K and work on a project of a budget of 50K that takes half a year, then the project is over budget given 0.5 * 200K = 100K &gt; 50K.

Write a query to forecast the budget for all projects and return a label of <code>&#34;overbudget&#34;</code> if it is over budget and <code>&#34;within budget&#34;</code> otherwise.

Note: Assume that employees only work on one project at a time.

Example:

Input:

<code>employees</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>first_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>last_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>salary</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>department_id</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<code>projects</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>title</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>state_date</code></td>
<td>DATETIME</td>
</tr>

<tr>
<td><code>end_date</code></td>
<td>DATETIME</td>
</tr>

<tr>
<td><code>budget</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
departments table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>name</code></td>
<td>VARCHAR</td>
</tr>
</tbody>
</table>
<code>employee_projects</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>project_id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>employee_id</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
Output:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>title</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>project_forecast</code></td>
<td>VARCHAR</td>
</tr>
</tbody>
</table>

Over-Budget Projects

How would you build a machine learning system to generate Spotify’s discover weekly playlist?

There are two lists, <code>list X</code> and <code>list Y</code>. Both lists contain integers from <code>-1000</code> to <code>1000</code> and are identical to each other except that one integer is removed in <code>list Y</code> that exists in <code>list X</code>.

Write a function <code>one_element_removed</code> that takes in both lists and returns the integer that was removed in \(O(1)\) space and \(O(n)\) time without using the python set function.

Example:

Input:

<pre tabindex="0" class="chroma"><code>list_x = [1,2,3,4,5]
list_y = [1,2,4,5]

one_element_removed(list_x, list_y) -&gt; 3
</code></pre>

Which of the following methods would be the most efficient way of identifying what element is not included in two lists of integers that differ by a single element being removed?

For example, if the lists are <code>[1,2,3,4,5]</code> and <code>[1,2,4,5]</code>, the function should return <code>3</code>

Which of the following methods would be the most efficient way of identifying what element is not included in two lists of integers that differ by a single element being removed? 

For example, if the lists are `[1,2,3,4,5]` and `[1,2,4,5]`, the function should return `3`

Calculate Moving Average	SQL	Easy
Predict Customer Churn	Machine Learning	Medium
A/B Test Significance	Statistics	Medium
Optimize Query Performance	SQL	Hard
Feature Importance Analysis	Machine Learning	Medium
Clean Missing Data	Python	Easy
Neural Network Architecture	Deep Learning	Hard
Calculate Cohort Retention	SQL	Medium
Bayesian Probability	Statistics	Easy
Recommend Similar Products	Machine Learning	Hard

Los Alamos National Laboratory Interview Questions

Los Alamos National Laboratory Interview Guides

Los Alamos National Laboratory Interview Questions

Challenge

Discussion & Interview Experiences

Discussion & Interview Experiences