Prepare for and practice interview questions from Hy-Vee across topics like SQL, Python, Statistics and more.

Hy-Vee Interview Questions

Hy-Vee Interview Guides

Machine Learning

Bagging vs Boosting

Regularization and Validation

Replaced by QUESTION 791 - Should be deleted

Bias Variance Tradeoff

Youtube Recommendations

Xgboost vs Random Forest

Forecasting & Time Series

Significance Time Series

Why Do We Need Time Series Models?

Time Series Discrepancies

Brainteasers

How would you answer when an Interviewer asks why you applied to their company?

Why Do You Want to Work With Us

What do you tell an interviewer when they ask you what your strengths and weaknesses are?

Your Strengths and Weaknesses

Data Structures & Algorithms

Write a function to find the best days to buy and sell a stock and the profit you generate from the sale.

Buy or Sell

Given a string, write a function to find its first recurring character.

Recurring Character

Write a query to generate a shopping list that sums up the total mass of each grocery item required across three recipes.

Generate Shopping List from Recipes

When an interviewer asks a question along the lines of:

<ul>
<li>What would your current manager say about you? What constructive criticisms might he give?</li>
<li>What are your three biggest strengths and weaknesses you have identified in yourself?</li>
</ul>

How would you respond?

When asked about your strengths in an interview, what is an effective way to respond?

When asked about your strengths in an interview, what is an effective way to respond?

Your Strengths and Weaknesses I

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Your Strengths and Weaknesses II

When an interviewer asks you a question along the lines of:

<ul>
<li>Why did you apply to our company?</li>
<li>What are you looking for in your next job?</li>
<li>What makes you a good fit for our company?</li>
</ul>

How should you respond?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

Why Do You Want to Work With Us I

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

Why Do You Want to Work With Us II

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

Why Do You Want to Work With Us III

What are the assumptions of linear regression?

Which assumption of the residuals of the standard linear regression model can not be overcome by increasing the sample size?

Regression assumptions

What are the assumptions of linear regression?

Assumptions of Linear Regression

Statistics

Let’s say that you’re training a classification model.

How would you combat overfitting when building tree-based models?

Let's say that you're training a classification model.   How would you combat overfitting when building tree-based models?

Overfit Avoidance

Let’s say we’re comparing two machine learning algorithms. In which case would you use a bagging algorithm versus a boosting algorithm? 

Give an example of the tradeoffs between the two.

In machine learning, when would you use a bagging algorithm over a boosting algorithm?

In machine learning, when would you use a bagging algorithm over a boosting algorithm?

Bagging vs. Boosting

Imagine you are asked to build a machine learning model to decide new loan approvals for a
financial firm. You ask the data department in the company for a subset of data to get started
working on the problem. The data includes different features about applicants such as age,
occupation, zip code, height, number of children, favorite color, etc. You decide to build
multiple machine learning models to test out different ideas before settling on the best one.

How would you explain the bias-variance tradeoff with regards to
building and choosing a model to use?

Bias vs. Variance Tradeoff

Explain the difference between the XGBoost and random forest algorithms and give an example where you would use one over the other.

Why might a data scientist choose to use XGBoost instead of Random Forest for a particular machine learning task?

Why might a data scientist choose to use XGBoost instead of Random Forest for a particular machine learning task?

XGBoost vs Random Forest: Choice

Regularization and cross-validation are two common techniques used to improve the performance of machine learning algorithms.

When should you use one versus the other?

Given the choices below, which one is an example of a scenario where we should be using regularization?

Given the choices below, which one is an example of a scenario where we should be using regularization?

Regularization Example

What’s the relationship between PCA and K-means clustering?

What does the variable “k” in k-means clustering refer to?

What does the variable "k" in k-means clustering refer to?

Input of K-means

PCA and K-Means

How would you explain the bias variance tradeoff in machine learning to a high school student?

What are time series models? Why do we need them when we have less complicated regression models?

What can happen if a traditional regression model is used on data that has autocorrelation?

What can happen if a traditional regression model is used on data that has autocorrelation?

Why Do We Need Time Series Models? I

Why can't standard regression models be used effectively for data points where current values are influenced by past values?

Why can't standard regression models be used effectively for data points where current values are influenced by past values?

Why Do We Need Time Series Models? II

Let’s say that you’re working at a global trading company and you’re analyzing the price of a particular asset over time. This dataset is super noisy, volatile, and you’re not guaranteed completely accurate data.

How would you analyze this data to ensure there aren’t any discrepancies?

Let’s say you have a time series dataset grouped monthly for the past five years.

How would you find out if the difference between this month and the previous month was significant or not?

Let’s say you’re tasked with building the YouTube video recommendation algorithm.

How would you design the recommendation system?

What are important factors to keep in mind when building the recommendation algorithm?

Given a string, write a function <code>recurring_char</code> to find its first recurring character. Return <code>None</code> if there is no recurring character.

Treat upper and lower case letters as distinct characters.

You may assume the input string includes no spaces.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>input = &#34;interviewquery&#34;
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>output = &#34;i&#34;
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>input = &#34;interv&#34;
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>output = None
</code></pre>

Given a list of <code>stock_prices</code> in ascending order by <code>datetime</code>, and their respective dates in list <code>dts</code>, write a function <code>max_profit</code> that outputs the max profit by buying and selling at a specific interval and start and end dates to buy and sell for max profit.

Example:

Input:

<pre tabindex="0" class="chroma"><code>stock_prices = [10,5,20,32,25,12]
dts = [
 &#39;2019-01-01&#39;, 
 &#39;2019-01-02&#39;,
 &#39;2019-01-03&#39;,
 &#39;2019-01-04&#39;,
 &#39;2019-01-05&#39;,
 &#39;2019-01-06&#39;,
]

def max_profit(stock_prices,dts) -&gt; 27
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def max_profit(stock_prices, dts) -&gt; 
(27, &#39;2019-01-02&#39;, &#39;2019-01-04&#39;)
</code></pre>

You are a chef preparing for an important event. You downloaded three recipes to make and imported each as a separate table: <code>recipe1</code>, <code>recipe2</code>, <code>recipe3</code> into your database. Each recipe requires some grocery items in various quantities.

Write a query that sums up the total mass of each grocery item required across all recipes.

Input:

<code>recipe1</code> table:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>grocery</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>mass</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<code>recipe2</code> table:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>grocery</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>mass</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<code>recipe3</code> table:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>grocery</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>mass</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
Output:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>grocery</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>total_mass</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>

A manager in a packet filling line is complaining that their machine is not functioning correctly. The machine weighs a group of packets at once and attempts to get 25 packets into a box.

He has been getting complaints from customers that the boxes contain amounts other than 25. How would you help out?

Incorrect Packets

Business Case

You work as a data scientist for a grocery store chain that has a mobile app. At the end of the checkout process on the app, users are presented with an up-sell carousel, a sliding display of items that users can scroll/swipe to view and add to their cart. Currently, the carousel only presents items from the store’s personal brand.

How would you determine whether the carousel should replace store-brand items with national-brand products of the same type?

How would you determine whether the carousel should replace store-brand items with national-brand products of the same type?

Upsell Carousel

A/B Testing

Let’s say you run a small store that’s required to keep at least 6 bottles of water on display at all times. Right now, you have 12 bottles in stock, and you typically sell 2 bottles per day.

How many bottles would you need to order today to ensure you always meet the display requirement for the next 14 days, assuming you don’t get any additional deliveries in that period?

How many water bottles should you order to maintain a minimum display stock for two weeks?

Calculate Moving Average	SQL	Easy
Predict Customer Churn	Machine Learning	Medium
A/B Test Significance	Statistics	Medium
Optimize Query Performance	SQL	Hard
Feature Importance Analysis	Machine Learning	Medium
Clean Missing Data	Python	Easy
Neural Network Architecture	Deep Learning	Hard
Calculate Cohort Retention	SQL	Medium
Bayesian Probability	Statistics	Easy
Recommend Similar Products	Machine Learning	Hard

Hy-Vee Interview Questions

Hy-Vee Interview Guides

Hy-Vee Interview Questions

Challenge

Hy-Vee Salaries by Position

Discussion & Interview Experiences

Discussion & Interview Experiences