Prepare for and practice interview questions from Hover Inc..

Hover Inc. Interview Questions

Hover Inc. Interview Guides

Data Structures & Algorithms

Compute Deviation

Given two sorted lists, write a function to merge them into one sorted list.

Merge Sorted Lists

Find Bigrams

Write a function to create a single dataframe with complete addresses in the format of street, city, state, zip code.

Complete Addresses

Get Top N Frequent Words

Select the 2nd highest salary in the engineering department

2nd Highest Salary

Write a SQL query to create a histogram of the number of comments per user in the month of January 2020.

Comments Histogram

Write a query that returns all neighborhoods that have 0 users.

Empty Neighborhoods

Calculate the first touch attribution for each `user_id` that converted.

First Touch Attribution

Closest SAT Scores

Machine Learning

Encoding Categorical Features

Coefficients of Logistic Regression

Booking Regression

Job Recommendation

Say you are given a dataset of perfectly linearly separable data. What would happen when you run logistic regression?

Perfectly Separable

Probability

First to Six

500 Cards

Raining in Seattle

Found Item

Ad Raters

Statistics

P-value to a Layman

Reducing Error Margin

What are the assumptions of linear regression?

Assumptions of Linear Regression

You are testing hundreds of hypotheses with many t-tests. What considerations should be made?

Hundreds of Hypotheses

What statistical test could you use to determine which of two parcel types is better to use, given how often they are damaged?

Different Parcel Effectiveness

When an interviewer asks a question along the lines of:

<ul>
<li>What would your current manager say about you? What constructive criticisms might he give?</li>
<li>What are your three biggest strengths and weaknesses you have identified in yourself?</li>
</ul>

How would you respond?

When asked about your strengths in an interview, what is an effective way to respond?

When asked about your strengths in an interview, what is an effective way to respond?

Your Strengths and Weaknesses I

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Your Strengths and Weaknesses II

What do you tell an interviewer when they ask you what your strengths and weaknesses are?

Your Strengths and Weaknesses

Brainteasers

When an interviewer asks you a question along the lines of:

<ul>
<li>Why did you apply to our company?</li>
<li>What are you looking for in your next job?</li>
<li>What makes you a good fit for our company?</li>
</ul>

How should you respond?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

Why Do You Want to Work With Us I

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

Why Do You Want to Work With Us II

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

Why Do You Want to Work With Us III

How would you answer when an Interviewer asks why you applied to their company?

Why Do You Want to Work With Us

Describe a data project you worked on. What were some of the challenges you faced?

Describing a data project and its challenges

Hurdles In Data Projects

Analytics

How would you explain what a p-value is to someone who is not technical?

What does a p-value in a statistical test represent?

What does a p-value in a statistical test represent?

P-value to a Layman I

In a statistical test, how does a low p-value (less than 0.05) influence our decision about the null hypothesis?

In a statistical test, how does a low p-value (less than 0.05) influence our decision about the null hypothesis?

P-value to a Layman II

What are the assumptions of linear regression?

Which assumption of the residuals of the standard linear regression model can not be overcome by increasing the sample size?

Regression assumptions

Let’s say that you’re training a classification model.

How would you combat overfitting when building tree-based models?

Let's say that you're training a classification model.   How would you combat overfitting when building tree-based models?

Overfit Avoidance

How would you handle the data preparation for building a machine learning model using imbalanced data?

Addressing imbalanced data in machine learning through carefully prepared techniques.

Data Preparation for Imbalanced Data

Let’s say we’re comparing two machine learning algorithms. In which case would you use a bagging algorithm versus a boosting algorithm? 

Give an example of the tradeoffs between the two.

In machine learning, when would you use a bagging algorithm over a boosting algorithm?

In machine learning, when would you use a bagging algorithm over a boosting algorithm?

Bagging vs. Boosting

Bagging vs Boosting

Write a SQL query to select the 2nd highest salary in the engineering department.

Note: If more than one person shares the highest salary, the query should select the next highest salary.

Example:

Input:

<code>employees</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>first_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>last_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>salary</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>department_id</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<code>departments</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>name</code></td>
<td>VARCHAR</td>
</tr>
</tbody>
</table>
Output:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>salary</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>

Imagine you are asked to build a machine learning model to decide new loan approvals for a
financial firm. You ask the data department in the company for a subset of data to get started
working on the problem. The data includes different features about applicants such as age,
occupation, zip code, height, number of children, favorite color, etc. You decide to build
multiple machine learning models to test out different ideas before settling on the best one.

How would you explain the bias-variance tradeoff with regards to
building and choosing a model to use?

Bias vs. Variance Tradeoff

Explain the difference between the XGBoost and random forest algorithms and give an example where you would use one over the other.

Why might a data scientist choose to use XGBoost instead of Random Forest for a particular machine learning task?

Why might a data scientist choose to use XGBoost instead of Random Forest for a particular machine learning task?

XGBoost vs Random Forest: Choice

Xgboost vs Random Forest

How does random forest generate the forest? Additionally, why would we use it over other algorithms such as logistic regression?

What happens when you average the output on multiple decision trees?

What happens when you average the output on multiple decision trees?

Average Trees

Random Forest Explanation

What is the difference between Logistic and Linear Regression?

When would use one instead of the other in practice?

What is the difference between Logistic and Linear Regression?

Linear vs Logistic Regression

Let’s say that you work at a B2B SAAS company that’s interested in testing the pricing of different levels of subscriptions.

Your project manager comes to you and asks you to run a two-week-long A/B test to test an increase in pricing.

How would you approach designing this test? How would you determine whether the increase in pricing is a good business decision?

Testing Price Increase

A/B Testing

Tell me about a project in which you had to clean and organize a large dataset.

Describing a real-world data cleaning and organization project

Data Cleaning Experiences

Data Pipelines

A team wants to A/B test multiple different changes through a sign-up funnel.

For example, on a page, a button is currently red and at the top of the page. They want to see if changing a button from red to blue and/or from the top of the page to the bottom of the page will increase click-through.

How would you set up this test?

Your team wants to run an AB test on the color (red or blue) and position (top/bottom of the page) of a button that links to a promotion.

How many different variants of the test should you run to see the effect that a red button has on the bottom of the page, if currently the button is blue and on the top of the page.

Your team wants to run an AB test on the color (red or blue) and position (top/bottom of the page) of a button that links to a promotion.

How many different variants of the test should you run to see the effect that a red button has on the bottom of the page, if currently the button is blue and on the top of the page.

A/B Test Variants

Button AB Test

Let’s say we want to launch a re-design of a landing page to improve the click-through rate. We can do this by implementing an AB test.
Given that we launch an AB test, how would you infer if the results of the click-through rate were statistically significant or not?

How can you accurately conclude if the results of an A/B test, conducted to evaluate the effectiveness of a landing page redesign, are statistically significant?

How can you accurately conclude if the results of an A/B test, conducted to evaluate the effectiveness of a landing page redesign, are statistically significant?

Statistically Significant Test

Precisely ascertain whether the outcomes of an A/B test, executed to assess the impact of a landing page redesign, exhibit statistical significance.

Given an array and a target integer, write a function <code>sum_pair_indices</code> that returns the indices of two integers in the array that add up to the target integer. If not found, just return an empty list.

Note: Can you do it on $O(n)$ time?

Note: Even though there could be many solutions, only one needs to be returned.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>array = [1 2 3 4] 
target = 5 
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def sum_pair_indices(array, target) -&gt; [0 3] or [1 2]
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>array = [3]
target = 6 
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>Do NOT return [0 0] as you can&#39;t use an index twice.
</code></pre>

Given an array and a target integer, write a function that returns the indices of two integers in the array that add up to the target integer.

Target Indices

Let’s say that your company is running a standard control and variant AB test on a feature to increase conversion rates on the landing page. The PM checks the results and finds a .04 p-value.

How would you assess the validity of the result?

Experiment Validity

Let’s say that you work at a bank that wants to build a model to detect fraud on the platform.

The bank wants to implement a text messaging service in addition that will text customers when the model detects a fraudulent transaction in order for the customer to approve or deny the transaction with a text response.

How would we build this model?

Let’s say that you work at a bank that wants to build a model to detect fraud on the platform.

The bank wants to implement a text messaging service that will text customers when the model detects a fraudulent transaction in order for the customer to approve or deny the transaction with a text response.

Which statement is true?

Let's say that you work at a bank that wants to build a model to detect fraud on the platform.

The bank wants to implement a text messaging service that will text customers when the model detects a fraudulent transaction in order for the customer to approve or deny the transaction with a text response.

Which statement is true?

Bank Fraud Model

Let’s say you work at Allstate. Allstate is running <code>N</code> online ads right now. The table <code>ads</code> contains all those ads, ranked by popularity via the <code>id</code> column (e.g., the entry with <code>id = 1</code> is the most popular, etc.).

Create a subquery or common table expression named <code>top_ads</code> containing the top 3 ads (by popularity) and return the number of rows that would result from the following operations

<ol>
<li><code>ads INNER JOIN top_ads</code></li>
<li><code>ads LEFT JOIN top_ads</code></li>
<li><code>ads RIGHT JOIN top_ads</code></li>
<li><code>ads CROSS JOIN top_ads</code></li>
</ol>

Note: Please make the <code>join_type</code> column in your output have the values <code>inner_join</code>, <code>left_join</code>, etc. for each of their respective join types

Note: Please return only one query with each number in a different row

Example:

Input:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>name</code></td>
<td>VARCHAR</td>
</tr>
</tbody>
</table>
Output:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>join_type</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>number_of_rows</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>

Size of Joins

Given an integer <code>N</code>, write a function that returns a list of all of the prime numbers up to <code>N</code>.

Note: Return an empty list there are no prime numbers less than or equal to <code>N</code>.

Example:

Input:

<pre tabindex="0" class="chroma"><code>N = 3
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def prime_numbers(N) -&gt; [2,3]
</code></pre>

Given an integer N, write a function that returns all of the prime numbers up to N

Prime to N

Regularization and cross-validation are two common techniques used to improve the performance of machine learning algorithms.

When should you use one versus the other?

Given the choices below, which one is an example of a scenario where we should be using regularization?

Given the choices below, which one is an example of a scenario where we should be using regularization?

Regularization Example

Regularization and Validation

What’s the difference between Lasso and Ridge Regression?

Under what circumstances is it recommended to use Lasso regression over Ridge regression?

Under what circumstances is it recommended to use Lasso regression over Ridge regression?

Lasso vs Ridge

<h1>Binary Tree Validation</h1>

You are given the root of a binary tree. You need to determine if it is a valid binary search tree (BST).

A valid BST is defined as follows:

<ul>
<li>The left subtree of a node contains only nodes with values less than or equal to the node’s value.</li>
<li>The right subtree of a node contains only nodes with values greater than or equal to the node’s value.</li>
<li>Both the left and right subtrees must also be binary search trees.</li>
</ul>

Given the function <code>def is_valid_bst(root: Node) -&gt; bool:</code>, return True if the binary tree is a valid BST. Otherwise, return False.

<h3>Example:</h3>

Input:

<img src="https://d2qpirhrfplx04.cloudfront.net/9664817c-3e12-4a40-9236-8382b13f55a0.png" alt="image"/>Converted Binary Tree.png

Output:

<pre tabindex="0" class="chroma"><code>def is_valid_bst(Node(3)) -&gt; True
</code></pre>

Given the root node, verify if a binary search tree is valid or not.

Binary Tree Validation

How would you convey insights and the methods you use to a non-technical audience?

Making data-driven insights actionable for those without technical expertise

Simple Explanations

What’s the relationship between PCA and K-means clustering?

What does the variable “k” in k-means clustering refer to?

What does the variable "k" in k-means clustering refer to?

Input of K-means

PCA and K-Means

Let’s say you’re a data engineer at Fidelity Investments, and you’re running a SQL query on a cloud-based data warehouse. All cluster resources and network health metrics look normal, but the query is still taking over 10 minutes to complete.

How would you go about diagnosing and improving the performance of this query?

How would you diagnose and speed up a slow SQL query when system metrics look healthy?

Slow SQL Query

Query Optimization

Let’s say you work as a data scientist at a bank.

You are tasked with building a decision tree model to predict if a borrower will pay back a personal loan they are taking out.

<ol>
<li>How would you evaluate whether using a decision tree algorithm is the correct model for the problem?</li>

<li>Let’s say you move forward with the decision tree model. How would you evaluate the performance of the model before deployment and after?</li>
</ol>

Decision Tree Evaluation

Given the <code>employees</code> and <code>departments</code> table, write a query to get the top 3 highest employee salaries by department. If the department contains less that 3 employees, the top 2 or the top 1 highest salaries should be listed (assume that each department has at least 1 employee). 

Note: The output should include the full name of the employee in one column, the department name, and the salary. The output should be sorted by department name in ascending order and salary in descending order. 

Example:

Input:

<code>employees</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>first_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>last_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>salary</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>department_id</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>
<code>departments</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>name</code></td>
<td>VARCHAR</td>
</tr>
</tbody>
</table>
Output:

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>employee_nam</code>e</td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>department_name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>salary</code></td>
<td>INTEGER</td>
</tr>
</tbody>
</table>

Get the top 3 highest employee salaries by department

Top Three Salaries

Challenge

Hover Inc. Salaries by Position

Hover helps people design, improve, and protect the properties they love. With proprietary AI built on over a decade of real property data, Hover answers age-old questions like “What will it look like?” and “What will it cost?” Homeowners, contractors, and insurance professionals rely on Hover to get fully measured, accurate, and interactive 3D models of any property — all from a smartphone scan in minutes. Hover is driven by curiosity, purpose, and a shared commitment to serving our customers, communities, and each other. We believe the best ideas come from diverse perspectives and are proud to cultivate an inclusive, high-performance culture that inspires growth, accountability, and excellence. Backed by leading investors like Google Ventures and Menlo Ventures, and trusted by industry leaders including Travelers, State Farm, and Nationwide, we’re redefining how people understand and interact with their spaces. About The Position Hover is looking for an Engineering Manager to lead our Infrastructure and Security team and help shape how engineering scales as our systems grow in complexity. This role sits at the intersection of DevOps, internal platform development, security engineering, and applied research. We don’t treat Infrastructure as a cost center or a ticket-taking IT function; we treat it as a product team. Your customers are our Product Engineers building customer-facing web applications, and our Research Scientists developing Generative models and Computer Vision systems. Your product is the platform that enables them to ship code safely, quickly, and reliably, even as our technical complexity increases. As an Engineering Manager, you’ll stay close to the code while taking on real ownership over systems that power the rest of engineering. Our product teams ship web apps; our research teams train large, GPU-intensive models, and today the highest friction lives at the boundary between experimentation and production. We want you to help bridge that gap by building tooling and defaults that make security, deployment, and scale feel seamless rather than burdensome. Your success won’t be measured by how many tickets you close, but by the leverage you create: faster iteration, fewer sharp edges, and an infrastructure platform that engineers trust and enjoy using. You will contribute by In this role, you will own Infrastructure as a product, working directly with engineers and researchers to identify friction, set a clear platform roadmap, and improve build speed, reliability, and deployment workflows. You’ll lead and own the Security function, managing core domains like IAM, MFA, and SCIM while acting as the bridge between Engineering, Legal, Compliance, and Corporate Security to deliver secure-by-default systems that don’t slow teams down. As a hands-on, first-level manager, you’ll lead and mentor a small infrastructure team while remaining technically active, spending roughly 30-50% of your time contributing to Go-based tooling, Terraform, and modeling strong DevOps practices. You’ll partner closely with Research to operationalize Generative AI and Computer Vision pipelines, improving developer experience around complex build environments, C++ dependencies, and GPU-based workflows as models move into production. Our platform runs primarily on GCP, with Kubernetes and Terraform at the core. Internal services and CLIs are written in Go, security covers the domains of IAM, MFA, SCIM, secure coding practices, and cloud security and the teams you support build in React, Ruby on Rails, Python, and C++. You don’t need deep expertise in every language, but you should be comfortable reading across the stack to understand where friction exists. Your background includes <ul> <li>A strong software engineering foundation with 6+ years of experience: you’re a developer at heart and comfortable writing clean, maintainable code, not just stitching together scripts. </li><li>2+ years of experience in infrastructure, SRE, or internal tooling, with hands-on expertise in Kubernetes and Terraform. </li><li>1+ years of people leadership experience (formal or informal), paired with strong IC credibility and an interest in hands-on, first-level engineering management. </li><li>A developer experience (DX) mindset: you care deeply about good tooling, intuitive workflows, and clear failure modes, and you take pride in building platforms engineers enjoy using. </li><li>A product mindset applied to infrastructure: you look beyond requests to understand root problems and prioritize work that improves visibility, reliability, and efficiency. </li><li>Cloud experience, with GCP preferred (AWS experience also welcome). </li><li>A solid grounding in security fundamentals. </li><li>Bonus points: </li><ul> <li>Experience scaling Computer Vision or Generative ML workloads. </li><li>Experience transitioning from an IC (Individual Contributor) to a Lead/Manager role. </li><li>Opinions on what makes a great Internal Developer Platform. </li></ul> </ul>Benefits <ul> <li>Compensation - Competitive salary and meaningful equity in a fast-growing company </li><li>Healthcare - Comprehensive medical, dental, and vision coverage for you and dependents </li><li>Paid Time Off - Unlimited and flexible vacation policy </li><li>Paid Family Leave - We support work/life balance and offer generous paid parental and new child bonding leave </li><li>Mandatory Self-Care Days - A day set aside each month to allow employees to recharge </li><li>Remote Wellbeing Resources - We provide recurring fitness classes, meditation/ mindfulness tools, virtual therapy, and family planning assistance </li><li>Learning - We encourage continued education and will help cover the cost of management training, conferences, workshops, or certifications </li></ul> Hybrid roles at Hover Hover has hubs in San Francisco and New York City, where we expect that all employees living within a 50-mile radius of our offices will come into their local Hover office at least three times a week to build rapport and foster organic connection. At this time, Hover is not considering fully remote roles. The US base salary range for this full-time position is $190,000 - $235,000 annually. Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries for the position across all applicable US locations. Within the range, individual pay is determined by work location and additional factors, including job-related skills, experience, and relevant education or training. Your recruiter can share more about the specific salary range for your preferred location during the hiring process. 
 #J-18808-Ljbffr

A tech company is hiring an Engineering Manager for Infrastructure & Security in San Francisco. This role focuses on enhancing engineering processes, managing a small team, and bridging gaps between various departments. The ideal candidate has over 6 years in software engineering, with solid experience in Kubernetes and Terraform, along with leadership capabilities. They emphasize product-oriented approaches in managing infrastructure, ensuring support for engineers and researchers alike. Competitive salary and comprehensive healthcare included.
 #J-18808-Ljbffr

A technology company seeks an Engineering Manager for their Infrastructure and Security team in San Francisco. The role involves leading a small team while building effective tooling for deployment and security. Candidates should have a software engineering background with 6+ years of experience, knowledge of Kubernetes and Terraform, and an interest in hands-on engineering management. Competitive salary, comprehensive benefits, and a hybrid work model are offered.
 #J-18808-Ljbffr

Calculate Moving Average	SQL	Easy
Predict Customer Churn	Machine Learning	Medium
A/B Test Significance	Statistics	Medium
Optimize Query Performance	SQL	Hard
Feature Importance Analysis	Machine Learning	Medium
Clean Missing Data	Python	Easy
Neural Network Architecture	Deep Learning	Hard
Calculate Cohort Retention	SQL	Medium
Bayesian Probability	Statistics	Easy
Recommend Similar Products	Machine Learning	Hard

Hover Inc. Interview Questions

Hover Inc. Interview Guides