Prepare for and practice interview questions from Harman International.

Harman International Interview Questions

Harman International Interview Guides

Data Structures & Algorithms

Find the missing integer from a array of consequtive integers

Find the Missing Number

One Element Removed

Given an array and a target integer, write a function that returns the indices of two integers in the array that add up to the target integer.

Target Indices

Valid Anagram

Write a function that tests whether a string of brackets is balanced.

The Brackets Problem

Machine Learning

Coefficients of Logistic Regression

Missing Housing Data

Why would one algorithm generate different success rates with the same dataset?

Same Algorithm Different Success

Lasso vs Ridge

Bagging vs Boosting

Brainteasers

Three Zebras

How would you answer when an Interviewer asks why you applied to their company?

Why Do You Want to Work With Us

What do you tell an interviewer when they ask you what your strengths and weaknesses are?

Your Strengths and Weaknesses

Statistics

Correlation in Regression

Describe linear regression to various audiences with different levels of knowledge.

Explaining Linear Regression to Different Audiences

What is the difference between Logistic and Linear Regression?

Linear vs Logistic Regression

Duplicate Rows

Categorize sales based on the amount of sales and the region

Categorize Sales

When an interviewer asks a question along the lines of:

<ul>
<li>What would your current manager say about you? What constructive criticisms might he give?</li>
<li>What are your three biggest strengths and weaknesses you have identified in yourself?</li>
</ul>

How would you respond?

When asked about your strengths in an interview, what is an effective way to respond?

When asked about your strengths in an interview, what is an effective way to respond?

Your Strengths and Weaknesses I

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Which of the following is an acceptable strategy when discussing weaknesses in an interview?

Your Strengths and Weaknesses II

When an interviewer asks you a question along the lines of:

<ul>
<li>Why did you apply to our company?</li>
<li>What are you looking for in your next job?</li>
<li>What makes you a good fit for our company?</li>
</ul>

How should you respond?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

When asked 'What are you looking for in your next job?' in an interview, how can you tie the company's employee benefits into your response?

Why Do You Want to Work With Us I

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

How can company values be used effectively in an interview when asked 'What makes you a good fit for our company?'

Why Do You Want to Work With Us II

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

When responding to the question 'Why did you apply to our company?' during an interview, what aspect should you highlight?

Why Do You Want to Work With Us III

How would you handle the data preparation for building a machine learning model using imbalanced data?

Addressing imbalanced data in machine learning through carefully prepared techniques.

Data Preparation for Imbalanced Data

Let’s say we’re comparing two machine learning algorithms. In which case would you use a bagging algorithm versus a boosting algorithm? 

Give an example of the tradeoffs between the two.

In machine learning, when would you use a bagging algorithm over a boosting algorithm?

In machine learning, when would you use a bagging algorithm over a boosting algorithm?

Bagging vs. Boosting

Given a string, write a function to determine if it is palindrome or not.

Note: A palindrome is a word/string that is read the same way forward as it is backward, e.g. <code>&#39;reviver&#39;</code>, <code>&#39;madam&#39;</code>, <code>&#39;deified&#39;</code> and <code>&#39;civic&#39;</code> are all palindromes, while <code>&#39;tree&#39;</code>, <code>&#39;music&#39;</code> and <code>&#39;person&#39;</code> are not palindromes.

Example:

Input:

<pre tabindex="0" class="chroma"><code>word1 = &#34;tree&#34;
word2 = &#34;radar&#34;
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def is_palindrome(word1) -&gt; False
def is_palindrome(word2) -&gt; True
</code></pre>

Given a string, write a function to determine if it is palindrome or not.

String Palindromes

Imagine you are asked to build a machine learning model to decide new loan approvals for a
financial firm. You ask the data department in the company for a subset of data to get started
working on the problem. The data includes different features about applicants such as age,
occupation, zip code, height, number of children, favorite color, etc. You decide to build
multiple machine learning models to test out different ideas before settling on the best one.

How would you explain the bias-variance tradeoff with regards to
building and choosing a model to use?

Bias vs. Variance Tradeoff

What is the difference between Logistic and Linear Regression?

When would use one instead of the other in practice?

Create a class named <code>LRUCache</code> that implements the behavior expected of a Least Recently Used (LRU) Cache.

The <code>LRUCache</code> class implements the following methods:

<ul>
<li><code>__init__(self, capacity)</code> - initializes the object and takes in a capacity parameter specifying the maximum capacity of the cache.</li>
<li><code>get_if_exists(self, key) -&gt; Any|None</code> - gets a value based on a key. If the key exists, returns the associated value. If the key has expired or did not exist in the first place, returns <code>None</code>.</li>
<li><code>put(self, key, value)</code> - places a value inside the cache. In the case wherein the cache is full, invalidates the least recently used element. When two keys collide, the older value should be invalidated in place of the new one.</li>
</ul>

Example:

<pre tabindex="0" class="chroma"><code>cache = LRUCache(2)
cache.put(&#34;sample&#34;, 55)
cache.get_if_exists(&#34;sample&#34;) # returns 55
cache.get_if_exists(&#34;key&#34;) # returns None
cache.put(&#34;hello&#34;, 10) 
cache.put(&#34;sample&#34;, 9)
cache.put(&#34;world&#34;, 5)
cache.get_if_exists(&#34;hello&#34;) # returns None
</code></pre>

Notes:

<ul>
<li>It is assured that the value in the <code>put</code> method will never receive a <code>None</code> value. Moreover, it is assured that the capacity will always be &gt; 0.</li>

<li>All operations must be O(1).</li>

<li>Aside from Python’s standard dictionary implementation (<code>{}</code>), you may not use any other built-in data structures.</li>
</ul>

LRU Cache 1

What are the key differences between classification models and regression models?

What is the MOST important difference between regression and classification models?

What is the MOST important difference between regression and classification models?

Classification vs Regression

Classification and Regression

Given an array and a target integer, write a function <code>sum_pair_indices</code> that returns the indices of two integers in the array that add up to the target integer. If not found, just return an empty list.

Note: Can you do it on $O(n)$ time?

Note: Even though there could be many solutions, only one needs to be returned.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>array = [1 2 3 4] 
target = 5 
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def sum_pair_indices(array, target) -&gt; [0 3] or [1 2]
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>array = [3]
target = 6 
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>Do NOT return [0 0] as you can&#39;t use an index twice.
</code></pre>

Regularization and cross-validation are two common techniques used to improve the performance of machine learning algorithms.

When should you use one versus the other?

Given the choices below, which one is an example of a scenario where we should be using regularization?

Given the choices below, which one is an example of a scenario where we should be using regularization?

Regularization Example

Regularization and Validation

What’s the difference between Lasso and Ridge Regression?

Under what circumstances is it recommended to use Lasso regression over Ridge regression?

Under what circumstances is it recommended to use Lasso regression over Ridge regression?

The Fibonacci sequence is a series of numbers in which each number is the sum of the two preceding ones, usually starting with 0 and 1. It is often used in algorithm examples, and is defined by the following formula: F(n) = F(n-1) + F(n-2), with F(0) = 0 and F(1) = 1.

Your task is to implement the Fibonacci algorithm in three different methods:
1. Recursively
2. Iteratively
3. Using Memoization

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>n = 5
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>fibonacci(n) -&gt; 5
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>n = 10
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>fibonacci(n) -&gt; 55
</code></pre>

The Fibonacci sequence starts as follows: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55…

This question requires the implementation of the Fibonacci sequence using three different methods: recursively, iteratively, and using memoization.

Implementing the Fibonacci Sequence in Three Different Methods

How would you explain the bias variance tradeoff in machine learning to a high school student?

Replaced by QUESTION 791 - Should be deleted

Bias Variance Tradeoff

You’re given a string that may contain the characters <code>{</code>, <code>}</code>, <code>[</code>, <code>]</code>, <code>(</code>, and <code>)</code>.

Task: Verify that the string is balanced. A balanced string is one where every opening character, <code>{</code>, <code>[</code>, or <code>(</code>, has a corresponding closing character, <code>}</code>, <code>]</code>, or <code>)</code>.

Write a function called <code>is_balanced(string: str) -&gt; bool</code> which verifies the balance of a string.

Example:

<pre tabindex="0" class="chroma"><code>is_balanced(&#39;(())[]{}&#39;) -&gt; True
</code></pre>

<pre tabindex="0" class="chroma"><code>is_balanced(&#39;{([(){}])()}&#39;) -&gt; True
</code></pre>

<pre tabindex="0" class="chroma"><code>is_balanced(&#39;{}[]())&#39;) -&gt; False
</code></pre>

<hr/>

How do you detect and handle correlation between variables in linear regression? What will happen if you ignore the correlation in the regression model?

You have an array of integers, <code>nums</code> of length <code>n</code> spanning <code>0</code> to <code>n</code> with one missing. Write a function <code>missing_number</code> that returns the missing number in the array.

Note: Complexity of $O(n)$ required.

Example:

Input:

<pre tabindex="0" class="chroma"><code>nums = [0,1,2,4,5] 
missing_number(nums) -&gt; 3
</code></pre>

Given a <code>users</code> table, write a query to return only its duplicate rows.

Example:

Input:

<code>users</code> table

<table>
<thead>
<tr>
<th>Column</th>
<th>Type</th>
</tr>
</thead>

<tbody>
<tr>
<td><code>id</code></td>
<td>INTEGER</td>
</tr>

<tr>
<td><code>name</code></td>
<td>VARCHAR</td>
</tr>

<tr>
<td><code>created_at</code></td>
<td>DATETIME</td>
</tr>
</tbody>
</table>

We want to build a model to predict housing prices in the city of Seattle. We’ve scraped 100K sold listings over the past three years but found that around 20% of the listings are missing square footage data.

How do we deal with the missing data to construct our model?

We want to build a model to predict housing prices in the city of Seattle. We’ve scraped 100K sold listings over the past three years, but have discovered that around 20% of the listings are missing square footage data.

How would you approach dealing with this missing data, in order to construct the most useful predictive model possible?

We want to build a model to predict housing prices in the city of Seattle. We’ve scraped 100K sold listings over the past three years, but have discovered that around 20% of the listings are missing square footage data.

How would you approach dealing with this missing data, in order to construct the most useful predictive model possible?



Given two strings, write a function to return <code>True</code> if the strings are anagrams of each other and <code>False</code> if they are not.

Note: A word is not an anagram of itself.

Example 1:

Input:

<pre tabindex="0" class="chroma"><code>string_1 = &#34;listen&#34;
string_2 = &#34;silent&#34;
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>True
</code></pre>

Example 2:

Input:

<pre tabindex="0" class="chroma"><code>string_1 = &#34;banana&#34;
string_2 = &#34;bandana&#34;
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>False
</code></pre>

How would you explain the concept of linear regression to three different audiences: a child, a first-year college student, and a seasoned mathematician? Ensure your explanations are tailored to the understanding level of each audience.

Write a function <code>fib</code> which takes an integer <code>n</code> and returns the nth fibonacci number. For this question, <code>fib</code> MUST be recursively defined.

Note: <code>for</code> and <code>while</code> keywords are disabled for this question, as well as <code>functools</code>. Any workarounds around this restriction is prohibited.

Example:

Input:

<pre tabindex="0" class="chroma"><code>n = 4
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def fib(n) -&gt; 3
# the sequence is [1, 1, 2, 3, 5, 8...]
</code></pre>

Implement a dynamic recursive fibonacci function.

Impossibly Iterative Fibonacci

Given a <code>ListNode</code> representing the head of a linked list, write a function <code>is_cyclic</code> which returns <code>True</code> if a linked list is cyclic and <code>False</code> if it is not. A linked list is cyclic when there is no tail to the linked list, and the supposed tail is attached to another node inside the list itself, creating a cycle.

A <code>ListNode</code> is defined as:

<pre tabindex="0" class="chroma"><code>class ListNode:
	value: ListNode = None
 next: ListNode = None
</code></pre>

Example:

Input:

<img src="https://d2qpirhrfplx04.cloudfront.net/1a65651d-a483-4481-b073-2738d65ab9d7.png" alt="image"/>

Output:

<pre tabindex="0" class="chroma"><code>def is_cyclic(head) -&gt; True
</code></pre>

Cyclic Detection

How would you interpret coefficients of logistic regression for categorical and boolean variables?

Why is one-hot encoding recommended for categorical variables in logistic regression models?

Why is one-hot encoding recommended for categorical variables in logistic regression models?

Three zebras are chilling in the desert. Suddenly a lion attacks.
Each zebra is sitting on a corner of a triangle with sides of equal length.
Each zebra randomly picks a direction and only runs along the outline of the triangle to either edge of the triangle.

What is the probability that none of the zebras collide?

Three zebras are chilling in the desert. Suddenly a lion attacks.

Each zebra is sitting on a corner of a triangle with side of equal length. Each zebra randomly picks a direction and only runs along the outline of the triangle to either edge of the triangle.

What is the probability that none of the zebras collide?

Three zebras are chilling in the desert. Suddenly a lion attacks.

Each zebra is sitting on a corner of a triangle with side of equal length.  Each zebra randomly picks a direction and only runs along the outline of the triangle to either edge of the triangle.

What is the probability that none of the zebras collide?



Let’s say that you work for a company like Netflix. Netflix has two pricing plans: $15/month or $100/year.

Let’s say an executive at the company wants you to analyze the churn behavior of users that subscribe to either plan.

What kinds of metrics / graphs / models would you build to help give the executive an over-arching view of how the subscriptions are performing?

How would you present the performance of each subscription to an executive?

Analyzing Churn Behavior

Product Sense & Metrics

You are given a list of <code>[x,y]</code> coordinates in a matrix called <code>X</code>.

Write a function using gradient descent to return a tuple describing the line of best fit for the given coordinates. The first number in the tuple should be the line’s slope. The second number in the tuple should be its y-intercept.

You may not import any libraries.

Example:

Input:

<pre tabindex="0" class="chroma"><code>coordinates = [(0, 1), (1, 3), (2, 2), (3, 4), (4, 6), (5, 5), (6, 7), (7, 9), (8, 8), (9, 10), (10, 12), (11, 11), (12, 13), (13, 15), (14, 14), (15, 16), (16, 18), (17, 17), (18, 19)]
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>(1, 1) 
</code></pre>

Implement gradient descent to calculate the parameters of a line of best fit

Gradient Descent Calculation

Why would the same machine learning algorithm generate different success rates using the same dataset?

Note: When they ask us an ambiguous question, we need to gather context and restate it in a way that’s clear for us to answer.

Let’s say that we want to build a chatbot system for frequently asked questions. Whenever a user writes a question, we want to return the closest answer from a list of FAQs.

What are some machine learning methods for building this system?

FAQ Matching

Suppose an array sorted in ascending order is rotated at some pivot unknown to you beforehand.

You are given a target value to search. If the value is in the array, then return its index; otherwise, return -1. 

Notes:

<ul>
<li>Rotating an array at pivot $n$ gives you a new array that begins with the elements after position $n$ and ends with the elements up to position $n$.</li>
<li>You may assume no duplicate exists in the array.</li>
</ul>

Bonus: Your algorithm’s runtime complexity should be in the order of $O(\log n)$.

Example:

Input: 

<pre tabindex="0" class="chroma"><code>sample_input = [0,1,2,4,5,6,7] 
rotated_input = [4,5,6,7,0,1,2]
target_value = 6
</code></pre>

Output:

<pre tabindex="0" class="chroma"><code>def target_value_search(rotated_input, target_value) -&gt; 2
</code></pre>

Search for a value in log(n) over a sorted array that has been shifted.

Calculate Moving Average	SQL	Easy
Predict Customer Churn	Machine Learning	Medium
A/B Test Significance	Statistics	Medium
Optimize Query Performance	SQL	Hard
Feature Importance Analysis	Machine Learning	Medium
Clean Missing Data	Python	Easy
Neural Network Architecture	Deep Learning	Hard
Calculate Cohort Retention	SQL	Medium
Bayesian Probability	Statistics	Easy
Recommend Similar Products	Machine Learning	Hard

Harman International Interview Questions

Harman International Interview Guides

Harman International Interview Questions

Challenge

Harman International Salaries by Position

Discussion & Interview Experiences

Discussion & Interview Experiences