Booz Allen Hamilton is a leading management and technology consulting firm, known for its innovative solutions for government and commercial clients. Because Booz Allen Hamilton is a consulting firm, data scientists work with both client and company data, often working in with diverse datasets to create models or business insights. Typically, the data team (engineering, data science) works with creating dashboards and constructing architectures for handling data pipelines.
Given two strings,
string2, write a function
str_map to determine if there exists a one-to-one correspondence (bijection) between the characters of
string2. The correspondence must be between characters in the same position/index.
Write a function to simulate drawing balls from a jar. The colors of the balls are stored in a list named
jar, with corresponding counts of the balls stored in the same index in a list called
Given a list of stop words, write a function
stopwords_stripped that takes a string and returns a string stripped of the stop words with all lowercase characters.
A team wants to A/B test various changes in a sign-up funnel. For instance, on a page, a button is red and at the top. They want to see if changing the button’s color to blue and/or moving it to the bottom will increase click-through rates. How would you set up this test?
You work for an E-commerce store where the new-user-to-customer conversion rate increased from 40% to 43% after a new marketing manager redesigned the email journey. However, the conversion rate was 45% a few months before the new manager started and had dropped to 40%. How would you investigate if the redesigned email campaign actually led to the increase in the conversion rate and that the increase wasn’t instead the result of other factors?
You work at Uber, and a PM is considering a new feature where instead of a direct ETA estimate of 5 minutes, it would instead display a range of something like 3-7 minutes. How would you conduct this experiment, and how would you know if your results were significant?
Write SQL queries to answer the following questions:
You have two tables, ‘projects’ and ‘employee_projects,’ but there’s a bug causing duplicate rows in the ‘employee_projects’ table. Write a query to account for this error and select the top five most expensive projects by budget-to-employee count ratio.
You have two tables, ‘projects’ and ‘employee_projects’. Each employee works on only one project. Write a query to get the top five most expensive projects by budget to employee count ratio, excluding projects with 0 employees.
Assume you are designing a marketplace for a website. Selling firearms is prohibited by the website’s Terms of Service Agreement, and by the laws of your country. You need to develop a system that can automatically detect if a listing on the marketplace is selling a gun. Describe how you would accomplish this task.
In the field of machine learning and data science, Logistic and Linear Regression are two common methods used for prediction. Explain the differences between these two techniques and describe the specific circumstances under which each method would be most appropriate to use.
Principal Component Analysis (PCA) and K-means clustering are two widely used methods in data science. Please explain their relationship, considering their characteristics, methodologies, and common applications.
1 - What is the probability of a biased coin landing as heads exactly 5 times out of 6 tosses? Given a biased coin that comes up heads 30% of the time when tossed, calculate the probability of the coin landing as heads exactly 5 times out of 6 tosses.
2 - What is the probability that it’s actually raining in Seattle given the responses from three friends? You are about to travel to Seattle and want to know if you should bring an umbrella. You call 3 random friends who live there, each with a 2⁄3 chance of telling the truth and a 1⁄3 chance of lying. All 3 friends tell you it is raining. Calculate the probability that it’s actually raining in Seattle.
3 - What’s the probability of each subsequent card being larger than the previous drawn card from a shuffled deck of 500 cards? Imagine a deck of 500 cards numbered from 1 to 500. If all the cards are shuffled randomly, and you are asked to pick three cards, one at a time, calculate the probability of each subsequent card being larger than the previously drawn card.
For mastering Probability & Statistics, consider the statistics and A/B testing learning path and the probability learning path. These resources will help you understand and apply statistical concepts effectively.
Practice for the Booz Allen Hamilton interview with these recently asked interview questions.
Sign up to get your personalized learning path.
Access 600+ data science interview questions
1600+ top companies interview guide
Unlimited code runs and submissions