Amazon data engineers play an integral role in the company’s data science operations. They are responsible for wrangling massive datasets, developing scalable engineering solutions and building data solutions that drive real impact at the company.
In other words, data engineers at Amazon are responsible for driving improvements in data systems for the benefit of the business, customers and data science teams.
The data engineering interview process is rigorous and technically demanding, with questions covering SQL, Python, algorithms and database design. In addition to the technical rounds, Amazon data engineers must pass a human resources’ screening and a behavioral interview, which focus on the Amazon Leadership Principles (discussion on these farther below).
Ultimately, Amazon data engineer interviews include three phases, a recruiter screen, a technical screen and an on-site, which itself includes two to four technical or behavioral rounds.
Data engineers at Amazon are responsible for designing, developing and maintaining the core data structures, data models and data pipelines at Amazon. Amazon data engineers work across a variety of verticals, including:
Data engineers at Amazon tackle complex, large-scale data engineering challenges, with many different streams of data to incorporate. Although the responsibilities vary by vertical, data engineers at Amazon are responsible for:
Amazon data engineer interviews are typically broken into three stages: An initial recruiter screen, a technical screen, and an onsite round. In the technical and onsite rounds, candidates will be asked questions focusing on core data engineering skills like SQL, data modeling, database design, and data warehousing.
Here is a more in-depth look at the Amazon data engineer interview process:
Stage 1: Recruiter Screen
Amazon data engineering interviews start with a phone screen. A recruiter will call you to assess your technical skills, ask about your experience and determine if you are a right fit for the role. At this stage, the recruiter wants to understand how proficient you are in programming, typically in Python and SQL. They also want to understand how your experience aligns with Amazon’s culture.
Tip: Be prepared to talk about your work experience, and map your skills and experiences to Amazon’s core values.
Stage 2: Technical Screen This interview is typically a 45-minute screen with an Amazon data engineer. This stage of the interview process assesses your technical skills and knowledge. Topics covered in the tech screen include:
Commonly, candidates will face a range of SQL questions, covering basic to intermediate SQL concepts like joins, subqueries, window functions and case statements. You may also face a simple data engineering case study question. In this stage, you will be assessed on how efficiently you can write code, as well as your comfort with programming languages, tech solutions and concepts used in the job.
Tip: Although speed is assessed, thoroughness and problem solving are more important to the interviewer. Ask clarifying questions before you jump into a coding solution. Also, think out loud to help the interviewer understand your process. In the real world you won’t be as constrained by time as in the interview setting.
Stage 3: Onsite Interview
The Amazon onsite includes a variety of challenging technical data engineer interview questions, including scenario-based case studies, coding tests and a “Bar Raiser” behavioral interview.
This stage is usually split into three areas:
The technical interview phase typically includes two to four rounds that seek to assess your problem-solving and coding ability. You may face a data engineering case study, as well as SQL, Python and database design rounds. In coding and case study interview rounds you will be asked to propose a solution to a case study question or write-out / whiteboard Python or SQL code.
Technical interviews may also ask you to investigate a pipeline model, design a data model or determine what is causing an ETL error.
Bar Raiser interviews are unique to Amazon. In this stage, an Amazon employee from a different department will ask you questions and determine if you are the right fit for Amazon. These interviews focus on Amazon’s 16 Leadership Principles (be sure to read through these and commit them to your vocabulary). Prepare answers to data engineer behavioral questions that map your skills and experiences to these 16 principles.
Finally, the HR round focuses on your previous work experiences and projects you have worked on in the past. Keep the focus on yourself and your data engineering skills in this stage.
Hiring requirements for Amazon data engineering jobs The qualifications and required skills vary by department and job role. But in general, most data engineering jobs at Amazon require:
In data engineering interviews at Amazon, the most frequently tested subjects include SQL (asked in 99% of interviews), Python, behavioral questions and database design/data modeling. Use these practice data engineer interview questions to prepare for the Amazon interview.
Remember to incorporate Amazon Leadership Principles into your answers to behavioral questions. Behavioral questions are asked in the recruiter screen, the Bar Raiser Interview and HR Interviews.
Culture fit is assessed in Amazon behavioral interviews. This question helps the interviewer understand how you communicate, gather information and collaborate with a team. With a question like this, you might incorporate any of the three principles of Disagree and Commit, Earn Trust or Learn and Be Curious.
Using a framework can help you answer a question like this. Describe the situation and task you were faced with, describe the actions and finish with the results you achieved.
For example, you might say that:
“In my previous job, I was asked to build an ETL pipeline for streaming data that would gather customer transaction data to be used by the sales team. I started by gathering stakeholder input to learn about the specific needs of the sales team, and then researched options. A challenge arose during the testing phase, as there was significant pipeline lag. I had to review all of the code, and optimize it.This turned out to be a great learning experience, as I learned commonly implemented code inefficiencies that I was creating and can now avoid using.”
A question like this assesses your communication and collaboration skills. You might say:
“I was asked to design a marketing analytics database. However, the marketing department didn’t have extensive analytics or database knowledge. I created a short presentation, helping the team visualize the database schema and held a Q&A session after. The presentation made the schema easy to understand and helped the team better query the data.”
Expect a variation of this question. Some options for answering include:
Start with the initial stages like gathering stakeholder input, understanding data requirements and creating process or logical data models.
SQL questions for data engineers typically include basic definitions, scenario-based questions and SQL query writing tests.
Start with clarifying questions. Specifically, you should ask:
This should arm you with enough information to answer this question confidently. For example, you might suggest using keys like DISTINCT, UNIQUE, or GROUP BY to de-duplicate the data.
For this question we have a table representing a company payroll schema. Due to an ETL error the employees table, instead of updating the salaries every year when doing compensation adjustments, did an insert instead. The head of HR still needs the current salary of each employee.
Hint. The first step we need to do would be to remove duplicates and retain the current salary for each user.
Given we know there are no duplicate first and last name combinations, we can remove duplicates from the employees table by running a GROUP BY on two fields, the first and last name. This allows us to then get a unique combinational value between the two fields.
Hint. Whenever the question asks about finding values with zero of something (users, employees, posts, etc..) immediately think of the concept of LEFT JOIN. An inner join finds any values that are in both tables, and a left join keeps only the values in the left table.
Our predicament is to find all the neighborhoods without users. To do this, we must do a left join between the neighborhoods table and the users table.
The first step is to determine what the question is asking. With this question, we can break this question down into separate conditions:
What comes next to fully solve the above question?
Data engineer Python questions can range from definitions of data structures to writing Python code.
We know we have to store a unique set of characters of the input string and loop through the string to check which ones occur twice.
Given that we have to return the first index of the second repeating character, we should be able to go through the string in one loop, save each unique character and then just check if the character exists in that saved set. If it does, then return the character.
def recurring_char(input):
seen = set()
for char in input:
if char in seen:
return char
seen.add(char)
return(None)
Python includes a variety of built-in data types, including:
There are also user-defined data types in Python. Examples include queues, trees and linked lists.
This is a scripting question that asks you to process unstructured data. Specifically, we are being asked to do a few different tasks:
Hint. There are two ways we can solve this problem. One way is through logical iteration and another way is through mathematical formulation. We can look at both methods as they both hold O(N) complexity.
The first method would be through general iteration through the array. We can pass in the array and create a set which will hold each value in the input array. Then we create a loop that will span the range from 0 to n, and look to see if each number is in the set we just created. If it is not, we return the missing number.
Database design questions asked in Amazon interviews typically provide you with a case and ask you to create a schema for that case.
Hint. First, we want to clarify what click data means. You could safely assume that it represents button clicks, scrolls, closing pop-ups, etc. One solution: You could assign each action a label that describes the action.
For example, here we can say that the product is Dropbox and that we want to track each folder click on the UI of an individual person’s Dropbox folder. We can label the clicking on a folder as an action name called folder_click. When the user clicks on the side panel to login or logout and we need to specify the action, we can call it login_click and logout_click.
This question is vague, and you would probably want to get some clarity before answering. You can see a full mock interview solution to this question on YouTube:
Algorithm questions are asked in Amazon data engineer interviews. However, the focus is primarily on basic algorithmic knowledge, data structures, and easy Python coding tests.
Hint. Start by thinking about what number you are trying to find. It is the sum of the entire list divided by 2. How could you create a function to add up all the values?
Hint. This is most similar to the types of algorithm questions you might face. You might start with noting that there is a linear relationship between the features and the response variable, which is the value you want to predict.
Multiple linear regression uses more than one independent variable to predict the dependent variable. One assumption we can make with this technique is that the independent variables are also independent from one another, or that the values do not affect one another.
Average Base Salary
Average Total Compensation