Table of Contents
Data science is an attractive field. It’s lucrative, you get opportunities to work on interesting projects, and you’re always learning new things. Hence, breaking into the world of data science is extremely competitive. One of the best ways to start your data science career is through a data science internship.
In this article, we’ll look at the general level of knowledge that’s required, the components of a typical interview process, and some example interview questions. Note that the term ‘general’ is emphasized because the specifics differ company by company.
What's expected in a data science internship interview?
The biggest difference between a data science internship interview and a full-time data scientist is that you typically won’t be expected to know extremely specific details regarding machine learning or deep learning concepts.
However, you will be expected to have the fundamental building blocks to be able to build on them — this includes Python, R, or SQL, statistics and probability basics, and basic machine learning concepts.
Learn more about the responsibilities of a data science intern and how their career path progresses here.
Below is a list of essential knowledge and skills that will make you an attractive candidate:
Python or R
You should have programming experience in a scripting language, ideally Python or R. If you’re a Python programmer, you should also have a basic understanding of popular libraries like Scikit-learn and Pandas.
What you should know: You should know how to write basic functions and have a fundamental understanding of various data structures and their uses. You should also know about Scikit-learn’s basic (yet essential) capacities, like test_train_split, and StandardScaler. For Pandas, you should be comfortable manipulating DataFrames similar to how you would write a query using SQL.
For example, you may be required to build a simple machine learning model to predict the quantity sold for a product. In this case, if you’re a Python user, it would be extremely useful to understand the Scikit-Learn library, as it provides a number of prebuilt functions already, like the ones mentioned above.
How to prepare: Try data science projects on Kaggle or take-home assignments on Interview Query to get an idea of what projects you might need to complete.
To get a better idea of Scikit-Learn, it would be a good idea to build a simple machine learning model using it or walk through a few data science projects that other people have completed.
Lastly try practicing Python problems on Interview Query to get a sense of what they might ask you.
You won't be expected to have too much experience in relational databases but at the minimum you should know how SQL works. If you’re vying for a data scientist internship, then you’ll most likely be working for a company that has an immense amount of data. You’ll be expected to navigate through that data yourself to solve problems.
What you should know: You should be able to write basic queries and you should know how to manipulate data using SQL queries. It’s very common for companies to incorporate SQL in their take-home case studies, so it’s essential that you know SQL well.
Write a SQL query to get the second highest salary from the
Employee table. For example, given the Employee table below, the query should return
200 as the second highest salary. If there is no second highest salary, then the query should return
| Id | Salary |
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
How to prepare: Try tons of SQL practice problems and case studies on Interview Query. Additionally check out our ultimate guide to SQL interview questions
Statistics & Probability
You should have an understanding of basic statistics and probability. These concepts serve as the base for most machine learning and data science concepts. As well, many of the interview questions asked for data science positions are related to statistics.
What you should know: You should have a solid understanding of fundamental concepts including but not limited to probability basics, probability distributions, estimation, and hypothesis testing. A very common application of statistics is conditional probability — for example, what is the probability that a customer will purchase product B given that they purchased product C?
How to prepare: If any of these concepts sound foreign to you, there are a number of free resources that you can leverage, like Khan Academy or Georgia Institute of Technology.
Machine Learning Concepts
While you’re not expected to be an expert, you should have a good understanding of fundamental machine learning models and concepts. This is especially the case if the job description says that you’ll be working on building models.
What you should know: This includes but is not limited to concepts like linear regression, support vector machines, and clustering. Ideally, you should have a fundamental understanding of these concepts and understand when it’s appropriate to use various machine learning methods.
For example, you may be required to implement linear regression on a product’s price point to determine the quantity sold. That being said, you won’t be required to productionize or deploy a machine learning model as an intern.
Check out this article to learn more about machine learning questions asked by a big tech company such as Amazon!
You should have domain knowledge of the field that you are applying for (and if you don’t have it, you should learn it).
For example, if you are applying for a data science position in the marketing department, it would be a good idea to learn about the different marketing channels (eg. social media, affiliate, TV) as well as core metrics (eg. LTV, CAC).
Data Science Internship Interview Process
Again, the interview process ultimately depends on the company that you are applying for. But generally, there are general steps within the process that most (if not all) companies have in their interview process, which I’ll explain below.
The worst thing you can do as an intern is not do your research into what the company does and it's cultural mission and values.
Typically, there’s an initial screening (usually a phone screen) conducted by a recruiter or the hiring manager of the company. The purpose of this is so that the interviewee gets a better understanding of the role and the interviewer can get a better understanding of the interviewee.
You should expect them to ask about your interest in the role and company, why you think you’d be a good fit, and questions related to your past experiences. In the rare case, you may also be asked one or two simple technical questions.
The interviewer is simply making sure that you’re genuinely interested in the company, that you’re a good communicator, and that you raise no red flags.
For many data science internships now, companies will require you to complete a take-home challenge. What this means is that they’ll give you a certain time period to complete a case study that they give you, which is typically reflective of the kind of problems you’d encounter in the actual role.
This is done to see how you would approach a problem (i.e. your thought process) and whether you have the basic knowledge that’s required to complete the problem. Examples of cases include cleaning a dataset and building a machine learning model to make a given prediction, or querying a dataset and analyzing the data, or a combination of the two.
Lastly is the on-site interview, which can consist of one to as many as six rounds of interviews. These interviews are composed of a mixture of behavioral and technical interview questions. You may also be required to complete a case on the spot for one of the rounds.
While they are trying to make sure that you have a strong understanding of the fundamental knowledge that’s required to be successful in the role, they’re also assessing your behavior, your motives, and ultimately whether you’d be a good fit for the team or not. Make sure you’re on your best behavior but don’t forget to be yourself!
Once you finish your data science internship, review "The New Grad Guide on Landing a Data Science Job" to prepare for your upcoming interviews!
Below are 10 examples of some interview questions that you are expected to know:
- What is a p-value?
- What is regularization and what problem does it try to solve?
- How can you the relationship between, say age and income, into a linear model?
- What is the probability of getting a sum of 4 if you have two equally weight dice?
- What are some of the steps that you take when wrangling and cleaning a dataset?
- What is cross-validation and why is it necessary?
- Give an example of when accuracy is not the best metric in determining the effectiveness of a machine learning model.
- What's the difference between an INNER and OUTER JOIN?