Back to Data Science
Data Science

Data Science

91 of 257 Completed

Data Science Technical Skills

Data science technical skills are the core of data science. You will need coding experience to unlock their potential, but they are the skills most useful to tackling both your interview and eventual business situations.

For many people, these are the hardest questions to answer accurately or communicate effectively, as they get the most into mathematics, statistics, and technical concepts underpinning data science.

The four core technical topics for data science are:

  • Probability
  • Statistics
  • Machine Learning
  • Database design & engineering

Probability Questions

Probability questions assess your probability knowledge. You’ll be making predictions as a data scientist, and every prediction is probabilistic in nature (especially in business). So you need a strong background in probability.

The first type of probability question they evaluate is conceptual; they may ask you to define probability concepts in a simple way to make sure you incorporated them correctly.

For example:

  • What is an unbiased estimator? Give an example as if speaking to a data science layperson.
  • Give examples of Machine learning models with lots of bias.

However, many probabilistic questions also test your ability for analytical reasoning and problem-solving. So it is always a good idea to give the correct answer while also communicating your thought process clearly and with a structured answer.

These kinds of questions may be logic-based, working with combinatorics, or simple probability questions asking you to calculate the probability of an event given a scenario.

For example, let’s say you shuffle a deck of 500 numbered cards. You pick 3 cards at random. What’s the probability that each subsequent card is larger than the last?

Probability distribution questions are very common. They evaluate your understanding of probability distributions and how the distributions relate to each other in order to see how well you reason within complex probabilistic contexts.

Examples:

  • Given two standard normal random variables X and Y, what is the probability that 2X > Y?
  • How far apart do the means need to be for a 50-50 mixture of two normal distributions to be bimodal?

Statistics Questions

Statistical questions for data science test your statistics knowledge. They are specially oriented to evaluate your ability to design tests, understand their results, and perform statistical computations over them.

Conceptual statistics questions are the first approach to understanding how well you know and understand statistical concepts. They may ask you for the definitions of statistical concepts or ask some simple or broad questions on how to apply them.

They also test your ability to communicate statistical information to a layperson.

For example:

  • What p-value should you target in an A/B test?
  • What are Type I and Type II errors?

Two common ways of testing your use of statistical concepts are case studies and experiment designs.

In statistics-based case studies questions, the interviewer may ask you to make computations (e.g., mean and variance in a non-normal distribution). These questions can cover a wide range of statistics and probability concepts.

Example: Let’s say you have a sample size of N. The margin of error for the sample size is 3. How much larger a sample do you need to reach in order to decrease the margin of error to 0.3?

These questions provide you with a testing scenario and ask you to either design an A/B test, assess what’s going on with an existing test, or measure the test’s results.

Experiment design questions test your ability to design and measure A/B tests. They include core statistical concepts like P-value, significance, and the power of a test.

Examples:

  • How would you determine if an A/B test with unbalanced sample sizes would result in bias?
  • You are asked to run a two-week A/B test to test an increase in pricing. How would you approach designing this test?

Machine Learning

Machine learning is a crucial aspect of data science because it is at the core of making predictions. Machine learning marks one of the main differences between a data scientist role and a data analyst role.

Machine learning questions try to assess your depth of knowledge and your ability to use ML concepts to find solutions, as well as evaluate your experience using machine learning tools. They span the entire process of building a model:

  1. Data Exploration & Pre-Processing
  2. Feature Selection & Engineering
  3. Model Selection
  4. Cross Validation
  5. Evaluation Metrics
  6. Testing and Roll Out

Machine learning questions are very broad, they come in various types. We’ll get deeper into each of them along the data science learning path, but we can mention a few broad categories alongside a few examples:

Conceptual questions:

  • What is linear discriminant analysis?
  • What is variance in a model?

Case studies:

  • You have a categorical variable with thousands of distinct values. How would you encode it?
  • You want to build a way to estimate the month and day of people’s birthdays. What methods would you propose, and what data would you seek to use?

Design questions:

  • How would you prevent overfitting in a deep-learning model?
  • How do you optimize model parameters during model building?

ML System design:

Machine learning system design questions ask you about the design and architecture of machine learning applications. Essentially, these questions test your ability to solve the problem of deploying a machine-learning model that meets specific business requirements.

For example, How would you build a machine-learning system to generate Spotify’s discover weekly playlist?

Recommendation & search engine questions are quite popular because they are a mixture between case studies and system design questions.

For example:

  • You work at Facebook and are tasked with building a restaurant recommendation engine. What data would you use? How would you build it?
  • How would you build a video recommendation system for YouTube? The main goal is to maximize user engagement.

Database Design & Engineering

Database design & engineering questions test your knowledge of data architecture and how to design it. They come in various forms.

Sometimes they are framed as hypothetical case studies in which you are asked to build or design a database based on stakeholder inputs.

During interviews, however, you might also face basic database and SQL definition questions or scenario-driven problem-solving questions.

Basic definitions, concepts, and database design questions test basic database design concepts, such as entity relationship modeling or normalization forms. Keep your answers short and simple.

For example, what is the difference between a physical database model and a logical model?

Case study questions provide a problem statement like “How would you build a database for X feature on Y platform?” You must first gather information and then develop a schema and database architecture to fit the problem.

For example, how would you design the data model for the notification system of a Reddit-style app?

Finally, there there are database design & data engineering questions that could fall into the coding or applications question families:

  • SQL coding exercises evaluate you on the SQL skills you’ll use for database design. For example, questions that are especially focused on Data Definition Language statements.

  • Scenario-based questions propose a database issue and ask you how you would respond, e.g. what would you do if you suspect a data error? Your job is to walk the interviewer through your problem-solving process.

Good job, keep it up!

35%

Completed

You have 166 sections remaining on this learning path.

Advance your learning journey! Go Premium and unlock 40+ hours of specialized content.