Mathematics is one subject that scares a lot of people interested in data science and machine learning. In fact, math is the topic of numerous questions we get asked by early-career data scientists. They all seem to ask a version of:
The short answer is that it’s yes, it is important. And if you’re interested in a data science career with no math background, you’ll likely struggle. But the good news is that the most fundamental mathematical concepts in data science can be learned (even if you don’t have much math experience).
But we wanted to answer that question for our readers - do I need to know math for data science? Is it a requirement in becoming a data scientist? - and provide an overview of some of the most useful mathematical concepts practitioners use.
The bottom line is this: Math - or more broadly, the quantitative reasoning skills a math background provides - is essential for many day-to-day job tasks.
How much math is needed in the field of data science? A wide range of mathematical concepts is put into play. But if you’re starting from scratch, you should focus your studies on three core areas, the so-called Big Three.
This includes: Linear algebra, calculus, and most importantly, statistics and probability.
Statistics is used nearly every day by data scientists. In fact, the majority of data science interviews for FAANG jobs will ask statistics questions.
In data science, statistics is used for trend-spotting and forecasting, predictive modeling, and hypothesis testing, to name a few applications. For example, if a product manager asked you to forecast sales, you might turn to a concept like regression analysis. Core concepts to know include:
One of the fundamental branches of math, linear algebra, applies to many data science processes. For example, linear algebra is essential for understanding many algorithms and prediction models. With linear algebra, it’s important to have a strong grasp of the fundamentals (although unlike stats, basic knowledge might be all that’s necessary. Core concepts to know include:
Calculus concepts are used for several key data science techniques. For example, backpropagation algorithms used to train neural networks are typically based on the chain rule of calculus. Core concepts to know include:
At the risk of being overly broad, a data scientist’s key job role is to mine, analyze and interpret data. And at each stage, math plays a role.
Really, a good way to think about how math is used is to think about some of the core techniques data scientists use: Clustering, regression and classification. Math forms the basis of all of these techniques:
Clustering is all about determining how data should be grouped. And there’s a lot of statistics and calculus behind these techniques like the K-means algorithm and mean-shift clustering.
Regression techniques are used by data scientists to make data-driven predictions. Concepts like linear regressions and multivariate regressions - from both linear algebra and stats - come in handy.
Classification techniques to sort data are built on math. For example, K-nearest neighbor classification is built around calculus formulas and linear algebra.
In interviews and on the job, you should be able to identify which of these techniques applies to a problem, given the characteristics of the data.
Really, you can apply math to a variety of data science questions. These are just a few of the possibilities:
In nearly every data science interview, you’ll be asked math questions. Statistics are the most common, but calculus- and linear algebra-based questions do get asked. The key is prep. Practice as many sample interview questions as you can.
Let’s calculate it out.
If we draw N cards from a deck of 52, the probability that the first card is a not a pair is 100%, given that you need at least two cards to make a pair.
The probability that the second card is a new card ranking is 48⁄51. How did we compute that?
Now let’s say we take the output from the random integer function and place it into another random function as the max value with the same min value N.
What would the distribution of the samples look like?
What would be the expected value?
Today, data scientists have a lot of tools at their disposal: pre-packaged algorithms, libraries and packages. But a challenge if you don’t have strong math fundamentals is understanding why these models work. Without understanding the why, it’s difficult - if not impossible - to improve upon them.
And improving upon methodologies and inventing your own is the fastest way to become a data scientist. With a strong math background, for example, you’d have a basis for dissecting new methodologies, quickly understanding why and how they work, and using those methods in your own solutions.
Without math, you can certainly find jobs in data science and perform basic tasks, like decision-tree classification, but the advanced methods will likely remain elusive.
Check out more content here on Interview Query if your dream is to land a data science job at tech companies.
We’ll help guide you through your journey as you prepare for your data science interviews by helping you prepare using our features such as our interview questions, learning paths, coaching, and more!