In the next ten years, what will be the most important skill needed in data science? Hint: it’s not Tensorflow, it’s not GPT-3, and it’s not the next fancy deep learning algorithm. The most important skill you need is actually applying good data science judgment.

Almost no one does machine learning by hand

A Brief History of Machine Learning
A Brief History of Machine Learning

Remember when machine learning was really hard? Before you could actually write train_test_split XGBoostClassifier and build a model in five lines? Personally, I don’t. But this was the reality for the creators of sci-kit learn, tensorflow, and all the other machine learning libraries we’ve come to depend on in the 2020s. Because the creators of these libraries didn’t have libraries to work on. They had to architect all of their classifiers from scratch.

Applying machine learning from scratch is really hard. It’s like the difference in programming in Python versus C. No one really wants to manage memory and track pointers, just like I don’t want to build trees by hand when running a random forest model.

But as data science has evolved in the last twenty years, we’ve gradually broken more and more technological barriers in AI and machine learning. We’ve made it easier for data scientists to enter the field without being complete programming and academic experts. Now anyone can re-use packages developed by experienced (and often unpaid) professionals to render their K Nearest Neighbors model almost without trying.

And more than anything else in machine learning, Auto-ML has come along to push the frontiers of data science even farther. According to Wikipedia:

“Automated machine learning (AutoML) is the process of automating the process of applying machine learning to real-world problems. AutoML covers the complete pipeline from the raw dataset to the deployable machine learning model.”

What does this jargon actually mean? It means the future will rely on you being able to build machine learning models without even having to code. You’ll be able to look at an interface (Minority Report style), drag and drop the dataset you care about, select the target variable you’re trying to predict, press run, and get a pre-trained model deployed in seconds.

Future-Facing Data Science

And so as this data science evolution is taking place, you have to wonder: where does it stop? Are we all going to be automated out of jobs in data science?

Well, let’s look at the timeline. Machine learning development is accelerating over time. We know that as model creation, deployment, and implementation becomes easier, the bottleneck from idea to production will drastically decrease. This is why the most important part of data science will be understanding where exactly to apply AI. This means understanding the tradeoffs, costs, and benefits of utilizing AI and machine learning on a case-by-case basis.

The problems in the future won’t be technical in nature. Rather, the problems of the future will rely on efficiently and accurately applying data science judgments to a vast array of scenarios. When you have the ability to produce and build models in seconds, the most important skill will be to understand where and how to apply them.

For example, let’s say that you work for a startup that deals in HR benefits for companies. This domain consists of accounting, 401K, payroll, taxes, etc. The startup can probably go in tons of different directions. As a data scientist, you want to be able to point out exactly where this startup can apply AI to get the most value in product growth.

Let’s say your coworker wants to build a model that can predict how many users will sign up for a 401K. You don’t see how that impacts the business.

But let’s say you see that you see a bunch of data entry people are hired to fill out tax forms in a long and arduous process. You want to know enough machine learning and data science to understand that if you can automate their processes more efficiently, they can spend ten minutes filling out tax forms instead of an hour.

Here’s where your data science judgment comes in. Maybe by applying a quick AutoML model to this problem, you estimate you can reduce their work time by half. But maybe when you apply a more detailed deep learning model, it reduces this time down to seconds.

Companies deal with these kinds of problems every day. Data science leaders and machine learning experts have to weigh the tradeoffs of applying AI to different kinds of projects. Most data science leaders’ sole job is two parts: prioritizing where to apply AI and then working on a high level architecture of implementation for others to implement.

So even in the next ten years, let’s say AI gets extremely sophisticated to the point where all we have to do is snap our fingers and a program suddenly builds us a model. We still have to use our judgement in understanding where we place that model. And so as data science progresses, the most important skill to learn will be understanding where to apply data science to make the most significant improvements.

Thanos snaps his fingers and the model builds itself
Thanos envisions the future of machine learning

Be the person that can point AI in the right direction for maximum output and you’ll never be outsourced or replaced.

Conclusion

Human judgement won’t stop playing a role in business decisions until the day of transcendence, when AI will make judgements and improvements on AI. By then, you won’t have to worry about being out of a job because everyone will be out of a job.

So until that day, improve your data science judgement and learn where you can gain the most benefit by applying data science fundamentals to the problems you encounter in your day-to-day life.