Back to SQL
SQL

SQL

24 of 56 Completed

How often does SQL show up in interviews?

A few weeks ago, I spoke with a hiring manager at a prominent tech company in Silicon Valley, and he mentioned that he always tested SQL first when interviewing a candidate. I asked him to elaborate, and he situated the importance of SQL in a great way.

“No matter what, the candidate needs to know SQL. I don’t care if they’re the best ML expert in the world. If you can’t pull your own data, you can’t work on my team. No one is going to pull your data for you.”

Even if you’re the world’s best machine learning researcher in the world, how would it look if you had to go to an analyst and say, “Hey, can you get this dataset for me, clean it, and then also feature engineer it a bit? Thx.”

This doesn’t exactly make for great teamwork and it also wouldn’t make sense that you would be the best ML expert in the world and still not master something as simple as SQL.

If this isn’t enough, the data speaks for itself. At Interview Query, we analyzed a dataset of Glassdoor data science interview experiences and responses submitted by our users. The analysis came back that SQL was asked:

image

Due to its reliability, friendliness, and independence, knowing SQL is by far the most important skill now for nabbing a data science position.

With its importance in interviews, we designed this course specifically to help you ace the SQL questions.

Why most interviews are in SQL rather than Pandas

A friend of mine called me the other day out of the blue in a state of frustration.

Let’s call him Don.

Don was ranting about his latest data science interview he had thirty minutes ago. His interviewer had asked him an interview question over the phone that consisted of two example tables, requiring him to write a query that would involve joining both tables to return a value.

When presented with a code editor, my friend didn’t understand why SQL was the only option.

“Um, I don’t know SQL but I can do anything you want in pandas,” Don replied nervously.

The interviewer was quiet in response over the phone. He finally spoke up and asked why Don didn’t know SQL? Don was a machine learning engineer at a mid-sized startup. Nonetheless, he got a rejection email the next day.

I was also perplexed about why he didn’t know SQL. And then I realized, after talking to many aspiring data scientists on Interview Query, that they too, like Don, had very little SQL experience. Most data science enthusiasts understood the intricacies of advanced data manipulation, aggregation, joins, and merges within pandas only to flail when given a SQL question asking about a simple sub-query. And yet why was SQL the default language used for all interview questions regarding analytics in data science if so many interviewees didn’t know it?

It’s pretty simple: You can’t query a database in pandas.

SQL is used because most interviews are from huge tech companies and most huge tech companies have a lot of data stored in their databases. Like a ridiculous amount of data. One day’s worth of data at a company like Facebook or Dropbox can be represented by a number with more commas than fingers on your hands!

But sure, if you know SQL basics, then why can’t you just run a select * from events_table where blah = ‘doodoo’ and pull the data directly into pandas right? But, if your analytics database is like most companies, it’ll take a couple hours and a thousand clusters before it finishes that query. And, even then, pandas runs on computer memory and companies generally hold their data in relational databases for a reason. Your dataset is bigger than five million rows? Have fun watching your laptop slowly break down.

Here’s the ultimate secret though: SQL is not hard.

Most people that have used pandas will pick up SQL like a trivial second language. Most people that understand Excel will also understand SQL very easily. Almost all of the operations that you can do in SQL, you can also do in Pandas and vice versa. Getting good at SQL has its clear advantages, not just for interviews.

Pandas and its simplicity towards getting started is like Data Science 101 for aspiring data scientists. It comes from a place where work in data science land is perceived exactly like a Kaggle competition where datasets are pre-cleaned, requirements to pass are easily scored, and all datasets are clearly labeled under a nicely quant folder named data.

But the real world isn’t like this. The real world is messy and requirements change faster than your CEO’s mood swings, which is why we have SQL– to help query the databases that hold and store our data efficiently and make it easy to run analytics.

So learn SQL. Don certainly did. Because he eventually got a job at Facebook with his new SQL knowledge.

Good job, keep it up!

42%

Completed

You have 32 sections remaining on this learning path.