If you’re applying for a job in the field of data science, data engineering, or business analytics, there’s a good chance that you’ll have to work with SQL at one point or another. Knowing how to manage and query your own data is incredibly important in these fields. If you’re new to SQL or need to brush off a rusty set of skills to address the current marketplace, it can be difficult to figure out what you need to know about SQL in order to be competitive in your field.
That being said, here are the basics of SQL:
Invented in 1974, SQL is a programming language designed to interface with Relational Database Management Systems (RDBMS). An RDBMS structures data into tables made up of rows (called “records”) and columns (called “fields”), which can then be queried, combined, and manipulated in various ways to achieve the desired outcome.
MySQL, Microsoft SQL Server, Oracle, Amazon Redshift, and IBM DB2 are all examples of RDBMS-based software. The chances that you will encounter one (or more) of these technologies in the course of your career or interview is pretty high. So, let’s make sure you’re prepared.
There are three different subsets of SQL:
The most commonly used subset of SQL is Data Manipulation Language. The most commonly used command in SQL is
SELECT. For the most part, unless you are a data engineer, you will use
SELECT to make your queries and perform basic manipulations on a database without having to worry about the actual creation of a database or the task of managing permissions.
Unfortunately, the ability to
SELECT all the data from a given table won’t get us very far in SQL. Luckily, SQL has lots of built-in tools to help us narrow and structure our queries so that we can get only the data that is relevant to our purposes. Such tools include the following four clauses:
WHERE(used to filter records based on specific criteria)
GROUP BY(used to group records with identical data)
ORDER BY(used to sort records according to a specific field in ASCending or DESCending order)
HAVING(used in combination with the GROUP BY clause in order to filter records).
It’s important to familiarize yourself with these beginner querying clauses and understand when it’s appropriate to use them in order to structure your queries to return useful results.
Q1. What is the difference between the
WHERE clause and the
HAVING filter a table to meet the conditions you set. The difference between the two is shown when they are used in conjunction with the GROUP BY clause. The WHERE clause is used to filter rows before grouping (before the GROUP BY clause), and HAVING is used to filter rows after grouping.
SQL also contains three types of basic operators that allow you to further narrow down your queries: arithmetic operators, comparison operators, and logical operators. Arithmetic operators should be familiar to you if you have basic knowledge of mathematics.
Comparison operators should also be familiar. They include:
<=(less than or equal to)
>=(greater than or equal to)
!<(not less than)
!>(not greater than)
Logical operators can be a bit trickier to get a handle on, but SQL is prized for its intuitive syntax, which is closer to ordinary speech compared to other programming languages. Logical operators include:
Since these operators are highly contextual, it’s difficult to explain them in abstract terms, but if you familiarize yourself with all of them, you’ll be on solid footing when it comes to the basics of SQL in your next interview.
Sometimes, it is useful to be able to compare the results of two separate SELECT queries, often across different tables. In these cases, SQL has built-in set operators that help us compare the results of our queries with as little effort as possible. These set operators include:
UNION(which combines the results of two separate queries without duplicates)
UNION ALL(which combines the results of two separate queries, including duplicates)
MINUS(which returns the result of one query with the common results of the second query “subtracted” from it)
INTERSECT(which returns only the results that appeared in both queries)
It’s important to familiarize yourself with these operators, but even more important to develop a sense of which operator is the appropriate solution for a given situation.
Q3. What is the difference between UNION and UNION ALL? When might you prefer using one over the other?
UNION ALL are SQL operators used to concatenate two or more result sets. This allows us to write multiple
SELECT statements, retrieve the desired results, then combine them together into a final, unified set.
The main difference between
UNION ALL is that:
UNION:only keeps unique records.
UNION ALL: keeps all records, including duplicates.
Often, rather than querying two separate tables with a set operator, we’ll want to actually merge the two tables into one for ease of comparison. In this case, we’ll find ourselves performing a JOIN. There are four commonly used JOINs in SQL:
INNER JOIN(which returns records with matching values from both tables)
LEFT JOIN(which returns all the records of the first table and any matching records from the second table)
RIGHT JOIN(which returns all the records from the second table and any matching records from the first table; the “opposite” of a LEFT JOIN)
FULL JOIN(which returns all records with matching values from either the first or the second table)
It’s important to be able to know, when looking at two different tables, what sort of data becomes available when you perform a given JOIN. Developing a sense of when each
JOIN is appropriate and what sort of analysis each makes available will help you when using your SQL toolkit in a real-world setting.
Here’s a hint:
First, we need the name of the department to be associated with each employee in the
employees table to understand which department each employee is a part of.
department_id field in the employees’ table is associated with the
id field in the department’s table. We call the
department_id a foreign key because it is a column that references the primary key of another table, which in this case is the
id field in the department’s table.
Based on this common field, we can join both tables, using
INNER JOIN to associate the name of the department name to the employees that are a part of those departments.
Sometimes, rather than retrieving an individual record or set of records, you’ll be asked to retrieve specific features of a table’s given field. In this case, you’ll want to apply one of SQL’s built-in aggregate functions. They’re called aggregate functions because they deal with data in “aggregate,” or as a group, rather than individually. Examples of aggregate functions include:
COUNT(or the total number of records in a specific table)
COUNT DISTINCT(or the total number of unique records in a specific table)
AVG(or the average of a set of values)
MIN(or the lowest of a set of values)
MAX(or the highest of a set of values)
SUM(or the sum of a set of values)
FIRST(or the first of a set of values)
LAST(or the last of a set of values)
Being able to recognize situations where aggregate functions can help you narrow down a query is a useful skill that comes with time and practice.
Q5. You’re given a table that represents search results from searches on Facebook. The query column is the search term, the position column represents each position the search result came in, and the rating column represents the human rating of the search result from 1 to 5 where 5 is high relevance and 1 is low relevance.
1. Write a query to compute a metric to measure the quality of the search results for each query.
2. You want to be able to compute a metric that measures the precision of the ranking system based on position. For example, if the results for dog and cat are…
The intention of this article is to outline in broad strokes some of the basic knowledge required to pass an entry-level SQL interview. But the reality is that abstract knowledge can only go so far. It doesn’t matter if you know how to perform an
INNER JOIN if you don’t know when you should, or the difference between a
WHERE and a
HAVING clause without knowing how to apply that information to a real-world situation.
Q6. Given the above table, what command would you use to add a new record to a database?
Q7. Given the above table, how would you structure a query to retrieve all of the entries from a single field?
Q8. Given the above table, what command would you use to eliminate the existing table?
Q9. Given the above table, how would you structure a query to return only users that joined on January 1st?
Q10. Given the above table, how would you structure a query to return the last date any user joined the website?
Q11. Given the above table, how would you structure a query to return all users who visited after a certain date AND spent more than 100 dollars?
Q12. Given the above table, how would you structure a query to return all users who spent exactly 15 dollars on the website?
Q13. Given the above table, how would you structure a query to determine the total amount spent by all visitors to your website?
Q14. Given the above table, how would you structure a query to determine the highest amount spent by any individual visitor to your website? The lowest amount?
Q15. What is the difference between
COUNT DISTINCT? When might you want to use one and not the other?
Q16. Given the above tables, how would you structure a query to return the list of names of people who are users but not subscribers?
Q17. Given the above tables, how would you structure a query to return the list of names of people who are both users and subscribers?
Q18. Given the above tables, how would you structure a query to return matching records from the users’ table while preserving ALL the records from the subscribers’ table?
Q19. Given the above tables, how would you structure a query to return records with matching values from either table?
Check out our Introduction to SQL Course that will help you get started with the foundation of SQL: