SQL Left Join To Combine Multiple Rows: A Full Guide

SQL Left Join To Combine Multiple Rows: A Full Guide

Introduction

SQL (Structured Query Language) is one of the most important querying languages today for accessing and manipulating relational databases.

One common task in SQL involves joining tables to combine related information. The LEFT JOIN operation is a widely known technique to retrieve relevant information from multiple tables.

In this article, we’ll explore a common use case in database management– how to use LEFT JOIN to merge multiple rows into one.

SQL LEFT JOIN Review

Before diving into a specific example, let’s go over how LEFT JOIN works.

In SQL, the LEFT JOIN (or LEFT OUTER JOIN) keyword allows you to retrieve records from two tables. It fetches all the rows from the left table and the matching rows from the right table.

If there’s no match, the result will be NULL for every column of the right table. This JOIN function should be used when you want to combine rows from two tables and show all the records from the “primary” table, even if there are no matching records in the “related” table.

Example

Let’s consider two tables, Students and Enrollments.

Students table

StudentID StudentName
1 Alice
2 Bob
3 Charlie

Enrollments table

EnrollmentID StudentID Course
1 1 Mathematics
2 3 English

To fetch a list of students and their enrolled courses (if any), we can use:

SELECT Students.StudentName, Enrollments.Course
FROM Students
LEFT JOIN Enrollments ON Students.StudentID = Enrollments.StudentID;

Our result should look something like this:

StudentName Course
Alice Mathematics
Bob NULL
Charlie English

Notice that “Bob” appears in the result set with a NULL value, indicating that he isn’t enrolled in any course.

Combining Multiple Rows with LEFT JOIN in SQL

Combining multiple rows of a table, especially using LEFT JOIN and/or aggregation functions, is required in various real-world scenarios. Common industry use cases include:

  • Customer purchase history: Say an e-commerce company wants a snapshot of each customer’s purchase history. Instead of having multiple rows representing each item a customer bought, you could have a single row with a combined list of purchases.
  • User activity logs: In web services or applications, user activities might be logged in a database. By combining rows, you can summarize each user’s activities in a single string (e.g., “logged in, uploaded photo, liked post, logged out”).
  • Patient history: In a healthcare database, you could use this technique to summarize treatments or medication given to a patient over a specific period.

An important caveat is that a combined string might become more challenging to parse, especially when dealing with extensive or lengthy data. Always evaluate the specific needs and constraints of your application or analysis before deciding on the best approach.

Using GROUP_CONCAT (MySQL)

GROUP_CONCAT is an aggregation function in MySQL that concatenates values from multiple rows into a single string. When used with LEFT JOIN, it combines two relational tables and summarizes the essential information in a list.

Let’s illustrate this function using an example.

Say that an e-commerce company is interested in customer purchasing patterns. They want to analyze order data to understand how consumers typically group products and better identify which items will likely be bought together in the future.

Orders table

OrderID OrderDate
1 2023-09-13
2 2023-09-11
3 2023-09-15

OrderItems table

ItemID OrderID ProductName Quantity
1 1 Laptop 2
2 1 Mouse 1
3 2 Keyboard 1
4 3 Monitor 1
5 3 Laptop 1

For the view they require, they would need to join Orders and OrderIDs and concatenate the items for each OrderID.

The solution should look like this:

SELECT 
    Orders.OrderID,
    Orders.OrderDate,
    GROUP_CONCAT(CONCAT(OrderItems.ProductName, ' (x', OrderItems.Quantity, ')') ORDER BY OrderItems.ProductName ASC) AS ItemsOrdered
FROM Orders
LEFT JOIN OrderItems ON Orders.OrderID = OrderItems.OrderID
GROUP BY Orders.OrderID, Orders.OrderDate;
SELECT 
    OrderID,
    GROUP_CONCAT(CONCAT(ProductName, ' (x', Quantity, ')') ORDER BY ProductName ASC) AS OrderedProducts
FROM OrderItems
GROUP BY OrderID;

The resulting table should contain a summarised view of the orders. Note that the CONCAT function combines the ProductName, Quantity, and the string literals ‘(x’ and ‘)’. The outer function, GROUP_CONCAT, aggregates these concatenated strings for all of the products related to an order, producing a list.

Expected output:

OrderID OrderDate ItemsOrdered
1 2023-09-13 Laptop (x2), Mouse (x1)
2 2023-09-11 Keyboard (x1)
3 2023-09-15 Laptop (x1), Monitor (x1)

Using STRING_AGG (SQL Server)

STRING_AGG is a function used in SQL Server to concatenate the values of multiple rows into a single string. Essentially, it’s SQL Server’s answer to MySQL’s GROUP_CONCAT function.

The STRING_AGG function syntax consists of:

STRING_AGG ( expression, separator ) [ <order_clause> ]

where the following convention is followed:

  • expression: The column or expression whose values you want to concatenate.
  • separator: The string value that will be used to separate concatenated results.
  • <order_clause> (Optional): Defines the order in which to concatenate the values. It’s typically used in the format WITHIN GROUP (ORDER BY <order_expression>).

Let’s try this out using an example.

A web services company wants to analyze user activity logs to analyze which activities are performed together most frequently at a user level. You’re given two tables with user information and their activities.

Users table

UserID UserName Email
1 Alice mailto:alice@email.com
2 Bob mailto:bob@email.com
3 Charlie mailto:charlie@email.com

UserLogs table

LogID UserID LogEntry LogDate
1 1 Logged in 2023-09-12 08:10:00
2 1 Changed password 2023-09-12 08:20:00
3 2 Logged in 2023-09-12 09:15:00
4 3 Logged in 2023-09-12 10:00:00
5 3 Downloaded report 2023-09-12 10:05:00

To do this, we need to join Users with UserLogs based on UserID and concatenate the logs against each UserName.

As such, our solution should look something like this:

SELECT 
    u.UserName,
    u.Email,
    STRING_AGG(CONCAT(FORMAT(l.LogDate, 'yyyy-MM-dd HH:mm:ss'), ': ', l.LogEntry), '; ') 
        WITHIN GROUP (ORDER BY l.LogDate) AS UserActivities
FROM Users u
LEFT JOIN UserLogs l ON u.UserID = l.UserID
GROUP BY u.UserName, u.Email;

Expected output table:

UserName Email UserActivities
Alice alice@email.com 2023-09-12 08:10:00: Logged in; 2023-09-12 08:20:00: Changed password
Bob bob@email.com 2023-09-12 09:15:00: Logged in
Charlie charlie@email.com 2023-09-12 10:00:00: Logged in; 2023-09-12 10:05:00: Downloaded report

Combining Multiple Rows in Other DBMS

In other DBMSs, this same technique can be employed using slightly different syntax.

PostgreSQL uses the STRING_AGG function, the same as SQL Server. However, the syntax varies slightly across these systems. While MySQL uses an ORDER BY clause within GROUP_CONCAT, SQL Server’s STRING_AGG uses WITHIN GROUP for ordering.

In Oracle, LISTAGG is the aggregation function used for concatenating values from multiple rows into a single string.

More Tips for Combining Multiple Rows

In different DBMSs, each string aggregation function has its own set of nuances and caveats. We’ll look at some of the main functions in-depth:

  1. MySQL’s GROUP_CONCAT
    • String Length Limit: By default, the result length is capped by the group_concat_max_len system variable, which has a default value of 1024 characters. This can lead to truncated results if not set properly.
    • Ordering: The order in which the rows are concatenated isn’t guaranteed unless the ORDER BY clause is explicitly provided.
    • Distinct Values: To avoid duplicates in the concatenated string, you have to use the DISTINCT keyword.
    • NULL Values: If any value in the group is NULL, it’s automatically skipped.
  2. PostgreSQL’s STRING_AGG
    • NULL Values: If the input string has NULL, it’s automatically skipped.
    • Ordering: Similar to GROUP_CONCAT, the order of concatenation isn’t guaranteed unless specified.
  3. SQL Server’s STRING_AGG
    • Version Limitation: STRING_AGG is available only from SQL Server 2017 onwards.
    • No Implicit Ordering: If you need the concatenated values in a specific order, you must use the WITHIN GROUP clause.
    • NULL Values: STRING_AGG will skip NULL values.
  4. Oracle’s LISTAGG
    • String Length Limit: The concatenated string can’t exceed the maximum size allowed for a VARCHAR2, NVARCHAR2, or RAW column. This can lead to an error message.
    • Duplicates: By default, LISTAGG doesn’t handle duplicates. You might need to use the DISTINCT keyword to exclude them.
    • NULL Values: LISTAGG will exclude NULL values during concatenation.
    • Overflow Handling: In Oracle 19c, the ON OVERFLOW clause was introduced to handle situations when the resulting string exceeds the length limit. You can use ON OVERFLOW TRUNCATE to manage these instances.

When using any of these functions, understanding their nuances is crucial. Proper precautions, like setting the right configurations or being aware of version limitations, can prevent unexpected errors in your SQL operations.

The FOR XML Subquery Technique

The FOR XML technique in SQL Server is a method to format query results as XML. This technique is often used when a traditional rowset structure doesn’t suffice.

Let’s try this out using our user activity example from above.

Users table

UserID UserName Email
1 Alice alice@email.com
2 Bob bob@email.com
3 Charlie charlie@email.com

UserLogs table

LogID UserID LogEntry LogDate
1 1 Logged in 2023-09-12 08:10:00
2 1 Changed password 2023-09-12 08:20:00
3 2 Logged in 2023-09-12 09:15:00
4 3 Logged in 2023-09-12 10:00:00
5 3 Downloaded report 2023-09-12 10:05:00

The objective in this example remains the same– to join Users with UserLogs based on UserID, and concatenate the logs against each UserName.

Using the FOR XML subquery method, our code should look like this:

SELECT 
    u.UserID, 
    u.UserName,
    STUFF(
        (
            SELECT ';' + LogEntry
            FROM UserLogs ul
            WHERE ul.UserID = u.UserID
            FOR XML PATH('')
        ), 1, 1, ''
    ) AS Activities
FROM Users u;

Note the use of the STUFF function, which removes the trailing delimiter ‘;’ from the end of the aggregated string.

Our expected output table now looks like:

UserID UserName Activities
1 Alice Logged In;Opened App;Logged Out
2 Bob Logged In;Closed App
3 Charlie Logged In;DownloadedReport

FOR XML Caveats:

  1. XML Escape: The values returned using the FOR XML method are XML-escaped. For example, special characters like <, >, and & would be encoded as &lt;, &gt;, and &amp; respectively. If your data contains these characters and you don’t want them XML-escaped, additional string manipulation functions will be needed.
  2. Trailing Delimiter: This method often leaves a trailing delimiter (like a comma). One way to handle this is by using the STUFF function to remove the first character of the concatenated string.
  3. Performance: While FOR XML PATH provides a way to aggregate strings, it’s not always the most efficient method, especially for large datasets. It’s essential to evaluate performance, especially if more efficient built-in string aggregation functions are available in newer versions of the DBMS.

Conclusion

This article has explored varying techniques to combine multiple rows across different database systems, including MySQL’s GROUP_CONCAT, SQL Server’s FOR XML PATH(''), STRING_AGG, and Oracle’s LISTAGG. Each method serves the same fundamental purpose but carries its own nuances and caveats.

Mastering these techniques will increase the efficiency of your data analysis and enhance your data presentation skills. Whether you’re a seasoned SQL expert or a budding enthusiast, combining rows is a game-changer, underscoring the vast possibilities that SQL offers for data manipulation and presentation.