UNION
and UNION ALL
UNION
and UNION ALL
operators.UNION
?The UNION
operator is used to combine data result sets from more than one SELECT
statements into a single unified output.
Whereas SQL JOIN
combines data vertically by merging based on key columns, UNION
combines data horizontally through column positions.
A simple example
Let’s pretend we have two small tables.
The male_employees
table, which contains male employees’ first and last names:
emp_no | first_name | last_name
-------+------------+-----------
10001 | Georgi | Facello
The female_employees
table, which contains female employees’ first and last names:
emp_no | first_name | last_name
-------+------------+-----------
10002 | Christine | Koblick
A UNION
between the two tables is written as thus:
SELECT *
FROM male_employees
UNION
SELECT *
FROM female_employees
;
which horizontally concatenates the data from both tables and displays as a single output:
emp_no | first_name | last_name
-------+------------+-----------
10001 | Georgi | Facello
10002 | Christine | Koblick
UNION
operator constraints
The UNION
operator will only work under two conditions.
Condition 1. All SELECT
list of queries under the same UNION
query must have the same number of columns, and the columns must be presented in the same order.
The following is correct:
SELECT
emp_no,
first_name,
last_name
FROM male_employees
UNION
SELECT
emp_no,
first_name,
last_name
FROM female_employees
;
The following will not work because the order of columns is switched:
SELECT
emp_no,
first_name,
last_name
FROM male_employees
UNION
SELECT
first_name,
last_name,
emp_no
FROM female_employees
;
The following also will not work because the first SELECT
contains 2 columns while the second SELECT
contains 3 columns:
SELECT
first_name,
last_name
FROM male_employees
UNION
SELECT
emp_no,
first_name,
last_name
FROM female_employees
;
Condition 2. The data types of all corresponding columns being horizontally stacked together must be compatible.
Using the employees
database as example, the following is syntactically correct, not so much because both columns are called emp_no
, but rather that both columns are of the same data type (integer).
SELECT emp_no
FROM employees
UNION
SELECT emp_no
FROM salaries
;
The following is also syntactically correct, since both columns are of the same data type (date), although it might not make much sense logically.
SELECT to_date
FROM salaries
UNION
SELECT from_date
FROM salaries
;
Sourcing
UNION
does not always preserve the row order of the source tables. In order to keep track of which row came from which SELECT
query, it is possible to do so by creating a source
or description
column for record keeping.
For example:
SELECT
to_date,
'to_date' AS source
FROM salaries
UNION
SELECT
from_date,
'from_date' AS source
FROM salaries
;
Column alias-ing
To alias column names in a query that contains UNION
clauses, apply the alias-ing to the first of the SELECT
queries.
For example:
SELECT
emp_no AS employee_number -- apply column aliasing here
FROM employees
UNION
SELECT
emp_no -- aliasing here will be ignored
FROM salaries
UNION
SELECT
emp_no -- aliasing here will also be ignored
FROM titles
;
UNION
vs UNION ALL
The UNION
operator removes row-level duplications from the combined data set. To retain the duplicate rows, use the the UNION ALL
operator instead.
Let’s pretend we have two small tables.
The all_employees
table, which contains all employees’ first and last names:
emp_no | first_name | last_name
-------+------------+-----------
10001 | Georgi | Facello
10002 | Christine | Koblick
The eng_employees
table, which contains only the engineering employees’ first and last names:
emp_no | first_name | last_name
-------+------------+-----------
10002 | Christine | Koblick
A UNION
between the two tables the outcome:
SELECT *
FROM eng_employees
UNION
SELECT *
FROM all_employees
;
automatically deduplicates the outcome so that we end with:
emp_no | first_name | last_name
-------+------------+-----------
10001 | Georgi | Facello
10002 | Christine | Koblick
Whereas, a UNION ALL
between the two tables:
SELECT *
FROM eng_employees
UNION ALL
SELECT *
FROM all_employees
;
retains the duplication:
emp_no | first_name | last_name
-------+------------+-----------
10001 | Georgi | Facello
10002 | Christine | Koblick
10002 | Christine | Koblick
To note:
All conditions that apply to UNION
operators also apply to UNION ALL
operators.
Because UNION ALL
does not do deduplication, it is computationally more efficient than UNION
.
Exercise
How many rows are in the output for each of the following queries? Why do the output numbers differ so much?
SELECT emp_no
FROM employees
UNION
SELECT emp_no
FROM salaries
;
SELECT emp_no
FROM employees
UNION ALL
SELECT emp_no
FROM salaries
;
ORDER BY
and LIMIT
with UNION
s
ORDER BY
and LIMIT
are the exception to the rule inside a SQL query that contains UNION
s or UNION ALL
s.
There can only be 1 ORDER BY
and 1 LIMIT
clause in such a SQL query. It is written at the end of the entire SQL query and is evaluated after the concatenation (and possible deduplication) of data.
See below for more details.
Order of syntax
The written syntax order is as follows:
SELECT ...
FROM ...
[WHERE ...]
[GROUP BY ...]
[HAVING ...]
UNION [ALL]
SELECT ...
FROM ...
[WHERE ...]
[GROUP BY ...]
[HAVING ...]
UNION [ALL]
...
[ORDER BY ...]
[LIMIT ...]
;
Order of execution
Based on our existing vocabulary in SQL:
FROM
clause.WHERE
filter conditions.GROUP BY
columns.HAVING
filter conditions.SELECT
.SELECT
queries.UNION ALL
, then concatenate without deduplication of results. If UNION
then deduplicate and remove rows that are perfect duplicates.ORDER BY
.LIMIT
.FROM
-> [WHERE
] -> [GROUP BY
] -> [HAVING
] -> SELECT
-> [repeat for all queries] -> [UNION
or UNION ALL
] -> [ORDER BY
] -> [LIMIT
]
Style
Please see the SQL style guide for more details.
Breakout exercises
Ex 1. In layman’s terms, what is the following SQL query accomplishing?
SELECT
emp_no,
'min salary' AS description,
MIN(salary) AS salary
FROM salaries
WHERE emp_no BETWEEN 10001 AND 10005
GROUP BY 1, 2
UNION ALL
SELECT
emp_no,
'max salary',
MAX(salary)
FROM salaries
WHERE emp_no BETWEEN 10001 AND 10005
GROUP BY 1, 2
ORDER BY 1, 2
;
Ex 2. Label the order of execution for the following query:
SELECT
emp_no,
first_name,
last_name
FROM employees
WHERE first_name = 'Aamer'
UNION
SELECT
emp_no,
first_name,
last_name
FROM employees
WHERE emp_no BETWEEN 10001 AND 10009
ORDER BY
first_name ASC,
last_name ASC
;
Ex 3. Is there a much simpler way of writing the SQL in exercise 1 without having to use UNION
and still achieving the same output? If so, please attempt it.
Ex 4. Using just the salaries
table, could you create the following output?
Desired output:
description | salary
------------+--------
avg salary | 63811
max salary | 158220
min salary | 38623
UNION
and UNION ALL
.