Home

Topic: UNION and UNION ALL

Learning objective


What is UNION?


The UNION operator is used to combine data result sets from more than one SELECT statements into a single unified output.

Whereas SQL JOIN combines data vertically by merging based on key columns, UNION combines data horizontally through column positions.



A simple example

Let’s pretend we have two small tables.

The male_employees table, which contains male employees’ first and last names:

emp_no | first_name | last_name
-------+------------+-----------
 10001 | Georgi     | Facello   

The female_employees table, which contains female employees’ first and last names:

emp_no | first_name | last_name
-------+------------+-----------
 10002 | Christine  | Koblick   

A UNION between the two tables is written as thus:

SELECT *
FROM male_employees

UNION

SELECT *
FROM female_employees
;

which horizontally concatenates the data from both tables and displays as a single output:

emp_no | first_name | last_name
-------+------------+-----------
 10001 | Georgi     | Facello   
 10002 | Christine  | Koblick    




UNION operator constraints

The UNION operator will only work under two conditions.


Condition 1. All SELECT list of queries under the same UNION query must have the same number of columns, and the columns must be presented in the same order.

The following is correct:

SELECT
  emp_no,
  first_name,
  last_name
FROM male_employees

UNION

SELECT
  emp_no,
  first_name,
  last_name
FROM female_employees
;

The following will not work because the order of columns is switched:

SELECT
  emp_no,
  first_name,
  last_name
FROM male_employees

UNION

SELECT
  first_name,
  last_name,
  emp_no
FROM female_employees
;

The following also will not work because the first SELECT contains 2 columns while the second SELECT contains 3 columns:

SELECT
  first_name,
  last_name
FROM male_employees

UNION

SELECT
  emp_no,
  first_name,
  last_name
FROM female_employees
;



Condition 2. The data types of all corresponding columns being horizontally stacked together must be compatible.

Using the employees database as example, the following is syntactically correct, not so much because both columns are called emp_no, but rather that both columns are of the same data type (integer).

SELECT emp_no
FROM employees

UNION

SELECT emp_no
FROM salaries
;


The following is also syntactically correct, since both columns are of the same data type (date), although it might not make much sense logically.

SELECT to_date
FROM salaries

UNION

SELECT from_date
FROM salaries
;



Sourcing

UNION does not always preserve the row order of the source tables. In order to keep track of which row came from which SELECT query, it is possible to do so by creating a source or description column for record keeping.

For example:

SELECT
  to_date,
  'to_date' AS source
FROM salaries

UNION

SELECT
  from_date,
  'from_date' AS source
FROM salaries
;



Column alias-ing

To alias column names in a query that contains UNION clauses, apply the alias-ing to the first of the SELECT queries.

For example:

SELECT
  emp_no AS employee_number -- apply column aliasing here
FROM employees

UNION

SELECT
  emp_no -- aliasing here will be ignored
FROM salaries

UNION

SELECT
  emp_no -- aliasing here will also be ignored
FROM titles
;    


UNION vs UNION ALL


The UNION operator removes row-level duplications from the combined data set. To retain the duplicate rows, use the the UNION ALL operator instead.

Let’s pretend we have two small tables.

The all_employees table, which contains all employees’ first and last names:

emp_no | first_name | last_name
-------+------------+-----------
 10001 | Georgi     | Facello   
 10002 | Christine  | Koblick    

The eng_employees table, which contains only the engineering employees’ first and last names:

emp_no | first_name | last_name
-------+------------+-----------
 10002 | Christine  | Koblick   

A UNION between the two tables the outcome:

SELECT *
FROM eng_employees

UNION

SELECT *
FROM all_employees
;

automatically deduplicates the outcome so that we end with:

emp_no | first_name | last_name
-------+------------+-----------
 10001 | Georgi     | Facello   
 10002 | Christine  | Koblick    


Whereas, a UNION ALL between the two tables:

SELECT *
FROM eng_employees

UNION ALL

SELECT *
FROM all_employees
;

retains the duplication:

emp_no | first_name | last_name
-------+------------+-----------
 10001 | Georgi     | Facello   
 10002 | Christine  | Koblick    
 10002 | Christine  | Koblick     



To note:

  1. All conditions that apply to UNION operators also apply to UNION ALL operators.

  2. Because UNION ALL does not do deduplication, it is computationally more efficient than UNION.




Exercise

How many rows are in the output for each of the following queries? Why do the output numbers differ so much?

SELECT emp_no
FROM employees

UNION

SELECT emp_no
FROM salaries
;
SELECT emp_no
FROM employees

UNION ALL

SELECT emp_no
FROM salaries
;

Order of syntax and execution


ORDER BY and LIMIT with UNIONs

ORDER BY and LIMIT are the exception to the rule inside a SQL query that contains UNIONs or UNION ALLs.

There can only be 1 ORDER BY and 1 LIMIT clause in such a SQL query. It is written at the end of the entire SQL query and is evaluated after the concatenation (and possible deduplication) of data.

See below for more details.




Order of syntax

The written syntax order is as follows:

SELECT ...
FROM ...
[WHERE ...]
[GROUP BY ...]
[HAVING ...]

UNION [ALL]

SELECT ...
FROM ...
[WHERE ...]
[GROUP BY ...]
[HAVING ...]

UNION [ALL]
...

[ORDER BY ...]
[LIMIT ...]
;




Order of execution

Based on our existing vocabulary in SQL:

  1. PostgreSQL identifies the source table using the FROM clause.
  2. Filters rows based on WHERE filter conditions.
  3. Aggregates and groups based on GROUP BY columns.
  4. Additionally filters rows based on aggregated values using HAVING filter conditions.
  5. Returns the columns specified (or temporarily created) using SELECT.
  6. Repeats steps #1 - #5 for all subsequent SELECT queries.
  7. If UNION ALL, then concatenate without deduplication of results. If UNION then deduplicate and remove rows that are perfect duplicates.
  8. Sorts the order of rows using columns specified in ORDER BY.
  9. Restricts number of rows outputted with LIMIT.

FROM -> [WHERE] -> [GROUP BY] -> [HAVING] -> SELECT -> [repeat for all queries] -> [UNION or UNION ALL] -> [ORDER BY] -> [LIMIT]




Style

Please see the SQL style guide for more details.




Breakout exercises

Ex 1. In layman’s terms, what is the following SQL query accomplishing?

SELECT
  emp_no,
  'min salary' AS description,
  MIN(salary) AS salary
FROM salaries
WHERE emp_no BETWEEN 10001 AND 10005
GROUP BY 1, 2

UNION ALL

SELECT
  emp_no,
  'max salary',
  MAX(salary)
FROM salaries
WHERE emp_no BETWEEN 10001 AND 10005
GROUP BY 1, 2

ORDER BY 1, 2
;


Ex 2. Label the order of execution for the following query:

SELECT
  emp_no,
  first_name,
  last_name
FROM employees
WHERE first_name = 'Aamer'

UNION

SELECT
  emp_no,
  first_name,
  last_name
FROM employees
WHERE emp_no BETWEEN 10001 AND 10009

ORDER BY
  first_name ASC,
  last_name ASC
;


Ex 3. Is there a much simpler way of writing the SQL in exercise 1 without having to use UNION and still achieving the same output? If so, please attempt it.


Ex 4. Using just the salaries table, could you create the following output?

Desired output:

description | salary
------------+--------
avg salary  |  63811
max salary  | 158220
min salary  |  38623

Resources


Questions?