HAVING
HAVING
filters in SQLHAVING
and when to use WHERE
Recall from previously, this SQL:
SELECT
title,
COUNT(*) AS cnt_row
FROM titles
GROUP BY
title
;
which outputs the following:
title | cnt_row
-------------------+---------
Assistant Engineer | 15128
Engineer | 115003
Manager | 24
Senior Engineer | 97750
Senior Staff | 92853
Staff | 107391
Technique Leader | 15159
(7 rows)
Using the WHERE
clause, how do you modify the SQL query above so that the output looks like the following?
title | cnt_row
-------------------+---------
Assistant Engineer | 15128
(1 row)
HAVING
clause?The HAVING
clause specifies a search condition to filter groups or aggregates returned by the GROUP BY
clause.
A simple example
SELECT
title,
COUNT(*) AS cnt_row
FROM titles
GROUP BY
title
HAVING COUNT(*) = 15128
;
returns:
title | cnt_row
-------------------+---------
Assistant Engineer | 15128
(1 row)
How does it work?
The HAVING
condition must evaluate to True
or False
. It can be a Boolean expression or a combination of Boolean expressions.
The SQL query will only return results that satisfy the condition in the HAVING
clause.
Exercise
Fill the HAVING
clause to create the desired output:
SELECT
title,
COUNT(*) AS cnt_row
FROM titles
GROUP BY
title
HAVING ???
;
Desired output:
title | cnt_row
-------------------+---------
Engineer | 115003
(1 row)
Order of syntax
The written syntax order is as follows:
SELECT ...
FROM ...
WHERE ...
GROUP BY ...
HAVING ...
LIMIT ...
;
Order of execution
Based on our existing vocabulary in SQL:
FROM
clause.WHERE
filter conditions.GROUP BY
columns.HAVING
filter conditions.SELECT
.LIMIT
.FROM
-> [WHERE
] -> [GROUP BY
] -> [HAVING
] -> SELECT
-> [LIMIT
]
Style
Please see the SQL style guide for more details.
A simple example, step by step
SELECT
title,
COUNT(*) AS cnt_row
FROM titles
GROUP BY
title
HAVING
COUNT(*) = 15128
;
Step 1. SQL identifies that this query is based off of the titles
table.
Step 2. Because there is no WHERE
condition, SQL skips that step of filtering.
Step 3. SQL aggregates the table so that the temporary output is 1 row per unique title value.
title | cnt_row
-------------------+---------
Assistant Engineer | 15128
Engineer | 115003
Manager | 24
Senior Engineer | 97750
Senior Staff | 92853
Staff | 107391
Technique Leader | 15159
(7 rows)
Step 4. SQL evaluates the aggregated condition COUNT(*) = 15128
for every row in the currently aggregated output and returns True or False for each.
title | cnt_row | ?column?
-------------------+---------+----------
Assistant Engineer | 15128 | t
Engineer | 115003 | f
Manager | 24 | f
Senior Engineer | 97750 | f
Senior Staff | 92853 | f
Staff | 107391 | f
Technique Leader | 15159 | f
(7 rows)
Step 5. The temporary column is used to remove rows where the condition result is not True, leaving only 1 row.
title | cnt_row | ?column?
-------------------+---------+----------
Assistant Engineer | 15128 | t
Step 6. SQL honors what is specified in the SELECT
clause, and only returns 2 columns: the original column title
and the aggregated column COUNT(*)
renamed as cnt_row
.
title | cnt_row
-------------------+---------
Assistant Engineer | 15128
(1 row)
Consider the following SQL query. Why does it return no rows?
SELECT
title,
COUNT(*) AS cnt_row
FROM titles
WHERE title IN ('Engineer', 'Manager', 'Senior Engineer')
GROUP BY
title
HAVING
COUNT(*) = 15128
;
HAVING
versus WHERE
filtersKey difference #1:
WHERE
clause is applied to rows.HAVING
clause is applied to groups of rows.Key difference #2:
WHERE
clause can operate with or without GROUP BY
.HAVING
clause requires prior aggregation with GROUP BY
to operate.
Aside from these key differences, HAVING
clause filters are remarkably similar to WHERE
clause filters.
The same list of operators used in WHERE
clause can also be used in HAVING
clause, although text operators (LIKE
, with wildcards %
and _
) are rarely used.
A complete list of operators learned in the WHERE
lecture:
Operator | Description
----------+---------------
= | Equal
<>, != | Not equal
> | Greater than
< | Less than
>= | Greater than or equal
<= | Less than or equal
IN | Return true if a value matches any value in a list
BETWEEN | Return true if a value is between a range of values
LIKE | Return true if a value matches a pattern (using _ or %)
IS NULL | Return true if a value is NULL
NOT | Negate the result of other operators
AND | Logical operator AND
OR | Logical operator OR
A few simple examples of HAVING
in action:
Return title and row count for job titles that are not held by 15,128 employees.
SELECT
title,
COUNT(*) AS cnt_row
FROM titles
GROUP BY
title
HAVING
COUNT(*) != 15128
;
Return title and row count for job titles that are held by more than or equal to 10,000 employees.
SELECT
title,
COUNT(*) AS cnt_row
FROM titles
GROUP BY
title
HAVING
COUNT(*) >= 10000
;
Return title and row count for job titles that are held by between 10,000 and 20,000 employees.
SELECT
title,
COUNT(*) AS cnt_row
FROM titles
GROUP BY
title
HAVING
COUNT(*) BETWEEN 10000 AND 20000
;
The logical operators AND
and OR
can be used to chain multiple HAVING
conditions together. Keep in mind that order of operation still applies, prioritizing conditions inside parentheses.
In this example, we are only retaining employees whose minimum salary is greater than $40k and whose maximum salary is less than $45k.
SELECT
emp_no,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM salaries
GROUP BY
emp_no
HAVING MIN(salary) >= 40000
AND MAX(salary) <= 45000
;
-- 13001 rows
The above is fundamentally and conceptually very different from the query below. The query below does not filter on the employee (emp_no
) level, rather applying the filter errorneously on the raw table first, prior to aggregating on the employee level.
SELECT
emp_no,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM salaries
WHERE salary >= 40000
AND salary <= 45000
GROUP BY
emp_no
;
-- 126462 rows
Finally, note that the aggregated filter condition in HAVING
does not always need to be present in the SELECT
clause for the filtering to take effect.
This query will return 13,001 rows, 3 columns:
SELECT
emp_no,
MIN(salary) AS min_salary,
MAX(salary) AS max_salary
FROM salaries
GROUP BY
emp_no
HAVING MIN(salary) >= 40000
AND MAX(salary) <= 45000
;
This query will also return the same 13,001 rows, albeit with only 1 column (emp_no
):
SELECT
emp_no
FROM salaries
GROUP BY
emp_no
HAVING MIN(salary) >= 40000
AND MAX(salary) <= 45000
;
WHERE
or HAVING
.
SELECT
gender,
SUM(1) As num_rows
FROM employees
??? gender = 'F'
GROUP BY 1
;
WHERE
or HAVING
.
SELECT
title,
COUNT(*) AS cnt_rows,
COUNT(DISTINCT emp_no) AS cnt_emps
FROM titles
GROUP BY
title
??? COUNT(*) BETWEEN 10000 AND 20000
;
WHERE
or HAVING
.
SELECT
EXTRACT(year FROM birth_date) AS birth_year,
COUNT(DISTINCT emp_no) AS num_employees
FROM employees
??? EXTRACT(year FROM birth_date) = 1961
GROUP BY
EXTRACT(year FROM birth_date)
??? SUM(1) > 10000
;
employees
table, output a list of first names that are shared among the employees across more than 200 people. (e.g. the name 'Aamer'
is shared by 228 people in the company and should be included in this list.)HAVING
filters.