Home

Topic: HAVING

Learning objective


Warm up

Recall from previously, this SQL:

SELECT
    title,
    COUNT(*) AS cnt_row
FROM titles
GROUP BY
    title
;

which outputs the following:

title              | cnt_row
-------------------+---------
Assistant Engineer |   15128
Engineer           |  115003
Manager            |      24
Senior Engineer    |   97750
Senior Staff       |   92853
Staff              |  107391
Technique Leader   |   15159
(7 rows)

Using the WHERE clause, how do you modify the SQL query above so that the output looks like the following?

title              | cnt_row
-------------------+---------
Assistant Engineer |   15128
(1 row)

What is a HAVING clause?


The HAVING clause specifies a search condition to filter groups or aggregates returned by the GROUP BY clause.


A simple example

SELECT
    title,
    COUNT(*) AS cnt_row
FROM titles
GROUP BY
    title
HAVING COUNT(*) = 15128    
;

returns:

title              | cnt_row
-------------------+---------
Assistant Engineer |   15128
(1 row)



How does it work?



Exercise

Fill the HAVING clause to create the desired output:

SELECT
    title,
    COUNT(*) AS cnt_row
FROM titles
GROUP BY
    title
HAVING ???
;

Desired output:

title              | cnt_row
-------------------+---------
Engineer           |  115003
(1 row)

Order of syntax and execution


Order of syntax

The written syntax order is as follows:

SELECT ...
FROM ...
WHERE ...
GROUP BY ...
HAVING ...
LIMIT ...
;



Order of execution

Based on our existing vocabulary in SQL:

  1. PostgreSQL identifies the source table using the FROM clause.
  2. Filters rows based on WHERE filter conditions.
  3. Aggregates and groups based on GROUP BY columns.
  4. Additionally filters rows based on aggregated values using HAVING filter conditions.
  5. Returns the columns specified (or temporarily created) using SELECT.
  6. Restricts number of rows outputted with LIMIT.

FROM -> [WHERE] -> [GROUP BY] -> [HAVING] -> SELECT -> [LIMIT]



Style

Please see the SQL style guide for more details.



A simple example, step by step

SELECT
    title,
    COUNT(*) AS cnt_row
FROM titles
GROUP BY
    title
HAVING
    COUNT(*) = 15128    
;

Step 1. SQL identifies that this query is based off of the titles table.

Step 2. Because there is no WHERE condition, SQL skips that step of filtering.

Step 3. SQL aggregates the table so that the temporary output is 1 row per unique title value.

title              | cnt_row
-------------------+---------
Assistant Engineer |   15128
Engineer           |  115003
Manager            |      24
Senior Engineer    |   97750
Senior Staff       |   92853
Staff              |  107391
Technique Leader   |   15159
(7 rows)

Step 4. SQL evaluates the aggregated condition COUNT(*) = 15128 for every row in the currently aggregated output and returns True or False for each.

title              | cnt_row | ?column?
-------------------+---------+----------
Assistant Engineer |   15128 | t
Engineer           |  115003 | f
Manager            |      24 | f
Senior Engineer    |   97750 | f
Senior Staff       |   92853 | f
Staff              |  107391 | f
Technique Leader   |   15159 | f
(7 rows)

Step 5. The temporary column is used to remove rows where the condition result is not True, leaving only 1 row.

title              | cnt_row | ?column?
-------------------+---------+----------
Assistant Engineer |   15128 | t

Step 6. SQL honors what is specified in the SELECT clause, and only returns 2 columns: the original column title and the aggregated column COUNT(*) renamed as cnt_row.

title              | cnt_row
-------------------+---------
Assistant Engineer |   15128
(1 row)

Exercise

Consider the following SQL query. Why does it return no rows?

SELECT
    title,
    COUNT(*) AS cnt_row
FROM titles
WHERE title IN ('Engineer', 'Manager', 'Senior Engineer')
GROUP BY
    title
HAVING
    COUNT(*) = 15128    
;

HAVING versus WHERE filters


Key difference #1:

Key difference #2:



Aside from these key differences, HAVING clause filters are remarkably similar to WHERE clause filters.

The same list of operators used in WHERE clause can also be used in HAVING clause, although text operators (LIKE, with wildcards % and _) are rarely used.

A complete list of operators learned in the WHERE lecture:

 Operator | Description
----------+---------------
  =       | Equal
  <>, !=  | Not equal
  >       | Greater than
  <       | Less than
  >=      | Greater than or equal
  <=      | Less than or equal
  IN      | Return true if a value matches any value in a list
  BETWEEN | Return true if a value is between a range of values
  LIKE    | Return true if a value matches a pattern (using _ or %)
  IS NULL | Return true if a value is NULL
  NOT     | Negate the result of other operators  
  AND     | Logical operator AND
  OR      | Logical operator OR



A few simple examples of HAVING in action:


Return title and row count for job titles that are not held by 15,128 employees.

SELECT
    title,
    COUNT(*) AS cnt_row
FROM titles
GROUP BY
    title
HAVING
    COUNT(*) != 15128
;


Return title and row count for job titles that are held by more than or equal to 10,000 employees.

SELECT
    title,
    COUNT(*) AS cnt_row
FROM titles
GROUP BY
    title
HAVING
    COUNT(*) >= 10000
;


Return title and row count for job titles that are held by between 10,000 and 20,000 employees.

SELECT
    title,
    COUNT(*) AS cnt_row
FROM titles
GROUP BY
    title
HAVING
    COUNT(*) BETWEEN 10000 AND 20000
;



The logical operators AND and OR can be used to chain multiple HAVING conditions together. Keep in mind that order of operation still applies, prioritizing conditions inside parentheses.


In this example, we are only retaining employees whose minimum salary is greater than $40k and whose maximum salary is less than $45k.

SELECT
    emp_no,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary
FROM salaries
GROUP BY
    emp_no
HAVING MIN(salary) >= 40000
   AND MAX(salary) <= 45000
;
-- 13001 rows

The above is fundamentally and conceptually very different from the query below. The query below does not filter on the employee (emp_no) level, rather applying the filter errorneously on the raw table first, prior to aggregating on the employee level.

SELECT
    emp_no,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary
FROM salaries
WHERE salary >= 40000
  AND salary <= 45000
GROUP BY
    emp_no  
;  
-- 126462 rows



Finally, note that the aggregated filter condition in HAVING does not always need to be present in the SELECT clause for the filtering to take effect.


This query will return 13,001 rows, 3 columns:

SELECT
    emp_no,
    MIN(salary) AS min_salary,
    MAX(salary) AS max_salary
FROM salaries
GROUP BY
    emp_no
HAVING MIN(salary) >= 40000
   AND MAX(salary) <= 45000
;

This query will also return the same 13,001 rows, albeit with only 1 column (emp_no):

SELECT
    emp_no
FROM salaries
GROUP BY
    emp_no
HAVING MIN(salary) >= 40000
   AND MAX(salary) <= 45000
;

Exercise

  1. Fill in the ??? with WHERE or HAVING.
      SELECT
       gender,
       SUM(1) As num_rows
      FROM employees
      ??? gender = 'F'
      GROUP BY 1
      ;
    
  2. Fill in the ??? with WHERE or HAVING.
      SELECT
       title,
       COUNT(*) AS cnt_rows,
       COUNT(DISTINCT emp_no) AS cnt_emps
      FROM titles
      GROUP BY
       title
      ??? COUNT(*) BETWEEN 10000 AND 20000
      ;
    
  3. Fill in the ??? with WHERE or HAVING.
      SELECT
       EXTRACT(year FROM birth_date) AS birth_year,
       COUNT(DISTINCT emp_no) AS num_employees
      FROM employees
      ??? EXTRACT(year FROM birth_date) = 1961
      GROUP BY
       EXTRACT(year FROM birth_date)
      ??? SUM(1) > 10000    
      ;
    
  4. Using the employees table, output a list of first names that are shared among the employees across more than 200 people. (e.g. the name 'Aamer' is shared by 228 people in the company and should be included in this list.)

Resources


Questions?