Home

Lecture 6: Conditional expressions

Learning objective


Warm up

Calculate the employee tenure time by differencing between from_date and to_date in the dept_emp table using the AGE() function.

Why are some of the calculated values so large?

SELECT
    from_date,
    to_date,
    ?? AS employee_tenure
FROM dept_emp
LIMIT 30   
;

Conditional expressions

In this lecture, we will go over conditional functions and expressions available in PostgreSQL. These are the equivalent of writing IF .. THEN .. in other languages.

The three we will cover are:

PostgreSQL documentation can be found at: https://www.postgresql.org/docs/current/functions-conditional.html.


NULLIF()

NULLIF(value1, value2)

The NULLIF() function returns a NULL value if value1 equals value2; otherwise it returns value1.

This is the equivalent of pseudo code:

if value1 == value2:
  return NULL

else:
  return value1

An example of NULLIF():

SELECT
    dept_name,
    NULLIF(dept_name, 'Customer Service') AS dept_name_wo_customer_service
FROM departments
;

Outputs the following:

dept_name          | dept_name_wo_customer_service
-------------------+-------------------------------
Customer Service   |
Development        | Development
Finance            | Finance
Human Resources    | Human Resources
Marketing          | Marketing
Production         | Production
Quality Management | Quality Management
Research           | Research
Sales              | Sales

Exercise:

Use NULLIF() to create a new column in which if the row value in to_date is equal to '9999-01-01' then let the value in the new column be null, otherwise default back to the original value in to_date.

SELECT
    to_date,
    ???
FROM dept_emp
LIMIT 30
;

COALESCE()

The COALESCE() function accepts an unlimited number of arguments. It returns the first argument that is not null. If all arguments are null, the COALESCE() function will return null.

In many ways, COALESCE() is the inverse of IFNULL().

The syntax for COALESCE() is as follows:

COALESCE (argument_1, argument_2, …)

This is the equivalent of pseudo code:

if argument_1 != NULL:
  return argument_1

elif argument_2 != NULL:
  return argument_2
...

else:
  return NULL

Consider the following SQL:

SELECT
    dept_name,
    NULLIF(dept_name, 'Customer Service') AS dept_name_wo_customer_service,
    COALESCE(NULLIF(dept_name, 'Customer Service'), 'CS') AS dept_name_w_cs
FROM departments
;

Outputs the following:

dept_name          | dept_name_wo_customer_service |   dept_name_w_cs
-------------------+-------------------------------+--------------------
Customer Service   |                               | CS
Development        | Development                   | Development
Finance            | Finance                       | Finance
Human Resources    | Human Resources               | Human Resources
Marketing          | Marketing                     | Marketing
Production         | Production                    | Production
Quality Management | Quality Management            | Quality Management
Research           | Research                      | Research
Sales              | Sales                         | Sales

Breakout Exercise:

  1. In English, what is the SQL logic for to_date_revised doing?
      SELECT
       from_date,
       to_date,
       COALESCE(NULLIF(to_date, '9999-01-01'), CURRENT_DATE) AS to_date_revised
      FROM dept_emp
      LIMIT 30
      ;
    
  2. If you take the datetime difference between from_date and the newly created to_date_revised column in question 1, does this new and improved employee tenure column make sense now? Write the SQL for it.

CASE .. WHEN .. statements

The PostgreSQL CASE .. WHEN .. expression is the same as IF/THEN/ELSE statements in other programming languages.

The CASE .. WHEN .. expression has two writing forms:



1. The general form

The general form of the CASE .. WHEN .. statement is as follows:

CASE
  WHEN condition_1 THEN result_1
  WHEN condition_2 THEN result_2
  [WHEN ...]
  [ELSE else_result]
END

In this syntax, each condition (condition_1, condition_2…) is a boolean expression that returns either true or false.

When a condition evaluates to false, the CASE expression evaluates the next condition from the top to bottom until it finds a condition that evaluates to true.

If a condition evaluates to true, the CASE expression returns the corresponding result that follows the condition. For example, if the condition_2 evaluates to true, the CASE expression returns the result_2. Also, it immediately stops evaluating the next expression.

In case all conditions evaluate to false, the CASE expression returns the result (else_result) that follows the ELSE keyword. If you omit the ELSE clause, the CASE expression returns NULL.



A beginner’s example:

SELECT
    dept_name,
    CASE
      WHEN dept_name = 'Customer Service' THEN 'CS'
    END AS dept_name_short
FROM departments
;    



An example with ELSE:

SELECT
    dept_name,
    CASE
      WHEN dept_name = 'Customer Service' THEN 'CS'
      ELSE 'not CS'
    END AS dept_name_short
FROM departments
;    

Note that the ELSE does not have to be a string literal. Here, we can default the ELSE to the original values in the dept_name column.

SELECT
    dept_name,
    CASE
      WHEN dept_name = 'Customer Service' THEN 'CS'
      ELSE dept_name
    END AS dept_name_short
FROM departments
;    



Exercise:

Convert the following pseudo code logic to SQL to create the new column called dept_name_short:

if dept_name == 'Customer Service':
  return 'CS'

else:
  return ''

Fill in the SQL below:

SELECT
    dept_name,
    ??? AS dept_name_short
FROM departments
;    



An example with multiple WHEN’s:

Now let’s look at an example with multiple conditions to evaluate.

SELECT
    dept_name,
    CASE
      WHEN dept_name = 'Customer Service' THEN 'CS'
      WHEN dept_name = 'Development' THEN 'Dev'
      ELSE 'N/A'
    END AS dept_name_short
FROM departments
;   

In pseudo code, this is the same as:

if dept_name == 'Customer Service':
  return 'CS'

elif dept_name == 'Development':
  return 'Dev'

else:
  return 'N/A'



Exercise:

Write the SQL query that generates the output below based on the column dept_name in the departments table:

dept_name          | dept_name_short
-------------------+-----------------
Customer Service   | CS
Development        | Dev
Finance            | Fnce
Human Resources    | HR
Marketing          | Mktg
Production         | Prod
Quality Management | QA
Research           | R&D
Sales              | Sales

Fill in the SQL below:

SELECT
    dept_name,
    ??? AS dept_name_short
FROM departments
;    



2. The simple form (an alternative way to write simple CASE WHEN statements)

PostgreSQL provides another form of the CASE expression called the simple form, which is as follows:

CASE expression
  WHEN value_1 THEN result_1
  WHEN value_2 THEN result_2
  [WHEN ...]
  [ELSE else_result]
END

The CASE first evaluates the expression and compares the result with each value( value_1, value_2, …) in the WHEN clauses sequentially until it finds the match.

Once the result of the expression equals a value (value1, value2, etc.) in a WHEN clause, the CASE returns the corresponding result in the THEN clause.

If CASE does not find any matches, it returns the else_result which follows the ELSE, or NULL value if the ELSE is not available.



A beginner’s example:

The general form:

SELECT
    dept_name,
    CASE
      WHEN dept_name = 'Customer Service' THEN 'CS'
    END as dept_name_short
FROM departments
;  

The simple form

SELECT
    dept_name,
    CASE dept_name
      WHEN 'Customer Service' THEN 'CS'
    END as dept_name_short
FROM departments
;    



A more complex example with multiple conditions and an ELSE:

With more conditions, the simple form is more concise to type:

The general form:

SELECT
    dept_name,
    CASE
      WHEN dept_name = 'Customer Service' THEN 'CS'
      WHEN dept_name = 'Development' THEN 'Dev'
      ELSE 'N/A'
    END as dept_name_short
FROM departments
;   

The simple form:

SELECT
    dept_name,
    CASE dept_name
      WHEN 'Customer Service' THEN 'CS'
      WHEN 'Development' THEN 'Dev'
      ELSE 'N/A'
    END as dept_name_short
FROM departments
;    

Regardless of the form, the psuedo code itself is still the same, and so is the output:

    dept_name      | dept_name_short
-------------------+-----------------
Customer Service   | CS
Development        | Dev
Finance            | N/A
Human Resources    | N/A
Marketing          | N/A
Production         | N/A
Quality Management | N/A
Research           | N/A
Sales              | N/A
(9 rows)



Exercise:

Re-write the following SQL, which uses the general form of CASE .. WHEN into the simple form.

SELECT
    dept_name,
    CASE
      WHEN dept_name = 'Customer Service'   THEN 'CS'
      WHEN dept_name = 'Development'        THEN 'Dev'
      WHEN dept_name = 'Finance'            THEN 'Fnce'
      WHEN dept_name = 'Human Resources'    THEN 'HR'
      WHEN dept_name = 'Marketing'          THEN 'Mktg'
      WHEN dept_name = 'Production'         THEN 'Prod'
      WHEN dept_name = 'Quality Management' THEN 'QA'
      WHEN dept_name = 'Research'           THEN 'R&D'
      ELSE dept_name
    END AS dept_name_short
FROM departments
;   



Breakout Exercise:

Please write the SQL that will create the following output using the departments table:

dept_no |     dept_name      | dept_name_bundled
--------+--------------------+-------------------
d009    | Customer Service   | Business
d005    | Development        | Engineering
d002    | Finance            | Business
d003    | Human Resources    | Business
d001    | Marketing          | Business
d004    | Production         | Engineering
d006    | Quality Management | Engineering
d008    | Research           | Engineering
d007    | Sales              | Business

CASE .. WHEN .. with aggregation functions

On many occasions, it’s helpful to combine the conditional logic from CASE .. WHEN .. with aggregation functions such as SUM(), MIN(), MAX(), etc.



Consider the following:

This SQL query returns the number of rows in the table salaries by creating a temporary column of 1’s for each row, then summing down that row.

SELECT
  SUM(1) AS cnt_rows
FROM salaries
;

However, if we want to apply conditional logic to the temporary column so that the value is only equal to 1 if the condition is met (e.g. emp_no = 90869), we can first do the following:

SELECT
  emp_no,
  CASE
    WHEN emp_no = 90689 THEN 1
    ELSE 0
  END AS flg_emp_90689
FROM salaries
LIMIT 10 -- this is just to limit the printout
;

Which gives an output like this:

emp_no | flg_emp_90689
--------+---------------
 90689 |             1
 90689 |             1
 90689 |             1
 90690 |             0
 90691 |             0
 90691 |             0
 90691 |             0
 90691 |             0
 90691 |             0
 90691 |             0
(10 rows)

Then, we can apply the aggregation function on top of this column created by conditional logic to calculate how many rows in the salaries table have the emp_no equal to 90689.

SELECT
  SUM(
    CASE
      WHEN emp_no = 90689 THEN 1
      ELSE 0
    END
    ) AS cnt_emp_90689_occurences
FROM salaries
;



Exercise:

Run the following SQL. In English, what does it do?

SELECT
  SUM(
    CASE
      WHEN gender = 'M' THEN 1
      ELSE 0
    END
    ) AS cnt_emp_male,
  SUM(
    CASE
      WHEN gender = 'F' THEN 1
      ELSE 0
    END
    ) AS cnt_emp_female    
FROM employees
;



More aggregation functions beyond SUM()

The combination of conditional logic with aggregation functions does not just stop with SUM() and zeroes and ones. All aggregation functions are fair game.



Consider the following:

We can create a new column where if the emp_no = 90589 then retain the original value from the same row salary column, otherwise let the value be 0.

SELECT
  emp_no,
  salary,
  CASE
    WHEN emp_no = 90689 THEN salary
    ELSE 0
  END AS flg_emp_90689_salary
FROM salaries
LIMIT 10 -- this is just to limit the printout
;

Which gives an output like this:

emp_no | salary | flg_emp_90689_salary
--------+--------+----------------------
 90689 |  98100 |                98100
 90689 | 101016 |               101016
 90689 | 102549 |               102549
 90690 |  40000 |                    0
 90691 |  65744 |                    0
 90691 |  66904 |                    0
 90691 |  71074 |                    0
 90691 |  71198 |                    0
 90691 |  74916 |                    0
 90691 |  74444 |                    0
(10 rows)

Which then can be aggregated, like so:

SELECT
  SUM(
    CASE
      WHEN emp_no = 90689 THEN salary
      ELSE 0
    END
    ) AS emp_90689_salary
FROM salaries
;

Note however, that SUM(), like all aggregation functions, simply ignore NULL values. That is why, the ELSE clause is not actually necessary, and the logic will return the same value.

SELECT
  SUM(
    CASE
      WHEN emp_no = 90689 THEN salary
    END
    ) AS emp_90689_salary
FROM salaries
;



Breakout Exercises:

  1. In English, what does the following SQL do?
      SELECT
       MIN(
         CASE
           WHEN emp_no = 90689 THEN salary
           ELSE 0
         END
       ) AS emp_90689_min_salary,
       MAX(
         CASE
           WHEN emp_no = 90689 THEN salary
           ELSE 0
         END
       ) AS emp_90689_max_salary
      FROM salaries
      ;
    
  2. In English, what does the following SQL do?
      SELECT
       SUM(
         CASE
           WHEN title like '%Staff%'
           THEN 1
           ELSE 0
         END
       ) AS cnt_staff
      FROM titles
      ;
    
  3. Write a SQL query that returns the number of rows in the employees table where the birth_date is equal to '1961-01-01'.

  4. Using the salaries table, write a SQL query that returns the highest salary for employees employed in 2000. You can use the from_date to represent employment date and to extract the employment year.

Resources


Wrapping Up