Home

Lecture 8: WHERE

Learning objective


Warm up

Run this SQL from last lecture:

SELECT
  SUM(
    CASE
      WHEN emp_no = 90689 THEN 1
      ELSE 0
    END
    ) AS cnt_emp_90689_occurences
FROM salaries
;

Now run this:

SELECT *
FROM salaries
WHERE emp_no = 90689
;

By inspecting the output, how are these two queries related?


How does a WHERE filter work?

The SELECT statement, by default, returns all rows from one or more columns in a table (unless it is an aggregation function). To select and filter rows that satisfy a specified condition, you use a WHERE clause.

A simple example

SELECT *
FROM salaries
WHERE emp_no = 90689
;

Order of syntax

The syntax order is as follows:

SELECT ...
FROM table_name
WHERE condition
[LIMIT n_rows]
;

Order of evaluation

Based on our existing vocabulary in SQL:

PostgreSQL evaluates the WHERE clause after the FROM clause and before the SELECT clause and, if present, the LIMIT clause:

FROM -> [WHERE] -> SELECT -> [LIMIT]

Style

Please see the SQL style guide for more details.

How does it work

A simple example, step by step

SELECT *
FROM salaries
WHERE emp_no = 90689
;

Step 1. SQL identify that this query is based off of the salaries table.

Step 2. SQL evaluates the condition emp_no = 90689 for every row in the table salaries and returns True or False for each row.

emp_no | salary | from_date  |  to_date   | ?column?
-------+--------+------------+------------+----------
 90689 |  98100 | 2000-02-16 | 2001-02-15 | t
 90689 | 101016 | 2001-02-15 | 2002-02-15 | t
 90689 | 102549 | 2002-02-15 | 9999-01-01 | t
 90690 |  40000 | 1997-12-02 | 1998-11-06 | f
 90691 |  65744 | 1997-01-22 | 1998-01-22 | f
 90691 |  66904 | 1998-01-22 | 1999-01-22 | f
 90691 |  71074 | 1999-01-22 | 2000-01-22 | f
 90691 |  71198 | 2000-01-22 | 2001-01-21 | f
 90691 |  74916 | 2001-01-21 | 2002-01-21 | f
 90691 |  74444 | 2002-01-21 | 9999-01-01 | f

 <output truncated to 10 rows for viewability>

Step 3. The temporary column is used to remove rows where the condition result is not True, leaving only 7 rows.

emp_no | salary | from_date  |  to_date   | ?column?
--------+--------+------------+------------+----------
 90689 |  98100 | 2000-02-16 | 2001-02-15 | t
 90689 | 101016 | 2001-02-15 | 2002-02-15 | t
 90689 | 102549 | 2002-02-15 | 9999-01-01 | t
 90689 |  90196 | 1996-02-17 | 1997-02-16 | t
 90689 |  90403 | 1997-02-16 | 1998-02-16 | t
 90689 |  94238 | 1998-02-16 | 1999-02-16 | t
 90689 |  94931 | 1999-02-16 | 2000-02-16 | t
(7 rows)

Step 4. SQL honors what is specified in the SELECT clause, and only returns SELECT *, which is all columns in the order of how they are stored. The temporary column is not returned.

emp_no | salary | from_date  |  to_date
-------+--------+------------+------------
 90689 |  98100 | 2000-02-16 | 2001-02-15
 90689 | 101016 | 2001-02-15 | 2002-02-15
 90689 | 102549 | 2002-02-15 | 9999-01-01
 90689 |  90196 | 1996-02-17 | 1997-02-16
 90689 |  90403 | 1997-02-16 | 1998-02-16
 90689 |  94238 | 1998-02-16 | 1999-02-16
 90689 |  94931 | 1999-02-16 | 2000-02-16
(7 rows)

Using operators in WHERE filters

WHERE filters depend on operators to evaluate conditions to True or False.

A complete list of operators:

 Operator | Description
----------+---------------
  =       | Equal
  <>, !=  | Not equal
  >       | Greater than
  <       | Less than
  >=      | Greater than or equal
  <=      | Less than or equal
  IN      | Return true if a value matches any value in a list
  BETWEEN | Return true if a value is between a range of values
  LIKE    | Return true if a value matches a pattern (using _ or %)
  IS NULL | Return true if a value is NULL
  NOT     | Negate the result of other operators  
  AND     | Logical operator AND
  OR      | Logical operator OR




The = equal operator

Text comparisons are case sensitive:

SELECT *
FROM departments
WHERE dept_name = 'Customer Service'
;

Numeric comparisons do not need quotes:

SELECT *
FROM employees
WHERE emp_no = 10001
;

Date comparisons do need quotes:

SELECT *
FROM employees
WHERE birth_date = '1958-02-19'
;



The != or <> not equal operator

Both != and <> work the same to indicate not equal to.

SELECT *
FROM departments
WHERE dept_name != 'Customer Service'
;
SELECT *
FROM departments
WHERE dept_name <> 'Customer Service'
;



The inequality operators: >, <, >=, <=

Inequality operators are typically only used on numeric or datetime columns.

A numeric inequality comparison:

SELECT *
FROM employees
WHERE emp_no <= 10007
;

A datetime inequality comparison:

SELECT *
FROM employees
WHERE birth_date > '1965-01-01'
;

If applied to text columns, SQL will sort values alphabetically.

However, this is not best practice and is discouraged.



Find missing data using IS NULL, IS NOT NULL

NULL means missing information or not applicable. NULL is not a value, therefore, you cannot compare it with any other values like numbers or strings or use the = equal operator.

To check whether a value is NULL or not, you use the IS NULL operator instead. The syntax is as follows. The expression returns true if the value is NULL or false if it is not.

value IS NULL

This query returns all in the employees table where the first name is missing. This returns no rows because all rows in the fierst_name column is populated.

SELECT *
FROM employees
WHERE first_name IS NULL
;

IS NOT NULL is the inverse of IS NULL:

value IS NOT NULL

Since the query above returned zero rows, the inverse will return all rows in the employees tables.

SELECT *
FROM employees
WHERE first_name IS NOT NULL
;



Exercises

  1. Return all rows and columns in the employees table where the birth date is 1958-02-19.
  2. Return all rows and columns in the employees table where the first name is Duangkaew.
  3. How many rows in the employees table have the value 'M' in the gender column?
  4. Can you write the SQL to question 3 in another way and still get the same answer?
  5. How many employees (in the employees table) are born after January 1, 1965?
  6. How many employees (in the employees table) have an employee ID number less than or equal to 10007?
  7. How many employees (in the employees table) do not have an employee ID number?

Using text operator LIKE with WHERE

The LIKE operator is useful for partial pattern matching with text columns.



A simple example:

SELECT *
FROM employees
WHERE first_name LIKE 'A%'
;

The syntax of LIKE operator is as follows. The expression returns True if the value matches the pattern.

value LIKE pattern

PostgreSQL provides you with two wildcards:



This query returns all in the employees table where the first name starts with the capitalized letter 'A'.

SELECT *
FROM employees
WHERE first_name LIKE 'A%'
;

This query returns all in the employees table where the first name starts with the capitalized letter 'A' but then has only 1 other letter following the 'A' in their first name.

SELECT *
FROM employees
WHERE first_name LIKE 'A_'
;

This query returns all in the employees table where the first name starts with the capitalized letter 'A' but then has 3 other letters following the 'A' in their first name.

SELECT *
FROM employees
WHERE first_name LIKE 'A___'
;



Without any wildcards, the LIKE operator gets treated like an = equal operator.

SELECT *
FROM employees
WHERE first_name LIKE 'A'
;

is the same as

SELECT *
FROM employees
WHERE first_name = 'A'
;



Wildcards can be used in any part of the string.

SELECT *
FROM employees
WHERE first_name LIKE '%er%'
;

Wildcards % and _ can be used in conjunction with each other.

SELECT *
FROM employees
WHERE first_name LIKE '_er%'
;



To negate the LIKE operator, you use the NOT operator as follows. The NOT LIKE operator returns true when the value does not match the pattern.

value NOT LIKE pattern

This query will return all in the employees table where the first name does NOT start with 'A'.

SELECT *
FROM employees
WHERE first_name NOT LIKE 'A%'
;



For more information on PostgreSQL specific text comparison operators, check out Postgresqltutorial.



Note: The wildcard % used inside the string literal is not to be confused the arithmetic modulo division operator %, which is used to calculate the remainder of an integer division.

This returns all rows where the salary number is even.

SELECT *
FROM salaries
WHERE salary % 2 = 0
;



Exercises

  1. Return all rows and columns in the employees table where the first name starts with 'G'.
  2. Return all rows and columns in the employees table where the first name starts with 'G' and is five letters long.
  3. Return all rows and columns in the employees table where the first name starts with 'G' and ends in an 'i'.

Combining multiple WHERE conditions

The logical operators AND and OR can be used to chain multiple WHERE conditions together.

This query returns all in the employees table only if the first name is 'Georgi' AND the last name is 'Facello'. Both conditions must be met for the row to be returned.

SELECT *
FROM employees
WHERE first_name = 'Georgi'
  AND last_name = 'Facello'
;

This query returns all in the employees table where the first name can be either 'Georgi' or 'Parto'. Either of the conditions can be met for the row to be returned.

SELECT *
FROM employees
WHERE first_name = 'Georgi'
   OR first_name = 'Parto'
;

Note: The styling for multiple WHERE conditions here follows the river style. This is up to your preference. An indent of 4 spaces or 1 tab also works well.




When multiple conditions all refer to the same column or expression, they can be shortened and combined with the IN operator.

The following query:

SELECT *
FROM employees
WHERE first_name = 'Georgi'
   OR first_name = 'Parto'
;

can be re-written as:

SELECT *
FROM employees
WHERE first_name IN ('Georgi', 'Parto')
;




Similarly, numeric comparison conditions could also be combined using IN operators.

The following query:

SELECT *
FROM employees
WHERE emp_no = 10001
   OR emp_no = 10002
   OR emp_no = 10003
;   

can be re-written as:

SELECT *
FROM employees
WHERE emp_no IN (10001, 10002, 10003)
;




The operator BETWEEN is used to match a value against a range of values.

The syntax:

WHERE value BETWEEN low AND high

Is equivalent to:

WHERE value >= low
  AND value <= high

The following query:

SELECT *
FROM employees
WHERE hire_date >= '1986-01-01'
  AND hire_date <= '1986-12-31'
;

can be re-written as:

SELECT *
FROM employees
WHERE hire_date BETWEEN '1986-01-01' AND '1986-12-31'
;




Note: The NOT operator can be used to denote the inverse of the condition.

The inverse of:

SELECT *
FROM employees
WHERE emp_no IN (10001, 10002, 10003)
;

is:

SELECT *
FROM employees
WHERE emp_no NOT IN (10001, 10002, 10003)
;



The inverse of:

SELECT *
FROM employees
WHERE hire_date BETWEEN '1986-01-01' AND '1986-12-31'
;

is:

SELECT *
FROM employees
WHERE hire_date NOT BETWEEN '1986-01-01' AND '1986-12-31'
;

Order of operations

Warning! When using different logical operators to combine multiple WHERE conditions, be aware of the order of operations! When in doubt, use (parentheses) to create groupings or priorities.

For example:

SELECT *
FROM departments
WHERE (dept_no = 'd001'
  AND dept_name = 'Customer Service')
   OR dept_name = 'Development'
;   

means very different things and will return different results than:

SELECT *
FROM departments
WHERE dept_no = 'd001'
  AND (dept_name = 'Customer Service'
   OR dept_name = 'Development')
;

Breakout Exercises

  1. How many people in the employees table have the first name 'Georgi', or 'Bezlel' or 'Parto'?

  2. How many people in the employees table have an employee number between 10000 and 20000, inclusive?

  3. How many people in employees table are older than or equal to 65 years old as of today’s date (run date)?

  4. Could you re-write the SQL logic below, but use substring functions instead of LIKE?
      SELECT *
      FROM employees
      WHERE last_name LIKE 'G%'
      ;
    
  5. How many employees in the employees table whose first name initial matches their last name inital? (e.g. 'Anastasia Antwerp'’s initials are 'A. A.')

Resources


Wrapping up