WHERE
WHERE
filters in SQLRun this SQL:
SELECT
SUM(
CASE
WHEN emp_no = 90689 THEN 1
ELSE 0
END
) AS cnt_emp_90689_occurences
FROM salaries
;
Now run this:
SELECT *
FROM salaries
WHERE emp_no = 90689
;
By inspecting the output, how are these two queries related?
WHERE
filter work?The SELECT
statement, by default, returns all rows from one or more columns in a table (unless it is an aggregation function). To select and filter rows that satisfy a specified condition, you use a WHERE
clause.
A simple example
SELECT *
FROM salaries
WHERE emp_no = 90689
;
Order of syntax
The syntax order is as follows:
SELECT ...
FROM table_name
WHERE condition
[LIMIT n_rows]
;
Order of evaluation
Based on our existing vocabulary in SQL:
PostgreSQL evaluates the WHERE
clause after the FROM
clause and before the SELECT
clause and, if present, the LIMIT
clause:
FROM
-> [WHERE
] -> SELECT
-> [LIMIT
]
Style
Please see the SQL style guide for more details.
How does it work
The WHERE
condition must evaluate to True or False. It can be a Boolean expression or a combination of Boolean expressions.
The SQL query will returns only rows that satisfy the condition in the WHERE
clause. In other words, only rows that cause the condition evaluates True will be included in the result set.
A simple example, step by step
SELECT *
FROM salaries
WHERE emp_no = 90689
;
Step 1. SQL identify that this query is based off of the salaries
table.
Step 2. SQL evaluates the condition emp_no = 90689
for every row in the table salaries
and returns True or False for each row.
emp_no | salary | from_date | to_date | ?column?
-------+--------+------------+------------+----------
90689 | 98100 | 2000-02-16 | 2001-02-15 | t
90689 | 101016 | 2001-02-15 | 2002-02-15 | t
90689 | 102549 | 2002-02-15 | 9999-01-01 | t
90690 | 40000 | 1997-12-02 | 1998-11-06 | f
90691 | 65744 | 1997-01-22 | 1998-01-22 | f
90691 | 66904 | 1998-01-22 | 1999-01-22 | f
90691 | 71074 | 1999-01-22 | 2000-01-22 | f
90691 | 71198 | 2000-01-22 | 2001-01-21 | f
90691 | 74916 | 2001-01-21 | 2002-01-21 | f
90691 | 74444 | 2002-01-21 | 9999-01-01 | f
<output truncated to 10 rows for viewability>
Step 3. The temporary column is used to remove rows where the condition result is not True, leaving only 7 rows.
emp_no | salary | from_date | to_date | ?column?
--------+--------+------------+------------+----------
90689 | 98100 | 2000-02-16 | 2001-02-15 | t
90689 | 101016 | 2001-02-15 | 2002-02-15 | t
90689 | 102549 | 2002-02-15 | 9999-01-01 | t
90689 | 90196 | 1996-02-17 | 1997-02-16 | t
90689 | 90403 | 1997-02-16 | 1998-02-16 | t
90689 | 94238 | 1998-02-16 | 1999-02-16 | t
90689 | 94931 | 1999-02-16 | 2000-02-16 | t
(7 rows)
Step 4. SQL honors what is specified in the SELECT
clause, and only returns SELECT *
, which is all columns in the order of how they are stored. The temporary column is not returned.
emp_no | salary | from_date | to_date
-------+--------+------------+------------
90689 | 98100 | 2000-02-16 | 2001-02-15
90689 | 101016 | 2001-02-15 | 2002-02-15
90689 | 102549 | 2002-02-15 | 9999-01-01
90689 | 90196 | 1996-02-17 | 1997-02-16
90689 | 90403 | 1997-02-16 | 1998-02-16
90689 | 94238 | 1998-02-16 | 1999-02-16
90689 | 94931 | 1999-02-16 | 2000-02-16
(7 rows)
WHERE
filtersWHERE
filters depend on operators to evaluate conditions to True or False.
A complete list of operators:
Operator | Description
----------+---------------
= | Equal
<>, != | Not equal
> | Greater than
< | Less than
>= | Greater than or equal
<= | Less than or equal
IN | Return true if a value matches any value in a list
BETWEEN | Return true if a value is between a range of values
LIKE | Return true if a value matches a pattern (using _ or %)
IS NULL | Return true if a value is NULL
NOT | Negate the result of other operators
AND | Logical operator AND
OR | Logical operator OR
The =
equal operator
Text comparisons are case sensitive:
SELECT *
FROM departments
WHERE dept_name = 'Customer Service'
;
Numeric comparisons do not need quotes:
SELECT *
FROM employees
WHERE emp_no = 10001
;
Date comparisons do need quotes:
SELECT *
FROM employees
WHERE birth_date = '1958-02-19'
;
The !=
or <>
not equal operator
Both !=
and <>
work the same to indicate not equal to.
SELECT *
FROM departments
WHERE dept_name != 'Customer Service'
;
SELECT *
FROM departments
WHERE dept_name <> 'Customer Service'
;
The inequality operators: >
, <
, >=
, <=
Inequality operators are typically only used on numeric or datetime columns.
A numeric inequality comparison:
SELECT *
FROM employees
WHERE emp_no <= 10007
;
A datetime inequality comparison:
SELECT *
FROM employees
WHERE birth_date > '1965-01-01'
;
If applied to text columns, SQL will sort values alphabetically.
'B'
> 'A'
'AA'
> 'A'
However, this is not best practice and is discouraged.
Find missing data using IS NULL
, IS NOT NULL
NULL
means missing information or not applicable. NULL
is not a value, therefore, you cannot compare it with any other values like numbers or strings or use the =
equal operator.
To check whether a value is NULL or not, you use the IS NULL
operator instead. The syntax is as follows. The expression returns true if the value is NULL or false if it is not.
value IS NULL
This query returns all in the employees
table where the first name is missing. This returns no rows because all rows in the fierst_name
column is populated.
SELECT *
FROM employees
WHERE first_name IS NULL
;
IS NOT NULL
is the inverse of IS NULL
:
value IS NOT NULL
Since the query above returned zero rows, the inverse will return all rows in the employees
tables.
SELECT *
FROM employees
WHERE first_name IS NOT NULL
;
employees
table where the birth date is 1958-02-19
.employees
table where the first name is Duangkaew
.employees
table have the value 'M'
in the gender column?employees
table) are born after January 1, 1965
?employees
table) have an employee ID number less than or equal to 10007
?employees
table) do not have an employee ID number?LIKE
with WHERE
The LIKE
operator is useful for partial pattern matching with text columns.
A simple example:
SELECT *
FROM employees
WHERE first_name LIKE 'A%'
;
The syntax of LIKE
operator is as follows. The expression returns True if the value matches the pattern.
value LIKE pattern
PostgreSQL provides you with two wildcards:
%
) matches any sequence of zero or more characters._
) matches any single character.
This query returns all in the employees
table where the first name starts with the capitalized letter 'A'
.
SELECT *
FROM employees
WHERE first_name LIKE 'A%'
;
This query returns all in the employees
table where the first name starts with the capitalized letter 'A'
but then has only 1 other letter following the 'A'
in their first name.
SELECT *
FROM employees
WHERE first_name LIKE 'A_'
;
This query returns all in the employees
table where the first name starts with the capitalized letter 'A'
but then has 3 other letters following the 'A'
in their first name.
SELECT *
FROM employees
WHERE first_name LIKE 'A___'
;
Without any wildcards, the LIKE
operator gets treated like an =
equal operator.
SELECT *
FROM employees
WHERE first_name LIKE 'A'
;
is the same as
SELECT *
FROM employees
WHERE first_name = 'A'
;
Wildcards can be used in any part of the string.
SELECT *
FROM employees
WHERE first_name LIKE '%er%'
;
Wildcards %
and _
can be used in conjunction with each other.
SELECT *
FROM employees
WHERE first_name LIKE '_er%'
;
To negate the LIKE
operator, you use the NOT
operator as follows. The NOT LIKE
operator returns true when the value does not match the pattern.
value NOT LIKE pattern
This query will return all in the employees
table where the first name does NOT start with 'A'
.
SELECT *
FROM employees
WHERE first_name NOT LIKE 'A%'
;
For more information on PostgreSQL specific text comparison operators, check out Postgresqltutorial.
Note: The wildcard %
used inside the string literal is not to be confused the arithmetic modulo division operator %
, which is used to calculate the remainder of an integer division.
This returns all rows where the salary number is even.
SELECT *
FROM salaries
WHERE salary % 2 = 0
;
employees
table where the first name starts with 'G'
.employees
table where the first name starts with 'G'
and is five letters long.employees
table where the first name starts with 'G'
and ends in an 'i'
.WHERE
conditionsThe logical operators AND
and OR
can be used to chain multiple WHERE
conditions together.
This query returns all in the employees
table only if the first name is 'Georgi'
AND the last name is 'Facello'
. Both conditions must be met for the row to be returned.
SELECT *
FROM employees
WHERE first_name = 'Georgi'
AND last_name = 'Facello'
;
This query returns all in the employees
table where the first name can be either 'Georgi'
or 'Parto'
. Either of the conditions can be met for the row to be returned.
SELECT *
FROM employees
WHERE first_name = 'Georgi'
OR first_name = 'Parto'
;
Note: The styling for multiple WHERE
conditions here follows the river style. This is up to your preference. An indent of 4 spaces or 1 tab also works well.
When multiple conditions all refer to the same column or expression, they can be shortened and combined with the IN
operator.
The following query:
SELECT *
FROM employees
WHERE first_name = 'Georgi'
OR first_name = 'Parto'
;
can be re-written as:
SELECT *
FROM employees
WHERE first_name IN ('Georgi', 'Parto')
;
Similarly, numeric comparison conditions could also be combined using IN
operators.
The following query:
SELECT *
FROM employees
WHERE emp_no = 10001
OR emp_no = 10002
OR emp_no = 10003
;
can be re-written as:
SELECT *
FROM employees
WHERE emp_no IN (10001, 10002, 10003)
;
The operator BETWEEN
is used to match a value against a range of values.
The syntax:
WHERE value BETWEEN low AND high
Is equivalent to:
WHERE value >= low
AND value <= high
The following query:
SELECT *
FROM employees
WHERE hire_date >= '1986-01-01'
AND hire_date <= '1986-12-31'
;
can be re-written as:
SELECT *
FROM employees
WHERE hire_date BETWEEN '1986-01-01' AND '1986-12-31'
;
Note: The NOT
operator can be used to denote the inverse of the condition.
The inverse of:
SELECT *
FROM employees
WHERE emp_no IN (10001, 10002, 10003)
;
is:
SELECT *
FROM employees
WHERE emp_no NOT IN (10001, 10002, 10003)
;
The inverse of:
SELECT *
FROM employees
WHERE hire_date BETWEEN '1986-01-01' AND '1986-12-31'
;
is:
SELECT *
FROM employees
WHERE hire_date NOT BETWEEN '1986-01-01' AND '1986-12-31'
;
Warning! When using different logical operators to combine multiple WHERE
conditions, be aware of the order of operations! When in doubt, use (parentheses) to create groupings or priorities.
For example:
SELECT *
FROM departments
WHERE (dept_no = 'd001'
AND dept_name = 'Customer Service')
OR dept_name = 'Development'
;
means very different things and will return different results than:
SELECT *
FROM departments
WHERE dept_no = 'd001'
AND (dept_name = 'Customer Service'
OR dept_name = 'Development')
;
How many people in the employees
table have their first name as either 'Georgi'
, or 'Bezlel'
or 'Parto'
?
How many people in the employees
table have an employee number between 10000
and 20000
, inclusive?
How many people in employees
table are older than or equal to 65 years old as of today’s date (run date)?
LEFT(), RIGHT(), SUBSTRING(), etc
) instead of LIKE
?
SELECT *
FROM employees
WHERE last_name LIKE 'G%'
;
employees
table whose first name initial matches their last name inital? (e.g. 'Anastasia Antwerp'
’s initials are 'A. A.'
)WHERE
filters.