employees
database (e.g.\c employees
)SELECT *
FROM employees
LIMIT 50
;
Based on the above, answer the questions:
dept_name
instead of all columns in this table, how should we modify the SELECT
statement of the SQL query above, while keeping everything else the same?dept_name
to department_name
, how should we modify the query from question #2?A SQL function is a program that manipulates existing data items and returns a result.
An example: The SQL function UPPER()
which converts the text within the column dept_name
in the table employees
to all upper case.
SELECT UPPER(dept_name)
FROM departments
;
Note:
Since SQL queries are read-only, the original column dept_name
remains unchanged and the manipulation (up-casing) is temporary and only exists in the returned result in the form of a new (and un-named) column.
UPPER()
, like all SQL functions must be initialized with parenthese ().
Since UPPER()
is a built-in SQL funtion in PostgreSQL, that makes it a SQL reserve keyword and is capitalized. However, since dept_name
is a data entity, it is lowercase. See the updated SQL style guide for more details.
Exercise:
Run the follow query. Describe in English what the query does.
SELECT
dept_name,
UPPER(dept_name) AS dept_name_all_caps
FROM departments
;
Documentation
All SQL database languages have a large variety of built-in SQL functions. There are easily over 100+ pre-built functions, and most of them are the same across different SQL variants (e.g. Oracle, PostgreSQL, SQLite).
PostgreSQL’s documentation for its functions can be found at: https://www.postgresql.org/docs/current/functions.html
The documentation tells us what functions are available and how to use them. For example, the documentation for UPPER()
reads:
upper ( text ) → text
Converts the string to all upper case,
according to the rules of the database's locale.
upper('tom') → TOM
The docs tell us several things:
UPPER()
only takes in a column that contain text data.'tom'
, it will return in its place and in the same row, the text 'TOM'
.Exercise:
In the documentation page for text manipulation here, find the function that is the opposite of UPPER()
and converts text data to all lower case. Fill in the section below to create a column of lower case dept_name
and run the SQL to verify.
SELECT
dept_name,
??? AS dept_name_all_lower
FROM departments
;
Types of SQL functions
Because there are so many SQL functions, it’s easier to categorize functions by what they do. Five major key types of SQL functions are as follows:
We will cover each type one by one.
It is also possible for users to create their own SQL functions, called user-defined functions. But that is for another day.
In this section, we will cover a few SQL functions that applies mathematical manipulation to numeric columns. For the full list of mathematical SQL functions available in PostgreSQL, please see the mathematical SQL functions documentation page.
There are usually many ways of achieving the same thing in SQL. For example, to divide the salary
column in the salaries
table by 2, we can do this two ways:
/
(e.g. half_salary_v1)DIV()
(e.g. half_salary_v2)SELECT
salary,
salary / 2 AS half_salary_v1,
DIV(salary, 2) AS half_salary_v2
FROM salaries
LIMIT 10
;
Operators versus SQL functions
Operators (operands) are reserve SQL keywords also used to manipulate data. Some common examples:
+
, -
, *
, /
=
, !=
AND
, OR
||
for concatenationThey are used differently than SQL functions since on their own, they must be used in conjunction with other columns and/or functions to return results. Also, they do not need to be called with parentheses, like SQL functions must.
While using the operator for division is more familiar to us, we learn how to use the DIV()
function by looking it up in the PostgreSQL documentation page, which tells us:
div ( y numeric, x numeric ) → numeric
Integer quotient of y/x (truncates towards zero)
div(9,4) → 2
Exercise:
Complete the SQL query below so that the second column returns a new column calculated by taking the square root of the salary. Let’s name it sqrt_salary
.
SELECT
salary,
?????
FROM salaries
LIMIT 10
;
Nested functions
It is entirely possible to nest your sql functions and apply more than one at the same time.
Consider the following:
SELECT
salary,
SQRT(salary) AS sqrt_salary,
ROUND(SQRT(salary)) AS sqrt_salary_rounded
FROM salaries
LIMIT 10
;
Note the order of evaluation is to apply SQRT()
first since it is on the inside, then followed by the rounding function ROUND()
. The order of evaluation in arithmetic applies here as well.
RANDOM()
Not all SQL functions require a column as input. Take the RANDOM()
function, which returns a new column where each row value is a random number between 0 and 1. This function requires no column as input.
SELECT
*,
RANDOM()
FROM salaries
LIMIT 10
;
Read this StackOverflow page if you are interested in how RANDOM()
can be used for sampling purposes.
Breakout exercise
SELECT
salary,
SQRT(ROUND(salary))
FROM salaries
LIMIT 10
;
Read up on the documentation for the SQL function FLOOR()
. What does this function do?
salaries
table, return two columns: the employee number and the new salary post raise. Rename the new salary column to something that makes sense.In this section, we will cover a few SQL functions that applies string manipulation to text columns. For the full list of string SQL functions available in PostgreSQL, please see the string SQL functions documentation page.
Two SQL functions from earlier UPPER()
and LOWER()
are both string manipulation functions in that they take in text columns and returns manipulated data as output.
An example of their usage:
SELECT
first_name,
UPPER(first_name) AS first_name_upper,
LOWER(first_name) AS first_name_lower
FROM employees
LIMIT 10
;
LENGTH()
Note however, that what qualifies a function as a string manipulation function is the data type of the input column, not the data type of the output value. Here, LENGTH()
takes in a text column, and returns the number of characters in the text for each row.
PostgresQL documentation:
length ( text ) → integer
Returns the number of characters in the string.
length('jose') → 4
Exercise:
Using the table employees
, return 2 columns:
You can limit the number of rows returned to 10 rows.
String concatenation with ||
operator
As with mathematical functions, there are many ways of doing the same thing with string functions and operators. One such case is string concatenation.
String concatenation is the operation of joining character strings end-to-end. For example, the concatenation of “snow” and “ball” is “snowball”. (source: Wikipedia)
To do so using the string operator ||
:
SELECT
first_name,
last_name,
first_name || last_name AS full_name
FROM employees
LIMIT 10
;
Concatenating end to end looks a bit messy (e.g. 'Georgi'
'Facello'
became 'GeorgiFacello'
). To add a space in between first name and last name, we can chain our concatenation commands and add in a space by adding in a string literal of 1 space ' '
:
SELECT
first_name,
last_name,
first_name || ' ' || last_name AS full_name
FROM employees
LIMIT 10
;
Exercise:
Fill in the query below so that the first name and last name are concatenated using the ||
operator where the last name appears first, followed by a comma and a space, then the first name.
(e.g. 'Georgi'
, 'Facello'
becomes 'Facello, Georgi'
)
SELECT
first_name,
last_name,
??? AS full_name
FROM employees
LIMIT 10
;
String concatenation with CONCAT()
String concatenation can also be done with the function CONCAT()
. Note that although the example in the documentation shows that this function allows the concatenation of non text data types (e.g. 2, NULL), this is not best practice and should be avoided.
concat ( val1 "any" [, val2 "any" [, ...] ] ) → text
Concatenates the text representations of all the arguments.
NULL arguments are ignored.
concat('abcde', 2, NULL, 22) → abcde222
To replicate the concatenation using CONCAT()
SELECT
first_name,
last_name,
CONCAT(first_name, ' ', last_name) AS full_name
FROM employees
LIMIT 10
;
Exercise:
Fill in the query below so that the first name and last name are concatenated using the function CONCAT()
where the last name appears first, followed by a comma and a space, then the first name.
(e.g. 'Georgi'
, 'Facello'
becomes 'Facello, Georgi'
)
SELECT
first_name,
last_name,
??? AS full_name
FROM employees
LIMIT 10
;
Substring
A substring is a contiguous sequence of characters within a string. For instance, “the best of” is a substring of “It was the best of times”. For example, “Itwastimes” is a subsequence of “It was the best of times”, but not a substring. (Source: Wikipedia)
There are times where we only need a portion of the text data stored in the database. This is when we call on substring functions. Below are a few common ones:
LEFT(column_name, n)
: returns first n characters in a stringRIGHT(column_name, n)
: returns last n characters in a stringSUBSTR(column_name, n, m)
: extracts the substring starting at a specified n-th character and ending (optional) after m-th characters (m is an optional input).All substring functions are 1 based, not 0 based in positioning.
An example:
SELECT
first_name,
LEFT(first_name, 3) AS first_name_first3,
RIGHT(first_name, 3) AS first_name_last3,
SUBSTR(first_name, 2, 5) AS first_name_substr
FROM employees
LIMIT 10
;
Exercise
Fill in the SQL query below to create the first_name_initial
and last_name_initial
.
SELECT
first_name,
?? AS first_name_initial,
last_name,
?? AS last_name_initial
FROM employees
LIMIT 10
;
Split string by delimiters
The opposite of string concatenation is split string. For example, our data might store the full name in one column 'Georgi Facello'
and we might want to split it up into 2 columns, first name and last name.
The PostgreSQL function for this is SPLIT_PART()
.
The structure is as follows:
SPLIT_PART(column_name, delimiter, position)
This function requires three input arguments:
To put this into context with an example, let’s pretend we have a table employees
with a column called full_name
that contains the value 'Georgi Facello'
.
full_name
.full_name
is passed through to SPLIT_PART()
, then 'Georgi Facello'
gets split into ['Georgi', 'Facello']
. To return the first name, the string position we need to enter is 1. To return the last name, we need the string position to be 2.The pseudo code looks like this:
SELECT
full_name,
SPLIT_PART(fullname, ' ', 1) AS first_name,
SPLIT_PART(fullname, ' ', 2) AS last_name
FROM employees
;
Exercise:
Run the following SQL query. Why are some rows for dept_name_2
missing data?
SELECT
dept_name,
SPLIT_PART(dept_name, ' ', 1) AS dept_name_1,
SPLIT_PART(dept_name, ' ', 2) AS dept_name_2
FROM departments
;
RegEx
Regular expression (short: RegEx) is a sequence of characters that specifies a search pattern. Usually such patterns are used by string-searching algorithms for “find” or “find and replace” operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory. (Source: wikipedia)
If you find yourself needing to do regex pattern matching, the two most common PostgreSQL functions are:
REGEX_REPLACE()
for find and replace operations on strings.REGEX_MATCH
for input validation match on strings.regexr is also helpful to learn, test, and build regex syntax.
Breakout exercise:
Using the first_name
and last_name
columns in the employees
table, write the SQL query that creates a new column called initials
that concatenates the lower cased first letter of each name together with a period .
as delimiter. (e.g. 'Georgi'
'Facello'
becomes 'g.f.'
)
Feel free to only return only 10 rows.