Home

Lecture 4: SQL functions - part 1

Learning objective

What is a SQL function
SQL functions: mathematical and string manipulations

Warm up

ssh into eniac, a lab machine, and the PostgreSQL server
choose the employees database (e.g.\c employees)
run the following:

SELECT *
FROM employees
LIMIT 50
;

Based on the above, answer the questions:

In English, what is this query doing?
If we only want to return the column dept_name instead of all columns in this table, how should we modify the SELECT statement of the SQL query above, while keeping everything else the same?
What if we want to rename (alias) the column dept_name to department_name, how should we modify the query from question #2?

What is a SQL function?

A SQL function is a program that manipulates existing data items and returns a result.

An example: The SQL function UPPER() which converts the text within the column dept_name in the table employees to all upper case.

SELECT UPPER(dept_name)
FROM departments
;

Note:

Since SQL queries are read-only, the original column dept_name remains unchanged and the manipulation (up-casing) is temporary and only exists in the returned result in the form of a new (and un-named) column.
UPPER(), like all SQL functions must be initialized with parenthese ().
Since UPPER() is a built-in SQL funtion in PostgreSQL, that makes it a SQL reserve keyword and is capitalized. However, since dept_name is a data entity, it is lowercase. See the updated SQL style guide for more details.

Exercise:

Run the follow query. Describe in English what the query does.

SELECT
  dept_name,
  UPPER(dept_name) AS dept_name_all_caps
FROM departments
;

Documentation

All SQL database languages have a large variety of built-in SQL functions. There are easily over 100+ pre-built functions, and most of them are the same across different SQL variants (e.g. Oracle, PostgreSQL, SQLite).

PostgreSQL’s documentation for its functions can be found at: https://www.postgresql.org/docs/current/functions.html

The documentation tells us what functions are available and how to use them. For example, the documentation for UPPER() reads:

upper ( text ) → text

Converts the string to all upper case,
according to the rules of the database's locale.

upper('tom') → TOM

The docs tell us several things:

The function UPPER() only takes in a column that contain text data.
The function then converts the text (string) data in the column to all upper case.
The function uses a trivial example that if we pass in a column that contains the text 'tom', it will return in its place and in the same row, the text 'TOM'.

Exercise:

In the documentation page for text manipulation here, find the function that is the opposite of UPPER() and converts text data to all lower case. Fill in the section below to create a column of lower case dept_name and run the SQL to verify.

SELECT
  dept_name,
  ??? AS dept_name_all_lower
FROM departments
;

Types of SQL functions

Because there are so many SQL functions, it’s easier to categorize functions by what they do. Five major key types of SQL functions are as follows:

mathematical manipulation of numeric columns
string manipulations of text columns
type conversions of date types for any columns
date time manipulations of date and/or timestamp columns
aggregation functions to calculate summary statistics

We will cover each type one by one.

It is also possible for users to create their own SQL functions, called user-defined functions. But that is for another day.

SQL functions: mathematical manipulations

In this section, we will cover a few SQL functions that applies mathematical manipulation to numeric columns. For the full list of mathematical SQL functions available in PostgreSQL, please see the mathematical SQL functions documentation page.

There are usually many ways of achieving the same thing in SQL. For example, to divide the salary column in the salaries table by 2, we can do this two ways:

Using the division operator / (e.g. half_salary_v1)
Using the pre-built SQL function DIV() (e.g. half_salary_v2)

SELECT
  salary,
  salary / 2 AS half_salary_v1,
  DIV(salary, 2) AS half_salary_v2
FROM salaries
LIMIT 10
;

Operators versus SQL functions

Operators (operands) are reserve SQL keywords also used to manipulate data. Some common examples:

Arithmetic operators, such as +, -, *, /
Comparison operators, such as =, !=
Boolean operators: AND, OR
String operators: || for concatenation

They are used differently than SQL functions since on their own, they must be used in conjunction with other columns and/or functions to return results. Also, they do not need to be called with parentheses, like SQL functions must.

While using the operator for division is more familiar to us, we learn how to use the DIV() function by looking it up in the PostgreSQL documentation page, which tells us:

div ( y numeric, x numeric ) → numeric

Integer quotient of y/x (truncates towards zero)

div(9,4) → 2

Exercise:

Complete the SQL query below so that the second column returns a new column calculated by taking the square root of the salary. Let’s name it sqrt_salary.

SELECT
  salary,
  ?????
FROM salaries
LIMIT 10
;

Nested functions

It is entirely possible to nest your sql functions and apply more than one at the same time.

Consider the following:

SELECT
  salary,
  SQRT(salary) AS sqrt_salary,
  ROUND(SQRT(salary)) AS sqrt_salary_rounded
FROM salaries
LIMIT 10
;

Note the order of evaluation is to apply SQRT() first since it is on the inside, then followed by the rounding function ROUND(). The order of evaluation in arithmetic applies here as well.

RANDOM()

Not all SQL functions require a column as input. Take the RANDOM() function, which returns a new column where each row value is a random number between 0 and 1. This function requires no column as input.

SELECT
  *,
  RANDOM()
FROM salaries
LIMIT 10
;

Read this StackOverflow page if you are interested in how RANDOM() can be used for sampling purposes.

Breakout exercise

Why is the outcome of the second column not rounded?

SELECT
  salary,
  SQRT(ROUND(salary))
FROM salaries
LIMIT 10
;

Read up on the documentation for the SQL function FLOOR(). What does this function do?
Suppose everyone received a 5% raise in their salary. Using the data inside the salaries table, return two columns: the employee number and the new salary post raise. Rename the new salary column to something that makes sense.

SQL functions: string manipulations

In this section, we will cover a few SQL functions that applies string manipulation to text columns. For the full list of string SQL functions available in PostgreSQL, please see the string SQL functions documentation page.

Two SQL functions from earlier UPPER() and LOWER() are both string manipulation functions in that they take in text columns and returns manipulated data as output.

An example of their usage:

SELECT
  first_name,
  UPPER(first_name) AS first_name_upper,
  LOWER(first_name) AS first_name_lower
FROM employees
LIMIT 10
;

LENGTH()

Note however, that what qualifies a function as a string manipulation function is the data type of the input column, not the data type of the output value. Here, LENGTH() takes in a text column, and returns the number of characters in the text for each row.

PostgresQL documentation:

length ( text ) → integer

Returns the number of characters in the string.

length('jose') → 4

Exercise:

Using the table employees, return 2 columns:

the original column that stores the first name
the length of characters in the first name (remember to give this column a name)

You can limit the number of rows returned to 10 rows.

String concatenation with || operator

As with mathematical functions, there are many ways of doing the same thing with string functions and operators. One such case is string concatenation.

String concatenation is the operation of joining character strings end-to-end. For example, the concatenation of “snow” and “ball” is “snowball”. (source: Wikipedia)

To do so using the string operator ||:

SELECT
  first_name,
  last_name,
  first_name || last_name AS full_name
FROM employees
LIMIT 10
;

Concatenating end to end looks a bit messy (e.g. 'Georgi' 'Facello' became 'GeorgiFacello'). To add a space in between first name and last name, we can chain our concatenation commands and add in a space by adding in a string literal of 1 space ' ':

SELECT
  first_name,
  last_name,
  first_name || ' ' || last_name AS full_name
FROM employees
LIMIT 10
;

Exercise:

Fill in the query below so that the first name and last name are concatenated using the || operator where the last name appears first, followed by a comma and a space, then the first name.

(e.g. 'Georgi', 'Facello' becomes 'Facello, Georgi')

SELECT
  first_name,
  last_name,
  ??? AS full_name
FROM employees
LIMIT 10
;

String concatenation with CONCAT()

String concatenation can also be done with the function CONCAT(). Note that although the example in the documentation shows that this function allows the concatenation of non text data types (e.g. 2, NULL), this is not best practice and should be avoided.

concat ( val1 "any" [, val2 "any" [, ...] ] ) → text

Concatenates the text representations of all the arguments.
NULL arguments are ignored.

concat('abcde', 2, NULL, 22) → abcde222

To replicate the concatenation using CONCAT()

SELECT
  first_name,
  last_name,
  CONCAT(first_name, ' ', last_name) AS full_name
FROM employees
LIMIT 10
;

Exercise:

Fill in the query below so that the first name and last name are concatenated using the function CONCAT() where the last name appears first, followed by a comma and a space, then the first name.

(e.g. 'Georgi', 'Facello' becomes 'Facello, Georgi')

SELECT
  first_name,
  last_name,
  ??? AS full_name
FROM employees
LIMIT 10
;

Substring

A substring is a contiguous sequence of characters within a string. For instance, “the best of” is a substring of “It was the best of times”. For example, “Itwastimes” is a subsequence of “It was the best of times”, but not a substring. (Source: Wikipedia)

There are times where we only need a portion of the text data stored in the database. This is when we call on substring functions. Below are a few common ones:

LEFT(column_name, n): returns first n characters in a string
RIGHT(column_name, n): returns last n characters in a string
SUBSTR(column_name, n, m): extracts the substring starting at a specified n-th character and ending (optional) after m-th characters (m is an optional input).

All substring functions are 1 based, not 0 based in positioning.

An example:

SELECT
  first_name,
  LEFT(first_name, 3) AS first_name_first3,
  RIGHT(first_name, 3) AS first_name_last3,
  SUBSTR(first_name, 2, 5) AS first_name_substr
FROM employees
LIMIT 10
;

Exercise

Fill in the SQL query below to create the first_name_initial and last_name_initial.

SELECT
  first_name,
  ?? AS first_name_initial,
  last_name,
  ?? AS last_name_initial
FROM employees
LIMIT 10
;

Split string by delimiters

The opposite of string concatenation is split string. For example, our data might store the full name in one column 'Georgi Facello' and we might want to split it up into 2 columns, first name and last name.

The PostgreSQL function for this is SPLIT_PART().

The structure is as follows:

SPLIT_PART(column_name, delimiter, position)

This function requires three input arguments:

the column name that contains the string to be split
the delimiter which is used to separate the string into its parts
the position (base 1) of the split part to return

To put this into context with an example, let’s pretend we have a table employees with a column called full_name that contains the value 'Georgi Facello'.

the column name is: full_name.
the delimiter separating the first name from the last name is a single space ` `.
once full_name is passed through to SPLIT_PART(), then 'Georgi Facello' gets split into ['Georgi', 'Facello']. To return the first name, the string position we need to enter is 1. To return the last name, we need the string position to be 2.

The pseudo code looks like this:

SELECT
  full_name,
  SPLIT_PART(fullname, ' ', 1) AS first_name,
  SPLIT_PART(fullname, ' ', 2) AS last_name
FROM employees
;

Exercise:

Run the following SQL query. Why are some rows for dept_name_2 missing data?

SELECT
  dept_name,
  SPLIT_PART(dept_name, ' ', 1) AS dept_name_1,
  SPLIT_PART(dept_name, ' ', 2) AS dept_name_2
FROM departments
;

RegEx

Regular expression (short: RegEx) is a sequence of characters that specifies a search pattern. Usually such patterns are used by string-searching algorithms for “find” or “find and replace” operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory. (Source: wikipedia)

If you find yourself needing to do regex pattern matching, the two most common PostgreSQL functions are:

REGEX_REPLACE() for find and replace operations on strings.
REGEX_MATCH for input validation match on strings.

regexr is also helpful to learn, test, and build regex syntax.

Breakout exercise:

Using the first_name and last_name columns in the employees table, write the SQL query that creates a new column called initials that concatenates the lower cased first letter of each name together with a period . as delimiter. (e.g. 'Georgi' 'Facello' becomes 'g.f.')

Feel free to only return only 10 rows.

Wrapping Up

We’ve covered 2 of the 5 types of SQL functions: mathematical manipulation and string manipulation functions.
No homework this week.
No class this Wednesday (Yom Kippur).