Oftentimes, data in columns of tables is not in the exact format we need to complete our desired analysis. For example we may need to extract a date from a full timestamp, or combine first and last name columns to create a full name. In this lesson, we'll be learning about some of SQL's built in functions for transforming dates, numbers and strings.
In this lesson, we’ll be examining data from
Instructions:
-
Select 10 rows from the bakeries table.
-
Select 10 rows from the baked_goods table.
===
We'll begin with dates. Dates are often written in the following format
i. Date: YYYY-MM-DD ii. Timestamp or Datetime: YYYY-MM-DD hh:mm:ss We can use SQL's date functions can get data into a desired format. Since date functions can be database specific, verify the functions that exist on your relational database management system.
Let's assume that we have a column in our baked_goods table named manufacture_time
in the format YYYY-MM-DD hh:mm:ss.
We'd like to know the number of baked_good manufactured by day, and not by second. We can use the DATE() function to easily convert timestamps to dates and complete the following query:
SELECT DATE(manufacture_time) as manufacture_date, count(*) as count_baked_goods
FROM baked_goods
GROUP BY DATE(manufacture_time);
instructions:
- Find the number of baked goods by date of delivery
SELECT DATE(delivery_time) as delivery_date, count(*) as count_baked_good_deliveries
FROM baked_goods
GROUP BY DATE(delivery_time);
We can also use the DATE_TRUNC function to truncate a timestamp based on the date portion that you specify, such as hour, week, or month.
DATE_TRUNC will return the first day of the specified year, the first day of the specified month, or the Monday of the specified week depending on the date part specified. Imagine we want to know the number of items manufactured each week from the baked_goods table, based on manufacture time. Using the DATE_TRUNC() function with the date part week will accomplish this:
SELECT DATE_TRUNC('week', delivery_time) as delivery_week, count(*) as count_deliveries
FROM baked_goods
GROUP BY DATE_TRUNC('week', delivery_time);
instructions:
- Find the number of items manufactured each month SELECT DATE_TRUNC('month', delivery_time) as delivery_month, count(*) as count_deliveries FROM baked_goods GROUP BY DATE_TRUNC('hour', delivery_month); ``
===
Given a datepart and a column of date or timestamp data type, the DATEADD function increments date or timestamp values by a specified interval.
DATEADD ( datepart, interval, expression )
Imagine that each dessert in our baked goods table is inspected 1 day after the manufacture time. To derive the inspection date for each flight, we can use the following query
SELECT dateadd(day, 1, manufacture_time) as inspection_time
FROM baked_goods
instructions:
- If each of the baked goods is packaged by Baker's Market exactly one day after the delivery, find the expected package date.
SELECT dateadd(day, 1, delivery_time) as package_time
FROM baked_goods
===
Great work! Numeric functions can be used to transform numbers. Some common mathematical functions are included below that take columns of numeric data types as inputs:
ROUND(number, precision)
: Returns the numeric value rounded off to the next value specified.CEIL(number)
: Returns the smallest integer greater than or equal to the specified numeric expression.FLOOR(number)
: Returns the largest integer less than or equal to the specified numeric expression.
In our baked_goods table, we have a column named costs that contains the cost of the item in dollars. We'd like to look at the distribution of number baked goods by 1 dollars increments, rounded down to the nearest 1 dollar. We can accomplish this using the following query:
SELECT FLOOR(cost/1) as one_dollar_increment, count(*) as count_items
FROM baked_goods
GROUP BY FLOOR(cost/1)
instructions:
- Find the bakery's distance from the Market, rounded up to the next 10 mile increment
SELECT CIEL(distance/10) as ten_mile_increment, count(*) as count_bakery_distance
FROM bakeries
GROUP BY CIEL(distance/10)
===
A couple more useful numeric SQL functions are included below: GREATEST
and LEAST
.
GREATEST(n1,n2,n3,...)
: returns the greatest value in the set of the input numeric expressions
LEAST(n1,n2,n3,...)
: returns the least value in the set of the input numeric expressions
In our baked_goods table, we have information about cost located in numeric ingredients_cost
and packaging_cost
columns. We can use the GREATEST
function to determine the overall greatest value of cost for each item using the following query:
SELECT id, GREATEST(ingredients_cost, packaging_cost)
FROM baked_goods
instructions:
- We also have information about cook time and cool down time in the baked goods table. Find the greatest time value for each item in the table.
SELECT id, GREATEST(cook_time, cool_down_time)
FROM baked_goods
- Find the least time value for each item in the table.
SELECT id, LEAST(cook_time, cool_down_time)
FROM baked_goods
===
String manipulation can be useful to derive information from columns. We'll cover a couple of the common string functions here.
A common use case for string manipulation in SQL is concatenation of strings. We can use the CONCAT()
to accomplish this:
CONCAT(string1,string2,...)
The bakeries table contains both an address_number and street name columns. In order to create a route for these columns, we use the CONCAT()
function to concatenate them using the following query:
SELECT CONCAT('address_number', ' ', 'street') as address
FROM bakeries
instructions:
- Combine first and last name columns from the bakeries table to create an owner that is the full name.
SELECT CONCAT('first_name', '', 'last_name') as owner
FROM bakeries
===
Another useful string function in SQL is REPLACE()
:
REPLACE(string,from_string,to_string)
The function returns the string string
with all occurrences of the string from_string
replaced by the string to_string
.
For example in baked_goods, there is a column named ingredients
. The ingredients strings are formatted with underscores, such as baking_soda
and vanilla_extract
. To make these values more readable, we might like to replace the underscores with spaces. We can do so by using the following query:
SELECT id, REPLACE(ingredients,'_',' ') as item_ingredients
from baked_goods
instructions:
- Any time enriched_flour appears in the ingredients list, we’d like to replace it with just “flour”. Apply this transformation and also remove all underscores.
educational text
-
Date Functions
- Date Formatting
- Date Truncating
- Date Add
-
Numeric Functions
- Ceil
- Floor
- Greatest
- Least
-
String Functions
- Concat
- Replace