Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Notes from DataCamp's "Intro to SQL for Data Science" course

Get a list of distinct values for a column in a table

SELECT DISTINCT column FROM table;

Get the count of rows in a table

SELECT COUNT(*) FROM table;

Get the count of non-null records for a column in a table

SELECT COUNT(column) FROM table;

Get the count of distinct non-null records for a column in a table

SELECT COUNT(DISTINCT column) FROM table;

When using the WHERE clause on strings in PostgreSQL, you must use single-quotes around the string.

Get values within a range (example, the 1990s)

Method 1

SELECT *
FROM table
WHERE column >= 1990
AND column <= 1999;

Method 2

SELECT *
FROM table
WHERE column
BETWEEN 1990 AND 1999;

Aggregate functions are useful for returning...aggregated values of a column

SELECT AVG(column) FROM table;

SELECT SUM(column) FROM table;

SELECT MIN(column) FROM table;

SELECT MAX(column) FROM table;

You can group results based on values in a particular column. That column needs to be included in the SELECT statement. You can also group by multiple columns

SELECT column1, column2, SUM(column3)
FROM table
GROUP BY column1, column2

You can't use aggregate functions in a WHERE clause, but you can mimic the effect using a HAVING clause.

INNER JOIN syntax

SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;

When dealing with multiple tables, if tables have identical column names,you have to specify the table and the column (table1.column) in your SELECT statement

When joining more than two tables you may need to use AND in the ON clause to ensure you're joining the third/nth table to the preceding join

If join fields in tables are identical, you can use USING (key_field) instead of, for example ON table1.id = table2.id

Self-join allows you to join within a single table. This is particularly helpful for comparing records within the same field

The following example allows us to compare records for a country for 2010 and 2015 and create a new field with values expressing the percentage growth rate:

SELECT p1.country_code, 
       p1.size AS size2010,
       p2.size AS size2015,
       (p2.size - p1.size) / p1.size * 100.0 AS growth_perc
FROM populations AS p1
INNER JOIN populations AS p2
ON  p1.country_code = p2.country_code AND p1.year = p2.year - 5;

CASE WHEN...THEN...ELSE...END allows you to set column values with something akin to IF...THEN syntax

CASE WHEN condition1
  THEN 'some_value1'
WHEN condition2
  THEN 'some_value2'
ELSE 'some_value3' END
AS new_column
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment