Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@kiyoto
Last active August 29, 2015 14:08
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kiyoto/377c297c15cf041aa2e9 to your computer and use it in GitHub Desktop.
Save kiyoto/377c297c15cf041aa2e9 to your computer and use it in GitHub Desktop.
Learn SQL Part 1

Learn SQL by Calculating Customer Life Value Part 1: Setup, Counting and Filtering

Motivation

As far as technical skills go, SQL is a really nice skill to have for product managers and product marketers. Instead of constantly running into performance issues in Excel or begging "technical" people to look stuff up for you, you can get answers to your questions directly from data.

Unfortunately, good, non-encyclopedic resources are few and far between on the Internet. Many of them just tell you the syntax without context. Many assume you are already a programmer and/or omit the technical details of setting up the environment.

The approach here is different. In a series of blog posts, I will walk through a simple but common data analysis scenario. Each step introduces a new concept in SQL that's tied to a natural business question. By the end of this series, you should feel comfortable getting read-only access to your company's databases.

Helpful Mnemonic for Excel Veterans

It is helpful to keep the following mental model throughout this tutorial

Excel SQL
Table Table
Sort ORDER BY
Filter WHERE
Pivot Table GROUP BY
VLOOKUP JOIN
Array Functions Aggregation Functions

Scenario

Imagine that you are a product manager at an e-commerce or SaaS startup. Then, the critical metric to keep pulse on your product and business is CLTV: Customer Lifetime Value. If you know how much revenue each customer brings to your business over their lifetime, you have a much better idea of how to meet revenue goals and evolve your product.

Also, if you know each customer's LTV, then you can make more efficient business decisions. Which customer acquisition channel has high LTVs? Which one is losing money? Are their regional differences?

The goal of this tutorial is to compute the average customer lifetime value from two data sources: users data and payments data.

Setup

For this series, I will be using SQLite. SQLite is simple, free, and shipped with recent versions of Mac OSX, which I assume many of you are on. Windows is also supported (See here, for example, on how to download SQLite on Windows).

Also, this might be the first time you are working with the command line interface. I promise that you won't have to do much besides entering SQL commands.

First, let's open up Terminal. If you are on a Mac, it's as simple as searching for it in the search bar like this:

GIF IMAGE HERE

The first step is to start sqlite3 with pre-populated tables. Download bootstrap.sql and place it in the directory of your choice. Then, in your terminal, run the following command in the same directory:

sqlite3 -init bootstrap.sql

This should start up a SQLite interactive shell like this:

sqlite>

Let's see what tables (again, they are like the tables in Excel) are there. For sqlite3, typing in ".tables" does the job:

sqlite> .tables
payments  users

Great. There are two tables here, "payments" and "users". Unlike Excel, you need to actually write a query to see what the data looks like.

To do so, we run the simplest SQL query, which just grabs all the columns and rows in a table. DO NOT DO THIS for large tables. But here, the table is small (with just 10 rows), so go ahead and type in "SELECT * FROM users;" and hit enter. Do not forget the trailing semicolon.

sqlite> SELECT * FROM users;
id          campaign    signed_up_on
----------  ----------  ------------
1           facebook    2014-10-01
2           twitter     2014-10-02
3           direct      2014-10-02
4           facebook    2014-10-03
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
8           direct      2014-10-05
9           twitter     2014-10-05
10          organic     2014-10-05

As you can see, this table has three fields/columns:

  1. "id" is the user's ID. This will be referenced in the "payments" table
  2. "campaign" is the campaign used to acquire that user.
  3. "signed_up_on" is the date when the user signed up for the website.

Sorting with ORDER BY

Just like Excel's tables, SQL lets you sort data by one or more columns. To do so, we add ORDER BY <column_names> to your SQL statement.

sqlite> SELECT * FROM users ORDER BY campaign;
id          campaign    signed_up_on
----------  ----------  ------------
3           direct      2014-10-02
8           direct      2014-10-05
1           facebook    2014-10-01
4           facebook    2014-10-03
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
10          organic     2014-10-05
2           twitter     2014-10-02
9           twitter     2014-10-05

Here is how you can sort by campaign AND signed_up_on.

sqlite> SELECT * FROM users ORDER BY campaign, signed_up_on;
id          campaign    signed_up_on
----------  ----------  ------------
3           direct      2014-10-02
8           direct      2014-10-05
1           facebook    2014-10-01
4           facebook    2014-10-03
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
10          organic     2014-10-05
2           twitter     2014-10-02
9           twitter     2014-10-05

You can also sort in reverse order by adding "DESC".

sqlite> SELECT * FROM users ORDER BY campaign DESC;
id          campaign    signed_up_on
----------  ----------  ------------
2           twitter     2014-10-02
9           twitter     2014-10-05
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
10          organic     2014-10-05
1           facebook    2014-10-01
4           facebook    2014-10-03
3           direct      2014-10-02
8           direct      2014-10-05

And of course, "DESC" can be used when sorted by multiple columns.

sqlite> SELECT * FROM users ORDER BY campaign, signed_up_on DESC;
id          campaign    signed_up_on
----------  ----------  ------------
8           direct      2014-10-05
3           direct      2014-10-02
4           facebook    2014-10-03
1           facebook    2014-10-01
10          organic     2014-10-05
7           organic     2014-10-04
5           organic     2014-10-03
6           organic     2014-10-03
9           twitter     2014-10-05
2           twitter     2014-10-02

Filtering with WHERE

One of Excel's most used features is filters. Of course, SQL, too, has filters. They are called WHERE clauses and can express an even broader range of filter conditions than Excel's filters.

Here is the query that fetches all the users that signed up organically:

sqlite> SELECT * FROM users WHERE campaign = 'organic';
id          campaign    signed_up_on
----------  ----------  ------------
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
10          organic     2014-10-05

As you can see, we have "WHERE campaign = 'organic'" in the above query. An analog in Excel is going to the "campaign" column and selecting just 'organic'.

What if you want to select multiple values? No problem, SQL can handle that with "IN" like this.

sqlite> SELECT * FROM users WHERE campaign IN ('facebook', 'twitter');
id          campaign    signed_up_on
----------  ----------  ------------
1           facebook    2014-10-01
2           twitter     2014-10-02
4           facebook    2014-10-03
9           twitter     2014-10-05

The above query fetches all the users that signed up through Facebook or Twitter. As you can see, you can list up multiple fields separated by commas inside the parenthesis.

What if you wanted to fetch all the users EXCEPT the ones that came Facebook or Twitter? This is supported with "NOT IN" like this:

sqlite> SELECT * FROM users WHERE campaign NOT IN ('facebook', 'twitter');
id          campaign    signed_up_on
----------  ----------  ------------
3           direct      2014-10-02
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
8           direct      2014-10-05
10          organic     2014-10-05

Okay, but all the filtering thus far involved a single column. Can SQL filter by multiple columns? They answer is yes, and it uses AND to concatenate filter conditions. The following query fetches all the users that came from Facebook or Twitter campaigns that signed up on Oct. 1, 2014.

sqlite> SELECT * FROM users WHERE campaign in ('facebook', 'twitter') AND signed_up_on = '2014-10-01';
id          campaign    signed_up_on
----------  ----------  ------------
1           facebook    2014-10-01

Now, it's time to show that SQL's WHERE is more powerful than Excel's filters. In addition to AND, You can actually use OR to say something like "Get me all the users that signed up before 2014-10-04 OR came in organically". Here is the query:

sqlite> SELECT * FROM users WHERE campaign = 'organic' OR signed_up_on < '2014-10-04';
id          campaign    signed_up_on
----------  ----------  ------------
1           facebook    2014-10-01
2           twitter     2014-10-02
3           direct      2014-10-02
4           facebook    2014-10-03
5           organic     2014-10-03
6           organic     2014-10-03
7           organic     2014-10-04
10          organic     2014-10-05

Filtering AND Sorting

As you might have guessed by now, SQL allows you to filter and sort in one go. The syntax is simple: Have both WHERE and ORDER BY in your query, but make sure WHERE comes before ORDER BY. Here is a query that fetches all the Facebook-/Twitter-sourced users, sorted by campaign.

sqlite> SELECT * FROM users WHERE campaign in ('facebook', 'twitter') ORDER BY campaign;
id          campaign    signed_up_on
----------  ----------  ------------
1           facebook    2014-10-01
4           facebook    2014-10-03
2           twitter     2014-10-02
9           twitter     2014-10-05

What's Next?

Congratulations! You now know how to do the SQL equivalent of Excel's sorting and filtering. In the next entry, I will show how to do "pivot tables" in SQL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment