Skip to content

Instantly share code, notes, and snippets.

@jpotts18
Last active July 7, 2016 04:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jpotts18/4edac9dfc86c9f0334fd1e8a174458b1 to your computer and use it in GitHub Desktop.
Save jpotts18/4edac9dfc86c9f0334fd1e8a174458b1 to your computer and use it in GitHub Desktop.

I have a dataset but I would like to append more information onto it to make it more useful. In data warehousing this is similar to the concept of creating dimensions. With dimensional aka dim you can do this from the command line or in an iPython notebook.

Here's how it works.

pip install dimensional - this installs a cli called dim

SELECT * FROM temp_orders_table;

+-------+-------+----------+
| zip   | units | revenue  |
+-------+-------+----------+
| 84003 | 3000  | 1303.04  |
| 60652 | 10992 | 64028.91 |
| ...   | ...   | ...      |
+-------+-------+----------+

This information is interesting but for some those pesky engineers didn't include the state information next in the orders. That's okay though because you have dimensional!

dim configure - Walks you through a simple command line that configures a relational database connection wizard. During this you configure an alias for your database called dev For now create your own postgres database createdb dim_testing and use a .env file in the root for this project for other configuration.

dim search zip - You know the key that you want to join on is a zip code, so you search for it using dim or you visit dimensional.io/search to find the package you need. You see one called zip_state which looks like it does exactly what you want.

dim install zip_state -d=dev - Dim downloads the package to your computer, stores it as a csv, creates the table in your dev database and copies the zip_state.csv into that new table.

SELECT * from zip_state

+-------+-------+
| zip   | state |
+-------+-------+
| 10451 | NY    |
| 10452 | NY    |
| 10453 | NY    |
| ...   | ...   |
+-------+-------+

Now you can write a simple join to add that this new dimension to your data set.

SELECT *
FROM orders_table o, zip_state z
WHERE o.zip = z.zip
+-------+-------+----------+-------+
| zip   | units | revenue  | state |
+-------+-------+----------+-------+
| 84003 | 3000  | 1303.04  | UT    |
| 60652 | 10992 | 64028.91 | IL    |
| ...   | ...   | ...      | ...   |
+-------+-------+----------+-------+

But then because you are insanely smart you realize that you should probably normalize your data based on per capita. Normally this is a moment of dread, but you have dimensional now so you can just run
dim search zip_population`

dim install zip_population -d=dev

Now that your life is so much better than it used to be you realize you should help the poor developers that are making dimensional so you look through your database and you realize there are some pretty standard dimensions that you could share. One of them is the zip code with crime statistics for 2015. Since you are awesome you prepare the data set and you type dim contribute or go to dimensional.io/contribute and learn how you can contribute your data to help other people.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment