I have a dataset but I would like to append more information onto it to make it more useful. In data warehousing this is similar to the concept of creating dimensions. With dimensional aka dim you can do this from the command line or in an iPython notebook.
Here's how it works.
pip install dimensional
- this installs a cli called dim
SELECT * FROM temp_orders_table;
+-------+-------+----------+
| zip | units | revenue |
+-------+-------+----------+
| 84003 | 3000 | 1303.04 |
| 60652 | 10992 | 64028.91 |
| ... | ... | ... |
+-------+-------+----------+
This information is interesting but for some those pesky engineers didn't include
the state information next in the orders. That's okay though because you have dimensional
!
For now create your own postgres database dim configure
- Walks you through a simple command line that configures a relational
database connection wizard. During this you configure an alias for your database called dev
createdb dim_testing
and use a .env
file in the root for this project for other configuration.
dim search zip
- You know the key that you want to join on is a zip code, so you search for it using dim or you visit dimensional.io/search to find the package you need. You see one called zip_state
which looks
like it does exactly what you want.
dim install zip_state -d=dev
- Dim downloads the package to your computer, stores it as a csv, creates the table in your dev
database and copies the zip_state.csv
into that new table.
SELECT * from zip_state
+-------+-------+
| zip | state |
+-------+-------+
| 10451 | NY |
| 10452 | NY |
| 10453 | NY |
| ... | ... |
+-------+-------+
Now you can write a simple join to add that this new dimension to your data set.
SELECT *
FROM orders_table o, zip_state z
WHERE o.zip = z.zip
+-------+-------+----------+-------+
| zip | units | revenue | state |
+-------+-------+----------+-------+
| 84003 | 3000 | 1303.04 | UT |
| 60652 | 10992 | 64028.91 | IL |
| ... | ... | ... | ... |
+-------+-------+----------+-------+
But then because you are insanely smart you realize that you should probably
normalize your data based on per capita. Normally this is a moment of dread, but you have
dimensional
now so you can just run
dim search zip_population`
dim install zip_population -d=dev
Now that your life is so much better than it used to be you realize you should
help the poor developers that are making dimensional
so you look through
your database and you realize there are some pretty standard dimensions that you could share.
One of them is the zip code with crime statistics for 2015. Since you are awesome
you prepare the data set and you type dim contribute
or go to dimensional.io/contribute
and learn how you can contribute your data to help other people.