Skip to content

Instantly share code, notes, and snippets.

View revbucket's full-sized avatar

Matt Jordan revbucket

  • Austin, TX
View GitHub Profile
@revbucket
revbucket / psycopg_load.md
Created October 7, 2016 21:38
psycopg2 bulk loading

Bulk Loading from Python to Postgres

Here at MaestroIQ we deal with large amounts of user data as we need impression-level information to generate an accurate picture of our customer’s user’s paths to purchase. We store all this data in a postgres 9.3 instance. I’m hesitant to call it “big data,” but our largest tables have over 500M rows, occupying over 200GB of disk space (if you include the indices), so performance is a very real issue that we have to worry about. We are also a python shop on the backend. All of our production code that manages data analytics and machine learning is python, as well as the code that controls our ETL workflow. To interact with our Postgres instances, we use the psycopg2 package. It’s my hope through this post to share some tips, tricks, and best practices we’ve discovered by heavily using psycopg2.

Bulk loading

In dealing with large postgres tables, one has to first make them large. Bulk-loading is an extremely common task, and as we’ll see, there are good ways and

@revbucket
revbucket / blog_cron.sh
Created March 31, 2017 14:06
Cron script for blog deployments
dropbox start;
now="$(date +'%d-%m-%Y')";
cd Dropbox/revbucket.github.io;
jekyll build;
cd _site;
git add .;
git commit -m "Blog update: $now";
git push;