Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save imbilltucker/d6146a77d49ab256abb94b74fb1bb64b to your computer and use it in GitHub Desktop.
Save imbilltucker/d6146a77d49ab256abb94b74fb1bb64b to your computer and use it in GitHub Desktop.
Pycon 2017 ETL open space
= Notes from ETL open space =
"There should be a word for people trying to restrict data entry to make it better, but then people stop using it.
ETL frameworks:
- Luigi: http://luigi.readthedocs.io/en/stable/
- Airflow: https://readthedocs.org/projects/airflow/
- mETL
- pygrametl
- petl
ETL Tools:
- openRefine: http://openrefine.org/
- visTrails: https://www.vistrails.org/index.php/Main_Page
"It's good to put unit test on your ETL code"
"Our dreams of the schema died pretty quickly"
18F Fed Spending transparency data broker backend: https://agile-labor-categories.18f.gov/positions/back-end-web-developer/
https://github.com/reubano/meza
"Yahoo Pipes lost the love, so I made this:"
- Riko, using the idea of pipes: https://github.com/nerevu/riko
- Meza, an alt to pandas: https://github.com/reubano/meza
"You can use a bit mask to represent processes and errors a row went thru"
"You've got to optimize for the people"
There's a subreddit for ETL: https://www.reddit.com/r/ETL/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment