Skip to content

Instantly share code, notes, and snippets.

Last active September 27, 2015 00:12
What would you like to do?
Quick intro to petl for VIVO Apps & Tools working group.


Python ETL

Many in the VIVO community use Python for data manipulation and transformation tasks (ETL). petl is a framework for reading data from various sources, transforming it, and passing it along to another source.

This package is designed primarily for convenience and ease of use, especially when working interactively with data that are unfamiliar, heterogeneous and/or of mixed quality.


  • Read and export data from many different source types - csv, Excel, JSON, XML, databases.
  • Operations for common transformations - filtering, sorting, joining, merging, validation, etc.
  • Decent sized user community - ~200 stars on Github. MIT licensed.
  • Good documentation and many examples of common tasks.
  • Reduces need for boilerplate, provides common pattern for approaching tasks.
  • Use interactively with iPython to explore new/unknown data and to test manipulations.


Read a file from Excel and merge duplicate rows.

import petl as etl
table = petl.fromxls("sample.xls", "sheet1")
# Output headers
print table.header()
# Output first 5 rows
print table.head()
# Merge rows with same id
table2 = table.mergeduplicates(table, 'id')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment