petl
Python ETL
- Documentation: https://petl.readthedocs.org/en/latest/index.html
- Source: https://github.com/alimanfoo/petl
- Mailing list: http://groups.google.com/group/python-etl
Many in the VIVO community use Python for data manipulation and transformation tasks (ETL). petl
is a framework for reading data from various sources, transforming it, and passing it along to another source.
This package is designed primarily for convenience and ease of use, especially when working interactively with data that are unfamiliar, heterogeneous and/or of mixed quality.
Why
- Read and export data from many different source types - csv, Excel, JSON, XML, databases.
- Operations for common transformations - filtering, sorting, joining, merging, validation, etc.
- Decent sized user community - ~200 stars on Github. MIT licensed.
- Good documentation and many examples of common tasks.
- Reduces need for boilerplate, provides common pattern for approaching tasks.
- Use interactively with iPython to explore new/unknown data and to test manipulations.
Example
Read a file from Excel and merge duplicate rows.
import petl as etl
table = petl.fromxls("sample.xls", "sheet1")
# Output headers
print table.header()
# Output first 5 rows
print table.head()
# Merge rows with same id
table2 = table.mergeduplicates(table, 'id')