- Documentation: https://petl.readthedocs.org/en/latest/index.html
- Source: https://github.com/alimanfoo/petl
- Mailing list: http://groups.google.com/group/python-etl
Many in the VIVO community use Python for data manipulation and transformation tasks (ETL).
petl is a framework for reading data from various sources, transforming it, and passing it along to another source.
This package is designed primarily for convenience and ease of use, especially when working interactively with data that are unfamiliar, heterogeneous and/or of mixed quality.
- Read and export data from many different source types - csv, Excel, JSON, XML, databases.
- Operations for common transformations - filtering, sorting, joining, merging, validation, etc.
- Decent sized user community - ~200 stars on Github. MIT licensed.
- Good documentation and many examples of common tasks.
- Reduces need for boilerplate, provides common pattern for approaching tasks.
- Use interactively with iPython to explore new/unknown data and to test manipulations.
Read a file from Excel and merge duplicate rows.
import petl as etl table = petl.fromxls("sample.xls", "sheet1") # Output headers print table.header() # Output first 5 rows print table.head() # Merge rows with same id table2 = table.mergeduplicates(table, 'id')