Skip to content

Instantly share code, notes, and snippets.

@lawlesst

lawlesst/petl.md

Last active Sep 27, 2015
Embed
What would you like to do?
Quick intro to petl for VIVO Apps & Tools working group.

petl

Python ETL

Many in the VIVO community use Python for data manipulation and transformation tasks (ETL). petl is a framework for reading data from various sources, transforming it, and passing it along to another source.

This package is designed primarily for convenience and ease of use, especially when working interactively with data that are unfamiliar, heterogeneous and/or of mixed quality.

Why

  • Read and export data from many different source types - csv, Excel, JSON, XML, databases.
  • Operations for common transformations - filtering, sorting, joining, merging, validation, etc.
  • Decent sized user community - ~200 stars on Github. MIT licensed.
  • Good documentation and many examples of common tasks.
  • Reduces need for boilerplate, provides common pattern for approaching tasks.
  • Use interactively with iPython to explore new/unknown data and to test manipulations.

Example

Read a file from Excel and merge duplicate rows.

import petl as etl
table = petl.fromxls("sample.xls", "sheet1")
# Output headers
print table.header()
# Output first 5 rows
print table.head()
# Merge rows with same id
table2 = table.mergeduplicates(table, 'id')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.