Skip to content

Instantly share code, notes, and snippets.

@mpjdem
Created August 7, 2019 10:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mpjdem/3e9c7442e55f976bda2dbb152b77f2c4 to your computer and use it in GitHub Desktop.
Save mpjdem/3e9c7442e55f976bda2dbb152b77f2c4 to your computer and use it in GitHub Desktop.
Basic operations in Python datatable
import numpy as np
import datatable as dt
from datatable import f, by, mean
# Reading a CSV
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
tbl = dt.fread(url)
# Filtering rows
tbl = tbl[f.species != "setosa", :]
# Selecting columns
tbl = tbl[:, (f.species, f.sepal_length)]
# Adding a computed column (by reference)
tbl.cbind(tbl[:, {"sepal_length_sq" : np.square(f.sepal_length)}])
# Aggregating tables
agg_tbl = tbl[:, {"avg_sq_length" : mean(f.sepal_length_sq)}, by(f.species)]
# Outputting the result (and conversion to pandas)
agg_tbl.to_pandas()
@mpjdem
Copy link
Author

mpjdem commented Aug 7, 2019

As discussed here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment