Skip to content

Instantly share code, notes, and snippets.

@timothyrenner
Created December 1, 2014 21:54
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save timothyrenner/5770006a0bab79cae55b to your computer and use it in GitHub Desktop.
Save timothyrenner/5770006a0bab79cae55b to your computer and use it in GitHub Desktop.
Comparison Between Python Pandas and Julia DataFrames GroupBy Operations
using DataFrames
keys = rand(1:100000, 500000);
values = randn(length(keys));
df = DataFrame();
df[:KEY] = keys;
df[:VALUE] = values;
@time by(df, :KEY, x -> sum(x[:VALUE]));
import pandas as pd
import numpy as np
import timeit
keys = np.random.randint(0, 100000, 500000)
values = np.random.normal(size=len(keys))
df = pd.DataFrame()
df["KEY"] = keys
df["VALUE"] = values
def group_func():
return df.groupby("KEY").sum()
print timeit.timeit(group_func, number=1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment