Last active
October 9, 2015 15:40
-
-
Save toast38coza/7ffbf26e2c51a57a8d75 to your computer and use it in GitHub Desktop.
Some useful snippets for analyzing data with pandas
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
DataFrame: | |
project_id hours overtime day | |
0 3 4 False 2015-10-07 | |
1 3 6 False 2015-10-07 | |
2 3 7 False 2015-10-07 | |
3 2 5 False 2015-10-07 | |
4 2 4 False 2015-10-07 | |
5 1 4 False 2015-10-07 | |
6 1 3 False 2015-10-07 | |
7 1 2 False 2015-10-07 | |
8 1 1 False 2015-10-07 | |
""" | |
# group by projects and get the sum of their respective hours | |
df.groupby(by=['project_id'])['hours'].sum() | |
""" | |
project_id | |
1 10 | |
2 9 | |
3 17 | |
Name: hours, dtype: object | |
""" | |
# You can also aggregate with multiple functions: | |
df.groupby('project_id')['hours'].agg(['sum', 'count', 'prod']) | |
""" | |
sum count prod | |
project_id | |
1 10 4 24 | |
2 9 2 20 | |
3 17 3 168 | |
""" | |
# create a new column by mapping ids from a lookup: | |
projects = {1: "P1", 2: "P2", 3: "P3"} | |
df['project_title'] = df['project_id'].map(projects) | |
""" | |
project_id hours overtime day project_title | |
0 3 4 False 2015-10-07 P3 | |
1 3 6 False 2015-10-07 P3 | |
2 3 7 False 2015-10-07 P3 | |
3 2 5 False 2015-10-07 P2 | |
4 2 4 False 2015-10-07 P2 | |
5 1 4 False 2015-10-07 P1 | |
6 1 3 False 2015-10-07 P1 | |
7 1 2 False 2015-10-07 P1 | |
8 1 1 False 2015-10-07 P1 | |
""" | |
# calculate overtime hours by project: | |
... | |
# Show only data from project_id=1 | |
``` | |
df[df['project_id'] == 1] | |
``` | |
# Show daily hours for this project: | |
import datetime | |
d1=datetime.date(2015,9,10) | |
d2=datetime.date(2015,9,20) | |
ts = df[['day', 'hours']].groupby('day').sum()['hours'] | |
# we than can spit out a slice like so: | |
ts[d1:d2] | |
# Which gives us: | |
""" | |
In [207]: ts[d1:d2] | |
Out[207]: | |
day | |
2015-09-10 182.5 | |
2015-09-11 164.0 | |
2015-09-12 3 | |
2015-09-13 5 | |
2015-09-14 177.0 | |
2015-09-15 173.0 | |
2015-09-16 169 | |
2015-09-17 158.5 | |
2015-09-18 154.5 | |
2015-09-20 9.5 | |
Name: hours, dtype: object | |
""" | |
# Convert this to json: | |
ts[d1:d2].to_json(date_format='iso') | |
""" | |
Gives us: | |
'{"2015-09-10T00:00:00.000Z":182.5,"2015-09-11T00:00:00.000Z":164.0,"2015-09-12T00:00:00.000Z":3.0,"2015-09-13T00:00:00.000Z":5.0,"2015-09-14T00:00:00.000Z":177.0,"2015-09-15T00:00:00.000Z":173.0,"2015-09-16T00:00:00.000Z":169.0,"2015-09-17T00:00:00.000Z":158.5,"2015-09-18T00:00:00.000Z":154.5,"2015-09-20T00:00:00.000Z":9.5}' | |
""" | |
# we can fill in the blank days by simply reindexing: | |
ts.reindex(date_range(min(ts.index), max(ts.index))) | |
""" | |
2015-02-12 9 | |
2015-02-13 NaN | |
2015-02-14 NaN | |
2015-02-15 NaN | |
2015-02-16 NaN | |
2015-02-17 9 | |
2015-02-18 16 | |
2015-02-19 NaN | |
2015-02-20 NaN | |
2015-02-21 NaN | |
2015-02-22 NaN | |
2015-02-23 10 | |
""" | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment