- dataframe from records:
pd.DataFrame.from_records([['A', 4], ['B', 2]], columns=['g', 'legs'], index=['g'])
- (un)pickle:
orders.to_pickle('saved.pkl'); pd.read_pickle('saved.pkl')
- filter data frame per (part of) index:
frame.loc[580]
- timestamp:
pd.Timestamp('2017-05-01')
- random sample:
frame.sample(n=20, random_state=42)
- columns list:
list(frame.columns)
- rename columns:
frame.rename({'from': 'to'})
- drop columns:
frame.drop(columns=['colname'])
- frame of bools:
frame['col'] > 5
- filter:
frame[frame_of_bools & more_bools]
- remove rows with null/NaN values:
frame.dropna()
(one null is enough to trigger the filter) - aggregation:
frame.groupby('colname').agg(['count', 'min', 'max'])
- merge on index:
frame.merge(another_frame, left_index=True, right_index=True)
- merge on index via join:
frame.join(another_frame, how='inner')
- index changes:
frame.set_index(['col1', 'col2'])
,frame.reset_index()
- only rows where index is in given set:
frame.filter(items=[458, 580], axis='index')
- regex replace in string series
series.str.replace(r'foo(.*)', r'bar\1', case=False)
- per-row mapping:
frame.apply(lambda row: pd.Series([row[0] + 1], index=[row.index[0]]), axis='columns')
frame.apply(lambda row: [row[0] + row[1]], axis='columns', result_type='expand')
- mapping in a series:
series.map(lambda x: x + 5)
- don't truncate table display:
pd.set_option('display.max_colwidth', -1)
- display table with HTML:
IPython.core.display.HTML(frame.to_html(escape=False))
- sort by column:
frame.sort_values(['col1', 'col2'], ascending=True)
- item labels:
mi.labels
->[[1,1,2], ['blue', 'red', 'blue']]
- unique labels for each level:
mi.levels
->[[1,2,3], ['blue', 'red']]
- column names:
mi.names
->['id', 'color']