Skip to content

Instantly share code, notes, and snippets.

@dimytr
dimytr / gist:298b99004afcd30fdffc27797d2037ec
Created November 4, 2018 11:33 — forked from conormm/r-to-python-data-wrangling-basics.md
R to Python: Data wrangling with dplyr and pandas
R to python useful data wrangling snippets
The dplyr package in R makes data wrangling significantly easier.
The beauty of dplyr is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).
dplyr is organised around six key verbs
@dimytr
dimytr / gist:a4d145bd1f863ce037e96d675060811a
Created November 4, 2018 11:33 — forked from conormm/r-to-python-data-wrangling-basics.md
R to Python: Data wrangling with dplyr and pandas
R to python useful data wrangling snippets
The dplyr package in R makes data wrangling significantly easier.
The beauty of dplyr is that, by design, the options available are limited.
Specifically, a set of key verbs form the core of the package.
Using these verbs you can solve a wide range of data problems effectively in a shorter timeframe.
Whilse transitioning to Python I have greatly missed the ease with which I can think through and solve problems using dplyr in R.
The purpose of this document is to demonstrate how to execute the key dplyr verbs when manipulating data using Python (with the pandas package).
dplyr is organised around six key verbs
@dimytr
dimytr / prep.py
Created April 17, 2018 20:30
Подготовка данных для Элис
def prepare_train_set(PATH_TO_DATA, session_length=10):
df = pd.read_csv(PATH_TO_DATA)
df['site_ID'] = pd.factorize(df.site)[0]
df['freq'] = df.groupby('site_ID', as_index=False)['site'].transform(lambda s: s.count())
dictionary = df[['site', 'site_ID', 'freq']].loc[pd.unique(df['site_ID'])]
dic = dictionary.set_index('site').T.to_dict('list')
df_r = pd.pivot_table(df, values='freq', index='ID', columns='site', aggfunc=np.sum, fill_value=0)
return df_r, dic
'''