Skip to content

Instantly share code, notes, and snippets.

@hussainsultan
Last active May 23, 2019 15:50
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hussainsultan/f7c2fb9f11008123bda405c5b024a79f to your computer and use it in GitHub Desktop.
Save hussainsultan/f7c2fb9f11008123bda405c5b024a79f to your computer and use it in GitHub Desktop.
import pandas as pd ; import numpy as np; import dask.dataframe as dd; from sklearn.datasets import load_boston
df = dd.from_pandas(pd.DataFrame(load_boston().data),npartitions=10)
def operation(df):
df['new'] = df[0]
return df[['new']]
df.pipe(operation).to_csv('boston*.csv')
Out:
['boston0.csv',
'boston1.csv',
'boston2.csv',
'boston3.csv',
'boston4.csv',
'boston5.csv',
'boston6.csv',
'boston7.csv',
'boston8.csv',
'boston9.csv']
In [5]: ls
boston0.csv boston1.csv boston2.csv boston3.csv boston4.csv boston5.csv boston6.csv boston7.csv boston8.csv boston9.csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment