Skip to content

Instantly share code, notes, and snippets.

@geniusnhu
Created September 1, 2021 03:45
Show Gist options
  • Save geniusnhu/cd31c600af2b3d68a60f95f7a201d6b0 to your computer and use it in GitHub Desktop.
Save geniusnhu/cd31c600af2b3d68a60f95f7a201d6b0 to your computer and use it in GitHub Desktop.
Loading and Training data in chunk
>>> from sklearn.linear_model import SGDRegressor
>>> from sklearn.datasets import make_regression
>>> import numpy as np
>>> import pandas as pd
>>> ### Load original data
>>> original_data = pd.read_csv('sample.csv')
>>> print(f'Shape of original data {original_data.shape:.f02}')
Shape of original data (100000, 21)
>>> ### Load in chunk
>>> chunksize = 1000
>>> reg = SGDRegressor()
>>> features_columns = [str(i) for i in range(20)]
>>> ### Fit each chunk
>>> for train_df in pd.read_csv("sample.csv", chunksize=chunksize, iterator=True):
>>> X = train_df[features_columns]
>>> Y = train_df["target"]
>>> reg.partial_fit(X, Y)
### The reg.partial_fit() method fit each chunk at a time and update weights accordingly after each the next chunk is loaded
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment