Skip to content

Instantly share code, notes, and snippets.

@GuidoTournois
Last active November 30, 2023 03:32
Show Gist options
  • Save GuidoTournois/f46026fba062f3c6e7c96d711c1f4c0e to your computer and use it in GitHub Desktop.
Save GuidoTournois/f46026fba062f3c6e7c96d711c1f4c0e to your computer and use it in GitHub Desktop.
import pandas
from sklearn.linear_model import LogisticRegression
datafile = "data.csv"
chunksize = 100000
models = []
for chunk in pd.read_csv(datafile, chunksize=chunksize):
chunk = pre_process_and_feature_engineer(chunk)
# A function to clean my data and create my features
model = LogisticRegression()
model.fit(chunk[features], chunk['label'])
models.append(model)
df = pd.read_csv("data_to_score.csv")
df = pre_process_and_feature_engineer(df)
predictions = mean([model.predict(df[features]) for model in models], axis=0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment