Skip to content

Instantly share code, notes, and snippets.

@frenzy2106
Last active April 6, 2019 08:28
Show Gist options
  • Save frenzy2106/e3772bb6b6f293aaf2181d77be73c0d7 to your computer and use it in GitHub Desktop.
Save frenzy2106/e3772bb6b6f293aaf2181d77be73c0d7 to your computer and use it in GitHub Desktop.
Simple Baseline solution for ML Hikeathon
'''
# Baseline script by Supreet Manyam (Ziron)
'''
import pandas as pd
import numpy as np
import gc
train = pd.read_csv("data/train.csv",
usecols=["node1_id", "node2_id", "is_chat"],
dtype={"node1_id": np.int64, "node2_id":np.int64, "is_chat": np.int8})
test = pd.read_csv("data/test.csv",
usecols=["id", "node1_id", "node2_id"],
dtype={"id": np.int64, "node1_id": np.int64, "node2_id":np.int64})
node1 = train.groupby("node1_id", as_index=False)["is_chat"].mean()
node2 = train.groupby("node2_id", as_index=False)["is_chat"].mean()
del train
gc.collect()
test = (test.merge(node1, on="node1_id", how="left")
.merge(node2, on="node2_id", how="left", suffixes=('_1', '_2')))
test["is_chat"] = test["is_chat_1"].fillna(0)*0.5 + test["is_chat_2"].fillna(0)*0.5
test[["id", "is_chat"]].to_csv("submit.csv", index=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment