Skip to content

Instantly share code, notes, and snippets.

@ImadDabbura
Created August 3, 2018 20:30
Show Gist options
  • Save ImadDabbura/ba3234432dea615cfd545f0537719255 to your computer and use it in GitHub Desktop.
Save ImadDabbura/ba3234432dea615cfd545f0537719255 to your computer and use it in GitHub Desktop.
# Convert dataframe into numpy objects and split them into
# train and test sets: 80/20
X = df.loc[:, df.columns != "left"].values
y = df.loc[:, df.columns == "left"].values.flatten()
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, stratify=y, random_state=1)
# Upsample minority class
X_train_u, y_train_u = resample(X_train[y_train == 1],
y_train[y_train == 1],
replace=True,
n_samples=X_train[y_train == 0].shape[0],
random_state=1)
X_train_u = np.concatenate((X_train[y_train == 0], X_train_u))
y_train_u = np.concatenate((y_train[y_train == 0], y_train_u))
# Downsample majority class
X_train_d, y_train_d = resample(X_train[y_train == 0],
y_train[y_train == 0],
replace=True,
n_samples=X_train[y_train == 1].shape[0],
random_state=1)
X_train_d = np.concatenate((X_train[y_train == 1], X_train_d))
y_train_d = np.concatenate((y_train[y_train == 1], y_train_d))
print("Original shape:", X_train.shape, y_train.shape)
print("Upsampled shape:", X_train_u.shape, y_train_u.shape)
print("Downsampled shape:", X_train_d.shape, y_train_d.shape)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment