Skip to content

Instantly share code, notes, and snippets.

@carlos-aguayo
Created July 19, 2017 22:07
Show Gist options
  • Save carlos-aguayo/b443e938de4860ed0e841a6c152ab7a0 to your computer and use it in GitHub Desktop.
Save carlos-aguayo/b443e938de4860ed0e841a6c152ab7a0 to your computer and use it in GitHub Desktop.
Comparison for XGBoost, LightGBM and a Neural Network
import pandas as pd
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import LabelEncoder
dataset = pd.read_csv('https://github.com/primaryobjects/voice-gender/raw/master/voice.csv', header=0).values
x = dataset[:, :-1]
y = dataset[:, -1]
label_encoder = LabelEncoder()
label_encoder = label_encoder.fit(y)
label_encoded_y = label_encoder.transform(y)
test_size = 0.33
seed = 7
x_training, x_test, y_training, y_test = train_test_split(x,
label_encoded_y,
test_size=test_size,
random_state=seed)
# XGBoost
model = XGBClassifier()
model.fit(x_training, y_training)
y_pred = model.predict(x_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy XGBoost: %.2f%%" % (accuracy * 100.0))
# LightGBM
import lightgbm as lgb
lgb_train = lgb.Dataset(x_training, y_training)
lgb_eval = lgb.Dataset(x_test, y_test, reference=lgb_train)
params = {
'boosting_type': 'gbdt',
'objective': 'binary',
'metric': 'binary_logloss',
'verbose': -1
}
gbm = lgb.train(params,
lgb_train,
verbose_eval=False,
valid_sets=lgb_eval)
y_pred = gbm.predict(x_test, num_iteration=gbm.best_iteration)
accuracy = accuracy_score(y_test, y_pred.round())
print("Accuracy LightGBM: %.2f%%" % (accuracy * 100.0))
# Neural Network
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(48*3, input_dim=x_training.shape[1], activation='relu'))
model.add(Dense(8*1, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(x_training, y_training, epochs=5, verbose=0)
y_pred = model.predict(x_test)
accuracy = accuracy_score(y_test, y_pred.round())
print("Accuracy Neural Network: %.2f%%" % (accuracy * 100.0))
@carlos-aguayo
Copy link
Author

Expected output

$ python voicegender.py
Accuracy XGBoost: 97.51%
Accuracy LightGBM: 97.71%
Using TensorFlow backend.
Accuracy Neural Network: 75.43%

@istavnit
Copy link

How do you know that your NN topology is optimal for the dataset?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment