Skip to content

Instantly share code, notes, and snippets.

@DexGroves
Created March 1, 2016 10:37
Show Gist options
  • Save DexGroves/9e5dbd42142ba7f9e8b3 to your computer and use it in GitHub Desktop.
Save DexGroves/9e5dbd42142ba7f9e8b3 to your computer and use it in GitHub Desktop.
xgboost sending sparse zeroes to missing node
library("data.table")
library("xgboost")
library("Matrix")
generate_data <- function(N) {
data.table(
response = as.numeric(runif(N) > 0.8),
int1 = round(rnorm(N, 3, 3))
)
}
N <- 1000
# Need a seed that will cause zero to get sent to the non-contiguous node
set.seed(1236)
train <- generate_data(N)
smm_train <- sparse.model.matrix(response ~ int1, train)
dtrain <- xgb.DMatrix(data = smm_train, label = train[, response])
model <- xgb.train(params = list(eta = 1,
max_depth = 1,
min_child_weight = 10,
subsample = 1.0,
objective = "binary:logistic",
eval_metric = "logloss"),
data = dtrain,
nrounds = 1)
xgb.dump(model = model)
# booster[0]
# 0:[f1<6.5] yes=1,no=2,missing=2" "1:leaf=-1.11714
# 2:leaf=-1.37056
# -> Split int1 at 6.5
train[, pred := predict(model, dtrain)]
train[, mean(pred), by = int1][order(int1)]
# int1 V1
# 1: -6 0.2465423
# 2: -5 0.2465423
# 3: -4 0.2465423
# 4: -3 0.2465423
# 5: -2 0.2465423
# 6: -1 0.2465423
# 7: 0 0.2025297 <- !
# 8: 1 0.2465423
# 9: 2 0.2465423
# 10: 3 0.2465423
# 11: 4 0.2465423
# 12: 5 0.2465423
# 13: 6 0.2465423
# 14: 7 0.2025297
# 15: 8 0.2025297
# 16: 9 0.2025297
# 17: 10 0.2025297
# 18: 11 0.2025297
# 19: 12 0.2025297
# 20: 13 0.2025297
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment