Skip to content

Instantly share code, notes, and snippets.

@MarkEdmondson1234
Last active September 23, 2015 17:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MarkEdmondson1234/a25299c7b78bf4ac4cb8 to your computer and use it in GitHub Desktop.
Save MarkEdmondson1234/a25299c7b78bf4ac4cb8 to your computer and use it in GitHub Desktop.
## want: 30049 x 187
## userId, product1_view, product2_view, ...., productN_view, productBought
pv <- reshape2::recast(product_views,
dimension1 ~ productSku + variable,
fun.aggregate=sum)
library(dplyr)
## if a user buys more than once, the row will be duplicated
pt <- product_trans %>% select(productSku, dimension1)
model_data <- left_join(pv, pt)
## NAs are no sale
model_data$boughtSku[is.na(model_data$boughtSku)] <- "NoSale"
## splitting into training and test:
## 75% of the sample size
smp_size <- floor(0.75 * nrow(model_data))
## set the seed to make your partition reproductible
set.seed(123)
train_ind <- sample(seq_len(nrow(model_data)), size = smp_size)
## split the data
train <- model_data[train_ind, ]
test <- model_data[-train_ind, ]
## what to use in the model
predictors <- train[,which(!names(train) %in% c("dimension1","boughtSku"))]
response <- as.factor(train[,"boughtSku"])
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment