Skip to content

Instantly share code, notes, and snippets.

@klauszhang
Created May 12, 2016 06:21
Show Gist options
  • Save klauszhang/ba353bc0f5ca50c60996ff4605fb081d to your computer and use it in GitHub Desktop.
Save klauszhang/ba353bc0f5ca50c60996ff4605fb081d to your computer and use it in GitHub Desktop.
expedia hotel prediction
# folked from https://www.kaggle.com/signochastic/expedia-hotel-recommendations/r-version-of-most-popular-local-hotel
## R version of most popular local hotels
library(data.table)
expedia_train <- fread('../input/train.csv', header=TRUE)
expedia_test <- fread('../input/test.csv', header=TRUE)
sum_and_count <- function(x){
sum(x)*0.835 + length(x) *0.165
}
dest_id_hotel_cluster_count <- expedia_train[,sum_and_count(is_booking),by=list(srch_destination_id, hotel_cluster)]
top_five <- function(hc,v1){
hc_sorted <- hc[order(v1,decreasing=TRUE)]
n <- min(5,length(hc_sorted))
paste(hc_sorted[1:n],collapse=" ")
}
dest_top_five <- dest_id_hotel_cluster_count[,top_five(hotel_cluster,V1),by=srch_destination_id]
dd <- merge(expedia_test,dest_top_five, by="srch_destination_id",all.x=TRUE)[order(id),list(id,V1)]
setnames(dd,c("id","hotel_cluster"))
write.csv(dd, file='submission_sum_and_count.csv', row.names=FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment