Skip to content

Instantly share code, notes, and snippets.

@szilard
Last active August 29, 2015 14:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save szilard/b2e97062025ac9347f84 to your computer and use it in GitHub Desktop.
Save szilard/b2e97062025ac9347f84 to your computer and use it in GitHub Desktop.
Generate integers encoded categoricals
## generate integer-encoded categoricals
for SIZE in 1; do
time R --vanilla --quiet << EOF
library(data.table)
d1 <- as.data.frame(fread("train-${SIZE}m.csv"))
d2 <- as.data.frame(fread("test.csv"))
d <- rbind(d1,d2)
for (k in c("Month","DayofMonth","DayOfWeek","UniqueCarrier","Origin","Dest")) {
d[,k] <- as.numeric(as.factor(d[,k]))-1
}
d[["dep_delayed_15min"]] <- ifelse(d[["dep_delayed_15min"]]=="Y",1,0)
dd1 <- d[1:nrow(d1),]
dd2 <- d[(nrow(d1)+1):(nrow(d1)+nrow(d2)),]
write.table(dd1, "train-intcateg-${SIZE}m.csv", row.names=FALSE, col.names=FALSE, sep=",")
write.table(dd2, "test-intcateg-${SIZE}m.csv", row.names=FALSE, col.names=FALSE, sep=",")
EOF
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment