Skip to content

Instantly share code, notes, and snippets.

@bwv988
Created July 24, 2017 23:58
Show Gist options
  • Save bwv988/349bbf5e7a911052556dea67a93e7243 to your computer and use it in GitHub Desktop.
Save bwv988/349bbf5e7a911052556dea67a93e7243 to your computer and use it in GitHub Desktop.
Question on treating "NA" values in H2O
# Question on treating "NA" values.
# RS25072017
# PROBLEM STATEMENT: Remove the "?" values in the data set, and turn them into "n". ---------------------
require(h2o)
# Modify below, as needed.
h2o.init(startH2O = FALSE)
# 1. The R way. ---------------------
# Using the 1984 Congressional Voting Records from UCI.
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/voting-records/house-votes-84.data"
# Load the data.
party.data = read.table(url, sep = ",")
colnames(party.data ) = c("party", paste("vote", 1:16, sep=""))
# I want to this to treat the "NA" values:
party.data[party.data == "?"] = "n"
# No more "?" in the df:
head(party.data)
# 2. The H2O way. ---------------------
voting.data = h2o.importFile(path = url,
col.names = c("party", paste0("vote", 1:16)),
col.types = rep("string", 17))
# This doesn't work:
voting.data[voting.data == "?", ] = "n"
# This still has NA's!
head(voting.data)
# Why? Look at the difference in semantics:
## Pure R
party.data == "?"
## H20
voting.data == "?"
## It's not a boolean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment