Created
July 24, 2017 23:58
-
-
Save bwv988/349bbf5e7a911052556dea67a93e7243 to your computer and use it in GitHub Desktop.
Question on treating "NA" values in H2O
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Question on treating "NA" values. | |
# RS25072017 | |
# PROBLEM STATEMENT: Remove the "?" values in the data set, and turn them into "n". --------------------- | |
require(h2o) | |
# Modify below, as needed. | |
h2o.init(startH2O = FALSE) | |
# 1. The R way. --------------------- | |
# Using the 1984 Congressional Voting Records from UCI. | |
url = "http://archive.ics.uci.edu/ml/machine-learning-databases/voting-records/house-votes-84.data" | |
# Load the data. | |
party.data = read.table(url, sep = ",") | |
colnames(party.data ) = c("party", paste("vote", 1:16, sep="")) | |
# I want to this to treat the "NA" values: | |
party.data[party.data == "?"] = "n" | |
# No more "?" in the df: | |
head(party.data) | |
# 2. The H2O way. --------------------- | |
voting.data = h2o.importFile(path = url, | |
col.names = c("party", paste0("vote", 1:16)), | |
col.types = rep("string", 17)) | |
# This doesn't work: | |
voting.data[voting.data == "?", ] = "n" | |
# This still has NA's! | |
head(voting.data) | |
# Why? Look at the difference in semantics: | |
## Pure R | |
party.data == "?" | |
## H20 | |
voting.data == "?" | |
## It's not a boolean. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment