Skip to content

Instantly share code, notes, and snippets.

@infotroph
Last active March 27, 2016 00:45
Show Gist options
  • Save infotroph/7e28b208e43792f7e4ce to your computer and use it in GitHub Desktop.
Save infotroph/7e28b208e43792f7e4ce to your computer and use it in GitHub Desktop.
Mapping arbitrary strings to to arbitrary numbers
# I have a vector of strings that map to known numeric values.
# What's the cleanest/most reader-friendly R idiom for this conversion?
# sample data
df = expand.grid(
x_str = c("string1", "secondstring", "blah", "garbagestring"),
replicate=1:3,
stringsAsFactors=FALSE)
# Approach 1: Encode the look-up table as its own dataframe
string_map = read.csv(strip=TRUE, stringsAsFactors=FALSE, text="
x_str, x_num
string1, 10
secondstring, 256
blah, 3.9
")
df = merge(df, string_map, all.x=TRUE)
# Approach 2: Enum-like
mapit = function(s){
switch(
s,
"string1"=10,
"secondstring"=256,
"blah"=3.9,
NA)
}
df$x_num2 = sapply(df$x_str, mapit)
# Approach 3: As a factor
df$x_num3 = factor(
df$x_str,
levels=c("string1", "secondstring", "blah"),
labels=c(10, 256, 3.9))
df$x_num3 = as.numeric(levels(df$x_num3))[df$x_num3]
# Approach 4 (current winner): A simple named vector
# Credit to mrflick
mapper<-c(string1=10, secondstring=256, blah=3.9)
df$x_num4<-mapper[df$x_str]
# Are the results equivalent? Yes, they ought to be.
identical(df$x_num, df$x_num2, df$x_num3, df$x_num4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment