Skip to content

Instantly share code, notes, and snippets.

@hadley
Created July 10, 2012 17:48
Show Gist options
  • Save hadley/3085066 to your computer and use it in GitHub Desktop.
Save hadley/3085066 to your computer and use it in GitHub Desktop.
library(tm)
x <- paste(sample(c(LETTERS, " "), 1e4, rep = T), collapse = "")
words <- stopwords("english")
match <- sprintf("\\b(%s)\\b", paste(words, collapse = "|"))
system.time(gsub(match, "", x))
# Takes ~0.08 s
# Home made solution, not quite equivalent but close.
system.time({
xs <- strsplit(x, " ", fixed = TRUE)[[1]]
d <- vapply(words, grepl, x = xs, fixed = TRUE,
FUN.VALUE = logical(length(xs)))
paste(x[rowSums(d) == 0], collapse = " ")
})
# Takes ~0.04 s
# But the perl regexp engine is way faster than either
system.time(gsub(match, "", x, perl = TRUE))
# Takes ~0.01 s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment