Skip to content

Instantly share code, notes, and snippets.

@espetro
Created June 4, 2017 12:30
Show Gist options
  • Save espetro/2954ad1220e2062ea80049d0d679aa59 to your computer and use it in GitHub Desktop.
Save espetro/2954ad1220e2062ea80049d0d679aa59 to your computer and use it in GitHub Desktop.
Function composition in R - text mining application
# Using 'Book I', the Odyssey from The Internet Classics Archive
# http://classics.mit.edu/Homer/odyssey.html
needed <- c("tm", "functional")
# install.packages(needed)
sapply(needed, require, character.only = TRUE)
text <- readLines("http://classics.mit.edu/Homer/odyssey.mb.txt")
str(text)
text[198]
# f . g x = f(g(x))
# Instead of consequently applying all cleaning functions: tolower(stripWhitespace(...)),
# function composition allows through 'Compose' function allows to simplify the process:
# (tolower . stripWhitespace . ...) ('input')
#
# As of June 2017, only one-argument functions are allowed for composition
clean <- function(docs, pattern) {
lapply(docs, function(txt) {
repl <- function(t) gsub(pattern, " ", t)
Compose(tolower, stripWhitespace, removePunctuation, repl)(txt)
})
}
patterns <- "\n d5> <d1> \f \ ?"
text <- clean(text, patterns)
text[198]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment