Skip to content

Instantly share code, notes, and snippets.

@corynissen
Created June 11, 2014 12:53
Show Gist options
  • Save corynissen/e5fcae6fcb433f5fea3c to your computer and use it in GitHub Desktop.
Save corynissen/e5fcae6fcb433f5fea3c to your computer and use it in GitHub Desktop.
get data from pdf files
library(tm)
pdf <- readPDF(PdftotextOptions = "-layout")
dat <- pdf(elem = list(uri=paste0("data/", file)), language='en', id='id1')
dat <- gsub(' +', ',', dat)
out <- read.csv(textConnection(dat), header=FALSE)
out <- apply(out, 1, function(x)paste(x, collapse=" "))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment