Skip to content

Instantly share code, notes, and snippets.

@trinker
Last active August 29, 2015 14:09
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save trinker/c4f7f17a0f2e027b1c08 to your computer and use it in GitHub Desktop.
Save trinker/c4f7f17a0f2e027b1c08 to your computer and use it in GitHub Desktop.
Reading docx files
A function developed by Bryan Goodrich for reading in .docx files:
```{r}
read_docx <- function (file, skip = 0) {
tmp <- tempfile()
if (!dir.create(tmp))
stop("Temporary directory could not be established.")
unzip(file, exdir = tmp)
xmlfile <- file.path(tmp, "word", "document.xml")
doc <- XML::xmlTreeParse(xmlfile, useInternalNodes = TRUE)
unlink(tmp, recursive = TRUE)
nodeSet <- XML::getNodeSet(doc, "//w:p")
pvalues <- sapply(nodeSet, XML::xmlValue)
pvalues <- pvalues[pvalues != ""]
if (skip > 0) pvalues <- pvalues[-seq(skip)]
pvalues
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment