Skip to content

Instantly share code, notes, and snippets.

@prcleary
Created January 13, 2021 20:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save prcleary/c7f4dcbd9226c491ee53161ad7f88cef to your computer and use it in GitHub Desktop.
Save prcleary/c7f4dcbd9226c491ee53161ad7f88cef to your computer and use it in GitHub Desktop.
Script for batch extraction of data from Microsoft Word forms, using function from deef Shiny app. Uses function from https://github.com/prcleary/deef so do check your forms are suitable for that.
# Save this script in a directory containing "input" and "output" directories.
# Open script in RStudio and make sure working directory is where script is.
# Packages required - install if needed
library(data.table)
library(stringr)
library(XML)
library(xml2)
# Directory containing forms
inputdir <- 'input'
# Directory for data extracted
outputdir <- 'output'
# Obtain key function from deef tool
source('https://github.com/prcleary/deef/raw/master/get_ffData.r')
# Get list of file paths
filepaths <- list.files(inputdir, full.names=TRUE, pattern='\\.docx$')
if (length(filepaths)==0) stop('No docx files found')
# Iterate through files
datalist <- lapply(filepaths, function(x) get_ffData(x))
output <- rbindlist(datalist, use.names=TRUE, fill=TRUE)
row.names(output) <- NULL
# Save output
write.csv(output,
file=file.path(outputdir, paste(Sys.Date(), 'output.csv', sep='')),
row.names=FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment