Created
January 13, 2021 20:42
-
-
Save prcleary/c7f4dcbd9226c491ee53161ad7f88cef to your computer and use it in GitHub Desktop.
Script for batch extraction of data from Microsoft Word forms, using function from deef Shiny app. Uses function from https://github.com/prcleary/deef so do check your forms are suitable for that.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Save this script in a directory containing "input" and "output" directories. | |
# Open script in RStudio and make sure working directory is where script is. | |
# Packages required - install if needed | |
library(data.table) | |
library(stringr) | |
library(XML) | |
library(xml2) | |
# Directory containing forms | |
inputdir <- 'input' | |
# Directory for data extracted | |
outputdir <- 'output' | |
# Obtain key function from deef tool | |
source('https://github.com/prcleary/deef/raw/master/get_ffData.r') | |
# Get list of file paths | |
filepaths <- list.files(inputdir, full.names=TRUE, pattern='\\.docx$') | |
if (length(filepaths)==0) stop('No docx files found') | |
# Iterate through files | |
datalist <- lapply(filepaths, function(x) get_ffData(x)) | |
output <- rbindlist(datalist, use.names=TRUE, fill=TRUE) | |
row.names(output) <- NULL | |
# Save output | |
write.csv(output, | |
file=file.path(outputdir, paste(Sys.Date(), 'output.csv', sep='')), | |
row.names=FALSE) | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment