Skip to content

Instantly share code, notes, and snippets.

@matt-dray
Last active October 3, 2018 13:22
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save matt-dray/a539e5dc53a010a1d774a5b521aa9076 to your computer and use it in GitHub Desktop.
Save matt-dray/a539e5dc53a010a1d774a5b521aa9076 to your computer and use it in GitHub Desktop.
Read XML files to dataframes in a list and then save each as a CSV
# FIREBREAK Q2 2018
# Government Art Collection
# Matt Dray
# 2 October 2018
# Purpose: Wrangle XML files output from GAC database to CSV
# Call packages -----------------------------------------------------------
library(XML) # XMl handling
library(dplyr) # tidy data manipulation
library(purrr) # functional programming
library(stringr) # string handling
library(tibble) # nice tables
# Read files to list -----------------------------------------------------
# Vector of filepaths
file_list <- list.files( # create vector of filepath strings to each file
"data", # filepath to where downloaded script files are
full.names = TRUE # full filepath
)
# Read each XML file to dataframe element
gac_list <- purrr::map(
file_list,
XML::xmlToDataFrame # read the content from each filepath
) %>%
set_names(str_replace_all(file_list, "data/|\\.xml", ""))
# Save to CSV -------------------------------------------------------------
# Vector of filepaths for CSVs to be written to
file_list_csv <- str_replace_all(file_list, ".xml", ".csv")
for(i in 1:length(gac_list)) {
write.csv(
gac_list[[i]],
file = file_list_csv[i],
row.names = FALSE)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment