Skip to content

Instantly share code, notes, and snippets.

@taivop
Last active December 26, 2015 09:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save taivop/c873790e3660e9939544 to your computer and use it in GitHub Desktop.
Save taivop/c873790e3660e9939544 to your computer and use it in GitHub Desktop.
Parse Estonian alcoholic products registry into data frame (and CSV)
library(xml2)
# XML file is available at https://alkoreg.agri.ee/avaandmed
x <- read_xml("alkoreg_avaandmed.xml")
products <- xml_children(x)
# Extract features from all products into vectors
regEntryDate <- as.Date(xml_text(xml_find_all(products, ".//regEntryDate")))
productClass <- as.factor(xml_text(xml_find_all(products, ".//productClass")))
productName <- xml_text(xml_find_all(products, ".//productName"))
producerName <- xml_text(xml_find_all(products, ".//producerName"))
producerCountry <- as.factor(xml_text(xml_find_all(products, ".//producerCountry")))
applicantName <- as.factor(xml_text(xml_find_all(products, ".//applicantName")))
capacity <- xml_text(xml_find_all(products, ".//capacity"))
ethanolRate <- as.numeric(xml_text(xml_find_all(products, ".//ethanolRate")))
# Combine vectors into data frame
df <- data.frame(regEntryDate, productClass, productName, producerName,
producerCountry, applicantName, capacity, ethanolRate)
# Save into CSV
write.csv2(df, file="alkoreg_processed.csv", row.names=FALSE)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment