Skip to content

Instantly share code, notes, and snippets.

@briatte
Created April 14, 2016 13:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save briatte/b1f51ebdee8bb5afd61a915ac57b5e5c to your computer and use it in GitHub Desktop.
Save briatte/b1f51ebdee8bb5afd61a915ac57b5e5c to your computer and use it in GitHub Desktop.
batch rename PDF files based on their metadata (uid_doc_title-doc_author.pdf), using R + PDFtk
# load XML package
library(XML)
# http://stackoverflow.com/a/5060203
html2txt <- function(str) {
xpathApply(htmlParse(str, asText=TRUE),
"//body//text()",
xmlValue)[[1]]
}
f = sample(list.files())
# f[ !grepl("_", f) ]
for (i in rev(f)) {
cat("[", which(f == i), "]", i, "\n")
system(paste("pdftk", i, "data_dump output nfo.txt"))
t = try(readLines("nfo.txt"), silent = TRUE)
if ("try-error" %in% class(t)) {
cat(" -- ERROR!\n")
next
}
a = gsub("InfoValue: ", "", t[ which(t == "InfoKey: Author") + 1 ])
t = gsub("InfoValue: ", "", t[ which(t == "InfoKey: Title") + 1 ])
n = gsub("\\.pdf", gsub(" ", "_", paste0("_", t, "-", a, ".pdf")), i)
n = html2txt(n)
n = gsub("&#40;", "(", n)
n = gsub("&#41;", ")", n)
n = gsub("&ndash;", "-", n)
# n = gsub("B9780080970868", "", n)
cat(" -->", n, "\n")
# stopifnot(!grepl("&", n))
file.rename(i, gsub("/|:", ",", n))
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment