Skip to content

Instantly share code, notes, and snippets.

@danielvartan
Last active June 13, 2024 13:04
Show Gist options
  • Save danielvartan/924817b7e4b69212beb217f339c37a3f to your computer and use it in GitHub Desktop.
Save danielvartan/924817b7e4b69212beb217f339c37a3f to your computer and use it in GitHub Desktop.
Find orphan files in a Zotero database.
# Install the packages below (in the `library` function) if you don't
# already have them.
# library(checkmate)
# library(magrittr)
# library(purrr)
# library(readr)
# library(stringr)
#' List all files linked to a reference in a Zotero library
#'
#' @description
#'
#' This function reads a CSV file exported from Zotero and extracts the
#' information about the files linked to the references in the library.
#'
#' @details
#'
#' To export your library from Zotero, go to the menu `File > Export Library...`
#' and choose the CSV format.
#'
#' @param lib_file A string with the path to the Zotero library exported as
#' a CSV file (important!).
#' @param basename A [`logical`][base::logical()] flag indicating if the
#' function should return the full path to the files or only the file names.
#' (default: `TRUE`).
#'
#' @return A [`character`][base::as.character] vector with the names of the
#' files linked to the references in the Zotero library.
#'
#' @noRd
list_linked_files <- function(lib_file = file.choose(),
basename = TRUE) {
checkmate::assert_file_exists(lib_file, access = "r")
checkmate::assert_flag(basename)
out <-
lib_file |>
readr::read_csv(col_types = readr::cols(.default = "c")) |>
magrittr::extract2("File Attachments") |>
stringr::str_split("; (?=[A-Z]:)") |>
unlist() |>
stringr::str_squish() |>
stringr::str_remove("[^A-Za-z0-9]$") |>
purrr::discard(is.na)
if (isTRUE(basename)) {
basename(out)
} else {
out
}
}
#' Find orphan files in a Zotero library
#'
#' @description
#'
#' This function compares the files in a folder with the files linked to the
#' references in a Zotero library and returns the names of the orphan files.
#'
#' @param lib_file A string with the path to the Zotero library exported as
#' a CSV file (important!).
#' @param file_folder A string with the path to the folder containing the files
#' linked to the references in the Zotero library.
#'
#' @return A [`character`][base::as.character] vector with the names of the
#' orphan files.
#'
#' @noRd
find_orphan_files <- function(lib_file = file.choose(),
file_folder = utils::choose.dir()) {
checkmate::assert_file_exists(lib_file, access = "r")
checkmate::assert_directory_exists(file_folder, access = "rw")
linked_files <- list_linked_files(lib_file, basename = TRUE)
real_files <- list.files(file_folder) |> basename()
real_files[!real_files %in% linked_files]
}
@wlperry
Copy link

wlperry commented Jun 13, 2024

Super cool - thanks - this is the best!!!

@wlperry
Copy link

wlperry commented Jun 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment