Skip to content

Instantly share code, notes, and snippets.

@agoldst
Last active August 6, 2017 10:09
Show Gist options
  • Save agoldst/f18d276c519d420acc6e4e5f346ec7e4 to your computer and use it in GitHub Desktop.
Save agoldst/f18d276c519d420acc6e4e5f346ec7e4 to your computer and use it in GitHub Desktop.
Quick and dirty conversion of exported evernote notes to tagspaces-type files

Convert evernote notes to a folder for tagspaces

The Tagspaces website suggests a complex manual process for exporting evernote notes into its format. But evernote XML is easy enough to scrape for basic metadata that Programmer's Laziness takes over. Follow the steps below to produce a folder of files named according to the Tagspaces convention (tags are in the filename between brackets).

I wrote this and used it once with apparent success on my own (Mac) system, but did no careful testing. This function can move a large number of files, so back up before using it.

The export preserves tags and note titles as well as a good deal of attached data (via the .resources directories evernote exports, which are linked in the exported HTML). Spaces are removed from tag names. Creation/modification metadata and notebook structure are lost. The former is easy to extract from the evernote XML but impossible to export because Tagspaces relies on file system metadata. I wasn't making use of the multiple-notebook feature.

The process

  1. Export notes as HTML files to a directory, say notes.

  2. Export notes as an evernote XML file, say notes.enex.

  3. Notes with : in their title lead to files with / in their names. Seriously. There is no easy way to get at these files programmatically. Manually replace the / with - (hyphen).

  4. In R:

     source("enex.R")
     enex_tagspace("notes.enex", "notes", dry=T)
    

    This will not rename files but will tell you what will happen and alert you to any notes whose corresponding files cannot be found. If there are any of these, return to step 3.

  5. To actually make the change:

     enex_tagspace("notes.enex", "notes", dry=F)
    
# Please see the accompanying enex.md file for usage notes.
library(xml2)
library(stringr)
enex_tagspace <- function (enex, d, dry=T) {
node_title <- . %>% xml_find_all(".//title") %>% xml_text()
# Tagspaces delimits tags by spaces, so we have to eliminate spaces from
# tag names.
node_tags <- function (n) {
tags <- n %>% xml_find_all(".//tag") %>% xml_text()
if (length(tags) > 0) {
tags %>%
str_replace_all("\\W", "") %>%
str_c(collapse=" ") %>%
str_c("[", ., "]")
} else
""
}
# Evernote file export follows an intriguing rule:
# / is replaced with _ and then
# colons are replaced with /
sanitize <- . %>% str_replace_all(":", "-") %>%
str_replace_all("/", "_")
stopifnot(file.exists(enex) && dir.exists(d))
en <- read_xml(enex)
nn <- en %>% xml_find_all(".//note")
ttls <- node_title(nn) %>% sanitize()
tags <- lapply(as_list(nn), node_tags) # not vectorizing, don't care
fs <- ttls %>%
str_c(".html") %>%
file.path(d, .)
mask <- file.exists(fs)
fs_e <- fs[mask]
fs_out <- ttls %>%
str_c(tags, ".html") %>%
`[`(mask) %>%
file.path(d, .)
for (f in fs[!mask]) message("Couldn't find ", f)
message("Renaming ", length(fs_e), " of ", length(fs), " files ")
if (dry) {
for (i in seq_along(fs_e))
message(fs_e[i], " -> ", fs_out[i])
}
else
file.rename(fs_e, fs_out)
}
@catalyst1987
Copy link

For some reason I get this error. Hope you can advise.

"Error in doc_parse_file(con, encoding = encoding, as_html = as_html, options = options) :
internal error: Huge input lookup [1]"

@catalyst1987
Copy link

tried with smaller sample and now I get " no applicable method for 'xml_find_all' applied to an object of class "list" "

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment