Skip to content

Instantly share code, notes, and snippets.

@psobczyk
Created September 19, 2020 12:57
Show Gist options
  • Save psobczyk/e727f418ec33fccec142af40e677e72e to your computer and use it in GitHub Desktop.
Save psobczyk/e727f418ec33fccec142af40e677e72e to your computer and use it in GitHub Desktop.
Creating an url mapping when migrating from Wordpress to Hugo
library(xml2)
library(rvest)
doc <- read_html(x = "path_to_xml_file")
threads <- xml_find_all(doc, ".//thread")
old_links <- gsub(".* (http.*)", "\\1", xml_text(xml_find_all(threads, xpath = ".//id")))
old_titles <- xml_text(xml_find_all(threads, xpath = ".//title"))
old_titles <- iconv(enc2utf8(old_titles), "utf-8", "ascii//translit")
old_titles <- tolower(old_titles)
old_titles <- gsub("'", "", old_titles)
old_titles <- gsub("[[:punct:]]", "", old_titles)
old_titles <- gsub(" ", " ", old_titles)
old_titles <- gsub(" ", "-", old_titles)
# here change for you domain
new_links <- paste0("http://szychtawdanych.pl/post/", old_titles, "/")
mapping <- unique(data.frame(old_links, new_links))
write.csv(file = "szychta_url_mapping.csv", mapping, row.names = F, col.names = FALSE)
# more on migrating to Hugo at https://szychtawdanych.pl/post/przenoszenie-bloga-do-hugo/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment