Skip to content

Instantly share code, notes, and snippets.

@tts
Created August 21, 2015 13:09
Show Gist options
  • Save tts/887cca8dd6c49932e95b to your computer and use it in GitHub Desktop.
Save tts/887cca8dd6c49932e95b to your computer and use it in GitHub Desktop.
library(rvest)
# List of Financial Times Research Rank journals
url <- "http://web.lib.aalto.fi/en/journals/?cmd=lists&listid=1"
# <a href="?cmd=show&o=journal&journalid=15">Academy of management journal</a></li>
#
# Journal titles are the text values of the a element
linknodes <- url %>%
html %>%
html_nodes(xpath = "//a[starts-with(@href,'?cmd=show&o=journal&journalid=')]")
jtitles <- linknodes %>%
html_text()
# URL where the ISSN is given, is constructed from the base URL
# and the value of the href attribute of the a element
issnurls <- linknodes %>%
html_attr("href") %>%
paste0("http://web.lib.aalto.fi/en/journals/", .)
# <div id="journaldata">
# <table cellspacing="1" cellpadding="2" border="0">
# <tr valign="top"><td class="label" align="right">ISSN:&nbsp;</td><td>0001-4273</td>
#
# ISSN is the text value of the second td element of the table's first row
jissns <- lapply(issnurls, function(x) {
x %>%
html %>%
html_nodes(xpath = "//div[@id='journaldata']/table[1]/tr[1]/td[2]") %>%
html_text()
})
# The data frame specs defined by Atira for Pure import
# ISSN, Title, UUID, Rating
ft45.df <- do.call("rbind", lapply(jissns, data.frame, stringsAsFactors=F))
ft45.df$Title <- do.call("rbind", lapply(jtitles, as.character))
ft45.df$UUID <- ""
ft45.df$Rating <- "FT45"
names(ft45.df) <- c("ISSN", "TITLE", "UUID", "RATING")
# Write out as an Excel file
library(XLConnect)
wb <- loadWorkbook("ft45.xlsx", create = TRUE)
createSheet(wb, name = "journals")
writeWorksheet(wb, ft45.df, sheet = "journals", startRow = 1, startCol = 1)
saveWorkbook(wb)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment