Skip to content

Instantly share code, notes, and snippets.

@fdabl
Last active January 4, 2016 10:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save fdabl/e5e47d3b0f6723cd6f04 to your computer and use it in GitHub Desktop.
Save fdabl/e5e47d3b0f6723cd6f04 to your computer and use it in GitHub Desktop.
library('rvest')
library('stringr')
fachschaft = 'http://wiki.psyfako.org/index.php?title=Liste_der_Psychologie_Fachschaften'
emails = read_html(fachschaft) %>%
html_nodes('p, a, span') %>% html_text(.) %>%
str_replace('\\[at\\]', '@') %>% str_replace('\\(at\\)', '@') %>%
str_replace('\\[ät\\]', '@') %>% str_replace('\\[dot\\]', '.') %>%
str_subset(., '@') %>% str_trim(.) %>%
sapply(., function(text) {
newline = regexpr('\n', text)[1]
new = str_sub(text, 1, newline - 1)
len = str_length(new)
if (str_sub(new, len) == 'd') new = paste0(new, 'e')
if (str_sub(new, len) == 'a') new = paste0(new, 't')
new
}) %>% str_subset(., '@') %>% unique(.)
write(emails, 'emails_fachschaften.txt')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment