Skip to content

Instantly share code, notes, and snippets.

@tbobin
Created August 4, 2017 12:48
Show Gist options
  • Save tbobin/ac574e763ae9e7514588e823e20bac84 to your computer and use it in GitHub Desktop.
Save tbobin/ac574e763ae9e7514588e823e20bac84 to your computer and use it in GitHub Desktop.
library(tidyverse)
library(rvest)
url_base <- "http://www.presseportal.de/blaulicht/d/polizei/"
content <- read_html(url_base)
css <- ".pad-b-l .news"
# scraping news boxes
container <- content %>% html_nodes(css)
# printing content of first box
# container[1] %>% html_node(".news-date") %>% html_text()
container[1] %>% html_node(".sans a") %>% html_text()
container[1] %>% html_node(".news-headline-clamp span") %>% html_text()
container[1] %>% html_node(".news-bodycopy a") %>% html_text()
container[1] %>% html_node(".news-bodycopy a") %>% html_attr("href")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment