Skip to content

Instantly share code, notes, and snippets.

@patperu
Created October 3, 2019 10:03
Show Gist options
  • Save patperu/36a9e6a66ea4e4f9dfcead970a932f61 to your computer and use it in GitHub Desktop.
Save patperu/36a9e6a66ea4e4f9dfcead970a932f61 to your computer and use it in GitHub Desktop.
Schriftliche Anfragen Berlin pardok (lpd-abodienst@parlament-berlin.de)
# see also https://www.brodrigues.co/blog/2018-06-10-scraping_pdfs/
library(pdftools)
library(tidyverse)
( txt <- pdf_text(pdf = 'http://pardok.parlament-berlin.de/starweb/adis/citat/VT/18/SchrAnfr/S18-20899.pdf') %>%
readr::read_lines() )
fin <- txt[133:145]
fin <- fin %>%
str_replace_all('%', '') %>%
str_replace_all('\\.', '') %>%
str_replace_all('\\,', '.') %>%
stringr::str_squish()
( fin <- readr::read_delim(fin, delim = ' ', col_names = FALSE) )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment