Skip to content

Instantly share code, notes, and snippets.

@jtrecenti
Created March 8, 2017 15:55
Show Gist options
  • Save jtrecenti/9ff60ed66f9df0a54e2ff82919cbcd4f to your computer and use it in GitHub Desktop.
Save jtrecenti/9ff60ed66f9df0a54e2ff82919cbcd4f to your computer and use it in GitHub Desktop.
baixar obras do governo
library(tabulizer)
library(tidyverse)
library(stringr)
library(abjutils)
u <- "http://legis.senado.leg.br/sdleg-getter/documento/download/a5e8c92c-86c1-45bb-99bd-3ad098905e81"
tab <- tabulizer::extract_tables(u)
arrumar <- function(x) {
x %>%
tolower() %>%
str_trim() %>%
str_replace_all('[[:space:]]+', '_') %>%
str_replace_all('%', 'p') %>%
str_replace_all('r\\$', 'reais') %>%
rm_accent()
}
tab_tidy <- tab %>%
map(as_tibble) %>%
bind_rows() %>%
set_names(arrumar(.[1,])) %>%
slice(-1) %>%
mutate_all(funs(str_replace_all(., '[[:space:]]+', ' '))) %>%
mutate_all(str_trim)
write_csv(tab_tidy, 'obras.csv')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment