Skip to content

Instantly share code, notes, and snippets.

@wmay
Last active November 18, 2019 03:23
Show Gist options
  • Save wmay/e76be0f40ac695e0f2ed601a62562334 to your computer and use it in GitHub Desktop.
Save wmay/e76be0f40ac695e0f2ed601a62562334 to your computer and use it in GitHub Desktop.
Scrape crime data from São Paulo State's Institute
library(rvest)
ssp_url = 'http://www.ssp.sp.gov.br/estatistica/pesquisa.aspx'
## get the region/municipality form
sess = html_session(ssp_url)
form = html_form(sess)[[2]]
## see form options
head(form$fields$`ctl00$conteudo$ddlRegioes`$options)
head(form$fields$`ctl00$conteudo$ddlMunicipios`$options)
## submit the form to get a new page
form = form %>% set_values('ctl00$conteudo$ddlRegioes' = 3)
sess = submit_form(sess, form)
sess %>%
html_node("#conteudo_repAnos_gridDados_0") %>%
html_table() %>%
head()
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(rvest)
Loading required package: xml2
> ssp_url = 'http://www.ssp.sp.gov.br/estatistica/pesquisa.aspx'
>
> ## get the region/municipality form
> sess = html_session(ssp_url)
> form = html_form(sess)[[2]]
> ## see form options
> head(form$fields$`ctl00$conteudo$ddlRegioes`$options)
Todos Capital
"0" "1"
Grande São Paulo (exclui a Capital) São José dos Campos
"2" "3"
Campinas Ribeirão Preto
"4" "5"
> head(form$fields$`ctl00$conteudo$ddlMunicipios`$options)
Todos Adamantina Adolfo Aguaí
"0" "1" "2" "3"
Águas da Prata Águas de Lindóia
"4" "5"
> ## submit the form to get a new page
> form = form %>% set_values('ctl00$conteudo$ddlRegioes' = 3)
> sess = submit_form(sess, form)
Submitting with 'ctl00$conteudo$btnExcel'
> sess %>%
+ html_node("#conteudo_repAnos_gridDados_0") %>%
+ html_table() %>%
+ head()
Natureza Jan Fev Mar Abr Mai Jun Jul Ago
1 OCORRÊNCIAS DE PORTE DE ENTORPECENTES 110 89 69 76 87 60 91 89
2 OCORRÊNCIAS DE TRÁFICO DE ENTORPECENTES 247 249 228 211 242 240 258 309
3 OCORRÊNCIAS DE APREENSÃO DE ENTORPECENTES(1) 11 11 18 10 13 12 16 22
4 OCORRÊNCIAS DE PORTE ILEGAL DE ARMA 43 43 44 59 55 35 40 53
5 Nº DE ARMAS DE FOGO APREENDIDAS 101 90 100 90 102 89 79 109
6 Nº DE FLAGRANTES LAVRADOS 542 461 476 465 497 442 503 482
Set Out Nov Dez Total
1 94 ... ... ... 765.000
2 244 ... ... ... 2.228
3 18 ... ... ... 131.000
4 51 ... ... ... 423.000
5 93 ... ... ... 853.000
6 474 ... ... ... 4.342
>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment