Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arthurwelle/ee3c21c1265b4ff47cb21d03c281cde9 to your computer and use it in GitHub Desktop.
Save arthurwelle/ee3c21c1265b4ff47cb21d03c281cde9 to your computer and use it in GitHub Desktop.
RAIS PostgreSQL
# RAIS
```{r}
arquivos <- readLines("ftps://ftp.mtps.gov.br/pdet/microdados/RAIS/2018/") %>%
str_extract("RAIS.+") %>%
str_subset("ESTAB",negate = TRUE) %>%
str_c("ftp://ftp.mtps.gov.br/pdet/microdados/RAIS/2018/",.)
arquivos <- c("ftp://ftp.mtps.gov.br/pdet/microdados/RAIS/2018/RAIS_VINC_PUB_CENTRO_OESTE.7z",
"ftp://ftp.mtps.gov.br/pdet/microdados/RAIS/2018/RAIS_VINC_PUB_MG_ES_RJ.7z",
"ftp://ftp.mtps.gov.br/pdet/microdados/RAIS/2018/RAIS_VINC_PUB_NORDESTE.7z",
"ftp://ftp.mtps.gov.br/pdet/microdados/RAIS/2018/RAIS_VINC_PUB_NORTE.7z",
"ftp://ftp.mtps.gov.br/pdet/microdados/RAIS/2018/RAIS_VINC_PUB_SP.7z",
"ftp://ftp.mtps.gov.br/pdet/microdados/RAIS/2018/RAIS_VINC_PUB_SUL.7z"
)
arquivos <- c("ftp://ftp.mtps.gov.br/pdet/microdados/RAIS/2018/RAIS_VINC_PUB_CENTRO_OESTE.7z")
purrr::walk(.x = arquivos, .f = download.file, destfile = "C:/Users/arthur/Desktop/t/RAIS_VINC_PUB_CENTRO_OESTE.7z")
```
Para dezipar
```{r}
# install.packages("archive") # Para dezipar
arq <- list.files("C:/Users/arthur/Desktop/t/",
full.names = T,
pattern = ".txt")
# lê arquivos
df <- data.table::fread(arq[1],
nrows = 1000)
df <- df %>%
janitor::clean_names() # limpa nomes das colunas
names(df)
df <- df %>%
mutate(across(starts_with("vl"), ~{str_remove_all(.x, "\\.") %>%
str_replace(",", ".") %>%
as.numeric()}
))
# criar tabela
dbCreateTable(conn, "rais_2016", df)
# inserir valores em batches
df2 <- readr::read_delim(arq[1],
n_max = 1000,
delim = ";",
locale = locale(encoding = "latin1"))
# função callback
f_call <- function(x, pos){
x %>%
janitor::clean_names() %>%
mutate(across(starts_with("vl"), ~{str_remove_all(.x, "\\.") %>%
str_replace(",", ".") %>%
as.numeric()}
)) %>%
dbx::dbxInsert(conn, "rais_2016", .)
}
# iterar sobre arquivos
walk(arq, ~readr::read_delim_chunked(.x,
delim = ";",
locale = locale(encoding = "latin1"),
callback = readr::DataFrameCallback$new(f_call),
chunk_size = 50000)
)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment