Skip to content

Instantly share code, notes, and snippets.

View JoaoCarabetta's full-sized avatar
🏊
data swimming

João Carabetta JoaoCarabetta

🏊
data swimming
View GitHub Profile
@JoaoCarabetta
JoaoCarabetta / headersTSE.csv
Last active March 28, 2017 19:11
Dados Eleitoais TSE - Headers para csv do LEIAME.pdf.
We can make this file beautiful and searchable if this error is corrected: It looks like row 6 should actually have 14 columns, instead of 2. in line 5.
PERFIL_ELEITORADO,CONSULTA_CAND_2010,CONSULTA_CAND_2012,CONSULTA_CAND_2014,BEM_CANDIDATO,CONSULTA_LEGENDAS ,CONSULTA_VAGAS ,VOTACAO_CANDIDATO_MUN_ZONA_2012,VOTACAO_CANDIDATO_MUN_ZONA_2014,VOTACAO_PARTIDO_MUN_ZONA_2012,VOTACAO_PARTIDO_MUN_ZONA_2014,VOTO_SECAO ,DETALHE_VOTACAO_MUN_ZONA_2012,DETALHE_VOTACAO_MUN_ZONA_2014
PERIODO,DATA_GERACAO,DATA_GERACAO,DATA_GERACAO,DATA_GERACAO,DATA_GERACAO,DATA_GERACAO,DATA_GERACAO,DATA_GERACAO,DATA_GERACAO,DATA_GERACAO,DATA_GERACAO,DATA_GERACAO,DATA_GERACAO
UF,HORA_GERACAO,HORA_GERACAO,HORA_GERACAO,HORA_GERACAO,HORA_GERACAO,HORA_GERACAO,HORA_GERACAO,HORA_GERACAO,HORA_GERACAO,HORA_GERACAO,HORA_GERACAO,HORA_GERACAO,HORA_GERACAO
MUNICIPIO,ANO_ELEICAO,ANO_ELEICAO,ANO_ELEICAO,ANO_ELEICAO,ANO_ELEICAO,ANO_ELEICAO,ANO_ELEICAO,ANO_ELEICAO,ANO_ELEICAO,ANO_ELEICAO,ANO_ELEICAO,ANO_ELEICAO,ANO_ELEICAO
COD_MUNICIPIO_TSE,NUM_TURNO ,NUM_TURNO ,NUM_TURNO ,DESCRICAO_ELEICAO,NUM_TURNO,DESCRICAO_ELEICAO,NUM_TURNO,NUM_TURNO,NUM_TURNO,NUM_TURNO,NUM_TURNO,NUM_TURNO,NUM_TURNO
NR_ZONA,DESCRICAO_ELEI
@JoaoCarabetta
JoaoCarabetta / EstadosBrasil.txt
Last active March 3, 2017 18:31
Lista Estados Brasil
"AC","AL","AP","AM","BA","CE","DF","ES","GO","MA","MT","MS","MG","PA","PB","PR","PE","PI","RJ","RN","RS","RO","RR","SC","SP","SE","TO"
@JoaoCarabetta
JoaoCarabetta / xml_to_csv.py
Last active May 10, 2024 08:41
Three lines to convert xml to csv
import xmltodict
import pandas as pd
import requests
xml = request.get('url').text
df = pd.DataFrame(xmltodict.parse(xml))
df.rename(columns=lambda x: x.replace('@', ''), inplace=True)
df.to_csv('data.csv')
@JoaoCarabetta
JoaoCarabetta / split_list_to_row.py
Last active April 11, 2018 16:09
Split list values to rows on pandas enforcing output type
def split_data_frame_list(df,
target_column,
output_type=float):
'''
Accepts a column with multiple types and splits list variables to several rows.
df: dataframe to split
target_column: the column containing the values to split
output_type: type of all outputs
def suffix(alist):
if not len(alist):
return [[]]
else:
return [alist] + suffix(alist[1:])
def preffix(alist):
if not len(alist):
@JoaoCarabetta
JoaoCarabetta / create_waze_partitioned_table_athena.sql
Created January 15, 2019 16:41
Creates an Athena partitioned table for Waze data
DROP TABLE IF EXISTS main;
CREATE EXTERNAL TABLE main (
endTimeMillis BIGINT,
startTimeMillis BIGINT,
endTime STRING,
startTime STRING,
jams array<struct<
uuid: STRING,
pubMillis: BIGINT,
CREATE TABLE waze.polygons_geo
WITH (
external_location = 's3://...',
format = 'Parquet') AS
WITH dataset AS (
SELECT
polygons
FROM waze.polygons)
SELECT
pol.polygon,
@JoaoCarabetta
JoaoCarabetta / README.md
Last active February 3, 2022 16:05
Add Time Based Glue Partitions with Lambda AWS

Creates time based Glue partitions given time range.

Keep in mind that you don't need data to add partitions. So, you can create partitions for a whole year and add the data to S3 later.

@JoaoCarabetta
JoaoCarabetta / linestring_to_geojson.sql
Created February 4, 2019 18:48
Waze linestring to geojson in Athena
SELECT
'{"type":"LineString", "coordinates":' ||
'[' || array_join(transform(line, loc -> '[' || CAST(loc.x AS VARCHAR) || ',' || CAST(loc.y AS VARCHAR) || ']'), ',') || ']}'
FROM test.test
@JoaoCarabetta
JoaoCarabetta / create_lambda_layer.sh
Last active May 14, 2019 14:28
Create Any Python Package Lambda Layer
rm -r python
rm lambda_layer
mkdir python
printf "[install]\nprefix=" > ~/.pydistutils.cfg
pip3.7 install $1 -t python/ # insert any pip available module, repeat if necessary
printf "" > ~/.pydistutils.cfg
zip -r lambda_layer.zip ./python
aws s3 cp lambda_layer.zip s3://config-lambda/layers/$1/
aws lambda publish-layer-version \
--layer-name $1 \