Skip to content

Instantly share code, notes, and snippets.

View JoaoCarabetta's full-sized avatar
🏊
data swimming

João Carabetta JoaoCarabetta

🏊
data swimming
View GitHub Profile
@JoaoCarabetta
JoaoCarabetta / utils.py
Created September 19, 2019 18:16
Load config from yaml
def open_yaml(path):
"""
Load yaml file.
Parameters
----------
path: pathlib.PosixPath
Path to yaml file
Return
------
Dictionary
@JoaoCarabetta
JoaoCarabetta / README.md
Last active February 3, 2022 16:05
Add Time Based Glue Partitions with Lambda AWS

Creates time based Glue partitions given time range.

Keep in mind that you don't need data to add partitions. So, you can create partitions for a whole year and add the data to S3 later.

@JoaoCarabetta
JoaoCarabetta / intersect_point_to_hexagon
Created April 19, 2021 21:48
Intersect point to hexagon - SQL Base dos Dados
select id_grid_h3, hora, ST_ASTEXT(ANY_VALUE(geometria)) wkt, count(*) n_registros, ANY_VALUE(quantidade_pessoas) populacao
from `rj-smtr.br_rj_riodejaneiro_onibus_gps.registros_tratada` t1
join `basedosdados.br_ipea_acesso_oportunidades.estatisticas_2019` t2
on st_intersects(geometria, st_geogpoint(longitude, latitude))
where id_municipio in (
select id_municipio
from `basedosdados.br_bd_diretorios_brasil.municipio`
where municipio = 'Rio de Janeiro')
group by id_grid_h3, hora
@JoaoCarabetta
JoaoCarabetta / line_polygon_intersection.py
Created September 2, 2020 20:15
Line and Polygon Intersection for Geopandas
def line_polygon_intersection(line_df, poly_df):
"""
It cuts the line if it sits between polygons.
"""
column_geom_poly = poly_df._geometry_column_name
column_geom_line = line_df._geometry_column_name
spatial_index = line_df.sindex
bbox = poly_df.geometry.apply(lambda x: x.bounds)
@JoaoCarabetta
JoaoCarabetta / github_repos_with_string
Created September 20, 2021 13:42
All unique GitHub repos that contains a string
from github import Github
from time import sleep
g = Github(token)
search_str = 'basedosdados'
repo = []
for i in g.search_code(search_str):
sleep(0.2)
@JoaoCarabetta
JoaoCarabetta / katana.py
Last active August 3, 2021 18:25
Katana Algorithm Minimal Working Example
from shapely.geometry import box, Polygon, MultiPolygon, GeometryCollection
from shapely.wkt import loads
def threshold_func(geometry, threshold_value):
"""Compares the threshold values with the polygon area"""
return geometry.area < threshold_value
def katana(geometry, threshold_func, threshold_value, number_tiles=0, max_number_tiles=250):
"""Splits a geometry in tiles forming a grid given a threshold function and
a maximum number of tiles.
@JoaoCarabetta
JoaoCarabetta / xml_to_csv.py
Last active July 10, 2021 10:47
Three lines to convert xml to csv
import xmltodict
import pandas as pd
import requests
xml = request.get('url').text
df = pd.DataFrame(xmltodict.parse(xml))
df.rename(columns=lambda x: x.replace('@', ''), inplace=True)
df.to_csv('data.csv')
# Parallelly download all aws-lambda functions
# Assumes you have ran `aws configure` and have output-mode as "text"
# Works with "aws-cli/1.16.72 Python/3.6.7 Linux/4.15.0-42-generic botocore/1.12.62"
download_code () {
local OUTPUT=$1
aws lambda get-function --function-name $OUTPUT | head -n 1 | cut -f 2 | xargs wget -O ./lambda_functions/$OUTPUT.zip
}
mkdir lambda_functions
for run in $(aws lambda list-functions | cut -f 6 | xargs);
@JoaoCarabetta
JoaoCarabetta / chess_highlights.js
Created November 11, 2020 20:07
chess.com highlights snippet
function coord_boundaries(coord) {
coord = coord.toString()
return coord[0] >= 1 && coord[0] <= 8 && coord[1] >= 1 && coord[1] <= 8
}
function highlight_square(coord, color) {
board = document.getElementsByClassName('layout-board')[0]
@JoaoCarabetta
JoaoCarabetta / Makefile
Created July 14, 2020 17:04
Makefile to setup python env. for Data Science projects
.PHONY: create-env update-env
# It creates an env. with the directory name
REPO=$(shell basename $(CURDIR))
create-env:
python3 -m venv .$(REPO);
source .$(REPO)/bin/activate; \
pip3 install --upgrade -r requirements.txt; \