Skip to content

Instantly share code, notes, and snippets.

View JoaoCarabetta's full-sized avatar
🏊
data swimming

João Carabetta JoaoCarabetta

🏊
data swimming
View GitHub Profile
@JoaoCarabetta
JoaoCarabetta / update-function-code-aws-lambda.sh
Last active February 25, 2019 14:00
Update python function to AWS lambda
# update-function-code-aws-lambda.sh <function path> <lambda-function-name>
# make sure that the aws-cli is configured in the same region and with a user with permission
rm temp.zip
echo ------ Zipping --------
cp $1 lambda_function.py
zip temp.zip lambda_function.py
rm lambda_function.py
echo ------ Updating Lambda ------
aws lambda update-function-code --function-name $2 --zip-file "fileb://temp.zip"
@JoaoCarabetta
JoaoCarabetta / create_table_athena_dump.sql
Created March 4, 2019 15:02
Create table from Athena CSV dump
-- Delete *.csv.metadata
-- aws s3 rm s3://... --recursive --exclude '*.csv'
CREATE EXTERNAL TABLE `osm_pems_ids`(
`osm_id` bigint COMMENT '',
`sensor_id` string COMMENT '',
`neigh_level` bigint COMMENT '')
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
@JoaoCarabetta
JoaoCarabetta / bulk_delete_dynamo_tables.py
Created March 21, 2019 16:53
Bulk delete AWS dynamodb tables
import boto3
table_filter = 'temp-cap'
dynamo = boto3.client('dynamodb')
tables = dynamo.list_tables()['TableNames']
tables = [t for t in tables if table_filter in t]
for t in tables:
try:
dynamo.delete_table(TableName=t)
except:
continue
@JoaoCarabetta
JoaoCarabetta / Accent_to_ascii_bigquery.sql
Created April 30, 2019 22:09
Take out all accents a string in bigquery
CREATE TEMP FUNCTION accent2latin(word STRING) AS
((
WITH lookups AS (
SELECT
'ã,ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS accents,
'a,c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,o,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' AS latins
),
pairs AS (
SELECT accent, latin FROM lookups,
UNNEST(SPLIT(accents)) AS accent WITH OFFSET AS p1,
@JoaoCarabetta
JoaoCarabetta / parlamentares_na_tramitacao.sql
Last active May 6, 2019 14:07
Identifica os parlamentares nas tramitações
CREATE TEMP FUNCTION accent2latin(word STRING) AS
((
WITH lookups AS (
SELECT
'ã,ç,æ,œ,á,é,í,ó,ú,à,è,ì,ò,ù,ä,ë,ï,ö,ü,ÿ,â,ê,î,ô,û,å,ø,Ø,Å,Á,À,Â,Ä,È,É,Ê,Ë,Í,Î,Ï,Ì,Ò,Ó,Ô,Ö,Ú,Ù,Û,Ü,Ÿ,Ç,Æ,Œ,ñ' AS accents,
'a,c,ae,oe,a,e,i,o,u,a,e,i,o,u,a,e,i,o,u,y,a,e,i,o,u,a,o,O,A,A,A,A,A,E,E,E,E,I,I,I,I,O,O,O,O,U,U,U,U,Y,C,AE,OE,n' AS latins
),
pairs AS (
SELECT accent, latin FROM lookups,
UNNEST(SPLIT(accents)) AS accent WITH OFFSET AS p1,
# Parallelly download all aws-lambda functions
# Assumes you have ran `aws configure` and have output-mode as "text"
# Works with "aws-cli/1.16.72 Python/3.6.7 Linux/4.15.0-42-generic botocore/1.12.62"
download_code () {
local OUTPUT=$1
aws lambda get-function --function-name $OUTPUT | head -n 1 | cut -f 2 | xargs wget -O ./lambda_functions/$OUTPUT.zip
}
mkdir lambda_functions
for run in $(aws lambda list-functions | cut -f 6 | xargs);
@JoaoCarabetta
JoaoCarabetta / utils.py
Created September 19, 2019 18:16
Load config from yaml
def open_yaml(path):
"""
Load yaml file.
Parameters
----------
path: pathlib.PosixPath
Path to yaml file
Return
------
Dictionary
@JoaoCarabetta
JoaoCarabetta / data_persistance.py
Created September 20, 2019 22:37
persist data locally
def persist_local(data, args, folder,
id_keys=['experiment_id'],
as_type='.parquet.gz',
save_path='persistance'):
save_path = build_path(args, folder, id_keys, as_type, save_path)
# Check if path exists, add path if not
if not save_path.parent.exists():
save_path.parent.mkdir(mode=777, parents=True, exist_ok=True)
@JoaoCarabetta
JoaoCarabetta / execution_timer_with_context_manager.py
Last active October 11, 2019 22:46
Get execution time of any part of your code with this context manager for python
import logging
from contextlib import contextmanager
log = logging.getLogger(__name__)
@contextmanager
def timed_log(name, time_chunk='seconds'):
"""Context manager to get execution time of parts of codes.
To use, simply declares the context manager:
```with timed_log(name='useful', time_chunck='minutes'):
@JoaoCarabetta
JoaoCarabetta / mode_calculation_athena.sql
Last active October 21, 2019 19:02
Calculate mode in Athena SQL / Presto
/* It calculates the mode of a records-like `maintable`
*/
with counter as (
select
service,
array[
cast(row('Bob', count_if(name = 'Bob')) AS row(name varchar, age interger)),
cast(row('Alice', count_if(name = 'Alice')) AS row(name varchar, age interger)),
cast(row('Jane', count_if(name = 'Jane')) AS row(name varchar, age interger))
] as users