Skip to content

Instantly share code, notes, and snippets.

View JoaoCarabetta's full-sized avatar
🏊
data swimming

João Carabetta JoaoCarabetta

🏊
data swimming
View GitHub Profile
@JoaoCarabetta
JoaoCarabetta / cufflinks_template.py
Last active January 10, 2020 21:03
Cufflinks Templates

How to use

This class is supposed to be used inside a Jupyter notebook with Keplergl activated.

It just makes it easier to organize maps and save configuration files.

The advantage is that once the config is save, the map with the same identifier_string will load with the saved config.

Initialize Map

@JoaoCarabetta
JoaoCarabetta / strict_polygon_overlay_over_linestring.py
Last active November 25, 2019 15:19
Linestring Polygon Overlay with Geopandas
def line_polygon_intersection(line_df, poly_df):
column_geom_poly = poly_df._geometry_column_name
column_geom_line = line_df._geometry_column_name
spatial_index = line_df.sindex
bbox = poly_df.geometry.apply(lambda x: x.bounds)
sidx = bbox.apply(lambda x: list(spatial_index.intersection(x)))
nei = []
@JoaoCarabetta
JoaoCarabetta / plotly_cufflinks_subplot_uniquelegend.py
Created October 30, 2019 20:04
Subplot with cufflinks with unique legend
@JoaoCarabetta
JoaoCarabetta / multiprocessing_multivariable.py
Last active June 1, 2020 23:26
Ready to go parallel process implementation for functions with more than one argument
from multiprocessing.pool import Pool
from functools import partial
import time
n_processes = 3
def func(extra, t):
time.sleep(t + extra)
print('t', t, 'extra', extra)
@JoaoCarabetta
JoaoCarabetta / stratified_train_test_split.sql
Last active October 23, 2019 20:36
Splits stratified dataset into train and test using SQL
/* Stratified splits dataset into training and test
It guarantees that each group has the minimum size to be split.
*/
with ssize as (
select
group
from to_split_table
group by group
@JoaoCarabetta
JoaoCarabetta / mode_calculation_athena.sql
Last active October 21, 2019 19:02
Calculate mode in Athena SQL / Presto
/* It calculates the mode of a records-like `maintable`
*/
with counter as (
select
service,
array[
cast(row('Bob', count_if(name = 'Bob')) AS row(name varchar, age interger)),
cast(row('Alice', count_if(name = 'Alice')) AS row(name varchar, age interger)),
cast(row('Jane', count_if(name = 'Jane')) AS row(name varchar, age interger))
] as users
@JoaoCarabetta
JoaoCarabetta / execution_timer_with_context_manager.py
Last active October 11, 2019 22:46
Get execution time of any part of your code with this context manager for python
import logging
from contextlib import contextmanager
log = logging.getLogger(__name__)
@contextmanager
def timed_log(name, time_chunk='seconds'):
"""Context manager to get execution time of parts of codes.
To use, simply declares the context manager:
```with timed_log(name='useful', time_chunck='minutes'):
@JoaoCarabetta
JoaoCarabetta / data_persistance.py
Created September 20, 2019 22:37
persist data locally
def persist_local(data, args, folder,
id_keys=['experiment_id'],
as_type='.parquet.gz',
save_path='persistance'):
save_path = build_path(args, folder, id_keys, as_type, save_path)
# Check if path exists, add path if not
if not save_path.parent.exists():
save_path.parent.mkdir(mode=777, parents=True, exist_ok=True)
@JoaoCarabetta
JoaoCarabetta / utils.py
Created September 19, 2019 18:16
Load config from yaml
def open_yaml(path):
"""
Load yaml file.
Parameters
----------
path: pathlib.PosixPath
Path to yaml file
Return
------
Dictionary