Skip to content

Instantly share code, notes, and snippets.

layout title description tags
default
SQL Style Guide
A guide to writing clean, clear, and consistent SQL.
data
process

Purpose

@nsorros
nsorros / keybase.md
Created December 4, 2020 08:17
keybase proof

Keybase proof

I hereby claim:

  • I am nsorros on github.
  • I am nsorros (https://keybase.io/nsorros) on keybase.
  • I have a public key ASAEEg7pTTGnMmH5072YFHU-nsM9ZR1hjAIjTjTg2lC7Ego

To claim this, I am signing this object:

@nsorros
nsorros / update_requirements.sh
Last active April 22, 2021 17:47
Update requirements txt file
update-requirements: VENV := /tmp/update-requirements-venv
update-requirements: ## updates requirements
@if [ -d $(VENV) ]; then rm -rf $(VENV); fi
@mkdir -p $(VENV)
$(PYTHON) -m venv $(VENV)
$(PIP) install --upgrade pip
$(PIP) install -r unpinned_requirements.txt
echo "Created by update-requirements. Do not edit." > requirements.txt
$(PIP) freeze | grep -v pkg-resources==0.0.0 >> requirements.txt
@nsorros
nsorros / venv.sh
Last active April 22, 2021 17:48
Create virtualenv
PYTHON := python3.8
VENV := venv
PIP := venv/bin/pip
.PHONY: venv
venv: ## creates virtualenv
@if [ -d $(VENV) ]; then rm -rf $(VENV); fi
@mkdir -p $(VENV)
$(PYTHON) -m venv $(VENV)
$(PIP) install --upgrade pip
@nsorros
nsorros / sync.sh
Last active April 22, 2021 17:49
Sync data and models to s3
PROJECT_NAME := classifier
PROJECT_BUCKET := datascience/$(PROJECT_NAME)
.PHONY: sync_data
sync_data: ## syncs data to s3
aws s3 sync data/ s3://$(PROJECT_BUCKET)/data
aws s3 s3://$(PROJECT_BUCKET)/data data/
.PHONY: sync_models
sync_models: ## syncs models to s3
@nsorros
nsorros / train_argparse.py
Last active April 22, 2021 17:46
Train with argparse
import argparse
def train(data_path, model_path, learning_rate, batch_size):
...
if __name__ == "__main__":
argparser = argparse.ArgumentParser()
argparser.add_argument("--data_path", type=str, help="path to train data")
@nsorros
nsorros / config.ini
Last active April 22, 2021 17:46
Config file that describes params for train
[DEFAULT]
version = 2021.03.0
[preprocess]
raw_data_path = data/raw/data.xlsx
processed_data_path = data/processed/data.jsonl
[train]
data_path = data/processed/data.jsonl
model_path = models/cnn-2021.03.0/
@nsorros
nsorros / train_config.py
Last active April 22, 2021 16:15
Train with config
import configparser
import argparse
def train(data_path, model_path, learning_rate, batch_size):
...
if __name__ == "__main__":
argparser = argparse.ArgumentParser()
argparser.add_argument("--config", type=str, help="path to config file")
args = argparser.parse_args()
@nsorros
nsorros / optimise_threshold_naive.py
Last active February 14, 2022 12:53
Naive implementation of optimising threshold for multilabel classifiers described in "Threshold optimisation for multi label classifier"
from functools import partial
import time
from sklearn.metrics import f1_score
from scipy.sparse import load_npz
import numpy as np
import typer
def f(Y_pred_proba, Y_test, thresholds):
@nsorros
nsorros / optimize_threshold_custom_f1.py
Created February 14, 2022 10:33
Add custom f1 score to the implementation of optimising threshold for multilabel classifiers described in "Threshold optimisation for multi label classifier"
from functools import partial
import time
from sklearn.metrics import multilabel_confusion_matrix
from scipy.sparse import load_npz
import numpy as np
import typer
if "line_profiler" not in dir() and "profile" not in dir():
# no-op profile decorator