Skip to content

Instantly share code, notes, and snippets.

@nbertagnolli
nbertagnolli / sklearn_lambda_handler.py
Last active June 13, 2021 23:22
A simple lambda handler to serve a pretrained sklearn model
import boto3
import json
import os
import pickle
s3 = boto3.resource("s3")
BUCKET_NAME = "nic-sklearn-models"
@nbertagnolli
nbertagnolli / binance_price_scraper.js
Created March 20, 2021 23:18
Simple Apify Script for scraping 24hr kline data from Binance
const Apify = require('apify');
const axios = require('axios');
/**
* Converts List of Lists of kline data from Binance to a list of
* dictionaries.
* @param {[List[List[String]]} data The raw data returned from binance
* @param {[String]} exchange The name of the exchange to scrape
* @return {[Dict[String, String]}
*/
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nbertagnolli
nbertagnolli / feature_importance.ipynb
Created October 12, 2020 05:11
Gist for medium article...
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@nbertagnolli
nbertagnolli / extract_feature_names.py
Created October 11, 2020 23:42
Extracts feature names from an sklearn base model, transformer, etc.
def extract_feature_names(model, name) -> List[str]:
"""Extracts the feature names from arbitrary sklearn models
Args:
model: The Sklearn model, transformer, clustering algorithm, etc. which we want to get named features for.
name: The name of the current step in the pipeline we are at.
Returns:
The list of feature names. If the model does not have named features it constructs feature names
by appending an index to the provided name.
@nbertagnolli
nbertagnolli / get_feature_names.py
Last active October 11, 2020 23:52
Gets the feature names in order from an arbitrary sklearn pipeline
from sklearn.pipeline import FeatureUnion, Pipeline
def get_feature_names(model, names: List[str], name: str) -> List[str]:
"""Thie method extracts the feature names in order from a Sklearn Pipeline
This method only works with composed Pipelines and FeatureUnions. It will
pull out all names using DFS from a model.
Args:
model: The model we are interested in
@nbertagnolli
nbertagnolli / language_transformer_fast.py
Last active April 29, 2021 21:04
A faster language transformer
from typing import List, Optional, Set
from sklearn.base import BaseEstimator, TransformerMixin
import fasttext
from transformers import MarianTokenizer, MarianMTModel
import os
import requests
class LanguageTransformerFast(BaseEstimator, TransformerMixin):
def __init__(
self,
@nbertagnolli
nbertagnolli / english_transformer.py
Last active April 29, 2021 21:02
Simple and Slow sklearn transformer to translate any input language to english.
from typing import List, Optional
from sklearn.base import BaseEstimator, TransformerMixin
import fasttext
from transformers import MarianTokenizer, MarianMTModel
import os
class EnglishTransformer(BaseEstimator, TransformerMixin):
def __init__(self,
fasttext_model_path: str="/tmp/lid.176.bin",
@nbertagnolli
nbertagnolli / predict_language.py
Last active August 8, 2020 22:18
Python snippet to predict which language each string in a list is.
from typing import List
import os
import requests
import fasttext
def get_language(texts: List[str]) -> List[str]:
"""Predicts the languge code for each text in a list
Args:
def split_on_date(data: pd.DataFrame, train_percent: float=0.9, seed: int=1234):
"""Splits a DataFrame into train and validation sets based on the date.
Args:
data: The data we want to split. It must contain a date column.
train_percent: The percent of data to use for training
seed: The random seed to use for selecting the sets
Returns:
data: A DataFrame with a new split column with values 'train' and 'val'.