This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import boto3 | |
import json | |
import os | |
import pickle | |
s3 = boto3.resource("s3") | |
BUCKET_NAME = "nic-sklearn-models" | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
const Apify = require('apify'); | |
const axios = require('axios'); | |
/** | |
* Converts List of Lists of kline data from Binance to a list of | |
* dictionaries. | |
* @param {[List[List[String]]} data The raw data returned from binance | |
* @param {[String]} exchange The name of the exchange to scrape | |
* @return {[Dict[String, String]} | |
*/ |
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def extract_feature_names(model, name) -> List[str]: | |
"""Extracts the feature names from arbitrary sklearn models | |
Args: | |
model: The Sklearn model, transformer, clustering algorithm, etc. which we want to get named features for. | |
name: The name of the current step in the pipeline we are at. | |
Returns: | |
The list of feature names. If the model does not have named features it constructs feature names | |
by appending an index to the provided name. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from sklearn.pipeline import FeatureUnion, Pipeline | |
def get_feature_names(model, names: List[str], name: str) -> List[str]: | |
"""Thie method extracts the feature names in order from a Sklearn Pipeline | |
This method only works with composed Pipelines and FeatureUnions. It will | |
pull out all names using DFS from a model. | |
Args: | |
model: The model we are interested in |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from typing import List, Optional, Set | |
from sklearn.base import BaseEstimator, TransformerMixin | |
import fasttext | |
from transformers import MarianTokenizer, MarianMTModel | |
import os | |
import requests | |
class LanguageTransformerFast(BaseEstimator, TransformerMixin): | |
def __init__( | |
self, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from typing import List, Optional | |
from sklearn.base import BaseEstimator, TransformerMixin | |
import fasttext | |
from transformers import MarianTokenizer, MarianMTModel | |
import os | |
class EnglishTransformer(BaseEstimator, TransformerMixin): | |
def __init__(self, | |
fasttext_model_path: str="/tmp/lid.176.bin", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from typing import List | |
import os | |
import requests | |
import fasttext | |
def get_language(texts: List[str]) -> List[str]: | |
"""Predicts the languge code for each text in a list | |
Args: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def split_on_date(data: pd.DataFrame, train_percent: float=0.9, seed: int=1234): | |
"""Splits a DataFrame into train and validation sets based on the date. | |
Args: | |
data: The data we want to split. It must contain a date column. | |
train_percent: The percent of data to use for training | |
seed: The random seed to use for selecting the sets | |
Returns: | |
data: A DataFrame with a new split column with values 'train' and 'val'. |