Skip to content

Instantly share code, notes, and snippets.

@jiahao87
jiahao87 / forecast_reconciliation.py
Last active March 13, 2022 08:14
Hierarchical Forecasting Reconciliation using OLS Method
import numpy as np
import pandas as pd
import hts # To install: pip install scikit-hts
import collections
from scipy.optimize import lsq_linear
hts_df = pd.DataFrame([{'total': 14,
'CA': 5.4, 'TX': 1.8, 'WI': 5.9,
'CA_1': 0.8, 'CA_2': 0.6, 'CA_3': 0.9, 'CA_4': 0.3,
@jiahao87
jiahao87 / pegasus_fine_tune.py
Last active April 11, 2024 03:01
Pytorch script for fine-tuning Pegasus Large model
"""Script for fine-tuning Pegasus
Example usage:
# use XSum dataset as example, with first 1000 docs as training data
from datasets import load_dataset
dataset = load_dataset("xsum")
train_texts, train_labels = dataset['train']['document'][:1000], dataset['train']['summary'][:1000]
# use Pegasus Large model as base for fine-tuning
model_name = 'google/pegasus-large'
train_dataset, _, _, tokenizer = prepare_data(model_name, train_texts, train_labels)
@jiahao87
jiahao87 / sample_reviews.txt
Last active December 20, 2020 07:09
Sample reviews of top topics
######################
### Sample Reviews ###
######################
###### Topic 1 ######
"From the start our experience was bad There was only one person on check in so we had to queue Having been allcated our rooms we had to change them as we had specified adjacent or interconnecting rooms which they failed to do We then had to queue up again for the one person still on reception and 45 minutes later were allocated 2 adjacent rooms But one of the rooms had a smell of drains which I reported and which the very discourteous duty manager Thalia refused to deal with In fact she told me several times that I was wrong The rooms were small the beds very soft and the shower and toilet were part of the bedroom The smell of drains was coming from the shower For such an expensive hotel this was unacceptable especially the way the duty manager treated her customers I don t think I have ever encountered a more unpleasant manner in my many years of travelling"
"On arrival we only had 30 minutes to get ready We were to
@jiahao87
jiahao87 / vaex_iris_sample.py
Created September 6, 2020 11:06
Vaex sample code for Iris data
import vaex
import vaex.ml
# load iris data
df = vaex.ml.datasets.load_iris()
# perform train test split
df_train, df_test = df.ml.train_test_split(test_size=0.2)
# apply standardization transformation
@jiahao87
jiahao87 / vaex_titanic_sample.py
Created September 6, 2020 09:30
Vaex sample code to titanic dataset
import vaex
import vaex.ml
# load titanic data
df_vaex = vaex.ml.datasets.load_titanic()
# perform train test split
df_train, df_test = df_vaex.ml.train_test_split(test_size=0.2)
# One-hot encode some features
@jiahao87
jiahao87 / mlflow_full_sample.py
Last active July 25, 2022 20:51
Full sample code for MLflow example
import os
import numpy as np
from scipy.stats import uniform
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_validate
from sklearn import metrics
from sklearn.model_selection import ParameterSampler
from sklearn.ensemble import RandomForestClassifier
@jiahao87
jiahao87 / mlflow_sample.py
Last active August 26, 2020 11:49
Sample code for MLflow
X_train, X_test, y_train, y_test = data_processing()
#################### 1. Setup Experiment ###########################
# set experiment name to organize runs
mlflow.set_experiment('New Experiment Name')
experiment = mlflow.get_experiment_by_name('New Experiment Name')
# set path to log data, e.g., mlruns local folder
mlflow.set_tracking_uri('./mlruns')
@jiahao87
jiahao87 / values.yaml
Created August 12, 2020 09:45
Configuration file template to update Dask Helm deployment
# values.yaml to overwrite default values
scheduler:
image:
tag: 2.21.0 # Container image tag
serviceType: "LoadBalancer"
resources:
limits:
cpu: 1
memory: 6G
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import missingno
import warnings
warnings.filterwarnings("ignore")
%matplotlib inline
@jiahao87
jiahao87 / text_preprocessing.py
Last active August 17, 2023 01:31
Full code for preprocessing text
from bs4 import BeautifulSoup
import spacy
import unidecode
from word2number import w2n
import contractions
nlp = spacy.load('en_core_web_md')
# exclude words from spacy stopwords list
deselect_stop_words = ['no', 'not']