Skip to content

Instantly share code, notes, and snippets.

@yuhanz
yuhanz / sfttraining.py
Last active April 17, 2024 06:09
Example of peft fine tuning with SFTtraininng
!pip install transformers==4.30
!pip install accelerate
!pip install trl peft
!pip install bitsandbytes
!pip install xformers==0.0.22
!pip install autoawq
from peft import LoraConfig
from peft import get_peft_model, PeftConfig, PeftModel, LoraConfig, prepare_model_for_kbit_training
@yuhanz
yuhanz / chat-bot.py
Last active April 17, 2024 05:36
Terminal Chat Bot.
### Download files from https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0/tree/main
import time
from transformers import AutoTokenizer
import transformers
import torch
# model = "PY007/TinyLlama-1.1B-step-50K-105b"
# model = "yuhanzgithub/tinyllama"
model = "./"
tokenizer = AutoTokenizer.from_pretrained(model)
@yuhanz
yuhanz / speech-synthesis-on-browser.js
Created February 7, 2024 19:21
Speech Synthesis On Browser - open this in your browser console to let the bot speak
speechSynthesis.speak(new SpeechSynthesisUtterance("This is very interesting"));
@yuhanz
yuhanz / select-yesterday-as-string.sql
Created November 28, 2023 22:32
hive sql - select the date of yesterday as a string
select from_unixtime(unix_timestamp(DATE_SUB(CURRENT_DATE, 1), 'yyyyMMdd'),'yyyy-MM-dd')
@yuhanz
yuhanz / python-write-delta-table.py
Last active November 14, 2023 19:30
Python - write to delta lake without pandas (use pyarrow)
import pyarrow as pa
from deltalake.writer import write_deltalake
n_legs = pa.array([2, 4, 5, 100])
animals = pa.array(["Flamingo", "Horse", "Brittle stars", "Centipede"])
names = ["n_legs", "animals"]
table = pa.Table.from_arrays([n_legs, animals], names=names)
write_deltalake('/tmp/delta_table', table, mode = 'append')
@yuhanz
yuhanz / chromadb-example-persistence-save-embedding.py
Last active August 15, 2023 22:08
ChromaDB: Create a DB with persistence, save embedding, querying with cosine similarity
# Based on https://docs.trychroma.com/usage-guide
import chromadb
persist_directory = '/tmp/vector_db'
client = chromadb.PersistentClient(path=persist_directory)
collection2 = client.create_collection( \
name="save_embeddings", \
metadata={"hnsw:space": "cosine"} # l2 is the default \
@yuhanz
yuhanz / dummy-pytorch-training.py
Last active July 19, 2023 06:01
Dummy Example of Training with Apple M1 GPU
# pip3 install torch hanz
### Verify your torch support Apple M1 GPU:
### >>> torch.backends.mps.is_available()
### True
## Initialize the device
import torch
DEVICE = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
@yuhanz
yuhanz / gist:aaa2af46c02dcc50ffd916fb1363bc83
Last active January 31, 2023 20:10
gPRC client - Ruby gruf getting started
# gem install grpc grpc-tools
# grpc_tools_ruby_protoc --ruby_out=/tmp/hello --grpc_out=/tmp/grpc ./src/main/proto/helloworld.proto
# ... Then, include the generated files
require 'gruf'
Gruf.configure do |c|
c.default_client_host = 'localhost:9001'
end
@yuhanz
yuhanz / check-max-memory-usage.py
Created December 5, 2022 02:34
python - check max memory usage
import resource
print(resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
@yuhanz
yuhanz / take-frames-of-video.sh
Created October 19, 2022 10:56
ffmpeg - take frames out of a video
ffmpeg -i ../puppy.mp4 -ss 00:02:08 -t 00:00:03 '%04d.png'