Skip to content

Instantly share code, notes, and snippets.

View pszemraj's full-sized avatar

Peter pszemraj

View GitHub Profile
@pszemraj
pszemraj / embed.py
Last active February 2, 2024 16:18
setting up nomic-embed-text-v1 in sbert and ONNX
# pip install sentence-transformers
from sentence_transformers import SentenceTransformer, util, models
model_name = "nomic-ai/nomic-embed-text-v1"
pooling_mode = "mean"
word_embedding_model = models.Transformer(
model_name,
max_seq_length=8192,
model_args={"trust_remote_code": True, "rotary_scaling_factor": 2},
tokenizer_args={"trust_remote_code": True},
@pszemraj
pszemraj / distract.py
Created February 1, 2024 20:56
download and run this periodically during setup so Colab doesn't whine about you not using the GPU
# pip install sentence-transformers -q
# source: https://www.sbert.net/docs/usage/semantic_textual_similarity.html
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("all-MiniLM-L6-v2")
# Two lists of sentences
sentences1 = [
"The cat sits outside",
@pszemraj
pszemraj / load_and_ensure_tokens.py
Last active January 17, 2024 02:36
loads a Hugging Face Transformers tokenizer, checks for essential special tokens, adds them if necessary
from transformers import AutoTokenizer
def load_and_ensure_tokens(model_name):
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Essential special tokens with their default values
essential_tokens = {
"pad_token": "<pad>",
@pszemraj
pszemraj / hf_repofolder_watchdog.py
Created January 16, 2024 02:53
upload a folder to Hugging Face Hub and other utils
import argparse
import logging
import time
from datetime import datetime
from pathlib import Path
from typing import Optional
from huggingface_hub import upload_folder
from watchdog.events import PatternMatchingEventHandler
from watchdog.observers import Observer
@pszemraj
pszemraj / textgen_inference_code.py
Created January 6, 2024 23:38
example inference script for beecoder-220M-python
import logging
import random
import time
from pathlib import Path
import fire
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
logging.basicConfig(format="%(levelname)s - %(message)s", level=logging.INFO)
@pszemraj
pszemraj / hf_repofolder_watchdog.py
Created December 12, 2023 01:43
The script is designed to monitor a specified directory for any file system changes (like additions, deletions, or modifications of files and subdirectories) and automatically upload the changes to a specified repository on the Hugging Face Hub.
"""
The script is designed to monitor a specified directory for any file system changes (like additions, deletions, or modifications of files and subdirectories) and automatically upload the changes to a specified repository on the Hugging Face Hub.
pip install huggingface-hub watchdog
"""
import argparse
import logging
import time
from pathlib import Path
@pszemraj
pszemraj / format2alpaca.py
Created December 8, 2023 23:18
quick formatting function given instruction/input/response cols -> make 'text' col
import os
import random
from datasets import load_dataset
def format_dataset(example):
"""Formats the dataset example into a single 'text' field."""
# Add input only if it is longer than 2 characters
@pszemraj
pszemraj / tf32_activate.py
Created December 6, 2023 04:47
sort of manual - Check if the GPU supports NVIDIA Ampere or later and enable FP32 in PyTorch if it does.
import logging
import subprocess
import torch
def check_ampere_gpu():
"""Check if the GPU supports NVIDIA Ampere or later and enable FP32 in PyTorch if it does."""
# Check if CUDA is available
@pszemraj
pszemraj / test_synthsumm.py
Created December 6, 2023 03:16
test out synthsumm summarization models via the free inference api
import os
import time
import requests
class Timer:
"""Basic timer utility."""
def __enter__(self):
@pszemraj
pszemraj / ubuntu_util_pkgs.md
Created November 29, 2023 22:53
some ubuntu packages helpful for CPU things related to ML

Useful misc installs

Details

Kernel and Low-Level Tools

  1. Microcode Update: Keeping your CPU microcode updated can help in better performance and security. You can install the AMD microcode package by running:

sudo apt install amd64-microcode