Skip to content

Instantly share code, notes, and snippets.

@dcbark01
dcbark01 / fix_tokenizer.py
Last active June 17, 2024 14:03
Fix tokenizer
"""
# Source: https://gist.github.com/jneuff/682d47b786329f19291d166957b3274a
/// Fix a huggingface tokenizer to which tokens have been added after training.
///
/// Adding tokens after training via `add_special_tokens` leads to them being added to the
/// `added_tokens` section but not to the `model.vocab` section. This yields warnings like:
/// ```
/// [2023-10-17T07:54:05Z WARN tokenizers::tokenizer::serialization] Warning: Token '<|empty_usable_token_space_1023|>' was expected to have ID '129023' but was given ID 'None'
/// ```
@dcbark01
dcbark01 / tgi_api.sh
Last active May 9, 2024 12:13
Huggingface Text Generation Inference SLURM
#!/bin/bash
#SBATCH --job-name=llm-swarm
#SBATCH --partition hopper-prod
#SBATCH --gpus={{gpus}}
#SBATCH --cpus-per-task=12
#SBATCH --mem-per-cpu=11G
#SBATCH -o slurm/logs/%x_%j.out
# See original source here:
# https://github.com/huggingface/llm-swarm/blob/main/templates/tgi_h100.template.slurm
@dcbark01
dcbark01 / file_size.sh
Created March 28, 2024 18:08
File/dir size
# For directory
du -h ./ | sort -hr | head -n 10
# For the largest files within a given directory and its subdirectories
find ./ -type f -exec du -h {} + | sort -hr | head -n 10
@dcbark01
dcbark01 / langchain_save_jsonl.py
Created March 20, 2024 13:19
Save Langchain Documents to JSONL
import typing as t
import jsonlines
from langchain.schema import Document
def save_docs_to_jsonl(documents: t.Iterable[Document], file_path: str) -> None:
with jsonlines.open(file_path, mode="w") as writer:
for doc in documents:
writer.write(doc.dict())
@dcbark01
dcbark01 / app.py
Last active February 13, 2024 14:54
Embed Mistral FastAPI
"""
# See https://huggingface.co/intfloat/e5-mistral-7b-instruct for model inference code
## Quickstart
Install requirements
```bash
pip install fastapi uvicorn torch transformers
```
@dcbark01
dcbark01 / auto_huggingface_textgen.py
Last active October 28, 2023 01:41
AutoHuggingFaceTextGenInference for Langchain PR
# TODO: Add this to PR for Langchain so that it will be easy to use across all our different LLM projects
import re
import time
import warnings
from pathlib import Path
from typing import List, Union, Optional
import requests
from tqdm import tqdm
from pydantic import BaseModel, Field, field_validator, computed_field
@dcbark01
dcbark01 / listdirs_new.py
Created October 29, 2022 16:17
New listdirs
import os
from typing import List, Union
def listdirs(path, extensions: Union[List[str], str] = None):
""" List all files in directory (including walking all subdirectories).
Can filter by file extension by providing either, for example:
extensions='png'
extensions=['png', 'jpeg']
@dcbark01
dcbark01 / fractional_knapsack.ipynb
Created August 15, 2022 16:44
Fractional Knapsack
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dcbark01
dcbark01 / pisano_fibonacci_numbers.py
Created August 9, 2022 15:43
Pisano Period for Fibonacci Numbers
# Uses python3
import sys
from typing import List
def calc_fib(n):
if n == 1 or n == 2:
return 1
elif n == 0:
return 0

A simple Docker and Docker Compose install script for Ubuntu

Usage

  1. sh install-docker.sh
  2. log out
  3. log back in

Links