Skip to content

Instantly share code, notes, and snippets.

@isgallagher
isgallagher / WUT.md
Created May 31, 2025 15:15 — forked from the-crypt-keeper/WUT.md
LiteLLM adapters for generating images with local Stable Diffusion APIs

This is a raw dump of litellm image generation adapters for kobold (which is automatic1111 compatible), sd-server and llamabox.

Only the /v1/images/generations endpoint is supported by the adapters.

To use:

  1. Register the custom handlers in custom_provider_map
  2. Define endpoints for kobold/automatic1111, sd-server and llamabox as needed

Things I hate about this:

@isgallagher
isgallagher / backup.py
Created May 3, 2025 04:00 — forked from RupertAvery/backup.py
Script to download Checkpoint (model/LORA) metadata from CivitAI based on the models in your local machine
import os
import hashlib
import requests
import json
import argparse
# Config
BASE_URL = "https://civitai.com/api/v1"
EXTENSION = ".safetensors"
@isgallagher
isgallagher / DeepSeek-R1-Quantized-GGUF-Gaming-Rig-Inferencing-Fast-NVMe-SSD.md Aggregate throughput just over 2 tok/sec on R1 671B with 8 concurrent generations.

tl;dr;

You can run the real deal big boi R1 671B locally off a fast NVMe SSD even without enough RAM+VRAM to hold the 212GB dynamically quantized weights. No it is not swap and won't kill your SSD's read/write cycle lifetime. No this is not a distill model. It works fairly well despite quantization (check the unsloth blog for details on how they did that).

The basic idea is that most of the model itself is not loaded into RAM on startup, but mmap'd. Then kv cache will take up some RAM. Most of your system RAM is left available to serve as disk cache for whatever experts/weights are currently most used. I can see the model slow down and cache dump and refill when the model switches over to counting words for example.

It is faster on my system using the GPU, but not by much. It may be overall faster to dedicate the GPU PCIe lanes to more NVMe storage in the theory. Curious if anyone has such a fast read IOPS array to try?

Notes and example generations below.

Model Reference

@isgallagher
isgallagher / hls.sh
Created December 18, 2024 22:27 — forked from stenuto/hls.sh
HLS ffmpeg script
#!/bin/bash
# Function to display usage information
usage() {
echo "Usage: $0 /path/to/input.mp4 [ /path/to/output_directory ]"
exit 1
}
# Check if at least one argument (input file) is provided
if [ $# -lt 1 ]; then
@isgallagher
isgallagher / rpi4_zfs.md
Created July 26, 2024 00:05 — forked from skystar-p/rpi4_zfs.md
Install Arch Linux aarch64 + dm-crypt+ ZFS root on Raspbery Pi 4

Installing Arch Linux ARM(aarch64) with ZFS root on RPi4

Prerequisite

Boot up your aarch64 Arch Linux. Plug in SD card adapter and ensure that disk shows up on your system.

lsblk