Skip to content

Instantly share code, notes, and snippets.

View pszemraj's full-sized avatar

Peter pszemraj

View GitHub Profile
@pszemraj
pszemraj / inference_openai.py
Last active July 8, 2025 21:32
basic openai chat completion example
"""
inference_openai.py - text generation with OpenAI API
See https://platform.openai.com/docs/quickstart for more details.
Usage:
python inference_openai.py --prompt "The quick brown fox jumps over the lazy dog." --model "gpt-3.5-turbo" --temperature 0.5 --max_tokens 256 --n 1 --stop "."
Detailed usage:
python inference_openai.py --help
@pszemraj
pszemraj / llm-foundry-config-reference.md
Last active July 3, 2025 21:02
config reference for mosaicml/llm-foundry by opus-4
@pszemraj
pszemraj / push_reddit_articshift.py
Last active June 30, 2025 03:29
util script for loading, basic processing, converting reddit posts -> hf dataset
"""
util script for loading, basic processing, converting reddit posts -> hf dataset
https://arctic-shift.photon-reddit.com/download-tool
"""
import pandas as pd
from datasets import Dataset, load_dataset
src = "./r_LocalLLaMA_posts.jsonl" # update with relevant path
df = pd.read_json(src, lines=True).convert_dtypes()
@pszemraj
pszemraj / test_gemma3n.py
Created June 29, 2025 21:29
test inference with gemma-3n-e2b-it
# -*- coding: utf-8 -*-
"""gemma-3n-test
pip install -U -q git+https://github.com/huggingface/transformers.git
pip install -U -q git+https://github.com/huggingface/pytorch-image-models.git
"""
from transformers import pipeline
import torch
@pszemraj
pszemraj / slice_image.py
Created June 28, 2025 19:53
Slice a tall image into chunks.
#!/usr/bin/env python3
"""
Slice a (possibly very tall) image into fixed-height chunks.
Creates a sibling directory called <image stem>_slices/
and writes slice_000.png, slice_001.png, … inside it.
"""
import argparse
from pathlib import Path
@pszemraj
pszemraj / push_dataset_from_text.py
Last active June 27, 2025 02:56
aggregate and push an hf dataset from text files
"""
Create & save an hf dataset with train/test/val splits from dir w/ text files
Ideal structure:
root / section_name_1 / file 1
root / section_name_1 / file 2
root / section_name_1 / file YYY
root / section_name_2 / file 1
root / section_name_2 / file ZZZ
@pszemraj
pszemraj / model_summary.py
Last active June 19, 2025 18:22
Prints an accurate summary of a pytorch model
from dataclasses import dataclass
from typing import List, Optional, Tuple
import torch
import torch.nn as nn
@dataclass
class _LayerSummary:
"""A dataclass to hold summary information for a single layer."""
@pszemraj
pszemraj / run_ocr_nanonets.py
Last active June 18, 2025 01:52
Standalone Asynchronous Nanonets-OCR-s Inference Script using vLLM and PyMuPDF.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Standalone Asynchronous Nanonets-OCR-s Inference Script using vLLM and PyMuPDF.
This script processes PDF files from an input directory using the
nanonets/Nanonets-OCR-s model served locally by vLLM via its OpenAI-compatible API.
It renders each page, sends API requests concurrently for OCR, extracts the
structured markdown/HTML text, and saves the combined text for each PDF into a
corresponding .txt file in the specified output directory.
@pszemraj
pszemraj / async_pipeline.py
Last active May 22, 2025 23:33
Standalone Asynchronous RolmOCR Inference Script using vLLM and PyMuPDF.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Standalone Asynchronous RolmOCR Inference Script using vLLM and PyMuPDF.
This script processes PDF files from an input directory using the
reducto/RolmOCR model served locally by vLLM via its OpenAI-compatible API.
It renders each page, sends API requests concurrently for OCR, extracts plain
text, and saves the combined text for each PDF into a corresponding .txt file
in the specified output directory.
@pszemraj
pszemraj / modeling_wavenetwork.py
Last active May 8, 2025 04:01
pytorch impl for pretraining-free (directly finetune) wavenet, tiny transformer for classification
"""
WaveNet: An Ultra-Small Language Model (PyTorch Implementation)
Based on the paper: https://arxiv.org/abs/2411.02674
Hugging Face Transformers compatible implementation.
"""
import math
from typing import Dict, Optional, Tuple, Union
import torch