Skip to content

Instantly share code, notes, and snippets.

View FlyingFathead's full-sized avatar
💭
Just flyin' around

Harry Horsperg FlyingFathead

💭
Just flyin' around
View GitHub Profile
@FlyingFathead
FlyingFathead / tarball_verify.sh
Last active July 2, 2024 10:00
Verify tarballs locally against a md5 checksum (if available) and check for tarball integrity with `tar`
#!/bin/bash
# Function to print usage
usage() {
echo "Usage: $0 [-c <checksum file>] <tarball>"
echo " -c <checksum file> : Optional, specify the checksum file to verify the tarball against"
exit 1
}
# Check if no arguments were provided
@FlyingFathead
FlyingFathead / word_letter_counter_for_llm_datasets.py
Last active June 21, 2024 00:15
Word letter counting dataset generator for LLM training
#!/usr/bin/env python3
# word_letter_counter_for_llm_datasets.py
"""
A training set generator to assist an LLM in actually being able to count
the letters in a given word. A model that can't count letters in a word isn't
usable for critical tasks; incorrect letter counts lead to compounding mistakes.
Outputs in JSON, can also use the NLTK corpus for a word dictionary, offering
a quick way to create a massive letter counting dataset for different words.
@FlyingFathead
FlyingFathead / list_usb_cameras.sh
Last active June 3, 2024 09:07
usb camera identifier and lister for debian/ubuntu linux
#!/bin/bash
# This script is designed to identify and list all USB cameras and audio devices connected to a system.
# It provides detailed information about each device, including vendor ID, product ID, serial number,
# device description, and supported video modes for cameras. It also identifies audio devices and
# lists their card and device numbers, names, and descriptions.
# Required Packages:
# - udevadm: Provides information about device attributes.
# - v4l2-ctl: A utility to control video4linux devices and list video formats.
@FlyingFathead
FlyingFathead / async_tts.py
Last active February 2, 2024 22:11
async chunk queue for Python's `tts`
# async_tts.py, for `TTS` pypi package (also requires `pydub`)
# $ pip install -U tts pydub
#
# v0.02
# changes:
# - now uses pydub, normalizes audio
#
# performs a staggered execution / playback of the audio,
# where the next chunk is being processed while the previous is still playing
#
@FlyingFathead
FlyingFathead / scanlan.sh
Last active February 2, 2024 18:47
scanlan / a quick LAN scanning script for diagnostics
#!/bin/bash
#
# >>> SCANLAN <<<
#
# this script can used for i.e. checking out what's on your LAN
# requires `nmap`, `xmlstarlet` and `lolcat` (just because)
# adjust to your own ip range as needed.
# NO WARRANTIES, use only for your own LAN diagnostics and at your own risk
#
#
@FlyingFathead
FlyingFathead / splitter.py
Last active December 24, 2023 14:21
split an input text into x char segments on an emptyline; for ML/LLM purposes
# v0.03 // added preformat sanitizer for hyphenation and other text formatting
import shutil
import sys
import os
import re
# print term width horizontal line
def hz_line(character='-'):
terminal_width = shutil.get_terminal_size().columns
@FlyingFathead
FlyingFathead / tokencounter.py
Created December 24, 2023 12:58
count a token estimate from a text file (for i.e. OpenAI API and other LLM use, etc.)
# requires the `transformers` package; please install with `pip -U install transformers`
import sys
from transformers import GPT2Tokenizer
def count_tokens(file_path):
print(f"Counting the token count estimate for: {inputfile} ...", flush=True)
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
try:
with open(file_path, 'r', encoding='utf-8') as file:
@FlyingFathead
FlyingFathead / pdfmine.py
Last active December 24, 2023 12:55
quickly grab a pdf's text contents with `pdfminer.six`
# requires `pdfminer.six`; install with `pip install -U pdfminer.six`
import sys
import os
from pdfminer.high_level import extract_text
def extract_pdf_text(pdf_file, output_file):
if os.path.exists(output_file):
print(f"File {output_file} already exists. Aborting to prevent overwriting.")
return False
@FlyingFathead
FlyingFathead / talker.py
Created December 22, 2023 16:50
`talker.py` -- a quick `pyttsx3` utilizing, `tkinter` based GUI-tts-copypastebin-util
# requires `pyttsx3` -- install with:
# pip install -U pyttsx3
import tkinter as tk
from tkinter.scrolledtext import ScrolledText
import pyttsx3
import threading
class SpeechManager:
def __init__(self):
@FlyingFathead
FlyingFathead / cuda-fibonacci.py
Last active December 13, 2023 19:56
CUDA-optimized code for generating a PyTorch Dataset of Fibonacci primes
# file under: laziness check, December 2023...
import torch
from torch.utils.data import Dataset
from numba import cuda
import numpy as np
# CPU function to generate Fibonacci numbers within uint64 range
def generate_fibonacci_numbers(max_length):
fib_numbers = np.zeros(max_length, dtype=np.uint64)