Skip to content

Instantly share code, notes, and snippets.

View iamlemec's full-sized avatar

Douglas Hanley iamlemec

View GitHub Profile
@iamlemec
iamlemec / check_tokenizer.py
Created February 22, 2024 18:38
Compare tokenization results between `llama-cpp-python` and Huggingface `tokenizers`.
def check_tokenizer(mod_ll, mod_hf, data, max_rows=None):
from llama_cpp import Llama
from transformers import AutoTokenizer
from Levenshtein import editops
from termcolor import cprint
# load models
if type(mod_ll) is str:
mod_ll = Llama(mod_ll, verbose=False)
if type(mod_hf) is str:
@iamlemec
iamlemec / cwdiff
Last active October 18, 2017 20:47
Diff between two PDFs. Crude, but useful for revisions. Requires wdiff and pdftotext. See diff.sh for usage.
#!/bin/sh
# Use this instead of diff[1] to get colored[2] word-based diffs.
# Useful for text documents that have reflowed paragraphs.
# Requires that wdiff is installed in your $PATH.
#
# [1] All diff options are ignored. Only replaces simplest usage.
# [2] Colors are always emitted. If piping into less, use "-R" or set LESS=-R
# Iain Murray, February 2009, Tweaked in June 2011
@iamlemec
iamlemec / vector.css
Created August 3, 2016 20:09
Wikipedia CSS (for vector theme) that makes things look super modern and awesome.
@import url(//fonts.googleapis.com/css?family=Open+Sans:400,700,400italic,700italic);
body {
background-color: white;
font-family: 'Open Sans', sans-serif;
}
#content {
width: 700px;
margin-top: 50px;
@iamlemec
iamlemec / nb2md
Created September 27, 2015 17:52
Markdown diffs for jupyter notebooks. Requires nbconvert package.
#!/usr/bin/env bash
# Step 1: put this file in your path and make executable
# Step 2: add the following to your .gitattributes file
# *.ipynb diff=nb2md
# Step 3: add the following to your .git/config
# [diff "nb2md"]
# textconv = nb2md
# or to it globally with
# git config --global diff.nb2md.textconv nb2md