Skip to content

Instantly share code, notes, and snippets.

View yasark's full-sized avatar
💭
I may be slow to respond.

yasark

💭
I may be slow to respond.
View GitHub Profile
@danielgross
danielgross / mathpix2gpt.py
Last active March 18, 2025 02:18
mathpix2gpt.py
import requests
import time
import os
import sys
import openai
import tiktoken
from termcolor import colored
openai.api_key = open(os.path.expanduser('~/.openai')).read().strip()
@jiahao87
jiahao87 / pegasus_fine_tune.py
Last active May 29, 2024 18:00
Pytorch script for fine-tuning Pegasus Large model
"""Script for fine-tuning Pegasus
Example usage:
# use XSum dataset as example, with first 1000 docs as training data
from datasets import load_dataset
dataset = load_dataset("xsum")
train_texts, train_labels = dataset['train']['document'][:1000], dataset['train']['summary'][:1000]
# use Pegasus Large model as base for fine-tuning
model_name = 'google/pegasus-large'
train_dataset, _, _, tokenizer = prepare_data(model_name, train_texts, train_labels)
@sujitpal
sujitpal / 07b-viterbi-gist.py
Created August 6, 2020 20:25
Entity Disambiguation for entities identified for a sentence by SciSpacy + UMLS integration using Viterbi's algorithm
import argparse
import itertools
import numpy as np
import operator
import os
import pickle
import spacy
import scispacy
import time
import pandas as pd
import re
import spacy
import neuralcoref
nlp = spacy.load('en_core_web_lg')
neuralcoref.add_to_pipe(nlp)
def get_entity_pairs(text, coref=True):
@karpathy
karpathy / min-char-rnn.py
Last active October 13, 2025 15:32
Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy
"""
Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
BSD License
"""
import numpy as np
# data I/O
data = open('input.txt', 'r').read() # should be simple plain text file
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)
@jboner
jboner / latency.txt
Last active October 14, 2025 17:13
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
@laszlomiklosik
laszlomiklosik / Using JSTATD deamon with VisualVM for remote profiling
Created May 4, 2012 12:02
Using JSTATD deamon with VisualVM for remote profiling
1. create tools.policy file:
grant {
permission java.security.AllPermission;
};
2. run to start jstatd:
jstatd -J-Djava.security.policy=tools.policy