Skip to content

Instantly share code, notes, and snippets.

View erip's full-sized avatar

Elijah Rippeth erip

  • Fairfax, VA
View GitHub Profile
@erip
erip / en-de.tsv
Created July 31, 2023 18:25
Sample data
We can't make this file beautiful and searchable because it's too large.
Michael Jackson wore tape on his nose to get front pages, former bodyguard claims Ehemaliger Bodyguard berichtet: Michael Jackson trug Klebeband auf seiner Nase, um auf die Titelseiten zu kommen
Michael Jackson's former bodyguard has claimed the late singer cultivated some of his eccentricities with the deliberate intention of riling up the media. Der ehemalige Bodyguard von Michael Jackson behauptet, dass der verstorbene Sänger einige seiner exzentrischen Verhaltensweise extra aneignete, um die Medien anzustacheln.
Matt Fiddes, now a property developer and owner of a martial arts/dance chain, told Metro that Jackson believed the fascination around his persona would stop if he ceased to be a "mystery" in the public eye. Matt Fiddes, jetzt ein Bauträger und Inhaber einer Kampfsport/Tanzverein-Kette, erzählte Metro, dass Jackson daran glaubte, dass die Faszination mit seinem Image schwinden würde, wenn er im öffentlichen Leben nicht mehr ein „Mysterium“ sein würde.
To get front pages, he would reportedly don su
@erip
erip / forced_decoding.py
Last active July 31, 2023 15:06
Scoring translations with HF
#!/usr/bin/env python3
import itertools
from argparse import ArgumentParser, FileType
import torch
import numpy as np
from tqdm import tqdm
from transformers import PrefixConstrainedLogitsProcessor, AutoTokenizer, AutoModelForSeq2SeqLM
@erip
erip / check_pronouns.py
Created July 10, 2023 15:25
Checks agreement between pronouns of a reference and MT system output.
#!/usr/bin/env python3
import spacy
from statistics import mean
from argparse import ArgumentParser, FileType
def setup_argparse():
parser = ArgumentParser()
parser.add_argument("-m", "--model", type=str, default="en_core_web_lg", help="The spaCy model to use for evaluation")
@erip
erip / ring.py
Created November 14, 2022 22:25
A silly circular buffer-style map-style iterator
#!/usr/bin/env python3
from typing import Iterator
from itertools import cycle, islice, chain
class GetItemIterator(Iterator):
def __init__(self, *it):
self._len, it_copy = self._get_len(it)
self._it = cycle(it_copy)
self._curr_idx = 0
@erip
erip / tying_test.py
Last active August 5, 2022 12:29
Testing whether embedding bag's weights can be tied with embedding layer
#!/usr/bin/env python3
import torch
import torch.nn as nn
if __name__ == "__main__":
V, max_seq, padding_idx, emb_dim, B = 10, 100, 1, 512, 32
emb_layer = nn.Embedding(V, emb_dim, padding_idx=padding_idx)
emb_bag = nn.EmbeddingBag.from_pretrained(emb_layer.weight, freeze=False, padding_idx=padding_idx)
initial_weights = emb_layer.weight.detach()
@erip
erip / forced_decoding.py
Last active March 4, 2022 22:09
Forced decoding with Huggingface Transformers
from transformers import PrefixConstrainedLogitsProcessor
def create_processor_fn(ref_tokens_by_segment):
def inner(batch_id, _):
return ref_tokens_by_segment[batch_id]
return inner
# ...
with tokenizer.as_target_tokenizer():
@erip
erip / max_token_per_batch_sampler.py
Last active January 2, 2022 22:27
A PyTorch Sampler which samples batches containing no more than max_tokens post-pad tokens.
#!/usr/bin/env python3
import random
from typing import List, Optional
import torch
from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import Sampler, DataLoader, Dataset
@erip
erip / app.py
Created April 19, 2021 11:44
Streamlit image dataset clustering viewer
#!/usr/bin/env python3
import base64
from io import BytesIO
from typing import List
from pathlib import Path
from argparse import ArgumentParser
import torch
@erip
erip / cmake1.log
Created September 19, 2020 22:23
cmake logs
-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.18362.0 to target Windows 10.0.18363.
-- The CXX compiler identification is MSVC 19.27.29111.0
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.27.29110/bin/Hostx64/x64/cl.exe
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.27.29110/bin/Hostx64/x64/cl.exe - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
from heapq import heapify, heappush, heappushpop
class MaxHeap:
def __init__(self, top_n: int):
self.h = []
self.length = top_n
heapify(self.h)
def add(self, element):
if len(self.h) < self.length: