Skip to content

Instantly share code, notes, and snippets.

View cccntu's full-sized avatar
:octocat:

Jonathan Chang cccntu

:octocat:
View GitHub Profile
@cccntu
cccntu / M1_GPU.md
Last active January 24, 2022 17:12
Run huggingface transformers on M1 Macs, on GPU!

Run huggingface transformers on M1 Macs, on GPU

  • Requirement: macOS 12 Monterey

  • First, install a conda env with M1 support

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
@cccntu
cccntu / number token.py
Created October 1, 2021 15:38
tokens that are numbers in gpt2 tokenizer
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("gpt2")
# model = AutoModelWithLMHead.from_pretrained("gpt2")
import re
nums = []
@cccntu
cccntu / parse-c4-date-from-url.ipynb
Created September 21, 2021 07:35
parse-c4-date-from-url.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@cccntu
cccntu / mc4-timestamp-analysis.ipynb
Created September 17, 2021 15:22
mc4-timestamp-analysis.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
from tqdm.auto import tqdm
import time
class LengthedGeneratorWrapper:
"""wraps an infinite generator with length for tqdm"""
def __init__(self, infinite_generator, len):
self.generator = infinite_generator
self.len = len
def __len__(self):
return self.len
import sys
# imports utils and imports
from src import *
from dataclasses import dataclass
from typing import Optional
from omegaconf import OmegaConf
@cccntu
cccntu / csv.py
Created February 8, 2021 11:58
python mmap to concatenate csv files
❯ rm out.csv
❯ cat 1.py
from glob import glob
import mmap
files = glob("data/*")
files.sort(key=lambda x: int(x.split("/")[-1].split(".")[0]))
write_f = open("out.csv", "w+b")
@cccntu
cccntu / shelve-test.py
Last active January 27, 2021 06:46
simple benchmark/test for python shelve
import shelve
import time
# initialize
with shelve.open('shelvedb') as db:
for i in range(10000):
db[f'{i}'] = '{i*2}'
# no caching
with shelve.open('shelvedb') as db:
tic = time.time()
@cccntu
cccntu / bug_report_model.py
Last active December 7, 2020 02:32
pytorch-lightning ddp BatchEncoding
# Copyright The PyTorch Lightning team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
@cccntu
cccntu / install julia.md
Last active November 9, 2020 15:00
Install Julia and use it in jupyter notebook
conda install -c conda-forge julia
conda activate julia
conda install notebook
julia # use julia REPL to install IJulia (necessary)
> using Pkg
> Pkg.add("IJulia")