Skip to content

Instantly share code, notes, and snippets.

@cccntu
cccntu / M1_GPU.md
Last active January 24, 2022 17:12
Run huggingface transformers on M1 Macs, on GPU!

Run huggingface transformers on M1 Macs, on GPU

  • Requirement: macOS 12 Monterey

  • First, install a conda env with M1 support

wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh
chmod +x ~/Downloads/Miniforge3-MacOSX-arm64.sh
sh ~/Downloads/Miniforge3-MacOSX-arm64.sh
@cccntu
cccntu / number token.py
Created October 1, 2021 15:38
tokens that are numbers in gpt2 tokenizer
from transformers import AutoTokenizer, AutoModelWithLMHead
tokenizer = AutoTokenizer.from_pretrained("gpt2")
# model = AutoModelWithLMHead.from_pretrained("gpt2")
import re
nums = []
@cccntu
cccntu / parse-c4-date-from-url.ipynb
Created September 21, 2021 07:35
parse-c4-date-from-url.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@cccntu
cccntu / mc4-timestamp-analysis.ipynb
Created September 17, 2021 15:22
mc4-timestamp-analysis.ipynb
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
from tqdm.auto import tqdm
import time
class LengthedGeneratorWrapper:
"""wraps an infinite generator with length for tqdm"""
def __init__(self, infinite_generator, len):
self.generator = infinite_generator
self.len = len
def __len__(self):
return self.len
import sys
# imports utils and imports
from src import *
from dataclasses import dataclass
from typing import Optional
from omegaconf import OmegaConf
@cccntu
cccntu / csv.py
Created February 8, 2021 11:58
python mmap to concatenate csv files
❯ rm out.csv
❯ cat 1.py
from glob import glob
import mmap
files = glob("data/*")
files.sort(key=lambda x: int(x.split("/")[-1].split(".")[0]))
write_f = open("out.csv", "w+b")
@cccntu
cccntu / shelve-test.py
Last active January 27, 2021 06:46
simple benchmark/test for python shelve
import shelve
import time
# initialize
with shelve.open('shelvedb') as db:
for i in range(10000):
db[f'{i}'] = '{i*2}'
# no caching
with shelve.open('shelvedb') as db:
tic = time.time()
@cccntu
cccntu / bug_report_model.py
Last active December 7, 2020 02:32
pytorch-lightning ddp BatchEncoding
# Copyright The PyTorch Lightning team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
@cccntu
cccntu / install julia.md
Last active November 9, 2020 15:00
Install Julia and use it in jupyter notebook
conda install -c conda-forge julia
conda activate julia
conda install notebook
julia # use julia REPL to install IJulia (necessary)
> using Pkg
> Pkg.add("IJulia")