Skip to content

Instantly share code, notes, and snippets.

View sgraaf's full-sized avatar

Steven van de Graaf sgraaf

View GitHub Profile
⭐ Total Stars: 57
➕ Total Commits: 680
🔀 Total PRs: 11
🚩 Total Issues: 7
📦 Contributed to: 10
@sgraaf
sgraaf / I'm an early 🐤
Last active April 25, 2023 00:52
I'm an early 🐤
🌞 Morning 131 commits ███████▏░░░░░░░░░░░░░ 34.3%
🌆 Daytime 179 commits █████████▊░░░░░░░░░░░ 46.9%
🌃 Evening 69 commits ███▊░░░░░░░░░░░░░░░░░ 18.1%
🌙 Night 3 commits ▏░░░░░░░░░░░░░░░░░░░░ 0.8%
@sgraaf
sgraaf / ddp_example.py
Last active April 23, 2024 11:13
PyTorch Distributed Data Parallel (DDP) example
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from argparse import ArgumentParser
import torch
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.utils.data import DataLoader, Dataset
from torch.utils.data.distributed import DistributedSampler
from transformers import BertForMaskedLM
@sgraaf
sgraaf / ReplacePositionedImage.gs
Last active July 6, 2020 09:25
Google Apps Script to replace a PositionedImage
/**
* Replace a PositionedImage by its ID.
*
* @param {Paragraph|ListItem} anchor The element (Paragraph or ListItem) to which the PositionedImage is anchored.
* @param {Number} positionedImageId The ID of the PositionedImage.
* @param {Blob} image The image used to replace the "old" PositionedImage with.
*/
function ReplacePositionedImage( anchor, positionedImageId, image ) {
// get the positioned image by its ID
var positionedImage = anchor.getPositionedImage(positionedImageId);
@sgraaf
sgraaf / gpu_transformers_benchmark.py
Last active April 15, 2021 13:34
Simple GPU benchmarking script using the Transformers library w/ timeit. Make sure that you have PyTorch, Transformers and tqdm installed!
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import timeit
import torch
from torch.utils.data import DataLoader, SequentialSampler, TensorDataset
from transformers import DistilBertForMaskedLM
from tqdm import tqdm
NUM_EXAMPLES = 128
@sgraaf
sgraaf / tokenize_data.py
Created February 3, 2020 19:27
Simple python script to tokenize a English-language text data (sentences)
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
from pathlib import Path
from tokenizers import BertWordPieceTokenizer
def main():
in_file = Path(sys.argv[1])
@sgraaf
sgraaf / Tokenizers_timing_experiment.ipynb
Last active January 20, 2020 17:01
Tokenizers timing experiments: Jupyter Notebook
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sgraaf
sgraaf / multithreaded.csv
Last active January 20, 2020 16:58
Tokenizers timing experiments: Multithreaded performance
implementation mean execution time
submit 1min 8s
map 1min 9s
encode_batch 10.6s
@sgraaf
sgraaf / transformers_vs_tokenizers.csv
Last active January 19, 2020 19:05
Tokenizers timing experiments: Transformers vs Tokenizers
implementation mean execution time
transformers 6min 42s
tokenizers 45.6s
@sgraaf
sgraaf / preprocess_wiki_dump.py
Last active October 23, 2021 21:28
Simple python script to pre-process a Wikipedia dump
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
from pathlib import Path
from blingfire import text_to_sentences
def main():
wiki_dump_file_in = Path(sys.argv[1])