Skip to content

Instantly share code, notes, and snippets.

View sgraaf's full-sized avatar

Steven van de Graaf sgraaf

View GitHub Profile
@sgraaf
sgraaf / ISO 4217-1 with currency symbols.ipynb
Created August 9, 2021 19:36
ISO 4217 w/ currency symbols
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sgraaf
sgraaf / ddp_example.py
Last active November 7, 2024 05:39
PyTorch Distributed Data Parallel (DDP) example
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from argparse import ArgumentParser
import torch
import torch.distributed as dist
from torch.nn.parallel import DistributedDataParallel as DDP
from torch.utils.data import DataLoader, Dataset
from torch.utils.data.distributed import DistributedSampler
from transformers import BertForMaskedLM
@sgraaf
sgraaf / I'm an early 🐤
Last active April 25, 2023 00:52
I'm an early 🐤
🌞 Morning 131 commits ███████▏░░░░░░░░░░░░░ 34.3%
🌆 Daytime 179 commits █████████▊░░░░░░░░░░░ 46.9%
🌃 Evening 69 commits ███▊░░░░░░░░░░░░░░░░░ 18.1%
🌙 Night 3 commits ▏░░░░░░░░░░░░░░░░░░░░ 0.8%
@sgraaf
sgraaf / rotatingproxysession.py
Created October 14, 2020 16:46
A requests Session that rotates (free) proxies for GET-requests.
from itertools import cycle
from lxml import html
from requests import Response, Session
class RotatingProxySession(Session):
def __init__(self) -> None:
super().__init__()
@sgraaf
sgraaf / rotatinguasession.py
Last active December 26, 2022 00:26
A requests Session that rotates its user-agent for GET-requests.
# coding: utf-8
from itertools import cycle
from pathlib import Path
from typing import Optional
from lxml import html
from requests import Response, Session
class RotatingUASession(Session):
@sgraaf
sgraaf / concurrentsession.py
Created October 14, 2020 15:06
A requests Session that can make concurrent GET and POST-requests.
import os
from concurrent.futures import Future, ThreadPoolExecutor
from functools import partial
from itertools import repeat
from typing import Any, Dict, List, Optional, Union
from requests import Response, Session
class ConcurrentSession(Session):
@sgraaf
sgraaf / download_wiki_dump.sh
Last active October 26, 2022 04:02
Simple bash script to download the latest Wikipedia dump in the chosen language. Adapted from: https://github.com/facebookresearch/XLM/blob/master/get-data-wiki.sh
#!/bin/sh
set -e
LG=$1
WIKI_DUMP_NAME=${LG}wiki-latest-pages-articles.xml.bz2
WIKI_DUMP_DOWNLOAD_URL=https://dumps.wikimedia.org/${LG}wiki/latest/$WIKI_DUMP_NAME
# download latest Wikipedia dump in chosen language
echo "Downloading the latest $LG-language Wikipedia dump from $WIKI_DUMP_DOWNLOAD_URL..."
wget -c $WIKI_DUMP_DOWNLOAD_URL
@sgraaf
sgraaf / Chrome_versions.ipynb
Created September 15, 2022 12:54
Scrape recent Chrome and Firefox version numbers
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@sgraaf
sgraaf / pdutils.py
Last active July 2, 2022 13:46
Pandas utility functions
import csv
from pathlib import Path
from typing import Optional, Union, Sequence, List, Literal
import pandas as pd
from tqdm import tqdm
def read_csv(
file: Path,
@sgraaf
sgraaf / extract_and_clean_wiki_dump.sh
Last active October 24, 2021 09:49
Simple bash script to extract and clean a Wikipedia dump. Adapted from: https://github.com/facebookresearch/XLM/blob/master/get-data-wiki.sh
#!/bin/sh
set -e
WIKI_DUMP_FILE_IN=$1
WIKI_DUMP_FILE_OUT=${WIKI_DUMP_FILE_IN%%.*}.txt
# clone the WikiExtractor repository
git clone https://github.com/attardi/wikiextractor.git
# extract and clean the chosen Wikipedia dump