Skip to content

Instantly share code, notes, and snippets.

View lhoestq's full-sized avatar
🤗

Quentin Lhoest lhoestq

🤗
View GitHub Profile
@lhoestq
lhoestq / checkout-pr.sh
Created March 30, 2023 15:47
GitHub: Clone, Checkout and open VSCode to PR from its URL
function getJsonVal () {
python -c "import json,sys;sys.stdout.write(str(json.load(sys.stdin)$1))";
}
prUrl=$1
apiUrl=$(echo "$prUrl" | sed -e 's/pull/pulls/g' -e 's/github.com/api.github.com\/repos/g')
upstreamUrl=$(echo "$prUrl" | sed -re 's/\/pull\/[0-9]+/.git/g')
prData=`curl "$apiUrl"`
userName=$(echo $prData | getJsonVal "['head']['repo']['owner']['login']")
repoName=$(echo $prData | getJsonVal "['head']['repo']['name']")
repoFullName=$(echo $prData | getJsonVal "['head']['repo']['full_name']")
@lhoestq
lhoestq / en_wiki_length.py
Created June 15, 2020 19:08
english wikipedia length
from nlp import load_dataset
from tqdm.auto import tqdm
wiki = load_dataset('wikipedia', '20200501.en', split="train")
batch_size = 1000
total_length = 0
for i in tqdm(range(0, len(wiki), batch_size)): # loop takes ~1min to run
batch = wiki[i:i + batch_size]
total_length += sum(len(sample_text) for sample_text in batch["text"])