Skip to content

Instantly share code, notes, and snippets.

View lewtun's full-sized avatar
🤫
LLM whispering

lewtun

🤫
LLM whispering
View GitHub Profile
@lewtun
lewtun / format_spaces_urls.ipynb
Created November 22, 2022 12:30
[HF Course] Format Gradio URLs
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@lewtun
lewtun / update-label-mappings.py
Created July 15, 2022 18:21
Update label mappings in config.json
import json
import datasets
import transformers
from datasets import ClassLabel, load_dataset
from huggingface_hub import (
HfFolder,
ModelFilter,
hf_hub_download,
list_models,
@lewtun
lewtun / page_334.py
Created January 21, 2022 09:27
Correction to page 334
def get_grouped_params(model, no_decay=["bias", "LayerNorm.weight"]):
params_with_wd, params_without_wd = [], []
for n, p in model.named_parameters():
if any(nd in n for nd in no_decay):
params_without_wd.append(p)
else:
params_with_wd.append(p)
return [{'params': params_with_wd, 'weight_decay': args.weight_decay},
{'params': params_without_wd, 'weight_decay': 0.0}]
if any(nd in n for nd in no_decay):
params_without_wd.append(p)
else:
params_with_wd.append(p)
@lewtun
lewtun / chapter06_codeblock01.py
Created January 20, 2022 08:31
Chapter 6 - Improve codeblock for summaries
from tqdm import tqdm
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
def chunks(list_of_elements, batch_size):
"""Yield successive batch-sized chunks from list_of_elements."""
for i in range(0, len(list_of_elements), batch_size):
yield list_of_elements[i : i + batch_size]
@lewtun
lewtun / codeblock.py
Last active January 9, 2022 15:35
Chapter 7 - page 175 - fix code block
for question_type in ["How", "What", "Is"]:
for question in (
dfs["train"][dfs["train"].question.str.startswith(question_type)]
.sample(n=3, random_state=42)['question']):
print(question)
from datasets import load_dataset
def validate_datasets(reference_dataset, new_dataset):
"""Validate the column names and rows of the new dataset"""
splits = list(reference_dataset.keys())
for split in splits:
ref_dset = reference_dataset[split]
new_dset = new_dataset[split]
# Check column names agree
ref_cols = set(ref_dset.column_names)
@lewtun
lewtun / subjqa-electronics-test.json
Created March 30, 2021 19:30
SubjQA test set for Electronics domain in SQuADv2 format
{"data": [{"title": "B00001WRSJ", "paragraphs": [{"qas": [{"question": "What is the tonal balance of these headphones?", "id": "d0781d13200014aa25860e44da9d5ea7", "answers": [{"text": "I have been a headphone fanatic for thirty years", "answer_start": 0}], "is_impossible": false}], "context": "I have been a headphone fanatic for thirty years and have owned and used a variety of headphones over those years, to include Stax SR-5, Sennheiser HD-424 and HD-580. The Sony MDRV6 excells as the best value of any headphone that I've ever owned. They are especially good at producing natural-sounding deep bass, and the overall octave-to-octave balance is excellent. The sound quality is all in all comparable to other headphones that cost considerably more.The MDRV6 is especially well-suited for travel due to the collapsible design, and for noisy environments or for quiet environments such as a library where the sound emitted by open-back headphones would distract others.The MDRV6 is not quite as comfortable as some ot
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.