Skip to content

Instantly share code, notes, and snippets.

View ymoslem's full-sized avatar
👩‍🎓

Yasmin Moslem ymoslem

👩‍🎓
View GitHub Profile
@younesbelkada
younesbelkada / bench-fa-2.py
Last active March 19, 2024 15:46
Benchmark FA2 + transformers integration
import torch
import os
import argparse
import matplotlib.pyplot as plt
from tqdm import tqdm
from transformers import AutoModelForCausalLM, AutoTokenizer
import seaborn as sns
def get_parser():
@younesbelkada
younesbelkada / finetune_llama_v2.py
Last active May 3, 2024 19:01
Fine tune Llama v2 models on Guanaco Dataset
# coding=utf-8
# Copyright 2023 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
@thomwolf
thomwolf / top-k-top-p.py
Last active January 2, 2024 07:43
Sample the next token from a probability distribution using top-k and/or nucleus (top-p) sampling
def top_k_top_p_filtering(logits, top_k=0, top_p=0.0, filter_value=-float('Inf')):
""" Filter a distribution of logits using top-k and/or nucleus (top-p) filtering
Args:
logits: logits distribution shape (vocabulary size)
top_k >0: keep only top k tokens with highest probability (top-k filtering).
top_p >0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).
Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)
"""
assert logits.dim() == 1 # batch size 1 for now - could be updated for more but the code would be less clear
top_k = min(top_k, logits.size(-1)) # Safety check
@athiyadeviyani
athiyadeviyani / tkinterlist.py
Created July 19, 2018 08:27
Python GUI cheatsheet
# BASIC TKINTER CHEATSHEET
# Build basic GUIs with Python
from tkinter import *
from tkinter import scrolledtext
from tkinter import messagebox
from tkinter.ttk import Progressbar
from tkinter import filedialog
from tkinter import Menu
#coding: utf-8
#demo of beam search for seq2seq model
import numpy as np
import random
vocab = {
0: 'a',
1: 'b',
2: 'c',
3: 'd',
@ingo-m
ingo-m / debian_install_cuda.md
Last active December 24, 2022 17:29
How to install CUDA on Debian

How to install CUDA on Debian 8 (Jessie)

This document describes how to install nvidia drivers & CUDA in one go on a fresh debian install.

Work in progress

Preparations

  • Start with a fresh Debian install.
@udibr
udibr / beamsearch.py
Last active October 4, 2021 11:50
beam search for Keras RNN
# variation to https://github.com/ryankiros/skip-thoughts/blob/master/decoding/search.py
def keras_rnn_predict(samples, empty=empty, rnn_model=model, maxlen=maxlen):
"""for every sample, calculate probability for every possible label
you need to supply your RNN model and maxlen - the length of sequences it can handle
"""
data = sequence.pad_sequences(samples, maxlen=maxlen, value=empty)
return rnn_model.predict(data, verbose=0)
def beamsearch(predict=keras_rnn_predict,
@joyrexus
joyrexus / README.md
Last active April 8, 2024 09:15
Perl one-liners

Hi:

perl -e 'print "hello world!\n"'

A simple filter:

perl -ne 'print if /REGEX/'

Filter out blank lines (in place):

@raisch
raisch / regex_tokenizer.js
Created June 10, 2011 13:29
Regular Expression Sentence Tokenizer (English)
// tokenize(str)
// extracts semantically useful tokens from a string containing English-language sentences
// @param {String} the string to tokenize
// @returns {Array} contains extracted tokens
function tokenize(str) {
var punct='\\['+ '\\!'+ '\\"'+ '\\#'+ '\\$'+ // since javascript does not
'\\%'+ '\\&'+ '\\\''+ '\\('+ '\\)'+ // support POSIX character
'\\*'+ '\\+'+ '\\,'+ '\\\\'+ '\\-'+ // classes, we'll need our