Skip to content

Instantly share code, notes, and snippets.

@joaopgrassi
joaopgrassi / wsl2-network.ps1
Created August 24, 2021 08:56
WSL2 expose ports to windows host
$remoteport = bash.exe -c "ifconfig eth0 | grep 'inet '"
$found = $remoteport -match '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}';
if( $found ){
$remoteport = $matches[0];
echo $remoteport
} else{
echo "The Script Exited, the ip address of WSL 2 cannot be found";
exit;
}
@tamuhey
tamuhey / tokenizations_post.md
Last active March 30, 2024 19:00
How to calculate the alignment between BERT and spaCy tokens effectively and robustly

How to calculate the alignment between BERT and spaCy tokens effectively and robustly

image

site: https://tamuhey.github.io/tokenizations/

Natural Language Processing (NLP) has made great progress in recent years because of neural networks, which allows us to solve various tasks with end-to-end architecture. However, many NLP systems still require language-specific pre- and post-processing, especially in tokenizations. In this article, I describe an algorithm that simplifies calculating correspondence between tokens (e.g. BERT vs. spaCy), one such process. And I introduce Python and Rust libraries that implement this algorithm. Here are the library and the demo site links:

@thomwolf
thomwolf / top-k-top-p.py
Last active May 14, 2024 00:20
Sample the next token from a probability distribution using top-k and/or nucleus (top-p) sampling
def top_k_top_p_filtering(logits, top_k=0, top_p=0.0, filter_value=-float('Inf')):
""" Filter a distribution of logits using top-k and/or nucleus (top-p) filtering
Args:
logits: logits distribution shape (vocabulary size)
top_k >0: keep only top k tokens with highest probability (top-k filtering).
top_p >0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering).
Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751)
"""
assert logits.dim() == 1 # batch size 1 for now - could be updated for more but the code would be less clear
top_k = min(top_k, logits.size(-1)) # Safety check
@L0SG
L0SG / freeze_example.py
Last active October 12, 2023 05:02
PyTorch example: freezing a part of the net (including fine-tuning)
import torch
from torch import nn
from torch.autograd import Variable
import torch.nn.functional as F
import torch.optim as optim
# toy feed-forward net
class Net(nn.Module):
def __init__(self):
@andreh7
andreh7 / pytorch-variable-number-of-inputs.ipynb
Last active March 10, 2021 15:15
Learning a function with a variable number of inputs with PyTorch
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@andreasvc
andreasvc / metainfo.py
Last active May 23, 2020 16:39
Extract metadata from Project Gutenberg RDF catalog into a Python dict.
"""Extract metadata from Project Gutenberg RDF catalog into a Python dict.
Based on https://bitbucket.org/c-w/gutenberg/
>>> md = readmetadata()
>>> md[123]
{'LCC': {'PS'},
'author': u'Burroughs, Edgar Rice',
'authoryearofbirth': 1875,
'authoryearofdeath': 1950,