Instantly share code, notes, and snippets.

• Sort options
Last active Aug 5, 2020
Sample the next token from a probability distribution using top-k and/or nucleus (top-p) sampling
View top-k-top-p.py
 def top_k_top_p_filtering(logits, top_k=0, top_p=0.0, filter_value=-float('Inf')): """ Filter a distribution of logits using top-k and/or nucleus (top-p) filtering Args: logits: logits distribution shape (vocabulary size) top_k >0: keep only top k tokens with highest probability (top-k filtering). top_p >0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751) """ assert logits.dim() == 1 # batch size 1 for now - could be updated for more but the code would be less clear top_k = min(top_k, logits.size(-1)) # Safety check
Last active Jul 29, 2020
Understanding & Visualizing Self-Normalizing Neural Networks
View notebook.ipynb Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Last active Aug 3, 2017
Jupyter shortbold markdown cell, just paste into a markdown cell to enjoy shortbold MathJax for the rest of the notebook!
View shortbold.md

\$ \newcommand{\aB}{\mathbf{a}} \newcommand{\bB}{\mathbf{b}} \newcommand{\cB}{\mathbf{c}} \newcommand{\dB}{\mathbf{d}} \newcommand{\eB}{\mathbf{e}} \newcommand{\fB}{\mathbf{f}} \newcommand{\gB}{\mathbf{g}} \newcommand{\hB}{\mathbf{h}} \newcommand{\iB}{\mathbf{i}}

Created Jan 12, 2017
View Wikipedia_fastText_gender_names.csv
Name t Jovan 0.143522377788 Wilford 0.171813290491 Newton 0.192343843426 Maurice 0.193607112432 Emmanuel 0.20571087052 Joseph 0.210762071958 Milton 0.21296788724 Ahmad 0.214983745995 Julius 0.218052193228
Created Nov 11, 2016
Hyperband for hyperparameter optimization
View hyperband.py
 # https://people.eecs.berkeley.edu/~kjamieson/hyperband.html # you need to write the following hooks for your custom problem from problem import get_random_hyperparameter_configuration,run_then_return_val_loss max_iter = 81 # maximum iterations/epochs per configuration eta = 3 # defines downsampling rate (default=3) logeta = lambda x: log(x)/log(eta) s_max = int(logeta(max_iter)) # number of unique executions of Successive Halving (minus one) B = (s_max+1)*max_iter # total number of iterations (without reuse) per execution of Succesive Halving (n,r)
Last active Jan 28, 2020
View animate-matplotlib-python.ipynb 