Last active Aug 5, 2020
Sample the next token from a probability distribution using top-k and/or nucleus (top-p) sampling
 def top_k_top_p_filtering(logits, top_k=0, top_p=0.0, filter_value=-float('Inf')): """ Filter a distribution of logits using top-k and/or nucleus (top-p) filtering Args: logits: logits distribution shape (vocabulary size) top_k >0: keep only top k tokens with highest probability (top-k filtering). top_p >0.0: keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751) """ assert logits.dim() == 1 # batch size 1 for now - could be updated for more but the code would be less clear top_k = min(top_k, logits.size(-1)) # Safety check
Last active Jul 29, 2020
Understanding & Visualizing Self-Normalizing Neural Networks
Last active Aug 3, 2017
Jupyter shortbold markdown cell, just paste into a markdown cell to enjoy shortbold MathJax for the rest of the notebook!
\$ \newcommand{\aB}{\mathbf{a}} \newcommand{\bB}{\mathbf{b}} \newcommand{\cB}{\mathbf{c}} \newcommand{\dB}{\mathbf{d}} \newcommand{\eB}{\mathbf{e}} \newcommand{\fB}{\mathbf{f}} \newcommand{\gB}{\mathbf{g}} \newcommand{\hB}{\mathbf{h}} \newcommand{\iB}{\mathbf{i}}

Created Jan 12, 2017
Name t Jovan 0.143522377788 Wilford 0.171813290491 Newton 0.192343843426 Maurice 0.193607112432 Emmanuel 0.20571087052 Joseph 0.210762071958 Milton 0.21296788724 Ahmad 0.214983745995 Julius 0.218052193228
Created Nov 11, 2016
Hyperband for hyperparameter optimization
 # https://people.eecs.berkeley.edu/~kjamieson/hyperband.html # you need to write the following hooks for your custom problem from problem import get_random_hyperparameter_configuration,run_then_return_val_loss max_iter = 81 # maximum iterations/epochs per configuration eta = 3 # defines downsampling rate (default=3) logeta = lambda x: log(x)/log(eta) s_max = int(logeta(max_iter)) # number of unique executions of Successive Halving (minus one) B = (s_max+1)*max_iter # total number of iterations (without reuse) per execution of Succesive Halving (n,r)
Last active Jan 28, 2020
