{{ message }}

Instantly share code, notes, and snippets.

# awni/ctc_decoder.py

Last active Jul 12, 2021
Example CTC Decoder in Python
 """ Author: Awni Hannun This is an example CTC decoder written in Python. The code is intended to be a simple example and is not designed to be especially efficient. The algorithm is a prefix beam search for a model trained with the CTC loss function. For more details checkout either of these references: https://distill.pub/2017/ctc/#inference https://arxiv.org/abs/1408.2873 """ import numpy as np import math import collections NEG_INF = -float("inf") def make_new_beam(): fn = lambda : (NEG_INF, NEG_INF) return collections.defaultdict(fn) def logsumexp(*args): """ Stable log sum exp. """ if all(a == NEG_INF for a in args): return NEG_INF a_max = max(args) lsp = math.log(sum(math.exp(a - a_max) for a in args)) return a_max + lsp def decode(probs, beam_size=100, blank=0): """ Performs inference for the given output probabilities. Arguments: probs: The output probabilities (e.g. post-softmax) for each time step. Should be an array of shape (time x output dim). beam_size (int): Size of the beam to use during inference. blank (int): Index of the CTC blank label. Returns the output label sequence and the corresponding negative log-likelihood estimated by the decoder. """ T, S = probs.shape probs = np.log(probs) # Elements in the beam are (prefix, (p_blank, p_no_blank)) # Initialize the beam with the empty sequence, a probability of # 1 for ending in blank and zero for ending in non-blank # (in log space). beam = [(tuple(), (0.0, NEG_INF))] for t in range(T): # Loop over time # A default dictionary to store the next step candidates. next_beam = make_new_beam() for s in range(S): # Loop over vocab p = probs[t, s] # The variables p_b and p_nb are respectively the # probabilities for the prefix given that it ends in a # blank and does not end in a blank at this time step. for prefix, (p_b, p_nb) in beam: # Loop over beam # If we propose a blank the prefix doesn't change. # Only the probability of ending in blank gets updated. if s == blank: n_p_b, n_p_nb = next_beam[prefix] n_p_b = logsumexp(n_p_b, p_b + p, p_nb + p) next_beam[prefix] = (n_p_b, n_p_nb) continue # Extend the prefix by the new character s and add it to # the beam. Only the probability of not ending in blank # gets updated. end_t = prefix[-1] if prefix else None n_prefix = prefix + (s,) n_p_b, n_p_nb = next_beam[n_prefix] if s != end_t: n_p_nb = logsumexp(n_p_nb, p_b + p, p_nb + p) else: # We don't include the previous probability of not ending # in blank (p_nb) if s is repeated at the end. The CTC # algorithm merges characters not separated by a blank. n_p_nb = logsumexp(n_p_nb, p_b + p) # *NB* this would be a good place to include an LM score. next_beam[n_prefix] = (n_p_b, n_p_nb) # If s is repeated at the end we also update the unchanged # prefix. This is the merging case. if s == end_t: n_p_b, n_p_nb = next_beam[prefix] n_p_nb = logsumexp(n_p_nb, p_nb + p) next_beam[prefix] = (n_p_b, n_p_nb) # Sort and trim the beam before moving on to the # next time-step. beam = sorted(next_beam.items(), key=lambda x : logsumexp(*x), reverse=True) beam = beam[:beam_size] best = beam return best, -logsumexp(*best) if __name__ == "__main__": np.random.seed(3) time = 50 output_dim = 20 probs = np.random.rand(time, output_dim) probs = probs / np.sum(probs, axis=1, keepdims=True) labels, score = decode(probs) print("Score {:.3f}".format(score))

### marcoleewow commented Jan 8, 2018

 Would you add character level or word level bi-gram LM in line 96? For adding word level, I can imagine that I need to check if current (s,) is a space label and then times the prob P(current word | last word), but wouldn't this approach makes the prob of the beam go smaller than the beams that does not add bi-gram?

### awni commented Jan 16, 2018

 You could add either. Like you said for a word level LM you'd only add in the score when the proposed character is a space. And it's a great observation that every time you add in an LM score the probability goes down so the search will inherently favor prefixes with fewer words hence you need to include a word bonus term. I discuss this in more detail in the distill article https://distill.pub/2017/ctc/#inference.

### sigpro commented Jan 30, 2018

 Thanks for your work,I read the paper(1408.2873),I think you made a mistake on algorithm 1 at page 5. if c = ℓ end then p nb (ℓ + ;x 1:t ) ← p(c;xt )pb (ℓ;x 1:t−1 ) p nb (ℓ ;x 1:t ) ← p(c;xt )pb (ℓ;x 1:t−1 ) is the p nb (ℓ ;x 1:t ) ← p(c;xt )pb (ℓ;x 1:t−1 ) should be p nb (ℓ ;x 1:t ) ← p(c;xt )pnb (ℓ;x 1:t−1 ) or maybe I misunderstand your algorithm. Thanks

### mrfox321 commented Mar 2, 2018

 I had a similar issue with the algo. I agree with your suggested correction, @sigpro

### geniki commented Mar 3, 2018

 I also agree @sigpro. You can also see it reflected in line 102 of this implementation of the algorithm (p_nb rather than p_b): `n_p_nb = logsumexp(n_p_nb, p_nb + p)`

### edchengg commented Aug 26, 2018

 agree @sigpro

### YuanEric88 commented Aug 6, 2020

 Thank you for your sharing. But I didn't find any place where the score are combined after I check out the code carefully. Anyone can point that out? @sigpro, @edchengg, @mrfox321

### viswanathgs commented Oct 31, 2020

 @YuanEric88, that is handled at the implementation level - `beam` and `next_beam` are dictionaries with keys as prefix strings.

### hurrynarainProg commented Dec 15, 2020

 Thanks for your work. I recently read your paper (1408.2873) and had two Qs in it. Same as what @sigpro mentioned above Wondering if the counter part to the below section from paper is considered in the logic? if ℓ + not in Aprev then .... end if by counterpart I mean : if ℓ - not in Aprev then pnb(ℓ ; x1:t) = p( ℓ _lastElement ; xt) * (pb( ℓ - ; x1:t−1) + pnb( ℓ - ; x1:t−1)) end if Where, l_lastElement refer to the last alphabet of l ℓ - refer to one alphabet less in " ℓ " for ex: ℓ : "BAG" ℓ - : "BA" ℓ _lastElement : "G"