Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Example CTC Decoder in Python
Author: Awni Hannun
This is an example CTC decoder written in Python. The code is
intended to be a simple example and is not designed to be
especially efficient.
The algorithm is a prefix beam search for a model trained
with the CTC loss function.
For more details checkout either of these references:
import numpy as np
import math
import collections
NEG_INF = -float("inf")
def make_new_beam():
fn = lambda : (NEG_INF, NEG_INF)
return collections.defaultdict(fn)
def logsumexp(*args):
Stable log sum exp.
if all(a == NEG_INF for a in args):
return NEG_INF
a_max = max(args)
lsp = math.log(sum(math.exp(a - a_max)
for a in args))
return a_max + lsp
def decode(probs, beam_size=100, blank=0):
Performs inference for the given output probabilities.
probs: The output probabilities (e.g. post-softmax) for each
time step. Should be an array of shape (time x output dim).
beam_size (int): Size of the beam to use during inference.
blank (int): Index of the CTC blank label.
Returns the output label sequence and the corresponding negative
log-likelihood estimated by the decoder.
T, S = probs.shape
probs = np.log(probs)
# Elements in the beam are (prefix, (p_blank, p_no_blank))
# Initialize the beam with the empty sequence, a probability of
# 1 for ending in blank and zero for ending in non-blank
# (in log space).
beam = [(tuple(), (0.0, NEG_INF))]
for t in range(T): # Loop over time
# A default dictionary to store the next step candidates.
next_beam = make_new_beam()
for s in range(S): # Loop over vocab
p = probs[t, s]
# The variables p_b and p_nb are respectively the
# probabilities for the prefix given that it ends in a
# blank and does not end in a blank at this time step.
for prefix, (p_b, p_nb) in beam: # Loop over beam
# If we propose a blank the prefix doesn't change.
# Only the probability of ending in blank gets updated.
if s == blank:
n_p_b, n_p_nb = next_beam[prefix]
n_p_b = logsumexp(n_p_b, p_b + p, p_nb + p)
next_beam[prefix] = (n_p_b, n_p_nb)
# Extend the prefix by the new character s and add it to
# the beam. Only the probability of not ending in blank
# gets updated.
end_t = prefix[-1] if prefix else None
n_prefix = prefix + (s,)
n_p_b, n_p_nb = next_beam[n_prefix]
if s != end_t:
n_p_nb = logsumexp(n_p_nb, p_b + p, p_nb + p)
# We don't include the previous probability of not ending
# in blank (p_nb) if s is repeated at the end. The CTC
# algorithm merges characters not separated by a blank.
n_p_nb = logsumexp(n_p_nb, p_b + p)
# *NB* this would be a good place to include an LM score.
next_beam[n_prefix] = (n_p_b, n_p_nb)
# If s is repeated at the end we also update the unchanged
# prefix. This is the merging case.
if s == end_t:
n_p_b, n_p_nb = next_beam[prefix]
n_p_nb = logsumexp(n_p_nb, p_nb + p)
next_beam[prefix] = (n_p_b, n_p_nb)
# Sort and trim the beam before moving on to the
# next time-step.
beam = sorted(next_beam.items(),
key=lambda x : logsumexp(*x[1]),
beam = beam[:beam_size]
best = beam[0]
return best[0], -logsumexp(*best[1])
if __name__ == "__main__":
time = 50
output_dim = 20
probs = np.random.rand(time, output_dim)
probs = probs / np.sum(probs, axis=1, keepdims=True)
labels, score = decode(probs)
print("Score {:.3f}".format(score))
Copy link

marcoleewow commented Jan 8, 2018

Would you add character level or word level bi-gram LM in line 96?

For adding word level, I can imagine that I need to check if current (s,) is a space label and then times the prob P(current word | last word), but wouldn't this approach makes the prob of the beam go smaller than the beams that does not add bi-gram?

Copy link

awni commented Jan 16, 2018

You could add either. Like you said for a word level LM you'd only add in the score when the proposed character is a space.

And it's a great observation that every time you add in an LM score the probability goes down so the search will inherently favor prefixes with fewer words hence you need to include a word bonus term. I discuss this in more detail in the distill article

Copy link

sigpro commented Jan 30, 2018

Thanks for your work,I read the paper(1408.2873),I think you made a mistake on algorithm 1 at page 5.

if c = ℓ end then
p nb (ℓ + ;x 1:t ) ← p(c;xt )pb (ℓ;x 1:t−1 )
p nb (ℓ ;x 1:t ) ← p(c;xt )pb (ℓ;x 1:t−1 )

is the

p nb (ℓ ;x 1:t ) ← p(c;xt )pb (ℓ;x 1:t−1 )

should be

p nb (ℓ ;x 1:t ) ← p(c;xt )pnb (ℓ;x 1:t−1 )

or maybe I misunderstand your algorithm. Thanks

Copy link

mrfox321 commented Mar 2, 2018

I had a similar issue with the algo. I agree with your suggested correction, @sigpro

Copy link

geniki commented Mar 3, 2018

I also agree @sigpro. You can also see it reflected in line 102 of this implementation of the algorithm (p_nb rather than p_b):

n_p_nb = logsumexp(n_p_nb, p_nb + p)

Copy link

edchengg commented Aug 26, 2018

agree @sigpro

Copy link

YuanEric88 commented Aug 6, 2020

Thank you for your sharing. But I didn't find any place where the score are combined after I check out the code carefully. Anyone can point that out? @sigpro, @edchengg, @mrfox321

Copy link

viswanathgs commented Oct 31, 2020

@YuanEric88, that is handled at the implementation level - beam and next_beam are dictionaries with keys as prefix strings.

Copy link

hurrynarainProg commented Dec 15, 2020

Thanks for your work. I recently read your paper (1408.2873) and had two Qs in it.

  1. Same as what @sigpro mentioned above

Wondering if the counter part to the below section from paper is considered in the logic?
if ℓ + not in Aprev then
end if

by counterpart I mean :
if ℓ - not in Aprev then
pnb(ℓ ; x1:t) = p( ℓ _lastElement ; xt) * (pb( ℓ - ; x1:t−1) + pnb( ℓ - ; x1:t−1))
end if
l_lastElement refer to the last alphabet of l
ℓ - refer to one alphabet less in " ℓ "
for ex:
ℓ : "BAG"
ℓ - : "BA"
ℓ _lastElement : "G"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment