Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
# *** Before ******************************************************************
# The program operated on a large document tree (DOM-like, but not exactly DOM).
# It had several functions that needed to walk the entire tree,
# which was done with recursion, like this:
def strip_grammar(doc):
...
def visit(parent):
for i, child in enumerate(parent.content):
if not isinstance(child, str):
if is_grammar_block(child, False):
strip_grammar_block(parent, i)
elif is_nonterminal(child):
strip_grammar_inline(parent, i)
visit(child)
visit(doc)
# *** After ******************************************************************
# The code is split into two parts: strip_grammar and a new generator-function
# called all_descendants.
def all_descendants(parent):
"""Find all elements that are descendants of 'parent'.
For each one, yield the triple (parent, index, child)
such that parent.content[index] === child."""
for index, child in enumerate(parent.content):
if not isinstance(child, str):
yield parent, index, child
for t in all_parent_index_child_triples(child):
yield t
def strip_grammar(doc):
...
for parent, i, child in all_descendants(doc):
if is_grammar_block(child, False):
strip_grammar_block(parent, i)
elif is_nonterminal(child):
strip_grammar_inline(parent, i)
# all_parent_index_child_triples is the generator. (In Python, there's no *
# marking generators; instead, any function containing a 'yield' is a
# generator.)
#
# I think this refactoring is good because:
# (a) what the code is trying to do to each element in the document is nontrivial
# (b) finding all the elements in the document is nontrivial
# (c) before refactoring, these things were kind of mashed together
# and it wasn't clear which was which.
#
# Also, in the real program, it turns out the same generator could be used in
# several places.
#
# Without generators, the best way accomplish a similar refactoring is to make
# all_descendants a higher-order function that takes a callback. There's really
# nothing wrong with that unless any caller needs to break or early-return out
# of the loop; or combine the sequence of elements with some other sequence,
# in the style of Python's zip(); or treat the elements as a data set for further
# querying and operations, in the style of jQuery element sets.
#
# Because generators are iterators, and iterators are very flexible, the
# caller gets to decide how to consume the data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment