Skip to content

Instantly share code, notes, and snippets.

@pepasflo
Last active July 21, 2018 00:05
Show Gist options
  • Save pepasflo/57f1ed229f541c2701135d5800269f3d to your computer and use it in GitHub Desktop.
Save pepasflo/57f1ed229f541c2701135d5800269f3d to your computer and use it in GitHub Desktop.
Python script which parses HTML, looks for a specific iframe, and replaces it with a div. Oops, this only works properly on XHTML.
#!/usr/bin/env python
# read in the HTML via stdin:
import sys
input = sys.stdin.read()
# define the set of toknes which our lexer knows about:
import re
symbol_table = (
("opentag", re.compile(r'<[a-zA-Z].*?>')),
("closetag", re.compile(r'</[a-zA-Z].*?>')),
("singletag", re.compile(r'<[a-zA-Z].*?/>')),
("other", re.compile(r'[^<>]+')),
)
# a representation of a token:
class Token(object):
def __init__(self, name, text):
self.name = name
self.text = text
def __repr__(self):
return "%s(%s)" % (self.name, self.text)
# consumes enough chars from input to create the next token:
def consume(symbol_pair, input):
(token_name, pattern) = symbol_pair
m = pattern.match(input)
if m is not None:
matched_text = m.group()
(start_index, end_index) = m.span()
token = Token(token_name, matched_text)
result = (token, end_index)
else:
result = (None, None)
return result
# the lexer: turns a list of characters into a list of recognized tokens:
def tokenize(symbol_table, input):
tokens = []
while len(input) > 0:
for symbol_pair in symbol_table:
(token, consumed_count) = consume(symbol_pair, input)
if token is not None:
tokens.append(token)
input = input[consumed_count:]
break
else:
raise Exception("bad input: '%s'" % input)
return tokens
# lex our input into tokens:
tokens = tokenize(symbol_table, input)
# print out the tokens as a sanity check:
import pprint
print "tokens:"
pprint.pprint(tokens)
# a representation of a node in a parse tree:
class Node(object):
def __init__(self, token):
self.token = token
self.closetoken = None
self.subnodes = []
def __repr__(self):
return "%s(%s)" % (self.token, self.subnodes)
# the parser: turns a linear stream of tokens into a parse tree:
def parse(tokens, node):
while len(tokens) > 0:
token = tokens.pop(0)
if token.name == "opentag":
subnode = Node(token)
parse(tokens, subnode)
node.subnodes.append(subnode)
elif token.name == "closetag":
node.closetoken = token
break
elif token.name in ["singletag", "other"]:
subnode = Node(token)
node.subnodes.append(subnode)
# parse the tokens into a tree. start by creating a root node (None).
parse_tree = Node(None)
parse(tokens, parse_tree)
# prints out the parse tree, with indentation to indicate tree structure.
def print_parsetree(node, indent=0):
print "%s%s" % (" " * indent, node.token if node.token else "(root)")
for subnode in node.subnodes:
print_parsetree(subnode, indent+1)
# print out the parse tree as a sanity check:
print
print "parse tree:"
print_parsetree(parse_tree)
# finds any video embed iframes and replaces then with "ios-video" divs:
def replace_iframe(node):
for i in range(len(node.subnodes)):
subnode = node.subnodes[i]
name = subnode.token.name
text = subnode.token.text
if name == "opentag" and text.startswith("<iframe") and "src=\"/embed/" in text and "flo-video-embed" in text:
del node.subnodes[i]
replacement = Node(Token("opentag", "<div class=\"ios-video\">"))
replacement.closetoken = Token("closetag", "</div>")
node.subnodes.insert(i, replacement)
else:
for subsubnode in subnode.subnodes:
replace_iframe(subsubnode)
# search-and-replace the video embed iframes:
replace_iframe(parse_tree)
# print out the modified parse tree as a sanity check:
print
print "modified parse tree:"
print_parsetree(parse_tree)
# serializes a parse tree back into HTML:
def dump_html(node):
if node.token:
print node.token.text,
for subnode in node.subnodes:
dump_html(subnode)
if node.closetoken:
print node.closetoken.text,
# print out the HTML of our modified parse tree:
print
print "modified HTML:"
print dump_html(parse_tree)
<p>The <a href="https://www.flocheer.com/video/6044838-the-story-behind-the-queen-of-tumbling" rel="noopener noreferrer" target="_blank">Queen of Tumbling</a> has earned her spot on the <a href="https://usagym.org/pages/post.html?PostID=22228&prog=" rel="noopener noreferrer" target="_blank">2018-2019 trampoline and tumbling senior national team</a>!</p><p>For the past decade, <a href="https://www.flocheer.com/video/6044838-the-story-behind-the-queen-of-tumbling" rel="noopener noreferrer" target="_blank">Angel Rice</a> has made her name as a legend in the cheerleading world for her powerful tumbling skills and iconic kick-double corner pass. After the Cheerleading Worlds 2018, Rice decided to hang up her shoes and bow and peruse goals in the next chapter of her life: power tumbling.</p><p><span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false"><iframe class="fr-instagram embed fr-draggable" scrolling="no" src="https://www.instagram.com/p/Bk6m3c-B7Vu/embed/captioned/?v=7" style="border: 0; margin: 0; max-width: 658px; width: 100%; display: block; padding: 0; background: rgb(255, 255, 255);" height="650"></iframe></span></p><p><br></p><p>Rice had been working to earn her spot on the U.S. national team for the past two years. After her final performance on the World&rsquo;s stage with Stingray Allstars Steel, Rice put her full focus into power tumbling and perfecting her skills to make the national team.</p><p>Rice represented <a href="http://www.flipcitysouth.com/" rel="noopener noreferrer" target="_blank">FlipCity South</a> at the <a href="https://usagym.org/pages/tt/pages/index.html" rel="noopener noreferrer" target="_blank">2018 USA Gymnastics Championships</a> this past week in Greensboro, North Carolina, where her dream finally became true.</p><p><span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false"><iframe class="fr-instagram embed fr-draggable" scrolling="no" src="https://www.instagram.com/p/Bk-NBwOhkNA/embed/captioned/?v=7" style="border: 0; margin: 0; max-width: 658px; width: 100%; display: block; padding: 0; background: rgb(255, 255, 255);" height="700"></iframe></span></p><p><br></p><p>Rice and 27 other talented men and women from around the country were announced as members of the <a href="https://usagym.org/pages/post.html?PostID=22228" rel="noopener noreferrer" target="_blank">2018-2019 trampoline and tumbling senior national team</a>.</p><p>Although Rice is not the first black female on the Trampoline and Tumbling Senior National Team, she is making history as she joins the list of other talented black female athletes, Kaylah Whaley and Lajeana Davis.</p><p>Whether she&rsquo;s tumbling on the Worlds mat or a rod floor, we can&rsquo;t wait to see what this young talent will accomplish next. Congratulations Angel!</p><h2>The Story Behind The Queen Of Tumbling</h2><div class="froala-video fr-draggable"><span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false"><iframe width="711" height="400" class="flo-video-embed embed embed-responsive-item fr-draggable" src="/embed/yPgX66qNk"></iframe></span></div><p><br></p><p><br></p>
$ cat nodearticle-6219156-content.txt | ./parse.py
tokens:
[opentag(<p>),
other(The ),
opentag(<a href="https://www.flocheer.com/video/6044838-the-story-behind-the-queen-of-tumbling" rel="noopener noreferrer" target="_blank">),
other(Queen of Tumbling),
closetag(</a>),
other( has earned her spot on the ),
opentag(<a href="https://usagym.org/pages/post.html?PostID=22228&prog=" rel="noopener noreferrer" target="_blank">),
other(2018-2019 trampoline and tumbling senior national team),
closetag(</a>),
other(!),
closetag(</p>),
opentag(<p>),
other(For the past decade, ),
opentag(<a href="https://www.flocheer.com/video/6044838-the-story-behind-the-queen-of-tumbling" rel="noopener noreferrer" target="_blank">),
other(Angel Rice),
closetag(</a>),
other( has made her name as a legend in the cheerleading world for her powerful tumbling skills and iconic kick-double corner pass. After the Cheerleading Worlds 2018, Rice decided to hang up her shoes and bow and peruse goals in the next chapter of her life: power tumbling.),
closetag(</p>),
opentag(<p>),
opentag(<span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false">),
opentag(<iframe class="fr-instagram embed fr-draggable" scrolling="no" src="https://www.instagram.com/p/Bk6m3c-B7Vu/embed/captioned/?v=7" style="border: 0; margin: 0; max-width: 658px; width: 100%; display: block; padding: 0; background: rgb(255, 255, 255);" height="650">),
closetag(</iframe>),
closetag(</span>),
closetag(</p>),
opentag(<p>),
opentag(<br>),
closetag(</p>),
opentag(<p>),
other(Rice had been working to earn her spot on the U.S. national team for the past two years. After her final performance on the World&rsquo;s stage with Stingray Allstars Steel, Rice put her full focus into power tumbling and perfecting her skills to make the national team.),
closetag(</p>),
opentag(<p>),
other(Rice represented ),
opentag(<a href="http://www.flipcitysouth.com/" rel="noopener noreferrer" target="_blank">),
other(FlipCity South),
closetag(</a>),
other( at the ),
opentag(<a href="https://usagym.org/pages/tt/pages/index.html" rel="noopener noreferrer" target="_blank">),
other(2018 USA Gymnastics Championships),
closetag(</a>),
other( this past week in Greensboro, North Carolina, where her dream finally became true.),
closetag(</p>),
opentag(<p>),
opentag(<span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false">),
opentag(<iframe class="fr-instagram embed fr-draggable" scrolling="no" src="https://www.instagram.com/p/Bk-NBwOhkNA/embed/captioned/?v=7" style="border: 0; margin: 0; max-width: 658px; width: 100%; display: block; padding: 0; background: rgb(255, 255, 255);" height="700">),
closetag(</iframe>),
closetag(</span>),
closetag(</p>),
opentag(<p>),
opentag(<br>),
closetag(</p>),
opentag(<p>),
other(Rice and 27 other talented men and women from around the country were announced as members of the ),
opentag(<a href="https://usagym.org/pages/post.html?PostID=22228" rel="noopener noreferrer" target="_blank">),
other(2018-2019 trampoline and tumbling senior national team),
closetag(</a>),
other(.),
closetag(</p>),
opentag(<p>),
other(Although Rice is not the first black female on the Trampoline and Tumbling Senior National Team, she is making history as she joins the list of other talented black female athletes, Kaylah Whaley and Lajeana Davis.),
closetag(</p>),
opentag(<p>),
other(Whether she&rsquo;s tumbling on the Worlds mat or a rod floor, we can&rsquo;t wait to see what this young talent will accomplish next. Congratulations Angel!),
closetag(</p>),
opentag(<h2>),
other(The Story Behind The Queen Of Tumbling),
closetag(</h2>),
opentag(<div class="froala-video fr-draggable">),
opentag(<span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false">),
opentag(<iframe width="711" height="400" class="flo-video-embed embed embed-responsive-item fr-draggable" src="/embed/yPgX66qNk">),
closetag(</iframe>),
closetag(</span>),
closetag(</div>),
opentag(<p>),
opentag(<br>),
closetag(</p>),
opentag(<p>),
opentag(<br>),
closetag(</p>)]
parse tree:
(root)
opentag(<p>)
other(The )
opentag(<a href="https://www.flocheer.com/video/6044838-the-story-behind-the-queen-of-tumbling" rel="noopener noreferrer" target="_blank">)
other(Queen of Tumbling)
other( has earned her spot on the )
opentag(<a href="https://usagym.org/pages/post.html?PostID=22228&prog=" rel="noopener noreferrer" target="_blank">)
other(2018-2019 trampoline and tumbling senior national team)
other(!)
opentag(<p>)
other(For the past decade, )
opentag(<a href="https://www.flocheer.com/video/6044838-the-story-behind-the-queen-of-tumbling" rel="noopener noreferrer" target="_blank">)
other(Angel Rice)
other( has made her name as a legend in the cheerleading world for her powerful tumbling skills and iconic kick-double corner pass. After the Cheerleading Worlds 2018, Rice decided to hang up her shoes and bow and peruse goals in the next chapter of her life: power tumbling.)
opentag(<p>)
opentag(<span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false">)
opentag(<iframe class="fr-instagram embed fr-draggable" scrolling="no" src="https://www.instagram.com/p/Bk6m3c-B7Vu/embed/captioned/?v=7" style="border: 0; margin: 0; max-width: 658px; width: 100%; display: block; padding: 0; background: rgb(255, 255, 255);" height="650">)
opentag(<p>)
opentag(<br>)
opentag(<p>)
other(Rice had been working to earn her spot on the U.S. national team for the past two years. After her final performance on the World&rsquo;s stage with Stingray Allstars Steel, Rice put her full focus into power tumbling and perfecting her skills to make the national team.)
opentag(<p>)
other(Rice represented )
opentag(<a href="http://www.flipcitysouth.com/" rel="noopener noreferrer" target="_blank">)
other(FlipCity South)
other( at the )
opentag(<a href="https://usagym.org/pages/tt/pages/index.html" rel="noopener noreferrer" target="_blank">)
other(2018 USA Gymnastics Championships)
other( this past week in Greensboro, North Carolina, where her dream finally became true.)
opentag(<p>)
opentag(<span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false">)
opentag(<iframe class="fr-instagram embed fr-draggable" scrolling="no" src="https://www.instagram.com/p/Bk-NBwOhkNA/embed/captioned/?v=7" style="border: 0; margin: 0; max-width: 658px; width: 100%; display: block; padding: 0; background: rgb(255, 255, 255);" height="700">)
opentag(<p>)
opentag(<br>)
opentag(<p>)
other(Rice and 27 other talented men and women from around the country were announced as members of the )
opentag(<a href="https://usagym.org/pages/post.html?PostID=22228" rel="noopener noreferrer" target="_blank">)
other(2018-2019 trampoline and tumbling senior national team)
other(.)
opentag(<p>)
other(Although Rice is not the first black female on the Trampoline and Tumbling Senior National Team, she is making history as she joins the list of other talented black female athletes, Kaylah Whaley and Lajeana Davis.)
opentag(<p>)
other(Whether she&rsquo;s tumbling on the Worlds mat or a rod floor, we can&rsquo;t wait to see what this young talent will accomplish next. Congratulations Angel!)
opentag(<h2>)
other(The Story Behind The Queen Of Tumbling)
opentag(<div class="froala-video fr-draggable">)
opentag(<span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false">)
opentag(<iframe width="711" height="400" class="flo-video-embed embed embed-responsive-item fr-draggable" src="/embed/yPgX66qNk">)
opentag(<p>)
opentag(<br>)
opentag(<p>)
opentag(<br>)
modified parse tree:
(root)
opentag(<p>)
other(The )
opentag(<a href="https://www.flocheer.com/video/6044838-the-story-behind-the-queen-of-tumbling" rel="noopener noreferrer" target="_blank">)
other(Queen of Tumbling)
other( has earned her spot on the )
opentag(<a href="https://usagym.org/pages/post.html?PostID=22228&prog=" rel="noopener noreferrer" target="_blank">)
other(2018-2019 trampoline and tumbling senior national team)
other(!)
opentag(<p>)
other(For the past decade, )
opentag(<a href="https://www.flocheer.com/video/6044838-the-story-behind-the-queen-of-tumbling" rel="noopener noreferrer" target="_blank">)
other(Angel Rice)
other( has made her name as a legend in the cheerleading world for her powerful tumbling skills and iconic kick-double corner pass. After the Cheerleading Worlds 2018, Rice decided to hang up her shoes and bow and peruse goals in the next chapter of her life: power tumbling.)
opentag(<p>)
opentag(<span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false">)
opentag(<iframe class="fr-instagram embed fr-draggable" scrolling="no" src="https://www.instagram.com/p/Bk6m3c-B7Vu/embed/captioned/?v=7" style="border: 0; margin: 0; max-width: 658px; width: 100%; display: block; padding: 0; background: rgb(255, 255, 255);" height="650">)
opentag(<p>)
opentag(<br>)
opentag(<p>)
other(Rice had been working to earn her spot on the U.S. national team for the past two years. After her final performance on the World&rsquo;s stage with Stingray Allstars Steel, Rice put her full focus into power tumbling and perfecting her skills to make the national team.)
opentag(<p>)
other(Rice represented )
opentag(<a href="http://www.flipcitysouth.com/" rel="noopener noreferrer" target="_blank">)
other(FlipCity South)
other( at the )
opentag(<a href="https://usagym.org/pages/tt/pages/index.html" rel="noopener noreferrer" target="_blank">)
other(2018 USA Gymnastics Championships)
other( this past week in Greensboro, North Carolina, where her dream finally became true.)
opentag(<p>)
opentag(<span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false">)
opentag(<iframe class="fr-instagram embed fr-draggable" scrolling="no" src="https://www.instagram.com/p/Bk-NBwOhkNA/embed/captioned/?v=7" style="border: 0; margin: 0; max-width: 658px; width: 100%; display: block; padding: 0; background: rgb(255, 255, 255);" height="700">)
opentag(<p>)
opentag(<br>)
opentag(<p>)
other(Rice and 27 other talented men and women from around the country were announced as members of the )
opentag(<a href="https://usagym.org/pages/post.html?PostID=22228" rel="noopener noreferrer" target="_blank">)
other(2018-2019 trampoline and tumbling senior national team)
other(.)
opentag(<p>)
other(Although Rice is not the first black female on the Trampoline and Tumbling Senior National Team, she is making history as she joins the list of other talented black female athletes, Kaylah Whaley and Lajeana Davis.)
opentag(<p>)
other(Whether she&rsquo;s tumbling on the Worlds mat or a rod floor, we can&rsquo;t wait to see what this young talent will accomplish next. Congratulations Angel!)
opentag(<h2>)
other(The Story Behind The Queen Of Tumbling)
opentag(<div class="froala-video fr-draggable">)
opentag(<span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false">)
opentag(<div class="ios-video">)
opentag(<p>)
opentag(<br>)
opentag(<p>)
opentag(<br>)
modified HTML:
<p> The <a href="https://www.flocheer.com/video/6044838-the-story-behind-the-queen-of-tumbling" rel="noopener noreferrer" target="_blank"> Queen of Tumbling </a> has earned her spot on the <a href="https://usagym.org/pages/post.html?PostID=22228&prog=" rel="noopener noreferrer" target="_blank"> 2018-2019 trampoline and tumbling senior national team </a> ! </p> <p> For the past decade, <a href="https://www.flocheer.com/video/6044838-the-story-behind-the-queen-of-tumbling" rel="noopener noreferrer" target="_blank"> Angel Rice </a> has made her name as a legend in the cheerleading world for her powerful tumbling skills and iconic kick-double corner pass. After the Cheerleading Worlds 2018, Rice decided to hang up her shoes and bow and peruse goals in the next chapter of her life: power tumbling. </p> <p> <span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false"> <iframe class="fr-instagram embed fr-draggable" scrolling="no" src="https://www.instagram.com/p/Bk6m3c-B7Vu/embed/captioned/?v=7" style="border: 0; margin: 0; max-width: 658px; width: 100%; display: block; padding: 0; background: rgb(255, 255, 255);" height="650"> </iframe> </span> </p> <p> <br> </p> <p> Rice had been working to earn her spot on the U.S. national team for the past two years. After her final performance on the World&rsquo;s stage with Stingray Allstars Steel, Rice put her full focus into power tumbling and perfecting her skills to make the national team. </p> <p> Rice represented <a href="http://www.flipcitysouth.com/" rel="noopener noreferrer" target="_blank"> FlipCity South </a> at the <a href="https://usagym.org/pages/tt/pages/index.html" rel="noopener noreferrer" target="_blank"> 2018 USA Gymnastics Championships </a> this past week in Greensboro, North Carolina, where her dream finally became true. </p> <p> <span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false"> <iframe class="fr-instagram embed fr-draggable" scrolling="no" src="https://www.instagram.com/p/Bk-NBwOhkNA/embed/captioned/?v=7" style="border: 0; margin: 0; max-width: 658px; width: 100%; display: block; padding: 0; background: rgb(255, 255, 255);" height="700"> </iframe> </span> </p> <p> <br> </p> <p> Rice and 27 other talented men and women from around the country were announced as members of the <a href="https://usagym.org/pages/post.html?PostID=22228" rel="noopener noreferrer" target="_blank"> 2018-2019 trampoline and tumbling senior national team </a> . </p> <p> Although Rice is not the first black female on the Trampoline and Tumbling Senior National Team, she is making history as she joins the list of other talented black female athletes, Kaylah Whaley and Lajeana Davis. </p> <p> Whether she&rsquo;s tumbling on the Worlds mat or a rod floor, we can&rsquo;t wait to see what this young talent will accomplish next. Congratulations Angel! </p> <h2> The Story Behind The Queen Of Tumbling </h2> <div class="froala-video fr-draggable"> <span class="fr-video fr-fvc fr-dvi fr-draggable" contenteditable="false"> <div class="ios-video"> </div> </span> </div> <p> <br> </p> <p> <br> </p> None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment