Skip to content

Instantly share code, notes, and snippets.

@kylebgorman
Last active May 13, 2021 04:27
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kylebgorman/53c6ab36f41cf4a846de to your computer and use it in GitHub Desktop.
Save kylebgorman/53c6ab36f41cf4a846de to your computer and use it in GitHub Desktop.
convert PTB-style parse tree (essentially, an sexps) to the format for LaTeX's `qtree`/`tikz-qtree` library
#!/usr/bin/env python
# treeify.py: convert PTB parse to LaTeX's `qtree`/`tikz-qtree` format
#
# NB: this only works for documents with a single tree, due to a limitation
# with `nltk.tree`.
import fileinput
from nltk import Tree
# list of LaTeX reserved chars, from:
# http://tex.stackexchange.com/questions/34580/escape-character-in-latex
translations = {"&": r"\&",
"%": r"\%",
"$": r"\$",
"#": r"\#",
"_": r"\_",
"{": r"\{",
"}": r"\}",
"~": r"\textasciitilde",
"^": r"\textasciicircum",
"\\": r"\textbackslash"}
if __name__ == "__main__":
tstring = "".join(fileinput.input()).strip()
tree = Tree.fromstring(tstring)
treestring = tree.pprint_latex_qtree()
newstring = treestring[:5]
for char in treestring[5:]:
newstring += translations.get(char, char)
print(newstring)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment