Skip to content

Instantly share code, notes, and snippets.

View ekingery's full-sized avatar

Eric Kingery ekingery

View GitHub Profile
@ekingery
ekingery / parse-rcv1-topics.py
Last active January 5, 2017 17:07
Parse RCV1 topics into a tree structure
# This script parses the RCV1 topics into a tree structure
# It can then be exported to json or dotfile format
# For more info on RCV1, see
# http://jmlr.csail.mit.edu/papers/volume5/lewis04a/lewis04a.pdf
import re
from treelib import Tree
from treelib.plugins import export_to_dot
# read topics from flat file into a list of lists