Skip to content

Instantly share code, notes, and snippets.

@djsun
Created May 13, 2010 15:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save djsun/399988 to your computer and use it in GitHub Desktop.
Save djsun/399988 to your computer and use it in GitHub Desktop.
# The information is somewhat jumbled together. This method takes
# advantage that headings are wrapped in <b> tags. The result is
# an array of strings, which is easier to parse than one big string
# (which is what we would have gotten if we used `start.content`)
#
# Sample result:
# [
# "Description:",
# "Operating and Maintenance Petitions. When the annual allowable increase does not completely cover the landlord's yearly increase in operating and maintenance expenses for a property, a landlord may petition for an additional base rent increase of up to 7%. This is known as an Operating and Maintenance or O&M increase",
# "Agency Name:",
# "San Francisco Rent Board",
# "Time Period:",
# "January 1, 2004 thru December 15, 2009",
# "Location of dataset:",
# "http://apps.sfgov.org/datafiles/view.php?file=rentboard/LLPet_OpMaintExp.csv",
# "Location of Data Dictionary:",
# "http://apps.sfgov.org/datafiles/view.php?file=rentboard/LLPet_OpMaintExp_dictionary.rtf",
# ]
def walk_nodes(start)
result = []
start.children.each do |x|
if x.text?
cleaned = U.single_line_clean(x.content)
result << cleaned unless cleaned == ""
else
result.concat(walk_nodes(x))
end
end
result
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment