Skip to content

Instantly share code, notes, and snippets.

@caged
Last active August 29, 2015 13:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save caged/9204169 to your computer and use it in GitHub Desktop.
Save caged/9204169 to your computer and use it in GitHub Desktop.

Hi,

I'm looking through the City Council agenda items and noticed there is a great deal of information in each one. However, all of the documents are in the proprietary PDF format which makes them hard for machines (computers) to parse. If I were looking to extract some meaningful data (say expense, revenue, and affected city areas), I would need to comb through each one of these files manually.

It would be great if the city were to release these documents as HTML or XML documents. Even plain text is better than PDF for parsing!

With the above in mind, I have two questions:

  • Can the city provide additional open formats for agenda item documents going forward?
  • Do you have this data summarized in any other formats (excel, csv, etc.)? It seems like some electronic documents should exist that record revenue and expenses for each agenda item.

Thanks for your time, -Justin Palmer

Hello Justin, I understand your enthusiasm for open source documents. At this time however, pdfa is the archival format used by the City. (After the items are voted on by Council and processed, they are converted to pdfa.)

This site--Council Agenda, one bread crumb back from the link you gave to the items--gives the title of the items. It's not what you are looking for, but it does allow readers to select the item they are interested in without opening each agenda file. https://www.portlandonline.com/auditor/index.cfm?c=26997

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment