Skip to content

Instantly share code, notes, and snippets.

@mluedke2
Last active December 19, 2015 23:19
Show Gist options
  • Save mluedke2/6033951 to your computer and use it in GitHub Desktop.
Save mluedke2/6033951 to your computer and use it in GitHub Desktop.
in progress of scraping site for Administrative Code of San Francisco at request of City Hall
@mluedke2
Copy link
Author

unfortunately, this is not working completely correctly yet. it seems to be starting at the right places, creating chapters, articles, and sections appropriately. However, while it's almost impossible to do a good analysis because there's just so much text, there is at least one case where the "section" text contains the section you want, but then it just keeps running and running.

So titles are fine. Looks like just the section text. It must have something to do with the histories and footers...

Even so, this is a relatively minor bug and development of the front-end can continue regardless. Just needs to be fixed before launch.

@mluedke2
Copy link
Author

i am now moving this over to a version that does not go into mysql but rather parses everything into a format called "The State Decoded," for which there are some nice open-source browsers being built in the sf-brigade github account

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment