Skip to content

Instantly share code, notes, and snippets.

@mikedewar
Last active August 29, 2015 14:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mikedewar/af6ec57d35dd1c6a33e7 to your computer and use it in GitHub Desktop.
Save mikedewar/af6ec57d35dd1c6a33e7 to your computer and use it in GitHub Desktop.
streamtools pattern to parse the FCC "Comments on Protecting and Promoting the Open Internet" XML files.
{
"Connections": [
{
"ToRoute": "in",
"ToId": "1",
"FromId": "4",
"Id": "5"
},
{
"ToRoute": "in",
"ToId": "6",
"FromId": "1",
"Id": "7"
},
{
"ToRoute": "in",
"ToId": "11",
"FromId": "9",
"Id": "18"
},
{
"ToRoute": "pop",
"ToId": "9",
"FromId": "10",
"Id": "16"
},
{
"ToRoute": "in",
"ToId": "2",
"FromId": "6",
"Id": "8"
},
{
"ToRoute": "in",
"ToId": "3",
"FromId": "6",
"Id": "13"
},
{
"ToRoute": "in",
"ToId": "12",
"FromId": "11",
"Id": "15"
},
{
"ToRoute": "push",
"ToId": "9",
"FromId": "3",
"Id": "17"
}
],
"Blocks": [
{
"Position": {
"Y": 512.991455078125,
"X": 409.991455078125
},
"Rule": {
"Filename": "doc.json"
},
"Type": "tofile",
"Id": "2"
},
{
"Position": {
"Y": 335,
"X": 512.9971466064453
},
"Rule": {
"Path": ".body"
},
"Type": "parsexml",
"Id": "6"
},
{
"Position": {
"Y": 665.991455078125,
"X": 829
},
"Rule": {
"Map": {
"zip": ".arr[17].str",
"state": ".arr[13].str",
"date": ".arr[4].date",
"comment": ".arr[15].str",
"city": ".arr[3].str",
"applicant": ".arr[0].str"
},
"Additive": false
},
"Type": "map",
"Id": "11"
},
{
"Position": {
"Y": 222.9943084716797,
"X": 408.991455078125
},
"Rule": {
"UrlPath": "",
"Url": "http://www.fcc.gov/files/ecfs/14-28/14-28-RAW-Solr-1.xml",
"Method": "GET",
"Headers": {},
"BodyPath": "."
},
"Type": "webRequest",
"Id": "1"
},
{
"Position": {
"Y": 750.9885864257812,
"X": 927.9971313476562
},
"Rule": {
"Filename": "comments.json"
},
"Type": "tofile",
"Id": "12"
},
{
"Position": {
"Y": 582.991455078125,
"X": 725.9971313476562
},
"Rule": null,
"Type": "queue",
"Id": "9"
},
{
"Position": {
"Y": 108,
"X": 370.99713134765625
},
"Rule": null,
"Type": "bang",
"Id": "4"
},
{
"Position": {
"Y": 417.99713134765625,
"X": 601.9971313476562
},
"Rule": {
"Path": ".response.result.doc"
},
"Type": "unpack",
"Id": "3"
},
{
"Position": {
"Y": 493,
"X": 536.9942626953125
},
"Rule": {
"Interval": "500us"
},
"Type": "ticker",
"Id": "10"
}
]
}
@mikedewar
Copy link
Author

To use this, run streamtools (find out more at http://nytlabs.github.io/streamtools/). To import this package download it to your computer and run

curl localhost:7070/import -d@thisjsonfile.json

This will create a pattern in your running streamtools. Then, in a browser, visit http://localhost:7070 and you should see a bunch of connected blocks. In the top left of the pattern is the bang block. If you click on the red square on the bang block it will set the whole thing going.

Beware! It takes a few seconds to download the file, and then a few more seconds to parse the XML, and then a few more seconds to write the nice json to a file. At the end of that (be patient!) you should have a file called comments.json in the directory you ran streamtools from.

Each line of that file is a single comment, that looks like:

{
  "zip": "90210",
  "state": "MA",
  "date": 2014-06-04T04:00:00Z,
  "comment": "7521204213.txt\nI'm concerned. \nI'm concerned that the age of an open, fair, accessible internet is over in this \ncountry. I'm concerned that ISPs, most of which are embarrassingly behind the global\ncurve in terms of infrastructure and affordability, will be the ones calling the \nshots and controlling who can access what data, at what speeds, for what cost.\nFCC, please: yield to reason. Do not allow greed and corporate interests to tarnish \nand potentially ruin what is potentially the single greatest asset we have: an open \nand neutral internet.\nClassify ISPs as common carriers and force them to invest in their own \ninfrastructure and services. I urge you to listen to public on this matter. If the \n\"fast and slow lanes\" plan comes to fruition, it may end up being the single \ngreatest mistake your organization ever makes. A true hindrance toward growth and \nprogress.\nAll information deserves to be treated fairly. Access to all data needs to be \npossible, regardless of the source. Make ISPs common carriers. It's that simple. \nAnd, it's the only chance we have at making these companies compete for business, care about their customers, and invest in their own product. Compared to other firstworld countries, the quality and price of internet access in the US is staggeringly,disgustingly, hilariously embarrassing. But it doesn't have to be. Do the right thing.\nThank you for your time.\nPage 1",
  "city": "Some City",
  "applicant": "A Nice Fellow"
}

(I have edited some fields to spare the innocent).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment