Skip to content

Instantly share code, notes, and snippets.

@sandsfish
sandsfish / python-simple-scrape.py
Created December 18, 2013 20:09
Very simple example of using Python and Requests to scrape the results from a search interface.
# For non-trivial scraping, better to use Scrapy or Beautiful Soup...
# - http://doc.scrapy.org/en/latest/intro/tutorial.html
# - http://www.crummy.com/software/BeautifulSoup/
import requests
r = requests.get("http://sacbee.com/search_results?aff=1100&q=robot")
r
# <Response [200]>
float time = 0;
float radius = 0;
void setup() {
background(5);
size(displayWidth,displayHeight, OPENGL);
noStroke();
}
void draw() {
@sandsfish
sandsfish / ReadHTTPData.pde
Created March 23, 2013 22:44
Minimal setup to use Apache's HTTPClient in a Processing sketch.
import org.apache.http.impl.client.DefaultHttpClient;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.HttpResponse;
import org.apache.http.HttpEntity;
// http://hc.apache.org/httpcomponents-client-ga/tutorial/html/fundamentals.html - Apache HttpClient
// https://forum.processing.org/topic/http-post-processing - Integration with Processing (different from above)
// https://code.google.com/p/processing/source/browse/trunk/processing/java/libraries/net/examples/HTTPClient/HTTPClient.pde?r=7950 - Messy.
// https://github.com/francisli/processing-http - Unnecessary?
@sandsfish
sandsfish / getMARCField Example.r
Created February 12, 2013 18:54
R function to parse out list of specific field/sub-field from MARC/XML
library(XML)
getMARCField = function(marc_doc, tag, code) {
xpath = paste("/m:collection/m:record/m:datafield[@tag='", tag, "']/m:subfield[@code='", code, "']", sep="")
return(xpathApply(marc_doc, xpath, namespaces=c("m"), xmlValue))
}
vs = xmlRoot(xmlParse('vail-first-3500.xml'))
field100a = getMARCField(vs, '100', 'a')
@sandsfish
sandsfish / try-tx.rb
Created December 18, 2012 21:34
RDF.rb transaction delete failing with gems/rdf-0.3.11/lib/rdf/mixin/mutable.rb:124:in `delete': undefined method `query' for #<RDF::Transaction:0x1e2bde(graph: nil, deletes: 0, inserts: 0)> (NoMethodError)
#!/Users/sands/.rvm/rubies/ruby-1.8.7-p371-i386/bin/ruby
require 'rubygems'
require 'rdf'
require 'rdf/ntriples'
include RDF
repository = RDF::Repository.load("http://rdf.rubyforge.org/doap.nt")
# reports correct predicate URI...
# DOAP.name: http://usefulinc.com/ns/doap#name
# all_summary is the data here. in this case, just a lot of text records collapsed into on corpus string
# Prep data for NLTK Analysis
import nltk.collocations
tokens = nltk.word_tokenize(all_summary)
text = nltk.Text(tokens)
# Remove stop-words, convert to lower-case, remove all non-alpha characters
from nltk.corpus import stopwords
stopwords = stopwords.words('english')
@sandsfish
sandsfish / doc-info-snippet.txt
Created June 22, 2012 20:55
multiple committee members HTML fields
<div id="committee-members"><table id="att4" style="display">
<tbody><tr>
<td><input type="text" class="input-medium" id="committeeFirstName4" name="committeeFirstName4"></td>
<td><input type="text" class="span1" id="committeeMiddleInitial4" name="committeeMiddleInitial4"></td>
<td><input type="text" class="input-medium" id="committeeLastName4" name="committeeLastName4"></td>
@sandsfish
sandsfish / Submit.java
Created June 4, 2012 21:35
Submit Controller Authentication Wiring
package controllers;
import play.*;
import play.mvc.*;
import play.mvc.Http.Header;
import java.util.*;
import java.util.Map.Entry;
import org.tdl.vireo.model.RoleType;
@sandsfish
sandsfish / Submit.java
Created June 4, 2012 21:34
Submit Controller Authentication
package controllers;
import play.*;
import play.mvc.*;
import play.mvc.Http.Header;
import java.util.*;
import java.util.Map.Entry;
import org.tdl.vireo.model.RoleType;
# Processing Required to fulfill one embed request (672ms) + rendering time
#
# User load should not be necessary when feed is embedded from a page that is not on the site.
#
# Caching: should perhaps be done at the CompositeFeed level (which only needs to be refreshed if the config has changed)
# Caching: should also be done at the feeds/show (JavaScript) point to quickly serve processed text versions, which will
# be updated at the Feed Refresh Interval.
Processing FeedsController#show to js (for -- at 2010-07-27 07:30:43) [GET]