Skip to content

Instantly share code, notes, and snippets.

View boblannon's full-sized avatar

Bob Lannon boblannon

View GitHub Profile
all_records_that_are_new_or_changed_since_your_last_import = get_all_records_that_are_new_or_changed_since_your_last_import()
for record in all_records_that_are_new_or_changed_since_your_last_import:
found = find_the_record_in_your_old_data(record)
if not found:
add_the_record(record)
if found:
change_whatever_bits_of_the_record_are_different(record)
all_your_existing_data = get_all_your_existing_data()
for chunk in record_chunks:
pool.apply_async(scoring_function, (chunk,), callback=score_queue.put)
pool.close()
pool.join()
import json
import csv
from io import StringIO
import requests
from lxml import etree
resp = requests.get('http://www.barackobama.com/contribution-disclosure/')
parsed = etree.parse(StringIO(resp.text), parser=etree.HTMLParser())
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-1265484-62', 'influenceexplorer.com');
ga('send', 'pageview');
ga('create', 'UA-48789618-1', 'influenceexplorer.com', {'name': 'global'});
ga('global.send', 'pageview');
@boblannon
boblannon / data_commons.md
Last active August 29, 2015 14:11
outline of presentation on why and how to build a shared open data commons. also some pitfalls to avoid.
@boblannon
boblannon / searching_nodes.py
Created December 11, 2014 19:06
searching nodes on ES index
from elasticsearch import Elasticsearch
es = Elasticsearch(['localhost:9201',])
def node_phrase_query(node_id, key_phrase):
qbody = {"query":
{
"bool": {
"must": [
{"match_phrase": {"text": key_phrase}},
{"match_phrase": {"clusters": node_id}}
6018211044-774
6018210841-12253
6018211039-10619
6018211039-11153
6018210827-1520
02-055-000996
6018261174-1333
6018210841-9305
6018210827-2681
6018211170-8238
@boblannon
boblannon / keybase.md
Last active November 18, 2015 00:26
My keybase.io proof

Keybase proof

I hereby claim:

  • I am boblannon on github.
  • I am boblannon (https://keybase.io/boblannon) on keybase.
  • I have a public key whose fingerprint is DA32 3774 0A70 5FE1 234B E4F3 6543 90FE 7D1B 247B

To claim this, I am signing this object:

@boblannon
boblannon / sfm_compare_all.py
Created May 8, 2013 20:47
script to demonstrate basic python-superfastmatch pairwise comparison of all documents in a collection
from superfastmatch import client
import uuid
sfm_client = client.Client(url='http://127.0.0.1:9000/')
class Document():
def __init__(self,title_string,content_string):
self.doc_id = uuid.uuid4()
@boblannon
boblannon / residual_IDF.py
Last active December 17, 2015 05:28
quick and dirty method of using residual IDF to find keywords in a corpus. implementation of chruch and gale 1991
from collections import defaultdict
from math import log
from math import exp
import pandas as pd
# this is based on data in the form released here: http://corpora.uni-leipzig.de/
# inv_w.txt is a table of (word_id, sentence_id, offset), which lets us create an inverted
# index with offset information