Skip to content

Instantly share code, notes, and snippets.

Bob Lannon boblannon

Block or report user

Report or block boblannon

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@boblannon
boblannon / residual_IDF.py
Last active Dec 17, 2015
quick and dirty method of using residual IDF to find keywords in a corpus. implementation of chruch and gale 1991
View residual_IDF.py
from collections import defaultdict
from math import log
from math import exp
import pandas as pd
# this is based on data in the form released here: http://corpora.uni-leipzig.de/
# inv_w.txt is a table of (word_id, sentence_id, offset), which lets us create an inverted
# index with offset information
@boblannon
boblannon / sfm_compare_all.py
Created May 8, 2013
script to demonstrate basic python-superfastmatch pairwise comparison of all documents in a collection
View sfm_compare_all.py
from superfastmatch import client
import uuid
sfm_client = client.Client(url='http://127.0.0.1:9000/')
class Document():
def __init__(self,title_string,content_string):
self.doc_id = uuid.uuid4()
@boblannon
boblannon / keybase.md
Last active Nov 18, 2015
My keybase.io proof
View keybase.md

Keybase proof

I hereby claim:

  • I am boblannon on github.
  • I am boblannon (https://keybase.io/boblannon) on keybase.
  • I have a public key whose fingerprint is DA32 3774 0A70 5FE1 234B E4F3 6543 90FE 7D1B 247B

To claim this, I am signing this object:

View sampling_non-form.txt
6018211044-774
6018210841-12253
6018211039-10619
6018211039-11153
6018210827-1520
02-055-000996
6018261174-1333
6018210841-9305
6018210827-2681
6018211170-8238
@boblannon
boblannon / searching_nodes.py
Created Dec 11, 2014
searching nodes on ES index
View searching_nodes.py
from elasticsearch import Elasticsearch
es = Elasticsearch(['localhost:9201',])
def node_phrase_query(node_id, key_phrase):
qbody = {"query":
{
"bool": {
"must": [
{"match_phrase": {"text": key_phrase}},
{"match_phrase": {"clusters": node_id}}
View gist:a16a1e22e00b94904283
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-1265484-62', 'influenceexplorer.com');
ga('send', 'pageview');
ga('create', 'UA-48789618-1', 'influenceexplorer.com', {'name': 'global'});
ga('global.send', 'pageview');
View ofa_scrape_donors.py
import json
import csv
from io import StringIO
import requests
from lxml import etree
resp = requests.get('http://www.barackobama.com/contribution-disclosure/')
parsed = etree.parse(StringIO(resp.text), parser=etree.HTMLParser())
View pool_loop
for chunk in record_chunks:
pool.apply_async(scoring_function, (chunk,), callback=score_queue.put)
pool.close()
pool.join()
View pseudo.py
all_records_that_are_new_or_changed_since_your_last_import = get_all_records_that_are_new_or_changed_since_your_last_import()
for record in all_records_that_are_new_or_changed_since_your_last_import:
found = find_the_record_in_your_old_data(record)
if not found:
add_the_record(record)
if found:
change_whatever_bits_of_the_record_are_different(record)
all_your_existing_data = get_all_your_existing_data()
You can’t perform that action at this time.