Skip to content

Instantly share code, notes, and snippets.

View shaldengeki's full-sized avatar

Charles OuGuo shaldengeki

View GitHub Profile
@shaldengeki
shaldengeki / gist:1f7e1304977d66ab8a03
Created January 8, 2015 17:52
feinstein-ssci-torture-report-response
Dear Charles:
Thank you for contacting me regarding the Senate Select Committee on Intelligence's report on the use of so-called enhanced interrogation techniques by the Central Intelligence Agency (CIA). I appreciate your engagement in this important issue, and I welcome the opportunity to provide you with additional information.
On December 9, 2014, the Senate Intelligence Committee, which I chaired from January 2009 to December 2014, released the executive summary and findings of its report on the CIA's former Detention and Interrogation Program. The Committee adopted the report by a bipartisan vote of 9-6 on December 13, 2012, and agreed to submit portions of the report to the Executive Branch for declassification on April 3, 2014 by a vote of 11-3. A copy of the report's declassified executive summary and findings is available on the Committee's website or at http://tinyurl.com/intelligence-report.
@shaldengeki
shaldengeki / sabretooth-llamaguy-tagd.md
Created December 25, 2013 18:29
Sabretooth and LlamaGuy's descriptions of how tagd works on ETI

Sabretooth

Anyway, we do use a custom service written in C++ that holds all of the tag data in-memory. Each tag holds ordered set of topic IDs. The order is by last post time and then by topic ID for tie breaking; this ordering is consistent between different tags, which gets important later. If you iterate through one of these sets you'll get all the topics tagged with that tag, most recently updated to least recently updated.

Tags also define and implement a very simple cursor interface. This interface is basically three functions; "advance the cursor by one topic", "advance the iterator until it's pointing to or past this topic (passing in a reference topic)", and "give me the currently pointed-to topic". When you are talking to the service, you can say "skip 50 topics and then give me 50 topics", and it will advance the cursor 50 times and throw away the result, and then advance the cursor 50 more times while returning each result. Pretty straightforward.

The cursor interface is also implemen

@shaldengeki
shaldengeki / gist:5855100
Created June 25, 2013 01:04
markov post model. horribly unoptimized but you know whatever
import bs4
import re
import random
APOSTROPHE_REGEX = re.compile("'")
NON_ALPHANUMERIC_REGEX = re.compile('[^a-zA-Z0-9]+')
def strip_tags(text, valid_tags):
text = bs4.BeautifulSoup(text)
while len(text.contents) == 1:
@shaldengeki
shaldengeki / userSimilarities-v0.3.py
Created September 18, 2012 14:29
user similarities map reduce job. implementing smaller cartesian product optimization.
from mrjob.job import MRJob
from math import sqrt
try:
from itertools import combinations
except ImportError:
def combinations(iterable, r):
"""
Implementation of itertools combinations method.
Re-implemented here because of import issues
@shaldengeki
shaldengeki / userSimilarities-v0.2.py
Created September 17, 2012 16:27
user similarities map reduce job. takes into account topics where only one user out of a pairing has posted.
from mrjob.job import MRJob
from math import sqrt
try:
from itertools import combinations
except ImportError:
def combinations(iterable, r):
"""
Implementation of itertools combinations method.
Re-implemented here because of import issues
@shaldengeki
shaldengeki / userSimilarities-v0.1.py
Created September 14, 2012 22:27
user similarities map reduce job. this works, but disregards topics where only one user has posted.
from mrjob.job import MRJob
'''
Some constants.
'''
PRIOR_CORRELATION = 0
PRIOR_COUNT = 10
MIN_INTERSECTION = 20
MIN_TOPICS = 20
@shaldengeki
shaldengeki / userSimilarities-faulty.py
Created September 14, 2012 20:37
user similarities map reduce job. this doesn't work!
from mrjob.job import MRJob
# Put metrics here.
# add metrics to the below function.
def calculate_metrics(v1, v2):
'''
Calculates similarity metrics between two unfilled vectors.
Returns [metric1, metric2, ...]
'''