Skip to content

Instantly share code, notes, and snippets.

View gavinmh's full-sized avatar

Gavin Hackeling gavinmh

View GitHub Profile
@gavinmh
gavinmh / latent-dirichlet-allocation.md
Created October 6, 2013 15:55
Latent Dirichlet Allocation

a

@gavinmh
gavinmh / node-based-semantic-similarity.md
Created October 5, 2013 00:16
Node-based Semantic Similarity Measures

Node-Based Semantic Similarity Measures

The following node-based semantic similarity measures are based on the information content of the lowest common subsumer of two words.

  • Information content is the probability of finding a term in a given corpus.
  • The lowest common subsumer is the most specific ancestor of two words in a lexical taxonomy such as WordNet. For example, the words "cat" and "dog" might have the common ancestors "animal" and "mammal". "Mammal" is their lowest common subsumer, because its distance to "cat" and "dog" is shorter. Note that if you conceive of WordNet as a tree, with "entity" as the root node, "lowest common subsumer" will be a misnomer; the lowest common subsumer is actually the ancestor that is farthest from the root.

These measures are dependent on the specific corpora used to generate the information content.

Resnik Similarity

@gavinmh
gavinmh / gist:6834934
Created October 5, 2013 00:16
Bayes' Theorem By/For Idiots
# Bayes' Theorem
Let's pretend that you wish to find the probability that two events, A and B, occur.
If A and B are independent events, then probability that A and B both occur is
P(A)P(B).
However, A and B might be related events. If they are not independent, the probability that A and B both occur is
P(A)P(B|A)
@gavinmh
gavinmh / mrlda-process.md
Last active December 24, 2015 04:19
mrlda-hadoop

Setup

bin/hadoop dfs -mkdir /home/hduser/raw_text
bin/hadoop dfs -mkdir /home/hduser/index
bin/hadoop dfs -mkdir /home/hduser/output
bin/hadoop dfs -copyFromLocal /home/gavin/dev/Mr.LDA/data/corpus1.txt /home/hduser/raw_text

Tokenize and Index

bin/hadoop jar /home/gavin/dev/Mr.LDA/bin/Mr.LDA-0.0.1.jar cc.mrlda.ParseCorpus -input /home/hduser/raw -output /home/hduser/indexed -mapper 2 -reducer 1

'Big Bang Theory' Brings Stephen Hawking on as Guest Star 'The Big Bang Theory' is getting a visit from Stephen Hawking. The renowned theoretical physicist will guest-star on the April 5 episode of the CBS comedy, the network said Monday. In the cameo, Hawking visits uber-geek Sheldon Cooper (Jim Parsons) at work 'to share his beautiful mind with his most ardent admirer,' according to CBS. Executive producer Bill Prady said that having Hawking on the show had long been a goal, though it seemed unattainable. When people would ask us who a dream guest star' for the show would be, we would always joke and say Stephen Hawking knowing that it was a long shot of astronomical proportions, Prady said. In fact, we're not exactly sure how we got him. It's the kind of mystery that could only be understood by, say, a Stephen Hawking. Hawking, known for his book A Brief History of Time, has appeared on television comedies before, albeit in voice work. Hawking has done a guest spot on 'Futurama' and appeared as himself on
U.N. Security Council unanimously passes Syria chemical weapons resolution UNITED NATIONS — The U.N. Security Council voted unanimously late Friday to approve an ambitious plan requiring Syria to surrender its chemical weapons for destruction, the first major diplomatic milestone reached more than two years after the start of the Syrian conflict. The resolution, adopted by a vote of 15 to 0, does not spell out what penalties the government in Damascus might face if it doesn’t comply. U.S. and European diplomats conceded that some of their toughest wording aimed at compelling Syria to obey the council’s demands and holding perpetrators to account for using chemical weapons was removed from the final resolution at Russia’s insistence. Still, the measure constituted the first legally binding action on Syria from the Security Council since the government of Syrian President Bashar al-Assad launched a brutal crackdown on peaceful protesters in early 2011.
Apple iPhone 5s I pull my iPhone out of my pocket, tap the
@gavinmh
gavinmh / rcv1-topics.txt
Created August 16, 2013 21:45
RCV1 Topics
CCAT: CORPORATE/INDUSTRIAL
C11: STRATEGY/PLANS
C12: LEGAL/JUDICIAL
C13: REGULATION/POLICY
C14: SHARE LISTINGS
C15: PERFORMANCE
C151: ACCOUNTS/EARNINGS
C1511 child-description: ANNUAL RESULTS
C152: COMMENT/FORECASTS
C16: INSOLVENCY/LIQUIDITY
{
"IAB1": "Arts & Entertainment",
"IAB1-1": "Arts & Entertainment::Books & Literature",
"IAB1-2": "Arts & Entertainment::Celebrity Fan/Gossip",
"IAB1-4": "Arts & Entertainment::Humor",
"IAB1-5": "Arts & Entertainment::Movies",
"IAB1-6": "Arts & Entertainment::Music",
"IAB1-7": "Arts & Entertainment::Television",
"IAB1-3": "Arts & Entertainment::Fine Art",
"IAB2": "Automotive",
@gavinmh
gavinmh / bing_search.py
Created June 15, 2013 23:05
Bing Search API Python example
import urllib
import json
bing_account_key = '<YOURKEY>'
search_base_url = \
'https://user:%s@api.datamarket.azure.com/Bing/SearchWeb/Web?' \
% bing_account_key
query = 'les paul'
num_results = 40

Approximate Algoritm Completion Times, N=100

10^-6 seconds
O(log(N)) 10^-7 seconds
O(N)