Skip to content

Instantly share code, notes, and snippets.

Brendan O'Connor brendano

Block or report user

Report or block brendano

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@brendano
brendano / NOTES.md
Created Jun 12, 2012
Patches to compile ocropus on Mac OSX 10.6 -- see explanation at NOTES.md at bottom https://gist.github.com/2919800#file_notes.md
View NOTES.md

by Brendan O'Connor (http://brenocon.com)

I got all of ocropus to compile on Mac OSX 10.6, though I haven't tested it much yet. This is the current version inside the ocropus hg repository, so approximately version 0.5, with iulib perhaps 0.4ish.

See ocroinst.osx -- the first file in "everything_besides_iulib.diff" -- for line-by-line instructions; the script may even just run. We're assuming Homebrew and pip (see the comments).

@brendano
brendano / sim.py
Created Mar 24, 2012
matrix-tree thm for CRF marginal dependencies via matrix inversion, from koo et al. 2007
View sim.py
In [9]: run -i sim
for each word: prob connect to root
[ 0.28026994 0.16394082 0.10616135 0.17767563 0.12675216 0.1452001 ]
for (head,child) entries: P(head <- child)
[[ 0. 0.12563458 0.27335659 0.17451717 0.24229165 0.24617475]
[ 0.25410789 0. 0.21883649 0.09361327 0.17785164 0.2048422 ]
[ 0.12280784 0.12047786 0. 0.13921346 0.12119944 0.11342211]
[ 0.11823039 0.27609723 0.15487263 0. 0.22541093 0.15197973]
[ 0.11249058 0.1968143 0.09766069 0.21988013 0. 0.13838112]
[ 0.11209336 0.11703521 0.14911225 0.19510033 0.10649417 0. ]]
@brendano
brendano / analysis.txt
Created Jun 14, 2011
How much text versus metadata is in a tweet?
View analysis.txt
How much text versus metadata is in a tweet?
Brendan O'Connor (brenocon.com), 2011-06-13
http://twitter.com/brendan642/status/80473880111742976
What's it mean to compare the amount of text versus metadata?
Let's start with raw size of the data that comes over the wire from Twitter.
## Get tweets out of a sample stream archive.
## (e.g. curl http://stream.twitter.com/1/statuses/sample.json)
% cat tweets.2011-05-19 | grep -P '"text":' | head -100000 > 100k_tweets
You can’t perform that action at this time.