Skip to content

Instantly share code, notes, and snippets.

Dougal J. Sutherland dougalsutherland

Block or report user

Report or block dougalsutherland

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View cosine basis linear classifier with normal noise.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View Cosine, sine of normal distribution.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View sigmoid normal approximation.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@dougalsutherland
dougalsutherland / update_imdb.py
Last active Jan 1, 2016
Updates IMDB dumps without making on-disk uncompressed copies.
View update_imdb.py
#!/usr/bin/env python
'''
Updates an IMDB data dump (http://www.imdb.com/interfaces) fully-automatically,
without making any uncompressed copies on disk because those are big.
Note that this runs slower than it could because, well, it's Python. Works with
pypy, though. I haven't done any profiling or anything to see where the
slowdowns are.
'''
View kl estimate known from samples.html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>[]</title>
<style type="text/css">
.clearfix{*zoom:1;}.clearfix:before,.clearfix:after{display:table;content:"";line-height:0;}
.clearfix:after{clear:both;}
.hide-text{font:0/0 a;color:transparent;text-shadow:none;background-color:transparent;border:0;}
.input-block-level{display:block;width:100%;min-height:30px;-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;}
View gmail_hammer.py
#!/usr/bin/env python
from email.utils import parseaddr
import re
from imapclient import IMAPClient, SEEN
DEFAULT_SEARCH = "(from:*@*.gov OR from:*@*.edu OR from:*@*.mil OR from:*@tiaa-cref.mkl-et.com) AND NOT label:!!!unspam"
DEFAULT_RE = '(.*\.(gov|edu|mil)$)|(.*@tiaa-cref\.mkl-et\.com$)'
DEFAULT_TARGET_LABELS = '!!!unspam'
View comment_counts.tsv
We can make this file beautiful and searchable if this error is corrected: Do not allow except col_sep_split_separator after quoted fields in line 7.
post_date post_title comment_count group_concat(terms.name)
2007-03-02 14:24:27 Economics hires two tenure-track professors 0 dtraber
2007-05-03 08:41:50 Pericles Project grants 0 lstokes
2007-05-03 08:43:43 ITS Director Judy Downing to Retire 0 mskorpe1
2007-05-02 08:44:58 Student Council report and election results 0 lstokes
2007-05-02 08:45:55 Worth Director Linda Echols reflects on her time at Swarthmore and prepares for retirement 0 mskorpe1
2007-05-02 08:47:10 "Blackness" only one of many competing identities in the Black diaspora, says historian Neptune 0 lstokes
2007-05-07 10:20:08 Two juniors win Rockefeller Teaching Fellowships 0 lstokes
2007-05-07 10:21:58 Gazette Schedule 0 dailygazette
2007-05-08 08:35:18 What's the structure behind Papazian? 0 lstokes
View discrete divergences.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View GMM divergences.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
View dump_rdata.py
#!/usr/bin/env python
import numpy as np
import sys
from six import iteritems
from six.moves import zip as izip
from six.moves import xrange
from itertools import chain, repeat, islice
You can’t perform that action at this time.