Skip to content

Instantly share code, notes, and snippets.

View chrishwiggins's full-sized avatar

chris wiggins chrishwiggins

View GitHub Profile
there's probably more consensus building in academia than in
real world. particularly since some of the participants are
tenured, you have to get along with them for years, where in
real world you can just fire them or they'll just find a new
job. tenure makes the market for faculty extremely illiquid.
background
outline
data science
practices (managerial)
reframing questions as ML
better wrong than "nice"
better science:
scribd URL: http://www.scribd.com/doc/224608514/The-Full-New-York-Times-Innovation-Report
0 (cover)
1-2 executive summary
- (general)
3-5 introduction
- NYT "is winning at journalism"
- falling behind in...the art and science of getting our journalism to readers"
4 {graphic} vast print & digital audience
frequently asked question:
Q: I would like to ask your advice about preparing for a role in data science
A:
my advice would be to put together a portfolio of projects, on GitHub,
evidencing that you know how to
- get data (e.g., via wget/curl)
wiggins@tantanmen{algorithms}132: lynx -dump -nolist -nobold -nocolor -noreverse https://github.com/ledeprogram/courses/tree/master/algorithms | /usr/bin/perl -pe 's/[^[:ascii:]]/+/g' | tr ',:; /\. ( ) ?-"#[0-9]' '\n' | tr '[:upper:]' '[:lower:]' | grep '[a-z]' | sort -bfd | uniq -c | sort -nr | grep -v '^ 1 '
25 of
20 literacy
17 to
17 in
16 o
16 data
16 a
15 algorithms
14 the
BuzzFeed has technology at its core.
Its 100+ person tech team has created world-class systems for
analytics,
advertising, and
content management.
Engineers are 1st class citizens.
Everything is built for mobile devices from the outset.
Internet native formats like
lists,
tweets,
The Bayesian approach to model selection is a subject you'll
like. The basic idea is to compute the "Bayes Factor":
http://en.wikipedia.org/wiki/Bayes_factor .
As the page says "Bayesian inference has been put forward as a
theoretical justification for and generalization of Occam's
razor".
( http://en.wikipedia.org/wiki/Occam%27s_razor )
The Bayes factor can be approximated under sum assumptions,
leading to a simple penalized maximum likelihood called the
learning mixtures of ranking models
consistency of spectral partitioning of uniform hypergraphs under
optimal rates for $k$-nn density and mode estimation
bayesian inference for structured spike and slab priors
grouping-based low-rank video completion and 3d reconstruction
tightening after relax: minimax-optimal sparse pca in polynomial
belief propagation recursive neural networks
communication efficient distributed machine learning with the
on the statistical consistency of plug-in classifiers for
distributed context-aware bayesian posterior sampling via
- tukey's 1962 paper on the tension between
mathematical statistics and applied computational statistics
http://web.stanford.edu/~gavish/documents/Tukey_the_future_of_data_analysis.pdf
- william cleveland's 2001 "data science" paper
http://www.datascienceassn.org/sites/default/files/Data%20Science%20An%20Action%20Plan%20for%20Expanding%20the%20Technical%20Areas%20of%20the%20Field%20of%20Statistics.pdf
- interview w/leo breiman, heretical statistician
http://projecteuclid.org/euclid.ss/1009213290
Q: I want to sign up for 3900 (supervised research). How many
credits will you give me?
A: If you want to take 3900 with me, we need to come to a
contract, and this contract needs to be closed before the start
of the semester. The contract will stipulate:
- Who is the scientific advisor (if not me)
- What is the deliverable (e.g., technical report, oral report)