Skip to content

Instantly share code, notes, and snippets.

View justmytwospence's full-sized avatar

Spencer justmytwospence

View GitHub Profile
@justmytwospence
justmytwospence / TextMining.ipynb
Created March 13, 2014 05:16
Various applications of text mining. Includes multinomial naive Bayes, TFIDF vectorizer, cross-validation over pipelines, confusion matrix of classifications, cosine distances between documents, collocation matrices, graph visualization with NetworkX and Gephi, various uses of textutils.py, and connecting to the Twitter API.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@justmytwospence
justmytwospence / Vectorization.ipynb
Created February 26, 2014 05:23
Vectorization for text mining. Includes the Porter Stemmer, a custom regular expression tokenizer, and the sklearn term frequency - inverse document frequency vectorizer.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
<!DOCTYPE html>
<html>
<head>
<title>#!title#</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<!-- Latest compiled and minified CSS -->
<link rel="stylesheet" href="http://netdna.bootstrapcdn.com/bootstrap/3.0.0/css/bootstrap.min.css">
@justmytwospence
justmytwospence / Logit.ipynb
Created February 17, 2014 06:47
Logistic regression with sci-kit learn. Second homework for MSAN 630. Learning curves, shuffle splits, L1 regularization, and more.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@justmytwospence
justmytwospence / Rmagic.ipynb
Last active August 29, 2015 13:55
IPython is not just for Python! Master your interdisciplinary data-science-fu with IPython magic! View here: http://nbviewer.ipython.org/gist/justmytwospence/8750427
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@justmytwospence
justmytwospence / SVM.ipynb
Created February 1, 2014 03:55
First homework assignment for MSAN 630. All about training SVMs, including parameter grid search, leave one out cross validation, k-fold cross validation, 2 dimensional decision boundary plotting, RBF kernels, and more. View it here: http://nbviewer.ipython.org/gist/justmytwospence/8747722.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
% FontAwesome (http://fortawesome.github.com/Font-Awesome/) bindings for (Xe)LaTeX
% Author: Honza Ustohal <honza@egoistic.biz>
% A few icons added by: Spencer Boucher <spencer.g.boucher@gmail.com>
%
% Translation of FontAwesome's private range characters into XeTeX symbols. All icons are camel-cased and prefixed with 'fa', i.e. what was .icon-align-center the CSS version of FontAwesome becomes \faAlignCenter
% This might be reworked into a full blown package in the near future
%
% Prerequisite:
% XeLaTeX, FontAwesome installed as a system font accessible by XeLaTeX
%
@justmytwospence
justmytwospence / direct-download.php
Last active January 3, 2016 17:29
This PHP function will initiate a direct download of a file (pdf, etc.). MIME type on line 5 will also need to be changed if not PDF (eg "application/xml"). Now just create a hyperlink directly to this php script.
<?php
$actual_file_name = "path/to/foo.pdf";
$saved_file_name = "foo-bar.pdf"
header("Content-Type: application/pdf");
header("Content-Disposition: attachment; filename=$saved_file_name");
header("Content-Length: " . filesize($actual_file_name));
readfile($actual_file_name);
exit;
@justmytwospence
justmytwospence / stratified-sample.R
Last active December 31, 2015 04:59
I was surprised to find that R doesn't have a base function for stratified random sampling. There's not even a well known package I could find that does this in a straight forward way. So heres my own. It is essentially a wrapper for a ddply call that samples each subset and then combines them. If the size argument is less than 1, it will be int…
stratified_sample <- function(df, size = .5, .by, seed = 37L) {
require(plyr)
set.seed(seed)
df.sample <- ddply(df, .by,
function(x) {
if (size < 1) { size <- size * nrow(x) }
return(x[sample(nrow(x), size = size),])
},
.progress = 'text')
return(df.sample)
@justmytwospence
justmytwospence / ignore-big
Last active December 27, 2015 21:29
Shell: Add all large files to .gitignore
find . -size +1G | cat >> .gitignore