Skip to content

Instantly share code, notes, and snippets.

View matiskay's full-sized avatar

Edgar Marca matiskay

View GitHub Profile
# To install the Python client library:
# pip install -U selenium
# Import the Selenium 2 namespace (aka "webdriver")
from selenium import webdriver
# iPhone
driver = webdriver.Remote(browser_name="iphone", command_executor='http://172.24.101.36:3001/hub')
# Android
@import compass
$icons: sprite-map("icons/*.png")
$icons-hd: sprite-map("icons-hd/*.png")
i
background: $icons
display: inline-block
@media (-webkit-min-device-pixel-ratio: 1.5), (min-resolution: 144dpi)
background: $icons-hd

Text Classification

To demonstrate text classification with Scikit Learn, we'll build a simple spam filter. While the filters in production for services like Gmail will obviously be vastly more sophisticated, the model we'll have by the end of this chapter is effective and surprisingly accurate.

Spam filtering is the "hello world" of document classification, but something to be aware of is that we aren't limited to two classes. The classifier we will be using supports multi-class classification, which opens up vast opportunities like author identification, support email routing, etc… However, in this example we'll just stick to two classes: SPAM and HAM.

For this exercise, we'll be using a combination of the Enron-Spam data sets and the SpamAssassin public corpus. Both are publicly available for download and are retreived from the internet during the setup phase of the example code that goes with this chapter.

Loading Examples

@matiskay
matiskay / peepcode.rb
Last active January 1, 2016 11:28 — forked from gertig/peepcode.rb
require 'mechanize'
@username = 'USERNAME'
@password = 'PASSWORD'
@download_path = File.expand_path '~/videos'
@wget_cookie = File.expand_path(File.dirname(__FILE__)) + '/wget-cookies.txt'
unless File.directory? @download_path
puts "@{download_path} doesn't exist!"
exit

OS X Screencast to animated GIF

This gist shows how to create a GIF screencast using only free OS X tools: QuickTime, ffmpeg, and gifsicle.

Screencapture GIF

Instructions

To capture the video (filesize: 19MB), using the free "QuickTime Player" application:

javascript:( function() {
console.group( 'Performance Information for all entries of ' + window.location.href );
console.log( '\n-> Duration is displayed in ms\n ' )
var entries = window.performance.getEntries();
entries = entries.sort( function( a, b ) {
return b.duration - a.duration;
} );
@matiskay
matiskay / svm.py
Created August 28, 2014 03:12 — forked from mblondel/svm.py
# Mathieu Blondel, September 2010
# License: BSD 3 clause
import numpy as np
from numpy import linalg
import cvxopt
import cvxopt.solvers
def linear_kernel(x1, x2):
return np.dot(x1, x2)
@matiskay
matiskay / svmflag.py
Created August 28, 2014 03:15 — forked from glamp/svmflag.py
import numpy as np
import pylab as pl
import pandas as pd
from sklearn import svm
from sklearn import linear_model
from sklearn import tree
from sklearn.metrics import confusion_matrix
import json
import geojson
from shapely.geometry import shape
o = {
"coordinates": [[[23.314208, 37.768469], [24.039306, 37.768469], [24.039306, 38.214372], [23.314208, 38.214372], [23.314208, 37.768469]]],
"type": "Polygon"
}
s = json.dumps(o)
@matiskay
matiskay / README.md
Last active August 29, 2015 14:07 — forked from tyrasd/README.md

Plotting Node Density

  1. download the source data

     wget http://fred.dev.openstreetmap.org/density/tiles.13
     wget http://fred.dev.openstreetmap.org/density/tiles.16
    
  2. convert to simple, gnuplot-readable text format

sed 's/([0-9]) z=([0-9]) x=([0-9]) y=([0-9])/\3 \4 \1/' < tiles.13 > tiles.13.txt