Skip to content

Instantly share code, notes, and snippets.

View cheesinglee's full-sized avatar

Chee Sing Lee cheesinglee

View GitHub Profile
@cheesinglee
cheesinglee / output.txt
Last active August 29, 2015 14:13
Map-reduce Spearman's rho?
Number of map-reduce blocks on columns, and population size on rows [200,500,1000,5000,10000]
1 3 5 10 20 30 50
================================================================================
uncorrelated data
0.020074 0.067028 0.150469 0.112782 0.260606 0.485714 0.400000
-0.018003 -0.095463 -0.102538 0.050660 0.031538 0.358824 0.333333
-0.011529 -0.000860 -0.019766 -0.164176 -0.203649 -0.167781 -0.183459
-0.004745 -0.021326 -0.014286 -0.075185 -0.091779 -0.063402 -0.051401
0.005333 0.000626 0.007783 0.024034 -0.040481 -0.017537 -0.115995
@cheesinglee
cheesinglee / anscombe.py
Created October 5, 2015 19:10
Scripts for correlations blog post
#!/usr/bin/env python
from __future__ import print_function
from scipy.stats import pearsonr,spearmanr
"""
Edward Tufte uses this example from Anscombe to show 4 datasets of x
and y that have the same mean, standard deviation, and regression
line, but which are qualitatively different.
matplotlib fun for a rainy day
@cheesinglee
cheesinglee / README.md
Created June 13, 2016 18:04 — forked from ashenfad/README.md
Dynamic Scatterplot - Iris

Dynamic scatterplot of the iris dataset.

Controls:

  • Left click to choose X-axis.
  • Right click to choose Y-axis.
  • Alt + right click to choose color axis.
  • Repeat click (left, right, or alt) for log scale.
  • Hover over a point to see all field values.
  • Click a multi-point (larger circle) to cycle through values.
@cheesinglee
cheesinglee / logistic-iris.pmml
Created May 30, 2018 06:14
BigML logistic regression model for iris, PMML export
<?xml version="1.0" encoding="UTF-8"?><PMML xmlns="http://www.dmg.org/PMML-4_3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="4.3">
<Header description="Model kind: logistic">
<Application name="BigML"/>
</Header>
<DataDictionary>
<DataField name="sepal length" displayName="" optype="continuous" dataType="double">
<Extension name="BigML-Field_ID" value="000000"/>
</DataField>
<DataField name="sepal width" displayName="" optype="continuous" dataType="double">
<Extension name="BigML-Field_ID" value="000001"/>
@cheesinglee
cheesinglee / reddit_scraper.py
Created June 20, 2014 03:17
Python subreddit scraper
# -*- coding: utf-8 -*-
"""
Created on Thu Oct 3 12:24:41 2013
@author: cheesinglee
"""
import praw
from csv import DictWriter
#!/usr/bin/python
import functools
import numpy as np
from matplotlib import pyplot as pp
from matplotlib.collections import PolyCollection
from matplotlib.cm import get_cmap
from bigml.api import BigML