Skip to content

Instantly share code, notes, and snippets.

jakevdp / Jupyter_vs_Mathematica.ipynb
Created Apr 8, 2018
Jupyter vs Mathematica Google Trends
View Jupyter_vs_Mathematica.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
porterjamesj /
Last active Mar 6, 2018
the tiniest mesos scheduler
import logging
import uuid
import time
from mesos.interface import Scheduler
from mesos.native import MesosSchedulerDriver
from mesos.interface import mesos_pb2
vrilleup / spark-svd.scala
Last active Nov 14, 2019
Spark/mllib SVD example
View spark-svd.scala
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg._
import org.apache.spark.{SparkConf, SparkContext}
// To use the latest sparse SVD implementation, please build your spark-assembly after this
// change:
// Input tsv with 3 fields: rowIndex(Long), columnIndex(Long), weight(Double), indices start with 0
// Assume the number of rows is larger than the number of columns, and the number of columns is
// smaller than Int.MaxValue
tlockney / Vagrantfile
Last active Aug 29, 2015
This setup allows for quick hacking with an sbt console on an EC2 instance -- very useful for trying out the AWS APIs when you need to try things out. As an example, I wanted to make sure I understood how to get the various bits of meta-data that are visible only on EC2. Create the following files and run to run everything.
View Vagrantfile
Vagrant.configure("2") do |config| = "dummy"
config.vm.provider :aws do |aws, override|
aws.access_key_id = "..."
aws.secret_access_key = "..."
# you'll need to create the EC2 keypair used here -- I called it vagrant for easy tracking
aws.keypair_name = "vagrant"
# you'll want to use a group that has at least SSH open
fperez / ProgrammaticNotebook.ipynb
Last active Sep 2, 2021
Creating an IPython Notebook programatically
View ProgrammaticNotebook.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
johnynek / gist:8961994
Last active Aug 29, 2015
Some Questions with Sketch Monoids
View gist:8961994

Unifying Sketch Monoids

As I discussed in Algebra for Analytics, many sketch monoids, such as Bloom filters, HyperLogLog, and Count-min sketch, can be described as a hashing (projection) of items into a sparse space, then using two different commutative monoids to read and write respectively. Finally, the read monoids always have the property that (a + b) <= a, b and the write monoids has the property that (a + b) >= a, b.

##Some questions:

  1. Note how similar CMS and Bloom filters are. The difference: bloom hashes k times onto the same space, CMS hashes k times onto a k orthogonal subspaces. Why the difference? Imagine a fixed space bloom that hashes onto k orthogonal spaces, or an overlapping CMS that hashes onto k * m length space. How do the error asymptotics change?
  2. CMS has many query modes (dot product, etc...) can those generalize to other sketchs (HLL, Bloom)?
  3. What other sketch or non-sketch algorithms can be expressed in this dual mo
drewlanenga / lm.pmml.xml
Created Jan 7, 2014
Exploring support for [transformations in PMML]( with Pattern. (Environment notes: Running Vagrant with Cascading SDK 2.2 --
View lm.pmml.xml
<?xml version="1.0"?>
<PMML version="4.1" xmlns="" xmlns:xsi="" xsi:schemaLocation="">
<Header copyright="Copyright (c) 2014 lanenga" description="Linear Regression Model">
<Extension name="user" value="lanenga" extender="Rattle/PMML"/>
<Application name="Rattle/PMML" version="1.4"/>
<Timestamp>2014-01-07 15:33:34</Timestamp>
<DataDictionary numberOfFields="4">
<DataField name="sepal_width" optype="continuous" dataType="double"/>
<DataField name="sepal_length" optype="continuous" dataType="double"/>
ceteri / cascalog_build.log
Last active Dec 14, 2015
Cascalog testing with Cascading 2.2-wip
View cascalog_build.log
bash-3.2$ lein do sub install, deps, compile, repl
Could not find artifact lein-newnew:lein-newnew:pom:0.3.5 in central (
Retrieving lein-newnew/lein-newnew/0.3.5/lein-newnew-0.3.5.pom (3k)
Could not find artifact stencil:stencil:pom:0.3.0 in central (
Retrieving stencil/stencil/0.3.0/stencil-0.3.0.pom (3k)
Retrieving org/clojure/clojure/1.3.0/clojure-1.3.0.pom (5k)
Retrieving org/sonatype/oss/oss-parent/5/oss-parent-5.pom (4k)
ceteri / Cascalog.log
Last active Dec 11, 2015
City of Palo Alto Open Data app in Cascalog
View Cascalog.log
bash-3.2$ lein version
Leiningen 2.0.0-preview10 on Java 1.6.0_43 Java HotSpot(TM) 64-Bit Server VM
bash-3.2$ hadoop version
Warning: $HADOOP_HOME is deprecated.
Hadoop 1.0.3
Subversion -r 1335192
Compiled by hortonfo on Tue May 8 20:31:25 UTC 2012
From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be
bash-3.2$ lein clean
ceteri / Pattern test.log
Last active Dec 11, 2015
Pattern machine learning library for Cascading
View Pattern test.log
bash-3.2$ pwd
bash-3.2$ java -version
java version "1.6.0_43"
Java(TM) SE Runtime Environment (build 1.6.0_43-b01-447-11M4203)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01-447, mixed mode)
bash-3.2$ hadoop version
Warning: $HADOOP_HOME is deprecated.
Hadoop 1.0.3