Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import logging | |
import uuid | |
import time | |
from mesos.interface import Scheduler | |
from mesos.native import MesosSchedulerDriver | |
from mesos.interface import mesos_pb2 | |
logging.basicConfig(level=logging.INFO) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import org.apache.spark.mllib.linalg.distributed.RowMatrix | |
import org.apache.spark.mllib.linalg._ | |
import org.apache.spark.{SparkConf, SparkContext} | |
// To use the latest sparse SVD implementation, please build your spark-assembly after this | |
// change: https://github.com/apache/spark/pull/1378 | |
// Input tsv with 3 fields: rowIndex(Long), columnIndex(Long), weight(Double), indices start with 0 | |
// Assume the number of rows is larger than the number of columns, and the number of columns is | |
// smaller than Int.MaxValue |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Vagrant.configure("2") do |config| | |
config.vm.box = "dummy" | |
config.vm.provider :aws do |aws, override| | |
aws.access_key_id = "..." | |
aws.secret_access_key = "..." | |
# you'll need to create the EC2 keypair used here -- I called it vagrant for easy tracking | |
aws.keypair_name = "vagrant" | |
# you'll want to use a group that has at least SSH open |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
As I discussed in Algebra for Analytics, many sketch monoids, such as Bloom filters, HyperLogLog, and Count-min sketch, can be described as a hashing (projection) of items into a sparse space, then using two different commutative monoids to read and write respectively. Finally, the read monoids always have the property that (a + b) <= a, b and the write monoids has the property that (a + b) >= a, b.
##Some questions:
- Note how similar CMS and Bloom filters are. The difference: bloom hashes k times onto the same space, CMS hashes k times onto a k orthogonal subspaces. Why the difference? Imagine a fixed space bloom that hashes onto k orthogonal spaces, or an overlapping CMS that hashes onto k * m length space. How do the error asymptotics change?
- CMS has many query modes (dot product, etc...) can those generalize to other sketchs (HLL, Bloom)?
- What other sketch or non-sketch algorithms can be expressed in this dual mo
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0"?> | |
<PMML version="4.1" xmlns="http://www.dmg.org/PMML-4_1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_1 http://www.dmg.org/v4-1/pmml-4-1.xsd"> | |
<Header copyright="Copyright (c) 2014 lanenga" description="Linear Regression Model"> | |
<Extension name="user" value="lanenga" extender="Rattle/PMML"/> | |
<Application name="Rattle/PMML" version="1.4"/> | |
<Timestamp>2014-01-07 15:33:34</Timestamp> | |
</Header> | |
<DataDictionary numberOfFields="4"> | |
<DataField name="sepal_width" optype="continuous" dataType="double"/> | |
<DataField name="sepal_length" optype="continuous" dataType="double"/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ lein do sub install, deps, compile, repl | |
Could not find artifact lein-newnew:lein-newnew:pom:0.3.5 in central (http://repo1.maven.org/maven2) | |
Retrieving lein-newnew/lein-newnew/0.3.5/lein-newnew-0.3.5.pom (3k) | |
from https://clojars.org/repo/ | |
Could not find artifact stencil:stencil:pom:0.3.0 in central (http://repo1.maven.org/maven2) | |
Retrieving stencil/stencil/0.3.0/stencil-0.3.0.pom (3k) | |
from https://clojars.org/repo/ | |
Retrieving org/clojure/clojure/1.3.0/clojure-1.3.0.pom (5k) | |
from http://repo1.maven.org/maven2/ | |
Retrieving org/sonatype/oss/oss-parent/5/oss-parent-5.pom (4k) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
bash-3.2$ lein version | |
Leiningen 2.0.0-preview10 on Java 1.6.0_43 Java HotSpot(TM) 64-Bit Server VM | |
bash-3.2$ hadoop version | |
Warning: $HADOOP_HOME is deprecated. | |
Hadoop 1.0.3 | |
Subversion https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192 | |
Compiled by hortonfo on Tue May 8 20:31:25 UTC 2012 | |
From source with checksum e6b0c1e23dcf76907c5fecb4b832f3be | |
bash-3.2$ lein clean |
NewerOlder