This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| from sklearn import linear_model | |
| from scipy import stats | |
| import numpy as np | |
| class LinearRegression(linear_model.LinearRegression): | |
| """ | |
| LinearRegression class after sklearn's, but calculate t-statistics | |
| and p-values for model coefficients (betas). | |
| Additional attributes available after .fit() |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| /* | |
| * Object in scala for calculating cosine similarity | |
| * Reuben Sutton - 2012 | |
| * More information: http://en.wikipedia.org/wiki/Cosine_similarity | |
| */ | |
| object CosineSimilarity { | |
| /* | |
| * This method takes 2 equal length arrays of integers |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| import org.apache.spark.mllib.linalg.distributed.RowMatrix | |
| import org.apache.spark.mllib.linalg._ | |
| import org.apache.spark.{SparkConf, SparkContext} | |
| // To use the latest sparse SVD implementation, please build your spark-assembly after this | |
| // change: https://github.com/apache/spark/pull/1378 | |
| // Input tsv with 3 fields: rowIndex(Long), columnIndex(Long), weight(Double), indices start with 0 | |
| // Assume the number of rows is larger than the number of columns, and the number of columns is | |
| // smaller than Int.MaxValue |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Often we use Scalding to compute a disributed algo that generates tons of data. | |
| For eg. imagine a simple Scalding job | |
| -comb through 100 million user requests | |
| -find (lat,lng) where each request originated. | |
| -Convert (lat,lng) to zipcode via reverse geocoding. | |
| -Visualizing result via a histogram for a bunch of zipcodes. | |
| So say you pick 10 zipcodes in some county, I show you how many people hit your website from each zipcode. | |
| The hard problem here isn't the scalding job - |