Skip to content

Instantly share code, notes, and snippets.

View cjnolet's full-sized avatar

Corey J. Nolet cjnolet

View GitHub Profile
@cjnolet
cjnolet / spark-svd.scala
Created August 8, 2018 20:44 — forked from vrilleup/spark-svd.scala
Spark/mllib SVD example
import org.apache.spark.mllib.linalg.distributed.RowMatrix
import org.apache.spark.mllib.linalg._
import org.apache.spark.{SparkConf, SparkContext}
// To use the latest sparse SVD implementation, please build your spark-assembly after this
// change: https://github.com/apache/spark/pull/1378
// Input tsv with 3 fields: rowIndex(Long), columnIndex(Long), weight(Double), indices start with 0
// Assume the number of rows is larger than the number of columns, and the number of columns is
// smaller than Int.MaxValue
@cjnolet
cjnolet / vi.py
Created April 18, 2018 02:07 — forked from jwcarr/vi.py
Variation of information (VI)
# Variation of information (VI)
#
# Meila, M. (2007). Comparing clusterings-an information
# based distance. Journal of Multivariate Analysis, 98,
# 873-895. doi:10.1016/j.jmva.2006.11.013
#
# https://en.wikipedia.org/wiki/Variation_of_information
from math import log