Skip to content

Instantly share code, notes, and snippets.

@waleking
waleking / SparkGibbsLDA.scala
Last active January 31, 2020 11:15
We implement gibbs sampling for LDA by Spark. This version performs much better than alpha version, and now can handle 3196204 words, 100 topics, 1000 sample iterations on server in 161.7 minutes. To solve the long time consuming in collect() process in alpha version, we utilize the cache() method as line 261 and line 262. We also solve a pile o…
package topic
import spark.broadcast._
import spark.SparkContext
import spark.SparkContext._
import spark.RDD
import spark.storage.StorageLevel
import scala.util.Random
import scala.math.{ sqrt, log, pow, abs, exp, min, max }
import scala.collection.mutable.HashMap
@nickman
nickman / install.sh
Created May 20, 2011 13:16
Graphite Server Install Script for Ubuntu
#!/bin/bash
#######################################
# Graphite Install
# Run with sudo for best results
#
#######################################
if [[ "$(/usr/bin/whoami)" != "root" ]]; then
echo "This script must be run as root or using sudo.Script aborted."
exit 1