Skip to content

Instantly share code, notes, and snippets.

import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover}
import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer}
import org.apache.spark.mllib.linalg.Vector
import sqlContext.implicits._
val numTopics: Int = 100
val maxIterations: Int = 100
val vocabSize: Int = 10000
@jfrazee
jfrazee / tuple_to_args.scala
Created August 19, 2013 16:37
Convert tuples to scala arguments
// Just an ordinary function
def sum(x: Int, y: Int, z: Int) = x + y + z
// A tuple of arguments
val args = (1, 2, 3)
// Convert the function to a (partial) Function, which has a tupled method
// that takes tuples up to arity 5
(sum _).tupled(args)
@mattb
mattb / gist:3888345
Created October 14, 2012 11:53
Some pointers for Natural Language Processing / Machine Learning

Here are the areas I've been researching, some things I've read and some open source packages...

Nearly all text processing starts by transforming text into vectors: http://en.wikipedia.org/wiki/Vector_space_model

Often it uses transforms such as TFIDF to normalise the data and control for outliers (words that are too frequent or too rare confuse the algorithms): http://en.wikipedia.org/wiki/Tf%E2%80%93idf

Collocations is a technique to detect when two or more words occur more commonly together than separately (e.g. "wishy-washy" in English) - I use this to group words into n-gram tokens because many NLP techniques consider each word as if it's independent of all the others in a document, ignoring order: http://matpalm.com/blog/2011/10/22/collocations_1/

@Fitzsimmons
Fitzsimmons / spark.md
Created April 25, 2012 15:00 — forked from jesperfj/spark.md
Spark on Heroku

This guide will get you started using Spark on Heroku/Cedar. Spark is basically a clone of Sinatra for Java. 'Nuff said.

Create your app

Create a single Java main class in src/main/java/HelloWorld.java:

import static spark.Spark.*;
import spark.*;
@fozziethebeat
fozziethebeat / Schisel.scala
Created February 8, 2012 00:05
Schisel, An example of how to run Latent Dirichelte Allocation (via Mallet) from Scala
/*
* Copyright (c) 2012, Lawrence Livermore National Security, LLC. Produced at
* the Lawrence Livermore National Laboratory. Written by Keith Stevens,
* kstevens@cs.ucla.edu OCEC-10-073 All rights reserved.
*
* This file is part of the S-Space package and is covered under the terms and
* conditions therein.
*
* The S-Space package is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as published
@mreid
mreid / INSTALL-VW-OSX.md
Created January 29, 2012 21:59
Install Vowpal Wabbit on Mac OS X Lion

The INSTALL instructions that come with Vowpal Wabbit appear not to work on Mac OS X Lion. Here's what I did to get it to compile. You will need the developer tools that come with the XCode installation.

The only dependency VW has is the boost C++ library. So first, download and install Boost

To install Boost, do the following:

$ cp ~/Downloads/boost_1_48_0.tar.bz2 ./
@lucasfais
lucasfais / gist:1207002
Created September 9, 2011 18:46
Sublime Text 2 - Useful Shortcuts

Sublime Text 2 – Useful Shortcuts (Mac OS X)

General

⌘T go to file
⌘⌃P go to project
⌘R go to methods
⌃G go to line
⌘KB toggle side bar
⌘⇧P command prompt
@retronym
retronym / type-bounds.scala
Created December 16, 2009 11:17
Tour of Scala Type Bounds
class A
class A2 extends A
class B
trait M[X]
//
// Upper Type Bound
//
def upperTypeBound[AA <: A](x: AA): A = x