Elias Ponvert eponvert

## Spark_OnlineLDA_wikipedia_example.scala
import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover}
import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer}
import org.apache.spark.mllib.linalg.Vector

import sqlContext.implicits._

val numTopics: Int = 100
val maxIterations: Int = 100
val vocabSize: Int = 10000

## tuple_to_args.scala
// Just an ordinary function
def sum(x: Int, y: Int, z: Int) = x + y + z

// A tuple of arguments
val args = (1, 2, 3)

// Convert the function to a (partial) Function, which has a tupled method
// that takes tuples up to arity 5
(sum _).tupled(args)

## gist:3888345

      
              1 file
            
          
              6 forks
            
          
              1 comment
            
          
              20 stars
            
          
                mattb
                / gist:3888345
            
            
              Created
              October 14, 2012 11:53
            
              
                Some pointers for Natural Language Processing / Machine Learning
              
          
    Here are the areas I've been researching, some things I've read and some open source packages...
Nearly all text processing starts by transforming text into vectors:
http://en.wikipedia.org/wiki/Vector_space_model
Often it uses transforms such as TFIDF to normalise the data and control for outliers (words that are too frequent or too rare confuse the algorithms):
http://en.wikipedia.org/wiki/Tf%E2%80%93idf
Collocations is a technique to detect when two or more words occur more commonly together than separately (e.g. "wishy-washy" in English) - I use this to group words into n-gram tokens because many NLP techniques consider each word as if it's independent of all the others in a document, ignoring order:
http://matpalm.com/blog/2011/10/22/collocations_1/

  
## spark.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              13 stars
            
          
                Fitzsimmons
                / spark.md
            
            
              Created
              April 25, 2012 15:00
                — forked from jesperfj/spark.md
            
              
                Spark on Heroku
              
          
    This guide will get you started using Spark on Heroku/Cedar. Spark is basically a clone of Sinatra for Java. 'Nuff said.
Create your app

Create a single Java main class in src/main/java/HelloWorld.java:
import static spark.Spark.*;
import spark.*;

  
## Schisel.scala
/*
 * Copyright (c) 2012, Lawrence Livermore National Security, LLC. Produced at
 * the Lawrence Livermore National Laboratory. Written by Keith Stevens,
 * kstevens@cs.ucla.edu OCEC-10-073 All rights reserved.
 *
 * This file is part of the S-Space package and is covered under the terms and
 * conditions therein.
 *
 * The S-Space package is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License version 2 as published

## INSTALL-VW-OSX.md

      
              1 file
            
          
              2 forks
            
          
              4 comments
            
          
              12 stars
            
          
                mreid
                / INSTALL-VW-OSX.md
            
            
              Created
              January 29, 2012 21:59
            
              
                Install Vowpal Wabbit on Mac OS X Lion
              
          
    The INSTALL instructions that come with Vowpal Wabbit appear not to work on Mac OS X Lion. Here's what I did to get it to compile. You will need the developer tools that come with the XCode installation.
The only dependency VW has is the boost C++ library. So first, download and install Boost
To install Boost, do the following:
$ cp ~/Downloads/boost_1_48_0.tar.bz2 ./


## gist:1207002

      
              1 file
            
          
              804 forks
            
          
              81 comments
            
          
              2097 stars
            
          
                lucasfais
                / gist:1207002
            
            
              Created
              September 9, 2011 18:46
            
              
                Sublime Text 2 - Useful Shortcuts
              
          
    Sublime Text 2 – Useful Shortcuts (Mac OS X)

General


		 ⌘T 
		 go to file 
	
	
		 ⌘⌃P 
		 go to project 
	
	
		 ⌘R 
		 go to methods 
	
	
		 ⌃G 
		 go to line 
	
	
		 ⌘KB 
		 toggle side bar 
	
	
		 ⌘⇧P 
		 command prompt 
	

## type-bounds.scala
class A
class A2 extends A
class B

trait M[X]

//
// Upper Type Bound
//
def upperTypeBound[AA <: A](x: AA): A = x
	import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover}
	import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer}
	import org.apache.spark.mllib.linalg.Vector

	import sqlContext.implicits._

	val numTopics: Int = 100
	val maxIterations: Int = 100
	val vocabSize: Int = 10000
	// Just an ordinary function
	def sum(x: Int, y: Int, z: Int) = x + y + z

	// A tuple of arguments
	val args = (1, 2, 3)

	// Convert the function to a (partial) Function, which has a tupled method
	// that takes tuples up to arity 5
	(sum _).tupled(args)
	/*
	* Copyright (c) 2012, Lawrence Livermore National Security, LLC. Produced at
	* the Lawrence Livermore National Laboratory. Written by Keith Stevens,
	* kstevens@cs.ucla.edu OCEC-10-073 All rights reserved.
	*
	* This file is part of the S-Space package and is covered under the terms and
	* conditions therein.
	*
	* The S-Space package is free software: you can redistribute it and/or modify
	* it under the terms of the GNU General Public License version 2 as published
⌘T	go to file
⌘⌃P	go to project
⌘R	go to methods
⌃G	go to line
⌘KB	toggle side bar
⌘⇧P	command prompt
	class A
	class A2 extends A
	class B

	trait M[X]

	//
	// Upper Type Bound
	//
	def upperTypeBound[AA <: A](x: AA): A = x