Skip to content

Instantly share code, notes, and snippets.

Elias Ponvert eponvert

Block or report user

Report or block eponvert

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View Spark_OnlineLDA_wikipedia_example.scala
import org.apache.spark.ml.feature.{CountVectorizer, RegexTokenizer, StopWordsRemover}
import org.apache.spark.mllib.clustering.{LDA, OnlineLDAOptimizer}
import org.apache.spark.mllib.linalg.Vector
import sqlContext.implicits._
val numTopics: Int = 100
val maxIterations: Int = 100
val vocabSize: Int = 10000
@jfrazee
jfrazee / tuple_to_args.scala
Created Aug 19, 2013
Convert tuples to scala arguments
View tuple_to_args.scala
// Just an ordinary function
def sum(x: Int, y: Int, z: Int) = x + y + z
// A tuple of arguments
val args = (1, 2, 3)
// Convert the function to a (partial) Function, which has a tupled method
// that takes tuples up to arity 5
(sum _).tupled(args)
@mattb
mattb / gist:3888345
Created Oct 14, 2012
Some pointers for Natural Language Processing / Machine Learning
View gist:3888345

Here are the areas I've been researching, some things I've read and some open source packages...

Nearly all text processing starts by transforming text into vectors: http://en.wikipedia.org/wiki/Vector_space_model

Often it uses transforms such as TFIDF to normalise the data and control for outliers (words that are too frequent or too rare confuse the algorithms): http://en.wikipedia.org/wiki/Tf%E2%80%93idf

Collocations is a technique to detect when two or more words occur more commonly together than separately (e.g. "wishy-washy" in English) - I use this to group words into n-gram tokens because many NLP techniques consider each word as if it's independent of all the others in a document, ignoring order: http://matpalm.com/blog/2011/10/22/collocations_1/

@Fitzsimmons
Fitzsimmons / spark.md
Created Apr 25, 2012 — forked from jesperfj/spark.md
Spark on Heroku
View spark.md

This guide will get you started using Spark on Heroku/Cedar. Spark is basically a clone of Sinatra for Java. 'Nuff said.

Create your app

Create a single Java main class in src/main/java/HelloWorld.java:

import static spark.Spark.*;
import spark.*;
@fozziethebeat
fozziethebeat / Schisel.scala
Created Feb 8, 2012
Schisel, An example of how to run Latent Dirichelte Allocation (via Mallet) from Scala
View Schisel.scala
/*
* Copyright (c) 2012, Lawrence Livermore National Security, LLC. Produced at
* the Lawrence Livermore National Laboratory. Written by Keith Stevens,
* kstevens@cs.ucla.edu OCEC-10-073 All rights reserved.
*
* This file is part of the S-Space package and is covered under the terms and
* conditions therein.
*
* The S-Space package is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as published
@mreid
mreid / INSTALL-VW-OSX.md
Created Jan 29, 2012
Install Vowpal Wabbit on Mac OS X Lion
View INSTALL-VW-OSX.md

The INSTALL instructions that come with Vowpal Wabbit appear not to work on Mac OS X Lion. Here's what I did to get it to compile. You will need the developer tools that come with the XCode installation.

The only dependency VW has is the boost C++ library. So first, download and install Boost

To install Boost, do the following:

$ cp ~/Downloads/boost_1_48_0.tar.bz2 ./
@lucasfais
lucasfais / gist:1207002
Created Sep 9, 2011
Sublime Text 2 - Useful Shortcuts
View gist:1207002

Sublime Text 2 – Useful Shortcuts (Mac OS X)

General

⌘T go to file
⌘⌃P go to project
⌘R go to methods
⌃G go to line
⌘KB toggle side bar
⌘⇧P command prompt
@retronym
retronym / type-bounds.scala
Created Dec 16, 2009
Tour of Scala Type Bounds
View type-bounds.scala
class A
class A2 extends A
class B
trait M[X]
//
// Upper Type Bound
//
def upperTypeBound[AA <: A](x: AA): A = x
You can’t perform that action at this time.