Skip to content

Instantly share code, notes, and snippets.

@rahulsom
Last active November 29, 2016 13:58
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save rahulsom/54f5376ee957e1794dda to your computer and use it in GitHub Desktop.
Save rahulsom/54f5376ee957e1794dda to your computer and use it in GitHub Desktop.
Mahout with Groovy - the faster way

Mahout with Groovy

I started looking at ML libraries and read somewhere that Apache Mahout is pretty good. Then I started looking for a hello world, and ran into this page.

It sucks that the tutorial is a youtube video. That's right you need to watch this guy do a bunch of stuff on a Youtube video to learn how to use Mahout. Much worse, he is manually managing libs in his project.

So I decided to implement his whole video with Groovy. As a bonus, I print movie names instead of ids.

You will have to download the data file from here and set the location in the variable mlDir.

@Grab(group = 'org.apache.mahout', module = 'mahout-core', version = '0.9')
import org.apache.mahout.cf.taste.impl.common.FastByIDMap
import org.apache.mahout.cf.taste.impl.common.FastIDSet
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel
import org.apache.mahout.cf.taste.impl.recommender.GenericItemBasedRecommender
import org.apache.mahout.cf.taste.impl.similarity.TanimotoCoefficientSimilarity
def mlDir = '/Users/rahulsomasunderam/Downloads/ml-100k'
def f = new File("$mlDir/u.data")
assert f.exists()
def m = new FileDataModel(f, ',') {
@Override
protected void processLine(
String line, FastByIDMap<?> data, FastByIDMap<FastByIDMap<Long>> timestamps, boolean fromPriorData
) {
def newLine = line.split('\t').take(3).join(',')
super.processLine(newLine, data, timestamps, fromPriorData)
}
@Override
protected void processLineWithoutID(
String line, FastByIDMap<FastIDSet> data, FastByIDMap<FastByIDMap<Long>> timestamps) {
def newLine = line.split('\t').take(3).join(',')
try {
super.processLineWithoutID(newLine, data, timestamps)
} catch (Exception ignore) {
// I have no idea why this happens. On my machine, it started reading from the u.user file.
}
}
}
def similarity = new TanimotoCoefficientSimilarity(m)
def recommender = new GenericItemBasedRecommender(m, similarity)
def items = new File("$mlDir/u.item").readLines().collectEntries { it.split('\\|').take(2).toList() }
m.itemIDs.each { itemId ->
def recommendedItems = recommender.mostSimilarItems(itemId, 5)
println "People who liked '${items[itemId.toString()]}' also liked"
recommendedItems.each { recommendedItem ->
println " (${(recommendedItem.value * 100).intValue()}) ${items[recommendedItem.itemID.toString()]}"
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment