Skip to content

Instantly share code, notes, and snippets.

@dszeto
Created July 28, 2013 05:01
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save dszeto/6097469 to your computer and use it in GitHub Desktop.
Save dszeto/6097469 to your computer and use it in GitHub Desktop.
Import MovieLens 10M data set from http://www.grouplens.org/node/73 to PredictionIO 0.5.0
require "predictionio"
if (ARGV[0].nil? || ARGV[1].nil?)
abort("Usage: import_ml.rb <app key> <movie lens data file>")
end
client = PredictionIO::Client.new(ARGV[0],
50,
"http://localhost:5586")
users = Hash.new
items = Hash.new
lines = 0
File.open(ARGV[1]) do |f|
f.each_line do |tsv|
while (client.pending_requests() > 10000) do
puts "More than 10000 requests in queue. Throttling..."
sleep(5)
end
tsv.chomp!
fields = tsv.split(/::/)
client.identify(fields[0])
client.arecord_action_on_item("rate", fields[1], "pio_rate" => fields[2].to_f.round)
users[fields[0]] = 1
items[fields[1]] = 1
lines += 1
if lines % 10000 == 0
puts "Processed #{lines} lines"
end
end
end
users.each_key {|k| client.acreate_user(k) }
items.each_key {|k| client.acreate_item(k, "movies") }
while (client.pending_requests() > 0) do
puts "Remaining: #{client.pending_requests()}"
sleep(5)
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment