Skip to content

Instantly share code, notes, and snippets.

@jiahao
Created May 15, 2016 05:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jiahao/8cb76e8bf72b219f04aa2dacc49dfe04 to your computer and use it in GitHub Desktop.
Save jiahao/8cb76e8bf72b219f04aa2dacc49dfe04 to your computer and use it in GitHub Desktop.
Julia script for downloading the MovieLens 20M dataset from http://grouplens.org/datasets/movielens/
using CSV
using Nettle
using ZipFile
zfilename = download("http://files.grouplens.org/datasets/movielens/ml-20m.zip")
#TODO check hashes
#md5 = readchomp(open(download("http://files.grouplens.org/datasets/movielens/ml-20m.zip.md5")))
#md5dl = open(zfilename) do f hexdigest(readall(f)) end
z = ZipFile.Reader(zfilename)
ratings = z.files[4]
data = CSV.csv(ratings)
A = sparse(Vector{Int}(data.data[1]), Vector{Int}(data.data[2]), Vector{Float32}(data.data[3]))
JLD.save("movielens-20m.jld", "data", A, compress=true)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment