Skip to content

Instantly share code, notes, and snippets.

@abhijithch
Created January 22, 2018 07:28
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save abhijithch/c1200d5228cd557796f29e5c6a14c9cb to your computer and use it in GitHub Desktop.
Save abhijithch/c1200d5228cd557796f29e5c6a14c9cb to your computer and use it in GitHub Desktop.
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Building a Recommendation engine from scratch using Julia\n",
"\n",
"\n",
"### What is Julia : \n",
"\n",
"Julia is a **high-level, high-performance dynamic programming language for technical computing**, with syntax that is familiar to users of other technical computing environments. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library.\n",
"\n",
"To summarize the movtivation behind creating yet another language here is a quote from the creators of the Julia language, \n",
"\n",
"> We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at gluing programs together as the shell. Something that is dirt simple to learn, yet keeps the most serious hackers happy. We want it interactive and we want it compiled.\n",
"\n",
"> (Did we mention it should be as fast as C?)\n",
"\n",
"> While we’re being demanding, we want something that provides the distributed power of Hadoop — without the kilobytes of boilerplate Java and XML; without being forced to sift through gigabytes of log files on hundreds of machines to find our bugs. We want the power without the layers of impenetrable complexity. We want to write simple scalar loops that compile down to tight machine code using just the registers on a single CPU. We want to write A*B and launch a thousand computations on a thousand machines, calculating a vast matrix product together.\n",
"\n",
"\n",
"\n",
"### What is a Recommendation System : \n",
"\n",
"* A system which understands the users personal taste for various products.\n",
"* The products can range from items on e-commerce sites to movies to connections on social sites.\n",
"* It uses the users past behaviour when available, like previous ratings on movies watched previously etc.\n",
"* Mathematically models the user-item interaction using complex algorithms.\n",
"* Works on large sparse datasets, which consists of the past behaviour.\n",
"* Make high quality predictions to individual users.\n",
"\n",
"P.S : Certain limitations do exist, like lack of sufficient information of the users and movies. We assume that each user must rate a certain minimum number of movies in order to get accurate predictions.\n",
"\n",
"\n",
"### What it takes to build one :\n",
"\n",
" 1. ** A versatile programming language **.\n",
" 2. ** Powerful mathematical capabilities **.\n",
" 3. High-performance, parallelism, ability to scale.\n",
" 4. Web, javascript capabilities.\n",
" 5. Fast processing of real-time data.\n",
" \n",
"### Why Julia :\n",
"\n",
"1. Its syntax which is similar Python and Matlab. \n",
"2. Julia runs inherently fast, written well.\n",
"3. Easy to call C/Fortran.\n",
"4. Ready for handling Big Data.\n",
"5. Add more.. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Toy Model of the movie recommender system :\n",
"\n",
"Consider 4 users *Alice*, *Bob*, *John*, *Dow* and 4 movies *Titanic*, *Braveheart*, *Lion King*, *Troll 2*.\n",
"\n",
"| |Titanic |Braveheart |Lion King | Troll 2 |\n",
"|---|:---:|:---:|:---:|:---:|\n",
"|Alice| 4 | 5 | 1 | ? |\n",
"| Bob | 5 | ? | 2 | 1 |\n",
"| John | 2 | ? | 4 | 4 |\n",
"| Dow | ? | 2 | 2 | 4 |\n",
"\n",
"The table above shows the ratings given by the 4 users to the 4 movies. The `?` are the movies unseen and hence we need to predict the ratings. By setting a threshold, say 3.5, we can recommend movies whose predictions are greater than 3.5."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4×4 Array{Int64,2}:\n",
" 4 5 1 0\n",
" 5 0 2 1\n",
" 2 0 4 4\n",
" 0 2 2 4"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Rating martix, R\n",
"\n",
"R = [4 5 1 0;\n",
" 5 0 2 1;\n",
" 2 0 4 4;\n",
" 0 2 2 4;]"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"5×4 Array{Int64,2}:\n",
" 4 5 3 4\n",
" 5 0 3 5\n",
" 2 0 3 0\n",
" 0 2 5 5\n",
" 1 1 0 5"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"typeof(R)\n",
"# To add another user, Gary, \n",
"# Initialise an empty array\n",
"Gary = Int[]\n",
"push!(Gary, 1)\n",
"push!(Gary, 1)\n",
"push!(Gary, 0)\n",
"push!(Gary, 5)\n",
"\n",
"# Now include this into R\n",
"R = [R; Gary']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Low rank matrix approximations to predict the movie ratings :"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4×4 sparse matrix with 12 Int64 nonzero entries:\n",
"\t[1, 1] = 4\n",
"\t[2, 1] = 5\n",
"\t[3, 1] = 2\n",
"\t[1, 2] = 5\n",
"\t[4, 2] = 2\n",
"\t[1, 3] = 1\n",
"\t[2, 3] = 2\n",
"\t[3, 3] = 4\n",
"\t[4, 3] = 2\n",
"\t[2, 4] = 1\n",
"\t[3, 4] = 4\n",
"\t[4, 4] = 4"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# First let us filter out the unwatched movie ratings.\n",
"R_s = sparse(R)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}([0.531804 -0.705837 0.397553; 0.50049 -0.183275 -0.697523; 0.557504 0.542563 -0.159789; 0.394821 0.416929 0.574353],[9.1757,5.60357,3.98918],[0.626074 -0.473733 -0.555749; 0.375847 -0.481001 0.786244; 0.496141 0.344731 -0.122317; 0.469696 0.652208 0.240835]),6,1,8,[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Let `k=3` be the reduced rank\n",
"k=3\n",
"LR = svds(R_s, nsv=k)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(Base.LinAlg.SVD{Float64,Float64,Array{Float64,2}}([-0.531804 -0.705837 -0.397553; -0.50049 -0.183275 0.697523; -0.557504 0.542563 0.159789; -0.394821 0.416929 -0.574353],[9.1757,5.60357,3.98918],[-0.626074 -0.473733 0.555749; -0.375847 -0.481001 -0.786244; -0.496141 0.344731 0.122317; -0.469696 0.652208 -0.240835]),6,1,8,[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": []
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4×3 Array{Float64,2}:\n",
" 0.626074 -0.473733 -0.555749\n",
" 0.375847 -0.481001 0.786244\n",
" 0.496141 0.344731 -0.122317\n",
" 0.469696 0.652208 0.240835"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Let us reconstruct back the matrix\n",
"U = LR[1][:U]\n",
"S = LR[1][:S]\n",
"V = LR[1][:Vt]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3×3 Array{Float64,2}:\n",
" 9.1757 0.0 0.0 \n",
" 0.0 5.60357 0.0 \n",
" 0.0 0.0 3.98918"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"diagm(S)"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4×4 Array{Int64,2}:\n",
" 5 5 1 1\n",
" 5 1 3 1\n",
" 3 0 4 5\n",
" 0 3 3 4"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"R_new = convert(Array{Int64}, ceil(U*(diagm(S)*V')))"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4×4 Array{Int64,2}:\n",
" 4 5 1 0\n",
" 5 0 2 1\n",
" 2 0 4 4\n",
" 0 2 2 4"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"R"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"3×4 Array{Float64,2}:\n",
" -0.626074 -0.375847 -0.496141 -0.469696\n",
" -0.473733 -0.481001 0.344731 0.652208\n",
" 0.555749 -0.786244 0.122317 -0.240835"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"V'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Make predictions :\n",
"\n",
"From the above matrix, the recommendations can be made on the predicted unknown ratings."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exploratory analysis and plots of the data :"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Parallel Movie recommender system \n",
"\n",
"This notebook demos collaborative filtering based movie recommender systems in Julia. The package [RecSys.jl](https://github.com/abhijithch/RecSys.jl/) is a package for recommender systems in Julia, it can currently work with explicit ratings data. This demos a parallel implementation of the ALS factorization based collaborative filtering for movie recommendations based on [this](http://dl.acm.org/citation.cfm?id=1424269) research article. The detailed report of the system is [here](http://juliacomputing.com/blog/2016/04/22/a-parallel-recommendation-engine-in-julia.html).\n",
"\n",
"### Collaborative Filtering using weighted ALS factorization :\n",
"\n",
"<img src=\"./images/als.png\" width=\"550\">\n",
"\n",
"Let $U={u_i}$ be the user feature matrix where ${u_i} \\subseteq\\mathbb{R}^{n_f}$ and $i=1,2,...,n_u$, and let $M={m_j}$ be the item or movie feature matrix, where ${m_j} \\subseteq \\mathbb{R}^{n_f}$ and $j=1,2,...,n_m$. Here $n_f$ is the number of factors, i.e., the reduced dimension or the lower rank, which is determined by cross validation. The predictions can be calculated for any user-movie combination,\n",
"$(i,j)$, as $r_{ij}={u_i} \\cdotp {m_j}, \\forall i,j$.\n",
"\n",
"** Credits ** :\n",
"\n",
"[Large-Scale Parallel Collaborative Filtering for the Netflix Prize](http://dl.acm.org/citation.cfm?id=1424269)\n",
"\n",
"[Movielens dataset](http://grouplens.org/datasets/movielens/)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"\u001b[1m\u001b[34mINFO: Cloning Blobs from https://github.com/tanmaykm/Blobs.jl.git\n",
"\u001b[0m"
]
},
{
"ename": "LoadError",
"evalue": "Blobs already exists",
"output_type": "error",
"traceback": [
"Blobs already exists",
"",
" in clone(::String, ::SubString{String}) at ./pkg/entry.jl:193",
" in clone(::String) at ./pkg/entry.jl:221",
" in (::Base.Pkg.Dir.##2#3{Array{Any,1},Base.Pkg.Entry.#clone,Tuple{String}})() at ./pkg/dir.jl:31",
" in cd(::Base.Pkg.Dir.##2#3{Array{Any,1},Base.Pkg.Entry.#clone,Tuple{String}}, ::String) at ./file.jl:59",
" in #cd#1(::Array{Any,1}, ::Function, ::Function, ::String, ::Vararg{Any,N}) at ./pkg/dir.jl:31",
" in clone(::String) at ./pkg/pkg.jl:151"
]
}
],
"source": [
"# Installation of the packages. (To be done the first time, hence comment after installing.)\n",
"#Pkg.clone(\"https://github.com/tanmaykm/Blobs.jl.git\")\n",
"#Pkg.clone(\"https://github.com/abhijithch/RecSys.jl\")"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"WARNING: Method definition (::Type{DomainError})(Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:576 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:576.\n",
"WARNING: Method definition (::Type{DomainError})(Any, Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:577 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:577.\n",
"WARNING: Method definition (::Type{OverflowError})(Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:582 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:582.\n",
"WARNING: Method definition (::Type{InexactError})(Symbol, Any, Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:571 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:571.\n",
"WARNING: Method definition (::Type{Main.Base.LinearIndexing})(Type{#T<:Any}) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:401 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:401.\n",
"WARNING: Method definition (::Type{Main.Base.LinearIndexing})(Any...) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:402 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:402.\n",
"WARNING: Method definition (::Type{Main.Base.Val})(Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:520 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:520.\n",
"WARNING: Method definition #cov(Array{Any, 1}, Main.Base.#cov, AbstractArray{T<:Any, 2}, Int64) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:607 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:607.\n",
"WARNING: Method definition #cov(Array{Any, 1}, Main.Base.#cov, Union{AbstractArray{T<:Any, 1}, AbstractArray{T<:Any, 2}}, Union{AbstractArray{T<:Any, 1}, AbstractArray{T<:Any, 2}}, Int64) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:609 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:609.\n",
"WARNING: Method definition done(Main.Base.Cmd, Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:499 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:499.\n",
"WARNING: Method definition getindex(Main.Base.Cmd, Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:499 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:499.\n",
"WARNING: Method definition !(Function) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:338 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:338.\n",
"WARNING: Method definition ==(Union{Main.Base.Dates.Hour, Main.Base.Dates.Second, Main.Base.Dates.Minute, Main.Base.Dates.Millisecond, Main.Base.Dates.Week, Main.Base.Dates.Day}, Union{Main.Base.Dates.Month, Main.Base.Dates.Year}) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:511 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:511.\n",
"WARNING: Method definition ==(Union{Main.Base.Dates.Month, Main.Base.Dates.Year}, Union{Main.Base.Dates.Hour, Main.Base.Dates.Second, Main.Base.Dates.Minute, Main.Base.Dates.Millisecond, Main.Base.Dates.Week, Main.Base.Dates.Day}) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:512 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:512.\n",
"WARNING: Method definition ==(Main.Base.Dates.Period, Main.Base.Dates.Period) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:507 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:507.\n",
"WARNING: Method definition include_string(Module, String, String) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:464 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:464.\n",
"WARNING: Method definition include_string(Module, AbstractString) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:466 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:466.\n",
"WARNING: Method definition include_string(Module, AbstractString, AbstractString) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:466 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:466.\n",
"WARNING: Method definition isnull(Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:292 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:292.\n",
"WARNING: Method definition reshape(AbstractArray, Main.Base.Val{#N<:Any}) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:523 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:523.\n",
"WARNING: Method definition unsafe_trunc(Type{#T<:Integer}, Integer) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:440 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:440.\n",
"WARNING: Method definition first(Main.Base.Cmd) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496.\n",
"WARNING: Method definition eltype(Main.Base.Cmd) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496.\n",
"WARNING: Method definition macroexpand(Module, ANY<:Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:463 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:463.\n",
"WARNING: Method definition endof(Main.Base.Cmd) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496.\n",
"WARNING: Method definition redirect_stdout(Function, Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:206 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:206.\n",
"WARNING: Method definition start(Main.Base.Cmd) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496.\n",
"WARNING: Method definition expand(Module, ANY<:Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:462 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:462.\n",
"WARNING: Method definition zeros(AbstractArray) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:411 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:411.\n",
"WARNING: Method definition zeros(AbstractArray, Type) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:411 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:411.\n",
"WARNING: Method definition zeros(AbstractArray, Type, Tuple) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:409 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:409.\n",
"WARNING: Method definition zeros(AbstractArray, Type, Any...) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:410 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:410.\n",
"WARNING: Method definition next(Main.Base.Cmd, Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:499 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:499.\n",
"WARNING: Method definition redirect_stdin(Function, Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:206 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:206.\n",
"WARNING: Method definition bswap(Main.Base.Complex) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:445 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:445.\n",
"WARNING: Method definition length(Main.Base.Cmd) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496.\n",
"WARNING: Method definition ones(AbstractArray) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:411 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:411.\n",
"WARNING: Method definition ones(AbstractArray, Type) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:411 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:411.\n",
"WARNING: Method definition ones(AbstractArray, Type, Tuple) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:409 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:409.\n",
"WARNING: Method definition ones(AbstractArray, Type, Any...) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:410 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:410.\n",
"WARNING: Method definition last(Main.Base.Cmd) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496.\n",
"WARNING: Method definition convert(Type{Main.Base.Set{#T<:Any}}, Main.Base.Set{#T<:Any}) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:430 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:430.\n",
"WARNING: Method definition convert(Type{Main.Base.Set{#T<:Any}}, Main.Base.Set) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:431 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:431.\n",
"WARNING: Method definition broadcast(Any, Tuple{Vararg{T<:Any, #N<:Any}}, Tuple{Vararg{T<:Any, #N<:Any}}...) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:277 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:277.\n",
"WARNING: Method definition read(Main.Base.Cmd, Type{String}) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:566 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:566.\n",
"WARNING: Method definition read(AbstractString, Type{String}) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:565 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:565.\n",
"WARNING: Method definition read(IO, Type{String}) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:564 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:564.\n",
"WARNING: Method definition take!(Task) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:23 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:23.\n",
"WARNING: Method definition take!(Main.Base.AbstractIOBuffer) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:331 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:331.\n",
"WARNING: Method definition isassigned(Main.Base.RefValue) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:436 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:436.\n",
"WARNING: Method definition rtoldefault(Any, Any, Real) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:653 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:653.\n",
"WARNING: Method definition eachindex(Main.Base.Cmd) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:496.\n",
"WARNING: Method definition redirect_stderr(Function, Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:206 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:206.\n",
"WARNING: Method definition isless(Union{Main.Base.Dates.Hour, Main.Base.Dates.Second, Main.Base.Dates.Minute, Main.Base.Dates.Millisecond, Main.Base.Dates.Week, Main.Base.Dates.Day}, Union{Main.Base.Dates.Month, Main.Base.Dates.Year}) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:513 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:513.\n",
"WARNING: Method definition isless(Union{Main.Base.Dates.Month, Main.Base.Dates.Year}, Union{Main.Base.Dates.Hour, Main.Base.Dates.Second, Main.Base.Dates.Minute, Main.Base.Dates.Millisecond, Main.Base.Dates.Week, Main.Base.Dates.Day}) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:514 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:514.\n",
"WARNING: Method definition isless(Main.Base.Dates.Period, Main.Base.Dates.Period) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:508 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:508.\n",
"WARNING: Method definition ntuple(#F<:Any, Main.Base.Val{#N<:Any}) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:525 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:525.\n",
"WARNING: Method definition chol!(Main.Base.LinAlg.UniformScaling, Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:537 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:537.\n",
"WARNING: Method definition logdet(Any) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:531 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:531.\n",
"WARNING: Method definition chol(Main.Base.LinAlg.UniformScaling, Any...) in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:538 overwritten in module Compat at /Users/abhijith/.julia/v0.5/Compat/src/Compat.jl:538.\n"
]
},
{
"data": {
"text/plain": [
"test_chunks (generic function with 1 method)"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"workspace()\n",
"using RecSys\n",
"include(joinpath(Pkg.dir(\"RecSys\"), \"examples\", \"movielens\", \"movielens.jl\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Dataset : \n",
"\n",
"GroupLens Research has collected and made available rating data sets from the [MovieLens](http://movielens.org) web site. The data sets were collected over various periods of time, depending on the size of the set. \n",
"\n",
"#### MovieLens 20M Dataset\n",
"\n",
"Stable benchmark dataset. 20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags.\n",
"\n",
"We use the ratings data to form a sparse matrix of size `138,000 X 27,000` with 20 million ratings ranging from 1 to 5."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"/Users/abhijith/work/ML/notebooks/data/recommender\""
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Please specify path to the data folder which includes the 20 million ratings data folder \"ml-20m\"\n",
"dataset_path = \"/Users/abhijith/work/ML/notebooks/data/recommender\""
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"ml-20m\""
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data_folder = \"ml-20m\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Creating file handles to the movie ratings and the movies list files."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RecSys.DlmFile(\"/Users/abhijith/work/ML/notebooks/data/recommender/ml-20m/movies.csv\",',',true,true)"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ratings_file = DlmFile(joinpath(dataset_path,data_folder, \"ratings.csv\"); dlm=',', header=true)\n",
"movies_file = DlmFile(joinpath(dataset_path,data_folder, \"movies.csv\"); dlm=',', header=true)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Parallel implementations :\n",
"\n",
"This package offers 3 modes of parallelism, \n",
"\n",
"1. Multi-threading - Julia native threading infrastructure provides an easy way to make use threads.\n",
"2. Shared memory - This is a multiprocessing using shared data.\n",
"3. Distributed memory - This is distributed memory based multiprocessing, this would require that the data be split into chunks. There is code to do this, refer ...\n",
"\n",
"Multiple Dispatch is a nice feature in Julia, which would dispatch to the correct implementation based on the type of the objects passed as arguments. \n",
"\n",
"For e.x., if we need to train the model using shared memory multiprocessing, the type of `MovieRec` is `MovieRec(trainingset::FileSpec, movie_names::FileSpec)` and if we need distributed memory model the type of `MovieRec` is `MovieRec(user_item_ratings::FileSpec, item_user_ratings::FileSpec, movie_names::FileSpec)`.\n",
"\n",
"Let us see how Shared memory Parallel implementation trains the MovieLens data set"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"MovieRec(RecSys.DlmFile(\"/Users/abhijith/work/ML/notebooks/data/recommender/ml-20m/movies.csv\",',',true,true),RecSys.ALSWR{RecSys.ParShmem,RecSys.SharedMemoryInputs,RecSys.SharedMemoryModel}(RecSys.SharedMemoryInputs(RecSys.DlmFile(\"/Users/abhijith/work/ML/notebooks/data/recommender/ml-20m/ratings.csv\",',',true,true),0,0,Nullable{Union{ParallelSparseMatMul.SharedSparseMatrixCSC{Float64,Int64},RecSys.MatrixBlobs.SparseMatBlobs{Tv,Ti},SparseMatrixCSC{Float64,Int64}}}(),Nullable{Union{ParallelSparseMatMul.SharedSparseMatrixCSC{Float64,Int64},RecSys.MatrixBlobs.SparseMatBlobs{Tv,Ti},SparseMatrixCSC{Float64,Int64}}}(),Nullable{Union{Array{Int64,1},SharedArray{Int64,1}}}(),Nullable{Union{Array{Int64,1},SharedArray{Int64,1}}}()),Nullable{RecSys.SharedMemoryModel}(),RecSys.ParShmem()),Nullable{SparseVector{AbstractString,Int64}}())"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"rec = MovieRec(ratings_file, movies_file)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"\"/Users/abhijith/work/ML/notebooks/data/recommender/ml-20m/movies.csv\""
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"movies_file.name"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To run in parallel mode, add processes like below:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let us train the model with `10` factors and `10` iterations."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 96.880785 seconds (50.02 M allocations: 64.357 GB, 8.59% gc time)\n"
]
}
],
"source": [
"@time train(rec, 10, 10)"
]
},
{
"cell_type": "code",
"execution_count": 150,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"1"
]
},
"execution_count": 150,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nprocs()"
]
},
{
"cell_type": "code",
"execution_count": 162,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.7534671378202421"
]
},
"execution_count": 162,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"err = rmse(rec)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Parallel Run:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Check the number of procs\n",
"nprocs()\n",
"# Add any number of procs\n",
"#addprocs(2)\n",
"@everywhere using RecSys\n",
"@everywhere include(joinpath(Pkg.dir(\"RecSys\"), \"examples\", \"movielens\", \"movielens.jl\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"@time train(rec, 10, 10)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Select a user, for which we show the movies watched and the recommendations for the user. "
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Already watched:\n",
" [1 ] = \"Nixon (1995) - Drama\"\n",
" [2 ] = \"Leaving Las Vegas (1995) - Drama|Romance\"\n",
" [3 ] = \"Twelve Monkeys (a.k.a. 12 Monkeys) (1995) - Mystery|Sci-Fi|Thriller\"\n",
" [4 ] = \"Clueless (1995) - Comedy|Romance\"\n",
" [5 ] = \"Usual Suspects, The (1995) - Crime|Mystery|Thriller\"\n",
" [6 ] = \"From Dusk Till Dawn (1996) - Action|Comedy|Horror|Thriller\"\n",
" [7 ] = \"Crimson Tide (1995) - Drama|Thriller|War\"\n",
" [8 ] = \"Crumb (1994) - Documentary\"\n",
" [9 ] = \"Net, The (1995) - Action|Crime|Thriller\"\n",
" [10] = \"Smoke (1995) - Comedy|Drama\"\n",
" [11] = \"Clerks (1994) - Comedy\"\n",
" [12] = \"Ed Wood (1994) - Comedy|Drama\"\n",
" [13] = \"Star Wars: Episode IV - A New Hope (1977) - Action|Adventure|Sci-Fi\"\n",
" [14] = \"Like Water for Chocolate (Como agua para chocolate) (1992) - Drama|Fantasy|Romance\"\n",
" [15] = \"Natural Born Killers (1994) - Action|Crime|Thriller\"\n",
" [16] = \"Léon: The Professional (a.k.a. The Professional) (Léon) (1994) - Action|Crime|Drama|Thriller\"\n",
" [17] = \"Pulp Fiction (1994) - Comedy|Crime|Drama|Thriller\"\n",
" [18] = \"Shawshank Redemption, The (1994) - Crime|Drama\"\n",
" [19] = \"Star Trek: Generations (1994) - Adventure|Drama|Sci-Fi\"\n",
" [20] = \"What's Eating Gilbert Grape (1993) - Drama\"\n",
" [21] = \"While You Were Sleeping (1995) - Comedy|Romance\"\n",
" [22] = \"Muriel's Wedding (1994) - Comedy\"\n",
" [23] = \"Ace Ventura: Pet Detective (1994) - Comedy\"\n",
" [24] = \"Forrest Gump (1994) - Comedy|Drama|Romance|War\"\n",
" [25] = \"Boxing Helena (1993) - Drama|Mystery|Romance|Thriller\"\n",
" [26] = \"Carlito's Way (1993) - Crime|Drama\"\n",
" [27] = \"Cliffhanger (1993) - Action|Adventure|Thriller\"\n",
" [28] = \"Coneheads (1993) - Comedy|Sci-Fi\"\n",
" [29] = \"Hudsucker Proxy, The (1994) - Comedy\"\n",
" [30] = \"Kalifornia (1993) - Drama|Thriller\"\n",
" [31] = \"Mrs. Doubtfire (1993) - Comedy|Drama\"\n",
" [32] = \"Philadelphia (1993) - Drama\"\n",
" [33] = \"Schindler's List (1993) - Drama|War\"\n",
" [34] = \"Short Cuts (1993) - Drama\"\n",
" [35] = \"Six Degrees of Separation (1993) - Drama\"\n",
" [36] = \"Welcome to the Dollhouse (1995) - Comedy|Drama\"\n",
" [37] = \"Home Alone (1990) - Children|Comedy\"\n",
" [38] = \"Ghost (1990) - Comedy|Drama|Fantasy|Romance|Thriller\"\n",
" [39] = \"Terminator 2: Judgment Day (1991) - Action|Sci-Fi\"\n",
" [40] = \"Silence of the Lambs, The (1991) - Crime|Horror|Thriller\"\n",
" [41] = \"Fargo (1996) - Comedy|Crime|Drama|Thriller\"\n",
" [42] = \"Heavy Metal (1981) - Action|Adventure|Animation|Horror|Sci-Fi\"\n",
" [43] = \"Space Jam (1996) - Adventure|Animation|Children|Comedy|Fantasy|Sci-Fi\"\n",
" [44] = \"Alphaville (Alphaville, une étrange aventure de Lemmy Caution) (1965) - Drama|Mystery|Romance|Sci-Fi|Thriller\"\n",
" [45] = \"Truth About Cats & Dogs, The (1996) - Comedy|Romance\"\n",
" [46] = \"Cold Comfort Farm (1995) - Comedy\"\n",
" [47] = \"Trainspotting (1996) - Comedy|Crime|Drama\"\n",
" [48] = \"Independence Day (a.k.a. ID4) (1996) - Action|Adventure|Sci-Fi|Thriller\"\n",
" [49] = \"Palookaville (1996) - Action|Comedy|Drama\"\n",
" [50] = \"Star Wars: Episode VI - Return of the Jedi (1983) - Action|Adventure|Sci-Fi\"\n",
" [51] = \"Waiting for Guffman (1996) - Comedy\"\n",
" [52] = \"Fifth Element, The (1997) - Action|Adventure|Comedy|Sci-Fi\"\n",
"\n",
"Recommended:\n",
" [1 ] = \"Sand Sharks (2011) - Comedy|Horror|Sci-Fi|Thriller\"\n",
" [2 ] = \"Desperate Search (1952) - Adventure|Drama\"\n",
" [3 ] = \"Soo (Art of Revenge) (2007) - Action|Crime|Drama|Thriller\"\n",
" [4 ] = \"Caltiki the Undying Monster (1959) - Adventure|Horror|Sci-Fi|Thriller\"\n",
" [5 ] = \"Slim Carter (1957) - Comedy|Western\"\n",
" [6 ] = \"Diverted (2009) - Drama\"\n",
" [7 ] = \"First Texan, The (1956) - Western\"\n",
" [8 ] = \"Shepherd (1999) - Sci-Fi\"\n",
" [9 ] = \"Andrew Dice Clay: Indestructible (2012) - Comedy\"\n",
" [10] = \"Alone for Christmas (2013) - Action|Children|Comedy|Fantasy\"\n",
"\n"
]
}
],
"source": [
"user = 100\n",
"print_recommendations(rec, recommend(rec, user)...)"
]
},
{
"cell_type": "code",
"execution_count": 149,
"metadata": {},
"outputs": [
{
"ename": "LoadError",
"evalue": "UndefVarError: M not defined",
"output_type": "error",
"traceback": [
"UndefVarError: M not defined",
""
]
}
],
"source": [
"M = read"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Julia 0.5.2",
"language": "julia",
"name": "julia-0.5"
},
"language_info": {
"file_extension": ".jl",
"mimetype": "application/julia",
"name": "julia",
"version": "0.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment