Skip to content

Instantly share code, notes, and snippets.

@tanmaykm
Last active January 3, 2016 02:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tanmaykm/ff562a754d5ac26c0b01 to your computer and use it in GitHub Desktop.
Save tanmaykm/ff562a754d5ac26c0b01 to your computer and use it in GitHub Desktop.
kmeans & als parallel mode comparisons

K-Means

Packages:

kddcup dataset (clusters network intrusion data)

size: 5,000,000 observations, each with 40 features

150 clusters

$ ~/julia/julia -p 20
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0-dev+1784 (2015-12-15 05:34 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 62f1481 (2 days old master)
|__/                   |  x86_64-linux-gnu

julia> include("kdd.jl")
INFO: reading csv...
INFO: transposing...
K-means converged with 63 iterations (objv = 5.2919472734757043e11)
INFO: distributing...
K-means converged with 64 iterations (objv = 5.2919472734757043e11)
K-means converged with 63 iterations (objv = 5.2919472734757043e11)
K-means converged with 256 iterations (objv = 1.271281932891472e12)
kmpar: 248.89936900138855, kmpp: 81.91902709007263, distributed: 88.15856599807739, singlenode: 646.2298829555511

ALS

Packages:

movielens dataset

20,000,000 ratings for 27,000 movies by 138,000 users

# single processor
julia> test("/home/tan/Work/datasets/movielens/ml-20m")
17-Dec 20:20:31:DEBUG:root:loading inputs...
17-Dec 20:20:54:DEBUG:root:time to load inputs: 22.946762084960938 secs
17-Dec 20:20:54:DEBUG:root:preparing inputs...
17-Dec 20:20:55:DEBUG:root:prep time: 0.8154430389404297
17-Dec 20:21:35:DEBUG:root:fact time 40.201488971710205
17-Dec 20:22:03:DEBUG:root:rmse time 28.2427339553833
# shared memory mode
# julia -p 8
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.5.0-dev+1824 (2015-12-16 08:25 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 79b2418* (1 day old master)
|__/                   |  x86_64-linux-gnu

julia> include("movielens.jl")
test (generic function with 1 method)

julia> test("/home/tan/Work/datasets/movielens/ml-20m")
17-Dec 20:12:52:DEBUG:root:loading inputs...
17-Dec 20:13:14:DEBUG:root:time to load inputs: 21.93354296684265 secs
17-Dec 20:13:14:DEBUG:root:preparing inputs...
17-Dec 20:13:15:DEBUG:root:prep time: 0.9523890018463135
17-Dec 20:13:35:DEBUG:root:fact time 19.933722019195557
17-Dec 20:13:47:DEBUG:root:rmse time 11.64983582496643
# with threads

# 1 thread
julia> test("/home/tan/Work/datasets/movielens/ml-20m")
03-Jan 07:24:41:DEBUG:root:loading inputs...
03-Jan 07:25:05:DEBUG:root:time to load inputs: 23.175585985183716 secs
03-Jan 07:25:05:DEBUG:root:preparing inputs...
03-Jan 07:25:06:DEBUG:root:prep time: 1.1804168224334717
03-Jan 07:25:28:DEBUG:root:fact time 21.833703994750977
03-Jan 07:25:57:DEBUG:root:rmse time 29.767030000686646
rmse of the model: 0.7929127604591637

# 2 threads
julia> test("/home/tan/Work/datasets/movielens/ml-20m")
03-Jan 07:31:10:DEBUG:root:loading inputs...
03-Jan 07:31:31:DEBUG:root:time to load inputs: 21.776299953460693 secs
03-Jan 07:31:31:DEBUG:root:preparing inputs...
03-Jan 07:31:32:DEBUG:root:prep time: 0.8184521198272705
03-Jan 07:31:50:DEBUG:root:fact time 17.791420936584473
03-Jan 07:32:08:DEBUG:root:rmse time 17.79410696029663
rmse of the model: 0.7934035822594543

# 8 threads (on 4+4 hyperthreaded cores)
julia> test("/home/tan/Work/datasets/movielens/ml-20m")
03-Jan 07:21:05:DEBUG:root:loading inputs...
03-Jan 07:21:34:DEBUG:root:time to load inputs: 28.406423807144165 secs
03-Jan 07:21:34:DEBUG:root:preparing inputs...
03-Jan 07:21:34:DEBUG:root:prep time: 0.8342618942260742
03-Jan 07:21:49:DEBUG:root:fact time 14.779378890991211
03-Jan 07:22:02:DEBUG:root:rmse time 11.711627006530762
rmse of the model: 0.7940479653508236
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment