- Spark 1.4 + 1da3c7f
- Databricks Cloud
- 8 Workers, EC2 Spot instances
- Workers: 240 GB Memory, 32 Cores
- Driver: 30 GB Memory, 4 Cores
- 10,000 points, 100 each from N(x,1) where x in [0, 3, 6, ... 297]
- n-dimensional points were generated by n iid draws from N(x,1)
e.g. for 40 features:
val data40D = sc.parallelize(
(0 to 300 by 3).flatMap { mean =>
Seq.fill(100)(Vectors.dense((0 until 40).map(_ => rng.nextGaussian() + 100*mean).toArray))
}
)
Num Features | Num Centers | Runtimes |
---|---|---|
1 | 10 | 10.68, 9.14, 8.45 |
1 | 50 | 11.36, 10.08, 9.91 |
1 | 100 | 16.92, 13.53, 16.05 |
30 | 10 | 2.21, 2.23, 2.47 |
30 | 50 | 9.14, 9.16, 8.67 |
30 | 100 | 17.12, 17.13, 16.92 |
100 | 10 | 2.41, 2.59, 2.41, 2.40, 2.36 |
100 | 100 | 14.54, 17.62, 17.11 |
300 | 10 | 16.38, 16.23, 16.21, 16.20, 16.34 |
300 | 100 | 155.86, 147.93, 153.36 |
40 | 10000 | 383.29, 373.50, 372,90 |
Num Features | Num Centers | Runtimes |
---|---|---|
1 | 10 | 5.57, 5.31, 5.57 |
1 | 50 | 8.96, 8.48, 7.18 |
1 | 100 | 10.56, 9.48, 12.88 |
30 | 10 | 2.39, 2.55, 2.33 |
30 | 50 | 9.91, 9.64, 9.55 |
30 | 100 | 18.87, 18.73, 18.64 |
100 | 10 | 3.16, 2.37, 2.31, 2.30, 2.37 |
100 | 100 | 17.86, 21.49, 17.96 |
300 | 10 | 18.44, 17.97, 20.49, 19.00 |
300 | 100 | 183.12, 172.17, 175.38 |
40 | 10000 | 397.97, 396.32, 396.21 |