Skip to content

Instantly share code, notes, and snippets.

@samklr
Created August 30, 2013 21:21
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save samklr/6394396 to your computer and use it in GitHub Desktop.
Save samklr/6394396 to your computer and use it in GitHub Desktop.
DotProduct matrix in scala and on spark
def dotProduct(vector: Array[Int], matrix: Array[Array[Int]]): Array[Int] = {
// ignore dimensionality checks for simplicity of example
(0 to (matrix(0).size - 1)).toArray.map( colIdx => {
val colVec: Array[Int] = matrix.map( rowVec => rowVec(colIdx) )
val elemWiseProd: Array[Int] = (vector zip colVec).map( entryTuple => entryTuple._1 * entryTuple._2 )
elemWiseProd.sum
} )
}
val A = sc.parallelize(Array(Array(7, 5, 4), Array(0, 3, 2), Array(8, 0, 5), Array(-11, 7, -4), Array(-8, 2, -13), Array(5, 0, -2)))
val B = sc.broadcast(Array(Array(100, -80, 75, -105, 30, -50), Array(60, -60, 60, -60, 60, -60), Array(-50, 30, -105, 75, -80, 100)))
A.map( row => dotProduct(row, B.value) ).collect
@eliasah
Copy link

eliasah commented Aug 13, 2015

Hello Sam! why did you broadcast your B value?

@Fokko
Copy link

Fokko commented Dec 15, 2015

You can also write: (0 until matrix(0).size), which is more readable.

@emhacker
Copy link

If B is not ginormous, it's actually a good idea to broadcast it, the performance should linearly scale on the number of workers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment