Assume you have an indexed Spark RDD
of N
images and you would like to sparsely select pairs, i.e. the number of selected
pairs is much smaller than all possible pairs N^2
. One way to do it, is to create a cartesian
of the RDD
with itself
and then filter the pairs by some criterion, e.g. limit the difference between pairwise indices:
selection = images
.cartesian( images )
.filter( /* some criterion */ )
;
However, this needs to create an intermediate RDD with N^2
elements and on the way their requires a lot of cross-talk between