Skip to content

Instantly share code, notes, and snippets.

@igponce
Created June 23, 2015 13:32
Show Gist options
  • Save igponce/2c64f9063b53866f7bac to your computer and use it in GitHub Desktop.
Save igponce/2c64f9063b53866f7bac to your computer and use it in GitHub Desktop.
Stupid way to substract a Spark RDD from another using right outer join
aaa = sc.parallelize( [1,2,3,4,5,6,7,8,9])
bbb = sc.parallelize( [1,3])
print ( aaa.map(lambda k: (k,k))
.leftOuterJoin(bbb.map(lambda k: (k,k)))
.collect() )
print ( aaa.map(lambda k: (k,k))
.leftOuterJoin(bbb.map(lambda k: (k,k)))
.filter(lambda k: k[1][1] == None)
.map(lambda v: v[0])
.collect() )
@igponce
Copy link
Author

igponce commented Jun 23, 2015

The fist print() is really important to realize how ugly is what we're doing here.
=sob=

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment