Skip to content

Instantly share code, notes, and snippets.

@felipecruz
Last active November 3, 2015 22:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save felipecruz/394d4ca0731bbfbdb4a5 to your computer and use it in GitHub Desktop.
Save felipecruz/394d4ca0731bbfbdb4a5 to your computer and use it in GitHub Desktop.
r1 = (u'1', 'a1')
r2 = (u'1', 'a2')
r3 = (u'2', 'a3')
r4 = (u'2', 'a4')
r5 = (u'2', 'a5')
r6 = (u'2', 'a6')
r7 = (u'4', 'a7')
r8 = (u'4', 'a8')
r9 = (u'3', 'a9')
data = sc.parallelize([r1, r2, r3, r4, r5, r6, r7, r8, r9])
nome = '1'
ddata_with_nome = data.filter(lambda x: x[0] == nome)
c11 = ddata_with_nome.count()
nome = '2'
c12 = ddata_with_nome.count()
nome = '1'
ddata_with_nome = filter(lambda x: x[0] == nome, [r1, r2, r3, r4, r5, r6, r7, r8, r9])
c21 = len(ddata_with_nome)
nome = '2'
c22 = len(ddata_with_nome)
print((c11, c12, c21, c22))
# results in (2, 4, 2, 2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment