Skip to content

Instantly share code, notes, and snippets.

View armisael's full-sized avatar

Stefano Parmesan armisael

  • SpazioDati s.r.l.
  • Trento, Italy
View GitHub Profile
@armisael
armisael / generate.py
Last active August 29, 2015 14:18
pyspark.sql nondeterministic issue with row fields
import json
import random
N = 50000
with open('data/sample_a.json', 'w') as f_sample_a, open('data/sample_b.json', 'w') as f_sample_b:
for i in xrange(N):
r = random.random()
if r >= 0.1:
f_sample_a.write(json.dumps(dict(
key=str(i),