Skip to content

Instantly share code, notes, and snippets.

@leechanwoo
Created October 4, 2018 14:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save leechanwoo/ac33b2b7d0e25be0d368cf8250aae5e8 to your computer and use it in GitHub Desktop.
Save leechanwoo/ac33b2b7d0e25be0d368cf8250aae5e8 to your computer and use it in GitHub Desktop.
# python for loop
for d in Dataset:
d = preprocessing(d)
write_tfrecord(d)
# python map function
itr = map(preprocessing, Dataset)
for i in itr:
write_tfrecord(i)
# pyspark
rdd = sc.parallelize(Dataset)
.map(preprocessing)
.toLocalIterator()
for r in rdd:
write_tfrecord(r)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment