Skip to content

Instantly share code, notes, and snippets.

@gangliao
Last active March 6, 2020 09:06
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save gangliao/5de9ec473bc6dc6ca6d3072ab22cdbb2 to your computer and use it in GitHub Desktop.
Save gangliao/5de9ec473bc6dc6ca6d3072ab22cdbb2 to your computer and use it in GitHub Desktop.
Using Tensorflow's tf.data to load data from HDFS
import tensorflow as tf
filenames = ["hdfs://10.152.104.73:8020/sogou/train_data/1_final.feature_transform"]
dataset = tf.data.TextLineDataset(filenames)
iterator = dataset.make_one_shot_iterator()
next_batch = iterator.get_next()
with tf.Session() as sess:
while True:
try:
print(sess.run(next_batch).decode())
except tf.errors.OutOfRangeError:
break
@gangliao
Copy link
Author

output:

<Nnet>
<Splice> 280 40
[ -3 -2 -1 0 1 2 3 ]
<!EndOfComponent>
</Nnet>

@fabioprev
Copy link

Hey gangliao, thanks for the piece of code. I am having issues in getting data from HDFS using tensorflow 1.11.0 under Windows 10.
When I run your code I get the following error:
*** File system scheme 'hdfs' not implemented ***

Which version of TensorFlow are you using to make it working? Which OS?

Thanks,
Fabio

@amithbk12man
Copy link

You need to set the path of HDFS or install with libhdfs.so please check this https://www.tensorflow.org/deploy/hadoop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment