Skip to content

Instantly share code, notes, and snippets.

Created October 4, 2020 05:37
Show Gist options
  • Save attatrol/797455189e283317bd6a0cb491ee6656 to your computer and use it in GitHub Desktop.
Save attatrol/797455189e283317bd6a0cb491ee6656 to your computer and use it in GitHub Desktop.

ssh -i emr.pem


  1. atta@atta-vmware:~$ ssh -i emr.pem -N -L
  2. folder "data" contains single folder "texts"



  1. hdfs dfs -ls /data/texts
  2. hdfs dfs -ls -h /data/texts
  3. actual

-mkdir & -touchz

  1. hdfs dfs -mkdir /godyaev
  2. hdfs dfs -mkdir /godyaev/inner
  3. Every HDFS user has their own .Trash folder on HDFS within hdfs:///user/. The folder existance is checked and then created whenever hadoop fs -rm command gets executed by that user without a -skipTrash option.

This is purged on a schedule as per values of core-site.xml


By default, both are zero, so it is disabled and deleted files will therefore always be recoverable until manually cleared out by an HDFS administrator. 4. hdfs dfs -touchz /godyaev/inner/1.txt 5. hdfs dfs -rm -skipTrash /godyaev/inner/1.txt 6. hdfs dfs -rm -r -skipTrash /godyaev

-put & -cat & -tail & -distcp

  1. hadoop distcp s3:///texts-bucket/henry.txt hdfs://
  2. hdfs dfs -cat /godyaev/henry.txt
  3. hdfs dfs -tail /godyaev/henry.txt
  4. hdfs dfs -head /godyaev/henry.txt since 3.10, but now used hdfs dfs -cat /godyaev/henry.txt | head
  5. hdfs dfs -mv /godyaev/henry.txt /godyaev/inner/henry.txt


  1. hdfs dfs -setrep -w 2 /godyaev/inner/henry.txt, it takes a lot of time, at least time of the file copiing.
  2. hdfs fsck /godyaev/inner/henry.txt -files -blocks -locations
  3. hdfs fsck -blockId blk_1073753298
    GS = 96852
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment