Skip to content

Instantly share code, notes, and snippets.

@attatrol
Created October 4, 2020 05:37
Show Gist options
  • Save attatrol/797455189e283317bd6a0cb491ee6656 to your computer and use it in GitHub Desktop.
Save attatrol/797455189e283317bd6a0cb491ee6656 to your computer and use it in GitHub Desktop.

ssh -i emr.pem hadoop@ec2-3-249-21-2.eu-west-1.compute.amazonaws.com

Beginner

  1. atta@atta-vmware:~$ ssh -i emr.pem -N -L 9999:ec2-3-249-21-2.eu-west-1.compute.amazonaws.com:50070 hadoop@ec2-3-249-21-2.eu-west-1.compute.amazonaws.com3-24
  2. folder "data" contains single folder "texts"

Intermediate

-ls

  1. hdfs dfs -ls /data/texts
  2. hdfs dfs -ls -h /data/texts
  3. actual

-mkdir & -touchz

  1. hdfs dfs -mkdir /godyaev
  2. hdfs dfs -mkdir /godyaev/inner
  3. Every HDFS user has their own .Trash folder on HDFS within hdfs:///user/. The folder existance is checked and then created whenever hadoop fs -rm command gets executed by that user without a -skipTrash option.

This is purged on a schedule as per values of core-site.xml

fs.trash.interval
fs.trash.checkpoint.interval

By default, both are zero, so it is disabled and deleted files will therefore always be recoverable until manually cleared out by an HDFS administrator. 4. hdfs dfs -touchz /godyaev/inner/1.txt 5. hdfs dfs -rm -skipTrash /godyaev/inner/1.txt 6. hdfs dfs -rm -r -skipTrash /godyaev

-put & -cat & -tail & -distcp

  1. hadoop distcp s3:///texts-bucket/henry.txt hdfs://ip-172-31-13-51.eu-west-1.compute.internal:8020/godyaev/henry.txt
  2. hdfs dfs -cat /godyaev/henry.txt
  3. hdfs dfs -tail /godyaev/henry.txt
  4. hdfs dfs -head /godyaev/henry.txt since 3.10, but now used hdfs dfs -cat /godyaev/henry.txt | head
  5. hdfs dfs -mv /godyaev/henry.txt /godyaev/inner/henry.txt

Advanced

  1. hdfs dfs -setrep -w 2 /godyaev/inner/henry.txt, it takes a lot of time, at least time of the file copiing.
  2. hdfs fsck /godyaev/inner/henry.txt -files -blocks -locations
  3. hdfs fsck -blockId blk_1073753298
    GS = 96852
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment