ssh -i emr.pem hadoop@ec2-3-249-21-2.eu-west-1.compute.amazonaws.com
Beginner
- atta@atta-vmware:~$ ssh -i emr.pem -N -L 9999:ec2-3-249-21-2.eu-west-1.compute.amazonaws.com:50070 hadoop@ec2-3-249-21-2.eu-west-1.compute.amazonaws.com3-24
- folder "data" contains single folder "texts"
Intermediate
-ls
- hdfs dfs -ls /data/texts
- hdfs dfs -ls -h /data/texts
- actual
-mkdir & -touchz
- hdfs dfs -mkdir /godyaev
- hdfs dfs -mkdir /godyaev/inner
- Every HDFS user has their own .Trash folder on HDFS within hdfs:///user/. The folder existance is checked and then created whenever hadoop fs -rm command gets executed by that user without a -skipTrash option.
This is purged on a schedule as per values of core-site.xml
fs.trash.interval
fs.trash.checkpoint.interval
By default, both are zero, so it is disabled and deleted files will therefore always be recoverable until manually cleared out by an HDFS administrator. 4. hdfs dfs -touchz /godyaev/inner/1.txt 5. hdfs dfs -rm -skipTrash /godyaev/inner/1.txt 6. hdfs dfs -rm -r -skipTrash /godyaev
-put & -cat & -tail & -distcp
- hadoop distcp s3:///texts-bucket/henry.txt hdfs://ip-172-31-13-51.eu-west-1.compute.internal:8020/godyaev/henry.txt
- hdfs dfs -cat /godyaev/henry.txt
- hdfs dfs -tail /godyaev/henry.txt
- hdfs dfs -head /godyaev/henry.txt since 3.10, but now used hdfs dfs -cat /godyaev/henry.txt | head
- hdfs dfs -mv /godyaev/henry.txt /godyaev/inner/henry.txt
Advanced
- hdfs dfs -setrep -w 2 /godyaev/inner/henry.txt, it takes a lot of time, at least time of the file copiing.
- hdfs fsck /godyaev/inner/henry.txt -files -blocks -locations
- hdfs fsck -blockId blk_1073753298
GS = 96852