- namenode
- zookeeper (application master -> YARN -> zookeeper takes responsibility)
- ambari
- shuffle and sort
- yarn (resource manager, comes with hadoop)
Note: password is maria_dev
ssh -p 2222 maria_dev@127.0.0.1
wget http://media.sundog-soft.com/hadoop/ml-100k/u.data
# To print hadoop version
[maria_dev@sandbox ~]$ hadoop version
Hadoop 2.7.3.2.6.1.0-129
Subversion git@github.com:hortonworks/hadoop.git -r 45e64533cdee3edf67c7b88a0267c64c194f93e5
Compiled by jenkins on 2017-05-31T03:06Z
Compiled with protoc 2.5.0
From source with checksum deba7ab784606611731cd7c37443e1c
This command was run using /usr/hdp/2.6.1.0-129/hadoop/hadoop-common-2.7.3.2.6.1.0-129.jar
# To create multiple hdfs directories and copy local file to hdfs directory
maria_dev@sandbox ~]$ hadoop fs -mkdir ml-100k2 ml-500k
hadoop fs -copyFromLocal u.data ml-100k2/u.data
# To print directories,files,bytes
[maria_dev@sandbox ~]$ hadoop fs -count ml-100k2
1 5 9134343 ml-100k2
[maria_dev@sandbox ~]$ hadoop fs -count ml-100k/u.data
0 1 2079229 ml-100k/u.data
[maria_dev@sandbox ~]$ ls -ltr testdir/*
-rw-rw-r-- 1 maria_dev maria_dev 2079229 Nov 9 02:58 testdir/u.data
# To copy from local file to hdfs
[maria_dev@sandbox ~]$ hadoop fs -copyFromLocal u.data ml-200k/u.data
[maria_dev@sandbox ~]$ hadoop fs -ls ml-200k/u.data
-rw-r--r-- 1 maria_dev hdfs 2079229 2017-11-09 03:32 ml-200k/u.data
# To copy from local directory to hdfs
[maria_dev@sandbox ~]$ hadoop fs -copyFromLocal testdir/* ml-500k/
[maria_dev@sandbox ~]$ hadoop fs -ls ml-500k
Found 1 items
-rw-r--r-- 1 maria_dev hdfs 2079229 2017-11-09 03:00 ml-500k/u.data
# To copy from hdfs to local file
[maria_dev@sandbox ~]$ hadoop fs -copyToLocal ml-100k/u.data testdir/u1.data
[maria_dev@sandbox ~]$ ls -ltr testdir/u1.data
-rw-r--r-- 1 maria_dev maria_dev 2079229 Nov 9 03:15 testdir/u1.data
# To copy from hdfs directory to local directory
[maria_dev@sandbox ~]$ hadoop fs -copyToLocal ml-200k/* testdir4
[maria_dev@sandbox ~]$ ls -ltr testdir4
total 4068
-rw-r--r-- 1 maria_dev maria_dev 2079229 Nov 9 03:42 u.data
-rw-r--r-- 1 maria_dev maria_dev 2079229 Nov 9 03:42 u.data2
# Free space in human readable format
[maria_dev@sandbox ~]$ hadoop fs -df -h hdfs:/
Filesystem Size Used Available Use%
hdfs://sandbox.hortonworks.com:8020 41.6 G 1.8 G 24.1 G 4%
# To print first 5 lines of hdfs file
[maria_dev@sandbox ~]$ hadoop fs -cat ml-100k2/u.data | head -5
0 50 5 881250949
0 172 5 881250949
0 133 1 881250949
196 242 3 881250949
186 302 3 891717742
cat: Unable to write to output stream.
# To print 'tail' of hdfs file
[maria_dev@sandbox ~]$ hadoop fs -tail ml-100k/u.data.3
30 121 5 876250746
537 778 3 886031106
655 913 4 891817521
889 2 3 880182460
865 1009 5 880144368
851 979 3 875730244
# To copy one hdfs file to another
[maria_dev@sandbox ~]$ hadoop distcp ml-100k/u.data ml-100k/u.data.3
# To print space consumed by hdfs file
[maria_dev@sandbox ~]$ hadoop fs -du -s -h ml-100k/u.data
2.0 M ml-100k/u.data
# To remove hdfs file
[maria_dev@sandbox ~]$ hadoop fs -rm ml-100k2/u.data
17/11/08 03:49:33 INFO fs.TrashPolicyDefault: Moved: 'hdfs://sandbox.hortonworks.com:8020/user/maria_dev/ml-100k2/u.data' to trash at: hdfs://sandbox.hortonworks.com:8020/user/maria_dev/.Trash/Current/user/maria_dev/ml-100k2/u.data
# To remove hdfs directory
[maria_dev@sandbox ~]$ hadoop fs -rmdir ml-100k2
# To get help on hdfs commands
[maria_dev@sandbox ~]$ hadoop fs
Usage: hadoop fs [generic options]
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] <path> ...]
[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
[-createSnapshot <snapshotDir> [<snapshotName>]]
[-deleteSnapshot <snapshotDir> <snapshotName>]
[-df [-h] [<path> ...]]
[-du [-s] [-h] <path> ...]
[-expunge]
[-find <path> ... <expression> ...]
[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
[-getfacl [-R] <path>]
[-getfattr [-R] {-n name | -d} [-e en] <path>]
[-getmerge [-nl] <src> <localdst>]
[-help [cmd ...]]
[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
[-mkdir [-p] <path> ...]
[-moveFromLocal <localsrc> ... <dst>]
[-moveToLocal <src> <localdst>]
[-mv <src> ... <dst>]
[-put [-f] [-p] [-l] <localsrc> ... <dst>]
[-renameSnapshot <snapshotDir> <oldName> <newName>]
[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
[-setfattr {-n name [-v value] | -x name} <path>]
[-setrep [-R] [-w] <rep> <path> ...]
[-stat [format] <path> ...]
[-tail [-f] <file>]
[-test -[defsz] <path>]
[-text [-ignoreCrc] <src> ...]
[-touchz <path> ...]
[-truncate [-w] <length> <path> ...]
[-usage [cmd ...]]
Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|resourcemanager:port> specify a ResourceManager
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]