Skip to content

Instantly share code, notes, and snippets.

@spinupol
Last active July 19, 2020 20:29
Show Gist options
  • Save spinupol/6f3e96af73dc7cceff022a872b54e57f to your computer and use it in GitHub Desktop.
Save spinupol/6f3e96af73dc7cceff022a872b54e57f to your computer and use it in GitHub Desktop.
Hadoop_HDFS

Keywords

  • namenode
  • zookeeper (application master -> YARN -> zookeeper takes responsibility)
  • ambari
  • shuffle and sort
  • yarn (resource manager, comes with hadoop)

Commands

Note: password is maria_dev

ssh -p 2222 maria_dev@127.0.0.1
wget http://media.sundog-soft.com/hadoop/ml-100k/u.data

# To print hadoop version
[maria_dev@sandbox ~]$ hadoop version
Hadoop 2.7.3.2.6.1.0-129
Subversion git@github.com:hortonworks/hadoop.git -r 45e64533cdee3edf67c7b88a0267c64c194f93e5
Compiled by jenkins on 2017-05-31T03:06Z
Compiled with protoc 2.5.0
From source with checksum deba7ab784606611731cd7c37443e1c
This command was run using /usr/hdp/2.6.1.0-129/hadoop/hadoop-common-2.7.3.2.6.1.0-129.jar

# To create multiple hdfs directories and copy local file to hdfs directory
maria_dev@sandbox ~]$ hadoop fs -mkdir ml-100k2 ml-500k
hadoop fs -copyFromLocal u.data ml-100k2/u.data

# To print directories,files,bytes
[maria_dev@sandbox ~]$ hadoop fs -count ml-100k2
           1            5            9134343 ml-100k2
[maria_dev@sandbox ~]$ hadoop fs -count ml-100k/u.data
           0            1            2079229 ml-100k/u.data
           
[maria_dev@sandbox ~]$ ls -ltr testdir/*
-rw-rw-r-- 1 maria_dev maria_dev 2079229 Nov  9 02:58 testdir/u.data

# To copy from local file to hdfs
[maria_dev@sandbox ~]$ hadoop fs -copyFromLocal u.data  ml-200k/u.data
[maria_dev@sandbox ~]$ hadoop fs -ls ml-200k/u.data
-rw-r--r--   1 maria_dev hdfs    2079229 2017-11-09 03:32 ml-200k/u.data

# To copy from local directory to hdfs
[maria_dev@sandbox ~]$ hadoop fs -copyFromLocal testdir/*  ml-500k/
[maria_dev@sandbox ~]$ hadoop fs -ls ml-500k
Found 1 items
-rw-r--r--   1 maria_dev hdfs    2079229 2017-11-09 03:00 ml-500k/u.data

# To copy from hdfs to local file
[maria_dev@sandbox ~]$ hadoop fs -copyToLocal ml-100k/u.data testdir/u1.data
[maria_dev@sandbox ~]$ ls -ltr testdir/u1.data
-rw-r--r-- 1 maria_dev maria_dev 2079229 Nov  9 03:15 testdir/u1.data

# To copy from hdfs directory to local directory
[maria_dev@sandbox ~]$ hadoop fs -copyToLocal ml-200k/*  testdir4
[maria_dev@sandbox ~]$ ls -ltr testdir4
total 4068
-rw-r--r-- 1 maria_dev maria_dev 2079229 Nov  9 03:42 u.data
-rw-r--r-- 1 maria_dev maria_dev 2079229 Nov  9 03:42 u.data2

# Free space in human readable format
[maria_dev@sandbox ~]$ hadoop fs -df -h hdfs:/
Filesystem                             Size   Used  Available  Use%
hdfs://sandbox.hortonworks.com:8020  41.6 G  1.8 G     24.1 G    4%

# To print first 5 lines of hdfs file
[maria_dev@sandbox ~]$ hadoop fs -cat  ml-100k2/u.data | head -5
0	50	5	881250949
0	172	5	881250949
0	133	1	881250949
196	242	3	881250949
186	302	3	891717742
cat: Unable to write to output stream.

# To print 'tail' of hdfs file
[maria_dev@sandbox ~]$ hadoop fs -tail ml-100k/u.data.3
30	121	5	876250746
537	778	3	886031106
655	913	4	891817521
889	2	3	880182460
865	1009	5	880144368
851	979	3	875730244

# To copy one hdfs file to another
[maria_dev@sandbox ~]$ hadoop distcp ml-100k/u.data ml-100k/u.data.3

# To print space consumed by hdfs file
[maria_dev@sandbox ~]$ hadoop fs -du -s -h ml-100k/u.data
2.0 M  ml-100k/u.data

# To remove hdfs file
[maria_dev@sandbox ~]$ hadoop fs -rm ml-100k2/u.data
17/11/08 03:49:33 INFO fs.TrashPolicyDefault: Moved: 'hdfs://sandbox.hortonworks.com:8020/user/maria_dev/ml-100k2/u.data' to trash at: hdfs://sandbox.hortonworks.com:8020/user/maria_dev/.Trash/Current/user/maria_dev/ml-100k2/u.data

# To remove hdfs directory
[maria_dev@sandbox ~]$ hadoop fs -rmdir ml-100k2

# To get help on hdfs commands
[maria_dev@sandbox ~]$ hadoop fs
Usage: hadoop fs [generic options]
	[-appendToFile <localsrc> ... <dst>]
	[-cat [-ignoreCrc] <src> ...]
	[-checksum <src> ...]
	[-chgrp [-R] GROUP PATH...]
	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
	[-chown [-R] [OWNER][:[GROUP]] PATH...]
	[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
	[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-count [-q] [-h] [-v] [-t [<storage type>]] [-u] <path> ...]
	[-cp [-f] [-p | -p[topax]] <src> ... <dst>]
	[-createSnapshot <snapshotDir> [<snapshotName>]]
	[-deleteSnapshot <snapshotDir> <snapshotName>]
	[-df [-h] [<path> ...]]
	[-du [-s] [-h] <path> ...]
	[-expunge]
	[-find <path> ... <expression> ...]
	[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
	[-getfacl [-R] <path>]
	[-getfattr [-R] {-n name | -d} [-e en] <path>]
	[-getmerge [-nl] <src> <localdst>]
	[-help [cmd ...]]
	[-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
	[-mkdir [-p] <path> ...]
	[-moveFromLocal <localsrc> ... <dst>]
	[-moveToLocal <src> <localdst>]
	[-mv <src> ... <dst>]
	[-put [-f] [-p] [-l] <localsrc> ... <dst>]
	[-renameSnapshot <snapshotDir> <oldName> <newName>]
	[-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
	[-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
	[-setfattr {-n name [-v value] | -x name} <path>]
	[-setrep [-R] [-w] <rep> <path> ...]
	[-stat [format] <path> ...]
	[-tail [-f] <file>]
	[-test -[defsz] <path>]
	[-text [-ignoreCrc] <src> ...]
	[-touchz <path> ...]
	[-truncate [-w] <length> <path> ...]
	[-usage [cmd ...]]

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment