Skip to content

Instantly share code, notes, and snippets.

@wavescholar
Created September 7, 2014 20:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wavescholar/088fac6a275a3e44fb80 to your computer and use it in GitHub Desktop.
Save wavescholar/088fac6a275a3e44fb80 to your computer and use it in GitHub Desktop.
running the hadoop grep example
hadoop fs -put /etc/hadoop/conf/*.xml input
[bcampbell@localhost ~]$ hadoop fs -ls input
Found 7 items
-rw-r--r-- 1 bcampbell supergroup 507105 2014-09-07 15:55 input/Milton_ParadiseLost.txt
-rw-r--r-- 1 bcampbell supergroup 246679 2014-09-07 15:55 input/WilliamYeats.txt
-rw-r--r-- 1 bcampbell supergroup 2133 2014-09-07 15:58 input/core-site.xml
-rw-r--r-- 1 bcampbell supergroup 2324 2014-09-07 15:58 input/hdfs-site.xml
-rw-r--r-- 1 bcampbell supergroup 246679 2014-09-07 15:56 input/inputWC
-rw-r--r-- 1 bcampbell supergroup 1549 2014-09-07 15:58 input/mapred-site.xml
-rw-r--r-- 1 bcampbell supergroup 2375 2014-09-07 15:58 input/yarn-site.xml
[bcampbell@localhost ~]$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar grep input output23 'dfs[a-z.]+'
14/09/07 16:00:07 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/09/07 16:00:07 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
14/09/07 16:00:07 INFO input.FileInputFormat: Total input paths to process : 7
14/09/07 16:00:08 INFO mapreduce.JobSubmitter: number of splits:7
14/09/07 16:00:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1410054700839_0002
14/09/07 16:00:09 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/09/07 16:00:09 INFO impl.YarnClientImpl: Submitted application application_1410054700839_0002
14/09/07 16:00:09 INFO mapreduce.Job: The url to track the job: http://localhost.localdomain:8088/proxy/application_1410054700839_0002/
14/09/07 16:00:09 INFO mapreduce.Job: Running job: job_1410054700839_0002
14/09/07 16:00:18 INFO mapreduce.Job: Job job_1410054700839_0002 running in uber mode : false
14/09/07 16:00:18 INFO mapreduce.Job: map 0% reduce 0%
14/09/07 16:00:23 INFO mapreduce.Job: map 29% reduce 0%
14/09/07 16:00:24 INFO mapreduce.Job: map 43% reduce 0%
14/09/07 16:00:25 INFO mapreduce.Job: map 57% reduce 0%
14/09/07 16:00:26 INFO mapreduce.Job: map 100% reduce 0%
14/09/07 16:00:30 INFO mapreduce.Job: map 100% reduce 100%
14/09/07 16:00:30 INFO mapreduce.Job: Job job_1410054700839_0002 completed successfully
14/09/07 16:00:30 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=330
FILE: Number of bytes written=740425
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1009700
HDFS: Number of bytes written=470
HDFS: Number of read operations=24
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=7
Launched reduce tasks=1
Data-local map tasks=7
Total time spent by all maps in occupied slots (ms)=20069
Total time spent by all reduces in occupied slots (ms)=3482
Total time spent by all map tasks (ms)=20069
Total time spent by all reduce tasks (ms)=3482
Total vcore-seconds taken by all map tasks=20069
Total vcore-seconds taken by all reduce tasks=3482
Total megabyte-seconds taken by all map tasks=20550656
Total megabyte-seconds taken by all reduce tasks=3565568
Map-Reduce Framework
Map input records=27113
Map output records=10
Map output bytes=304
Map output materialized bytes=366
Input split bytes=856
Combine input records=10
Combine output records=10
Reduce input groups=10
Reduce shuffle bytes=366
Reduce input records=10
Reduce output records=10
Spilled Records=20
Shuffled Maps =7
Failed Shuffles=0
Merged Map outputs=7
GC time elapsed (ms)=323
CPU time spent (ms)=6260
Physical memory (bytes) snapshot=2039488512
Virtual memory (bytes) snapshot=5680246784
Total committed heap usage (bytes)=1610612736
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1008844
File Output Format Counters
Bytes Written=470
14/09/07 16:00:30 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/09/07 16:00:30 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
14/09/07 16:00:30 INFO input.FileInputFormat: Total input paths to process : 1
14/09/07 16:00:30 INFO mapreduce.JobSubmitter: number of splits:1
14/09/07 16:00:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1410054700839_0003
14/09/07 16:00:30 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/09/07 16:00:30 INFO impl.YarnClientImpl: Submitted application application_1410054700839_0003
14/09/07 16:00:30 INFO mapreduce.Job: The url to track the job: http://localhost.localdomain:8088/proxy/application_1410054700839_0003/
14/09/07 16:00:30 INFO mapreduce.Job: Running job: job_1410054700839_0003
14/09/07 16:00:37 INFO mapreduce.Job: Job job_1410054700839_0003 running in uber mode : false
14/09/07 16:00:37 INFO mapreduce.Job: map 0% reduce 0%
14/09/07 16:00:43 INFO mapreduce.Job: map 100% reduce 0%
14/09/07 16:00:49 INFO mapreduce.Job: map 100% reduce 100%
14/09/07 16:00:50 INFO mapreduce.Job: Job job_1410054700839_0003 completed successfully
14/09/07 16:00:50 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=330
FILE: Number of bytes written=184533
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=605
HDFS: Number of bytes written=244
HDFS: Number of read operations=7
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3171
Total time spent by all reduces in occupied slots (ms)=3435
Total time spent by all map tasks (ms)=3171
Total time spent by all reduce tasks (ms)=3435
Total vcore-seconds taken by all map tasks=3171
Total vcore-seconds taken by all reduce tasks=3435
Total megabyte-seconds taken by all map tasks=3247104
Total megabyte-seconds taken by all reduce tasks=3517440
Map-Reduce Framework
Map input records=10
Map output records=10
Map output bytes=304
Map output materialized bytes=330
Input split bytes=135
Combine input records=0
Combine output records=0
Reduce input groups=1
Reduce shuffle bytes=330
Reduce input records=10
Reduce output records=10
Spilled Records=20
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=58
CPU time spent (ms)=2140
Physical memory (bytes) snapshot=431476736
Virtual memory (bytes) snapshot=1437347840
Total committed heap usage (bytes)=402653184
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=470
File Output Format Counters
Bytes Written=244
[bcampbell@localhost ~]$ hadoop fs -ls output23
Found 2 items
-rw-r--r-- 1 bcampbell supergroup 0 2014-09-07 16:00 output23/_SUCCESS
-rw-r--r-- 1 bcampbell supergroup 244 2014-09-07 16:00 output23/part-r-00000
[bcampbell@localhost ~]$ hadoop fs -cat output23/part-r-00000 | head
1 dfs.safemode.min.datanodes
1 dfs.safemode.extension
1 dfs.replication
1 dfs.namenode.name.dir
1 dfs.namenode.checkpoint.dir
1 dfs.domain.socket.path
1 dfs.datanode.hdfs
1 dfs.datanode.data.dir
1 dfs.client.read.shortcircuit
1 dfs.client.file
[bcampbell@localhost ~]$
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment