Skip to content

Instantly share code, notes, and snippets.

@ogibayashi
Created July 11, 2014 06:45
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ogibayashi/2a2bd33a8e6557fe6817 to your computer and use it in GitHub Desktop.
Save ogibayashi/2a2bd33a8e6557fe6817 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
## Hadoopのジョブ実行ログをパースし、task attemptごとに<attempt ID>,<start>,<end>の
## 形で出力する.
## Usage
## parse_joblog.rb history/done/....
task_attempts = { }
jobinfo = { }
while line=ARGF.gets
if /Job JOBID="(\w+)" LAUNCH_TIME="(\d+)"/ =~ line
jobinfo["jobid"] = $1
jobinfo["launch_time"] = $2[0..-4].to_i
end
if /Job JOBID="(\w+)" FINISH_TIME="(\d+)"/ =~ line
jobinfo["finish_time"] = $2[0..-4].to_i
end
if /TASK_ATTEMPT_ID="(.*?)".*START_TIME="(\d+?)"/ =~ line
task_attempts[$1] = $2[0..-4]
elsif /TASK_ATTEMPT_ID="(.*?)".*TASK_STATUS="SUCCESS".*FINISH_TIME="(\d+?)"/ =~ line
task_attempts[$1] += ",#{$2[0..-4]}"
end
end
STDERR.puts "#{ARGF.filename},#{jobinfo['jobid']},#{jobinfo['finish_time'] - jobinfo['launch_time']},#{Time.at(jobinfo['launch_time'])},#{Time.at(jobinfo['finish_time'])}"
task_attempts.each{ |k,v|
if v.index(",")
puts "#{k},#{v}"
end
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment