Skip to content

Instantly share code, notes, and snippets.

@miguno
Created January 23, 2013 12:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save miguno/4605173 to your computer and use it in GitHub Desktop.
Save miguno/4605173 to your computer and use it in GitHub Desktop.
Re: Hadoop counter visualization

The only open source visualization project that comes to my mind right now is Twitter Ambrose. You might want to have a look at Ambrose first. It supports the following features in its web UI:

[Ambrose web UI features]

  • A table view of all the associated jobs, along with their current state
  • Chord and graph diagrams to visualize job dependencies and current state
  • An overall script progress bar

Apart from that my personal experience has been with offerings from commercial vendors. Two name but two of them:

Both products come with an API that allows you to extend them and integrate them with your own Ops tool set. Cloudera Manager requires an evaluation license whereas MapR's Dashboard is available in the free M3 distribution if you want to give it a spin. As usual there are pros and cons for each of them.

That said, you can also configure standard Hadoop to sent its metrics to a monitoring tool such as Ganglia (see live demo at UC Berkely Grid). Basically, you just dump metrics into Ganglia and the latter will take care of the visualization/plotting of the various metrics. There are several online guides available that describe how to configure Ganglia for a small Hadoop cluster. If you are running Hadoop 2.x have a look at What is Hadoop Metrics2 for how the metrics system in next-gen Hadoop works in general.

Finally albeit a bit unrelated to your direct question, you can also write custom monitors by calling Hadoop's Java API. It is usually straight-forward to write these custom monitors in a way that is compatible with other Ops infrastructure tools such as Nagios. For instance, one of our custom monitors connects to the JobTracker in order to detect any MapReduce jobs that run for longer than 24 hours (which is in 99% a tell-tale that a job is broken one way or another). Depending on the tool you dump the metrics into you will get visualizations/graphs for free (cf. Ganglia example above).

Hope this helps, Michael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment