We need 3 things from our monitoring systems: Log aggregation and analysis tools for (deep-dive info) Data visualization tools (at-a-glace information, data correlation/causation, pattern identification, easier anomaly detection) non-simple error reporting (To let us know when things are actually going wrong. e.g. rollups, multi-variable alerts, alerts that include more data than 'I passed a threshold')
If I were starting from scratch, this is the architecture I'd build for monitoring.
Logstash -> Reimann and/or Flapjack-> (dataviz) Statsd -> Graphite -> Tasseo & Descarte
|
|--> (alerting) Sensu -> Pagerduty