You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a front-end webdev
I would like visibility of which browsers are in use
So that I can convince my product owner to stop supporting crap browsers or devices
As a front-end webdev
I would like to see exceptions that are logged from my code
So that I can fix bugs
As a front-end webdev
I would like to know the performance of my our site from remote locations.
So that we can figure out how to optimise it.
As a front-end webdev
I want to know what devices (screen sizes) are viewing the site
So that we can support them correctly
As a front-end webdev
I want to know % of users that have javascript
So that we can establish what to do when they don't.
Back-end web developer
As a back-end webdev
I would like to be able to instrument my code and visualise the data
So that I can see which parts need work
As a back-end webdev
I would like to know that all backend systems are operational.
So that I can be happy my application is functional
As a back-end webdev
I would like to be able to see all the logging related to a given web session
So that I can trace the path through the backend systems.
As a back-end webdev
I want to know the scale at which my app is working via key facts like:
* number of docs
* queries against db
* number of items in queue
* number of users
* key page views
So that I know how it is growing and can plan accordingly.
As a back-end webdev
I want to know which of my DB queues are taking longer than n milliseconds
So I can optimise them if possible.
As a back-end webdev
I want a series of smoke tests to be run after all deployments
So that I know that I haven’t broken anything when deploying my application
As a back-end webdev
I want a simple way of instrumenting my application to feed metrics to the metrics system
So that I can gain visibility of how my application is running in production
And so we can find and fix problems with it quickly
Product Owner
As a product owner
I would like visibility of the dashboards that my team use to ascertain site health
So that I can also be happy that the site is performant
As a product owner
I would like to know that end-users are happy
So that I can dance for joy
As a product owner
I would like to see that key functionality of my site is used
So that I can be aware of trends in usability.
As a product owner
I would like the ability to see how releases affect user progress
so I can learn from what we release and simplify the user journey.
Tech Arch
As a technical architect
I would like the ability to graph releases alongside system errors
so that I can see how a release affects the site's stability.
As a technical architect,
I would like to see how site errors affect user progress
so I can understand and communicate how release quality impacts user satisfaction.
Security analyst
As a security analyst
I would like the data in the logging system to be encrypted at rest and in transit
So that we do not leak information
As a security analyst
I would like access to the data in the logging system to be authenticated and recorded.
So that we have awareness of who can view it and when they do.
As a security analyst
I would like to be able to view activity on all systems
So that I can be aware of intrusions and unauthorised access.
As a security analyst
I would like the time on all log messages to be accurate to the millisecond and in UTC.
So that I can plot the timeline of events accurately.
As a security analyst
I would like to monitor inappropriate access
so that I can understand if attacks against our systems are increasing with time.
Web operations
As a webop
I would like to NOT receive a message storm when my metrics collection system is down.
So that i don't miss important alerts.
As a webop
I would like to be able to upgrade my metrics aggregation system without losing data.
So that dependent monitoring systems dont fail.
As a webop
I would like to know when a system is on trajectory for failure based on the history of its metrics (eg root filesystem filling up, at current rate will fill in 3 days)
So that I can fix it before it fails.
As a webop
I would like to my metrics retrieval system to be reliable
So that I can base my health check alerts on its data.
As a webop
I would like to be able to easily inspect my graph data
So that I can easily review the overall performance of a cluster.
As a webop
I would like to be able to construct and share my own queries against metrics data
So that I can investigate the data available
As a webop
I would like to be able to create new alerts and checks quickly and easily
So I can quickly react to new issues discovered
As a webop
I would like to know the cache-hit ratio of my kernel fscache
So I can tune memory and backup timing for my applications.
As a webop
I would like visibility of key performance related kernel parameters
So I can see if changes in them affect system performance
As a webop
I would like visibility of key events in my system, such as reboots, deployments
So that I can see these markers on all graphs.
As a webop
I would like to be alerted when a metric has crossed a threshold
So I can examine the data and work out if it is a problem
As a webop
I would like to be provided with a link to the problem metric graph in an alert
So that I can inspect it myself.
As a webop
I would like to be alerted when a metric has started to behave abnormally
So I can see if it is a problem.
As a webop
I would like to be able to acknowledge or silence alerts that I know are not a problem
So that I don't get hassled.
As a webop
I would like to be able to configure custom per-node/service thresholds for alerts
So that I don't get hassled when the world changes.
As a webop or developer
I want to set up suitable notifications from the monitoring system, with a customisable endpoint, such as
* Full escalation path alerting (eg Pagerduty)
* Simple 'working hours' level serious alerts
* Notification into a chatops room.
So that I know about any issues as they happen, with a forewarning of severity.
As an on-call webop or dev
I would like to have a link to supporting documentation when i receive an alert
So that I get a head start on how to fix or diagnose the problem.