Skip to content

Instantly share code, notes, and snippets.

@mikepea
Last active August 29, 2015 14:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mikepea/5226efe1b87e6700023c to your computer and use it in GitHub Desktop.
Save mikepea/5226efe1b87e6700023c to your computer and use it in GitHub Desktop.
Monitoring Pack User Stories v1

Monitoring Pack User Stories

Front-end web developer

As a front-end webdev
I would like visibility of which browsers are in use
So that I can convince my product owner to stop supporting crap browsers or devices

As a front-end webdev
I would like to see exceptions that are logged from my code
So that I can fix bugs

As a front-end webdev
I would like to know the performance of my our site from remote locations.
So that we can figure out how to optimise it.

As a front-end webdev 
I want to know what devices (screen sizes) are viewing the site
So that we can support them correctly

As a front-end webdev 
I want to know % of users that have javascript
So that we can establish what to do when they don't.


Back-end web developer

As a back-end webdev
I would like to be able to instrument my code and visualise the data
So that I can see which parts need work

As a back-end webdev
I would like to know that all backend systems are operational.
So that I can be happy my application is functional

As a back-end webdev
I would like to be able to see all the logging related to a given web session
So that I can trace the path through the backend systems.

As a back-end webdev 
I want to know the scale at which my app is working via key facts like:
  * number of docs
  * queries against db
  * number of items in queue
  * number of users
  * key page views
So that I know how it is growing and can plan accordingly.

As a back-end webdev 
I want to know which of my DB queues are taking longer than n milliseconds
So I can optimise them if possible.

As a back-end webdev
I want a series of smoke tests to be run after all deployments
So that I know that I haven’t broken anything when deploying my application

As a back-end webdev
I want a simple way of instrumenting my application to feed metrics to the metrics system
So that I can gain visibility of how my application is running in production
And so we can find and fix problems with it quickly

Product Owner

As a product owner
I would like visibility of the dashboards that my team use to ascertain site health
So that I can also be happy that the site is performant

As a product owner
I would like to know that end-users are happy
So that I can dance for joy

As a product owner
I would like to see that key functionality of my site is used
So that I can be aware of trends in usability.

As a product owner 
I would like the ability to see how releases affect user progress 
so I can learn from what we release and simplify the user journey.


Tech Arch

As a technical architect 
I would like the ability to graph releases alongside system errors 
so that I can see how a release affects the site's stability.

As a technical architect, 
I would like to see how site errors affect user progress 
so I can understand and communicate how release quality impacts user satisfaction.

Security analyst

As a security analyst
I would like the data in the logging system to be encrypted at rest and in transit
So that we do not leak information

As a security analyst
I would like access to the data in the logging system to be authenticated and recorded.
So that we have awareness of who can view it and when they do.

As a security analyst
I would like to be able to view activity on all systems
So that I can be aware of intrusions and unauthorised access.

As a security analyst
I would like the time on all log messages to be accurate to the millisecond and in UTC.
So that I can plot the timeline of events accurately.

As a security analyst 
I would like to monitor inappropriate access 
so that I can understand if attacks against our systems are increasing with time.



Web operations

As a webop
I would like to NOT receive a message storm when my metrics collection system is down.
So that i don't miss important alerts.

As a webop
I would like to be able to upgrade my metrics aggregation system without losing data.
So that dependent monitoring systems dont fail.

As a webop
I would like to know when a system is on trajectory for failure based on the history of its metrics (eg root filesystem filling up, at current rate will fill in 3 days)
So that I can fix it before it fails.

As a webop
I would like to my metrics retrieval system to be reliable
So that I can base my health check alerts on its data.

As a webop
I would like to be able to easily inspect my graph data
So that I can easily review the overall performance of a cluster.

As a webop
I would like to be able to construct and share my own queries against metrics data
So that I can investigate the data available

As a webop
I would like to be able to create new alerts and checks quickly and easily
So I can quickly react to new issues discovered

As a webop
I would like to know the cache-hit ratio of my kernel fscache
So I can tune memory and backup timing for my applications.

As a webop
I would like visibility of key performance related kernel parameters
So I can see if changes in them affect system performance

As a webop
I would like visibility of key events in my system, such as reboots, deployments
So that I can see these markers on all graphs.

As a webop
I would like to be alerted when a metric has crossed a threshold
So I can examine the data and work out if it is a problem

As a webop
I would like to be provided with a link to the problem metric graph in an alert
So that I can inspect it myself.

As a webop
I would like to be alerted when a metric has started to behave abnormally
So I can see if it is a problem.

As a webop
I would like to be able to acknowledge or silence alerts that I know are not a problem
So that I don't get hassled.

As a webop
I would like to be able to configure custom per-node/service thresholds for alerts
So that I don't get hassled when the world changes.

As a webop or developer
I want to set up suitable notifications from the monitoring system, with a customisable endpoint, such as
  * Full escalation path alerting (eg Pagerduty)
  * Simple 'working hours' level serious alerts
  * Notification into a chatops room.
So that I know about any issues as they happen, with a forewarning of severity.

As an on-call webop or dev
I would like to have a link to supporting documentation when i receive an alert
So that I get a head start on how to fix or diagnose the problem.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment