Skip to content

Instantly share code, notes, and snippets.

@mhammond
Last active May 10, 2016 23:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mhammond/e51a494cb04dc0acd44acf2c1589e7c5 to your computer and use it in GitHub Desktop.
Save mhammond/e51a494cb04dc0acd44acf2c1589e7c5 to your computer and use it in GitHub Desktop.
Sync health metrics

Sync Health Metrics

A summary of the initial data that will be recorded in the Sync telemetry ping.

Data analysis

The collected data will primarily be used to answer the following questions. Images are used for visualization, are not composed of actual data and only show a very short time range.

Key performance indicators

Number of Syncs completed and overall error rate

How healthy is Sync?

This reports the total number of Syncs done per day, split by success and failure. This is the high-level overview of the general health of the Sync system.

Total Syncs

Error rate by release

Have improvements we made actually improved the overall health?

This reports rate of sync errors by release version. This will help tell us if improvements made in specific versions have had the impact we hoped for.

For example, in the above chart we can see that the error rates for "forms" and "bookmarks" improved in 48, but "tabs" got worse in 49.

Error rate by engine

How healthy are the individual Sync engines?

This reports the number of times the Sync failed for individual engines, so we can determine if a particular data type (such as bookmarks, passwords etc) is recording a higher than expected number of errors.

For example, in the above chart we can see fairly stable error rates per engine, although the error rate for bookmarks has been declining.

This reports the number of record successfully applied and those which failed. In this scenario Sync didn't actually fail, but certain Synced data (eg, a bookmark) did not get applied.

The following chart is for a single engine, but this should be duplicated for each engine (or better, all engines overlaid in a single chart)

Error rates per user

Are Sync errors evenly distributed across users, or do a few users see the bulk of the errors?

This will give us insights into whether we should focus on the reasons why just a few users have extreme error rates, or whether the errors are evenly distributed.

For example, in the above chart we can see that while the average error rate is high, the vast majority of users have a low error rate.

@ckarlof
Copy link

ckarlof commented May 10, 2016

Great start Mark.

  • In "Error rate by release" and "Error rate by engine", what is the label of the y-axis?
  • In the "Error rates per user", instead of mode, mean, median, we might consider 50%, 75%, 90%, 95%, 99%, which tells us a little more about the extremes. This chart also needs a label on the y-axis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment