Skip to content

Instantly share code, notes, and snippets.

@rostrovsky
Created July 7, 2023 12:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rostrovsky/95a3fa4d8d75bfb4c837c16da84c2ef0 to your computer and use it in GitHub Desktop.
Save rostrovsky/95a3fa4d8d75bfb4c837c16da84c2ef0 to your computer and use it in GitHub Desktop.
Prometheus storage analysis

Prometheus itself doesn't provide built-in tools to directly analyze which metrics are using the most storage. However, there are several ways you can get an idea of which metrics are taking up a lot of space:

  1. Cardinality: A single Prometheus metric with high cardinality (a large number of unique timeseries) can take up a lot of storage. You can use PromQL queries to find metrics with high cardinality. One such query might be count({__name__=~".+"}) by (__name__), which will count the number of unique timeseries for each metric.

  2. Metric volume: You can also check the rate at which metrics are ingested into Prometheus, using a query like rate(prometheus_tsdb_head_samples_appended_total[5m]). This will give you an idea of how many samples are being ingested, which could be a good proxy for understanding which metrics are taking up a lot of space. However, this query won't directly tell you which metrics are responsible for the volume, but will tell you how quickly you're accumulating data overall.

  3. Prometheus Tools: There are a number of external tools you can use to analyze Prometheus storage:

    • Prometheus Tsdb tool: The tsdb tool, which is included in the Prometheus repository, provides various utilities to work with TSDB blocks, one of which is the analyze command. This can give you a rough idea of which metrics occupy the most disk space. You can run it using the command: ./tsdb analyze data_dir block_id.

    • PromLens: PromLens is a tool for exploring and diagnosing PromQL queries, it can help visualize the data and queries but it's not specifically designed for storage analysis.

    • Thanos: Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments. It includes a "compact" component that among other things, can show you statistics about your data, including which labels are taking up the most space.

Please note that these methods give you information about potential candidates for high storage usage, but they don't guarantee that these metrics are actually the ones using the most storage. For example, a metric might have high cardinality, but if it's rarely updated, it won't use much storage. Similarly, a metric might be updated very frequently, but if it's always the same few timeseries being updated, it also won't use much storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment