Prometheus itself doesn't provide built-in tools to directly analyze which metrics are using the most storage. However, there are several ways you can get an idea of which metrics are taking up a lot of space:
-
Cardinality: A single Prometheus metric with high cardinality (a large number of unique timeseries) can take up a lot of storage. You can use PromQL queries to find metrics with high cardinality. One such query might be
count({__name__=~".+"}) by (__name__)
, which will count the number of unique timeseries for each metric. -
Metric volume: You can also check the rate at which metrics are ingested into Prometheus, using a query like
rate(prometheus_tsdb_head_samples_appended_total[5m])
. This will give you an idea of how many samples are being ingested, which could be a good proxy for understanding which metrics are taking up a lot of space. However, this query won't directly tell you which metrics are responsible for the volume, but will tell you how quickly you're accumulating data overall. -
Prometheus Tools: There are a number of external tools you can use to analyze Prometheus storage:
-
Prometheus Tsdb tool: The
tsdb
tool, which is included in the Prometheus repository, provides various utilities to work with TSDB blocks, one of which is theanalyze
command. This can give you a rough idea of which metrics occupy the most disk space. You can run it using the command:./tsdb analyze data_dir block_id
. -
PromLens: PromLens is a tool for exploring and diagnosing PromQL queries, it can help visualize the data and queries but it's not specifically designed for storage analysis.
-
Thanos: Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments. It includes a "compact" component that among other things, can show you statistics about your data, including which labels are taking up the most space.
-
Please note that these methods give you information about potential candidates for high storage usage, but they don't guarantee that these metrics are actually the ones using the most storage. For example, a metric might have high cardinality, but if it's rarely updated, it won't use much storage. Similarly, a metric might be updated very frequently, but if it's always the same few timeseries being updated, it also won't use much storage.