Skip to content

Instantly share code, notes, and snippets.

@ceroloy
Last active December 29, 2021 08:38
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ceroloy/7f0a103ed41fd5812f9320d897099cf3 to your computer and use it in GitHub Desktop.
Save ceroloy/7f0a103ed41fd5812f9320d897099cf3 to your computer and use it in GitHub Desktop.
Prometheus Memory Usage Investigation

To understand the proper memory limits for Prometheus, we first need to observe the memory usage behavior of the deployment. After the increase in memory, we can clearly see the reported usage doesn't decrease but oscillate around new level. This is an unexpected behavior, and why the RSS memory is not released after the increase needs to be investigated. Keep in mind that this pattern was initially observed in only one region.

Current configuration:

  • Version 2.18.1
  • GoVersion go1.14.2
  • Prometheus Helm Chart: v11.5.0

Screenshot from 2021-10-29 10-51-10 Screenshot from 2021-10-29 10-51-05

Understanding container memory metrics

From [4]

Metric name Metric Type Description Units Option Parameter
container_memory_cache Gauge Total page cache memory bytes memory
container_memory_rss Gauge Size of RSS bytes memory
container_memory_swap Gauge Container swap usage bytes memory
container_memory_usage_bytes Gauge Current memory usage, including all memory regardless of when it was accessed bytes memory
container_memory_working_set_bytes Gauge Current working set bytes memory

From the graphics it can be seen that with an ever increasing container_memory_usage_bytes, it is not easy to determine a memory limit for this deployment. One potential problem that can arise from ill defined memory limits is the deployment will be OOMKilled unnecessarily.

As a result it is important to understand how the aforementioned container metrics are involved in OOMKill decision. container_memory_working_set_bytes metric is monitored for OOMKill decisions. [2] As a result this metric is used to determine memory limit and related alerting.

Understanding current memory usage pattern of Prometheus

From the graph it can be clearly seen that working set bytes is much smaller than the usage bytes metric. Usage byte metrics and RSS memory patterns look very similar, in fact total usage seems to be not going down due to remaining RSS. [3]Explains that memory usage reporting changed after golang 1.12 and the behavior seen in the memory report is not due to a memory leak but instead is a consequence of the change made.

Moving Forward

[3] suggests the following

  • If you depend on container_memory_usage_bytes switch to container_memory_working_set_bytes metric for closest possible experience to actual usage. It’s not perfect though.
  • Use go_memstats_alloc_bytes and others (e.g go_memstats_.*_inuse_bytes) to see actual allocations. Useful when profiling and optimizing your application memory. This helps to filter out the memory that is “cached” and it’s the most accurate from the application perspective.

References

  1. Deep dive into K8s metrics - P3 - Container resource metrics
  2. What is the relation between container memory working set bytes and OOM
  3. Golang Memory Monitoring by Plotka
  4. List of Prometheus container metrics reported by cAdvisor

Reproducing the memory usage behavior

To understand the root cause behind the memory usage pattern, we can stress test our prometheus instances. Inspired by the [1] a stress test in staging Prometheus deployment is executed via running the following command in prometheus-server pod.

#test 1
yes | tr \\n x | head -c 4000000000 | grep n

#test2 - exited after memory cache eviction observed to avoid pod eviction.
yes | tr \\n x | head -c 4000000000000 | grep n

Test 1 and test 2 results can be seen in sequence in the following image. Screenshot from 2021-10-29 16-28-28

From these tests it is clear the all memory components decrease following the removal of the load on the system. Contrary to findings in [1] the total memory reported decreased and returned to pre test levels. Given this and the initial memory usage pattern, it is clear that the root cause of the initial issue is different than the one reported in [1]

References

  1. Golang Memory Monitoring by Plotka
  2. Dropping metrics at scrape time with Prometheus
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment