We thought it was memory leak, but it was bad configuration
configuration that didnt work before:
- puma (RAILS_MAX_THREADS: 8, WEB_CONCURRENCY: 8)
- db pool - 10
- memory cache
worked:
- puma (RAILS_MAX_THREADS: 2, WEB_CONCURRENCY: 8)
- db pool - 100
- redis + elasticache
Conclusion: ???
Someone created publication with inlined image <img src="data:base64....">
and server crashed
because publication content were parsed for [shortcodes]
We thought it was memory leak, but it was not.
We uploaded such images and problem disappeared.
Lessons learned:
bundle exec derailed exec perf:mem_over_time
doesnt show right stats withoutGC.start
on each request- memory graph should be as logaripghm, if its increasing linearly - bad, if after 200k req its still increasing a little - fine
How it could be prevented:
- check maybe only some request cause crash