Skip to content

Instantly share code, notes, and snippets.

@joewiz
Last active September 3, 2023 11:57
Show Gist options
  • Save joewiz/4c39c9d061cf608cb62b to your computer and use it in GitHub Desktop.
Save joewiz/4c39c9d061cf608cb62b to your computer and use it in GitHub Desktop.
Recovery from nginx "Too many open files" error on Amazon AWS Linux

On Tue Oct 27, 2015, history.state.gov began buckling under load, intermittently issuing 500 errors. Nginx's error log was sprinkled with the following errors:

2015/10/27 21:48:36 [crit] 2475#0: accept4() failed (24: Too many open files)

2015/10/27 21:48:36 [alert] 2475#0: *7163915 socket() failed (24: Too many open files) while connecting to upstream...

An article at http://www.cyberciti.biz/faq/linux-unix-nginx-too-many-open-files/ provided directions that mostly worked. Below are the steps we followed. The steps that diverged from the article's directions are marked with an *.

  1. * Instead of using su to run ulimit on the nginx account, use ps aux | grep nginx to locate nginx's process IDs. Then query each process's file handle limits using cat /proc/pid/limits (where pid is the process id retrieved from ps). (Note: sudo may be necessary on your system for the cat command here, depending on your system.)
  2. Added fs.file-max = 70000 to /etc/sysctl.conf
  3. Added nginx soft nofile 10000 and nginx hard nofile 30000 to /etc/security/limits.conf
  4. Ran sysctl -p
  5. Added worker_rlimit_nofile 30000; to /etc/nginx/nginx.conf.
  6. * While the directions suggested that nginx -s reload was enough to get nginx to recognize the new settings, not all of nginx's processes received the new setting. Upon closer inspection of /proc/pid/limits (see #1 above), the first worker process still had the original S1024/H4096 limit on file handles. Even nginx -s quit didn't shut nginx down. The solution was to kill nginx with the kill pid. After restarting nginx, all of the nginx-user owned processes had the new file limit of S10000/H30000 handles.
@mef
Copy link

mef commented Aug 6, 2016

Very handy, thanks

@ivan24
Copy link

ivan24 commented Oct 14, 2016

+1

@vpxavier
Copy link

vpxavier commented Nov 8, 2016

Solved my issue thanks !

@ranjeetranjan
Copy link

Solved my issue thanks !

@nickjwebb
Copy link

Nice. One quick update, though, on my 2016.09 ALAMI (m3.medium) fs.file-max is set to 382547 out of the box, so I skipped step 2.

@marcoceccarellispotsoftware

+1 thank so much

@jlapier
Copy link

jlapier commented Mar 30, 2017

After making these changes, I was getting 1024 worker_connections are not enough
So I also increased worker_connections (which is limited by worker_rlimit_nofile). See: http://nginx.org/en/docs/ngx_core_module.html#worker_connections

@christrotter
Copy link

Also saved my bottom. Much thanks.

@kscc25
Copy link

kscc25 commented May 3, 2017

@jsmrcaga
Copy link

jsmrcaga commented Nov 1, 2017

Awesome! Thanks 💥

@srimaln91
Copy link

Thanks a lot.

@maderluc
Copy link

maderluc commented Jan 9, 2018

thanks man

@ramezanpour
Copy link

Thank you. My problem seems to be solved.

@omarelsayed1992
Copy link

Great ! Thanks 👍

@nixorn
Copy link

nixorn commented Jun 26, 2018

Thanks!

@ray-moncada
Copy link

ray-moncada commented Jul 6, 2018

Thank you, will implement. This adds load to the server. Has anyone experienced the CPU or Memory working harder?

@ray-moncada
Copy link

I followed the instructions but I am getting 5 NGINX processes now. prior to the changes I only had 4 nginx processes. Why is there a 5 process labeled "Master", which was not there before?

One of the processes is labeled as "master" something that was not there before. The other 4 processes are worker processes. The workers processes have the right max Hn and Sn configuration. But the master does not.

@atmosx
Copy link

atmosx commented Jul 10, 2018

I followed the instructions but I am getting 5 NGINX processes now. prior to the changes I only had 4 nginx processes. Why is there a 5 process labeled "Master", which was not there before?

I'm guessing you have 4 CPUs and workers are set to auto which means one per CPU. NGINX uses a master/worker model to achieve high speed throughput (async model). What you're seeing is perfectly normal.

From their website:

NGINX uses a predictable process model that is tuned to the available hardware resources:

  • The master process performs the privileged operations such as reading configuration and binding to ports, and then creates a small number of child processes (the next three types).
  • The cache loader process runs at startup to load the disk‑based cache into memory, and then exits. It is scheduled conservatively, so its resource demands are low.
  • The cache manager process runs periodically and prunes entries from the disk caches to keep them within the configured sizes.
  • The worker processes do all of the work! They handle network connections, read and write content to disk, and communicate with upstream servers.

read more at https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/

@eojones
Copy link

eojones commented Jul 14, 2018

Worked for me, thank you so very much!!!!! You're awesome <3

@ryancheung
Copy link

tks!

@mskian
Copy link

mskian commented Dec 19, 2018

Thanks a lot, it works 💯
Awesome...!

@berenddeboer
Copy link

If you use systemd, instead of 2 you need to create an override and put LimitNOFILE=10000 in the [Service] section.

@ldrrp
Copy link

ldrrp commented Jul 23, 2019

This solved my issue, in the process i noticed it only hit this limit because one of a default setting of 300s timeout. It was backing up connections which is why it hit the limit. I lowered timeout to 5s and also fixed an issue on the server it was sending to that had a issue of its own that caused the overall backup

@jmg
Copy link

jmg commented Sep 9, 2020

Thanks for making this guide! Just one thing:

In point 5:
Added worker_rlimit_nofile 30000; to /etc/nginx/nginx.conf.

Please notice that:
The worker_rlimit_nofile 30000 directive goes in the main context of the config file. Outside server {...} and events {...}. Reference: http://nginx.org/en/docs/ngx_core_module.html#worker_rlimit_nofile

I was struggling with this because I tried to add it on the server {...} context and this didn't fail but it wasn't increasing the max open files.

@jianhaiqing
Copy link

awesome

@claytonrothschild
Copy link

Worked for us. Thanks a lot. Also, @jmg has an important insight

@marensas
Copy link

  1. service nginx restart succeeded. No need to manually kill processes by PID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment