Skip to content

Instantly share code, notes, and snippets.

@AGWA
Last active October 31, 2016 20:33
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save AGWA/1de6c26be5396f7cbce7ee016302d684 to your computer and use it in GitHub Desktop.
Save AGWA/1de6c26be5396f7cbce7ee016302d684 to your computer and use it in GitHub Desktop.
OCSP Stapling Robustness in Apache and nginx

Date: Mon, 5 Oct 2015 16:34:03 -0700

Apache caches an OCSP response for one hour by default. Unfortunately, once the hour is up, the response is purged from the cache, and Apache doesn't attempt to retrieve a new one until the next TLS handshake takes place. That means that if there's a problem contacting the OCSP responder at that moment, Apache is left without an OCSP response to staple. Furthermore, it caches the non-response for 10 minutes (by default), so for the next 10 minutes, no OCSP response will be stapled to your responses. This isn't a theoretical concern; as Ivan Ristic says in Bulletproof SSL and TLS, "there is a lot of anecdotal evidence that OCSP responders can be flaky."

nginx's logic is a lot more robust than Apache's in this regard. Good OCSP responses are cached for an hour, but are not replaced until a successful new response has been received, meaning nginx can weather temporary OCSP responder outages. Unfortunately, nginx's logic is drastically worse in a different way: nginx kicks off OCSP queries on-demand, during the TLS handshake, but continues the handshake without waiting for the OCSP response to return. And since the OCSP response caches are unique per worker process, the first TLS connection handled by any given worker process never has a response stapled! (By the way, this makes testing whether you've properly enabled OCSP stapling rather annoying and confusing if you don't know about this.) This behavior also means that if a worker process sites idle for a long time, it doesn't refresh its OCSP responses and could staple an expired OCSP response on the next request it handles. [Update: the expired response issue is fixed in nginx 1.9.2. Now, if the cached OCSP response is expired, no response at all is stapled. A query to the OCSP responder is still initiated in the background, so subsequent handshakes should have a fresh stapled response.]

Also, this is a minor point, but neither server persists OCSP responses to disk between restarts, which seems to be me would be a sensible thing to do to enhance reliability.

One design choice that strikes me as sub-optimal is that both servers only initiate an OCSP query during the TLS handshake, from an OpenSSL callback function. As discussed above, nginx doesn't wait for the response to complete before continuing the handshake, so the first handshake has no OCSP stapling. Apache does wait, and [holds a rather coarse lock when doing so] (https://bz.apache.org/bugzilla/show_bug.cgi?id=57131), impacting performance. This is particularly bad if the OCSP responder is not responding: Apache will block the TLS handshake for 10 seconds (by default) before timing out. Instead of initiating OCSP queries during the TLS handshake, I think both servers would be better served by periodically making OCSP queries on a consistent schedule from either a dedicated thread or from a timeout in the main event loop.

@AGWA
Copy link
Author

AGWA commented Aug 7, 2016

If you liked reading about how Apache and nginx are broken, you'll love Ryan Sleevi's explanation of how to fix everything: https://gist.github.com/sleevi/5efe9ef98961ecfb4da8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment