sleevi/ocsp-stapling.md

## ocsp-stapling.md

      
    Raw
  

              ocsp-stapling.md
            
          
    On Twitter the other day,
I was lamenting the state of OCSP stapling support on Linux servers, and got
asked by several people to write-up what I think the requirements are for OCSP
stapling support.


Support for keeping a long-lived (disk) cache of OCSP responses.
This should be fairly simple. Any restarting of the service shouldn't
blow away previous responses that were obtained. This doesn't need to be
disk, just stable - and disk is an easy stable storage for most server
operators.


Validate the server responses to make sure it is something the client
will accept.
There's a number of ways to botch this on the server, and sadly, a number
of ways in which CAs can botch their response generators. The most immediate
and obvious issues are situations where you have a 'revoked' response, or when
you receive an OCSP 'tryLater' or 'internalError' response. However, there's
also more subtle issues, like making sure the OCSP Response as actually
well-formed (sometimes uploads to CDNs are botched), is time valid for the
current time (sometimes the CDNs server stale files), is for the certificate
requested (yes, sadly, really), and any sort of PKI-related errors (for
example, the delegated OCSP signer's certificate being expired).


Refreshes the response, in the background, with sufficient time before
expiration.
A rule of thumb would be to fetch at notBefore + (notAfter - notBefore) / 2,
which is saying "start fetching halfway through the validity period". You want
to be able to handle situations like the OCSP responder giving you junk, but
also sufficient time to raise an alert if something has gone really wrong.
What you do NOT want to do is start OCSP fetching the first time you need
it, or waiting until the response is fully expired - that creates really
terrible experiences all around, and makes your CA an even bigger point of
failure.


That said, even with background refreshing, such a system should observe
the Lightweight OCSP Profile of RFC 5019.
This more or less boils down to "Use GET requests whenever possible, and
observe HTTP cache semantics." Given how complicated the cache semantics can
be to get right in a client, this can be surprisingly hard to implement
correctly.


As with any system doing background requests on a remote server, don't be
a jerk and hammer the server when things are bad.
The Internet is a strange and wonderful place, and sometimes servers and
networks have issues. When a server supporting OCSP stapling has trouble
getting a request, hopefully it does something smarter than just retry in a
busy loop, hammering the OCSP server into further oblivion. This may seem
implied by the previous two remarks, but it's worth spelling out.


Distributed or proxiable fetching
From talking with server operators, a variety of situations are brought up
as challenges for OCSP stapling. One common bucket is the problem of front-end
and back-end splits - there may be thousands of FE servers, all with the same
certificate, all needing to staple an OCSP response. You don't want to have
all of them hammering the OCSP server - ideally, you'd have one request, in
the backend, and updating them all.
A variation of this problem is FEs that aren't actually allowed to
initiate outbound connections. Sometimes it's required that the FE talk to a
proxy server, sometimes it's just outright blocked - so a system should be
robust in handling that distribution.
This may not be a problem for the OCSP daemon to solve - it could be that
the matter is just treated as a general configuration management/distribution
problem - but at least it should be clear to those deploying the config what
the tradeoffs are. For example, is it possible for the config distribution
system to mangle responses? Should FEs still check the validity of incoming
responses?


The ability to serve old responses while fetching new responses.
That is, it shouldn't be mutually exclusive - it's not that there is the
'ONE TRUE RESPONSE' - some flexibility for multiple responses is needed.


Some idea of what to do when "things go bad".
What happens when it's been 7 days, no new OCSP response can be obtained,
and the current response is about to expire? Do you:

Stop the (web/email/ftp/xmpp) service?
Stop serving stapled OCSP responses?

Especially in a world where Must-Staple becomes more prevalent, what
should the action be taken when things go awful? If it's a Must-Staple cert,
it might be more beneficial to fully stop the service (thus causing monitoring
to really flip out) rather than serve bad responses or no response, both of
which may result in even worse user experiences.


Configurable OCSP responder per-certificate-being-checked.
The CA/Browser Forum's Baseline Requirements allows CAs to omit the
authorityInfoAccess extension for situations where the subscriber has agreed
to staple. This agreement can be done via contractual means or technical means,
which is to say that it's not predicated on the Must-Staple extension in the
certificate. The reason for this omission is to allow for smaller certificates,
which offsets (a very small amount) of the size increase of the OCSP response.
For these certificates, the server operator will need to configure what
the OCSP responder URL is for that certificate.


Staple by default.
If you can get all the above worked out, with sane behaviours, there is
very little reason that OCSP stapling shouldn't be on by default. Make it
happen!


If this seems like an unfairly long list, the reality is that virtually all
of this is supported by Microsoft IIS services today. The Microsoft
documentation is a bit spread out, but this
is good for starters, and this is good
for further reading.
Given this long list of things, which do seem somewhat 'basic', it seems a shame
to require every TLS server to reimplement this. This seems ideal to have as
a common, stand-alone daemon/service, which can then interface with a variety
of TLS servers (IMAP, SMTP, HTTP, FTP, XMPP, etc).
Perhaps the most basic interface for this is simply dropping the OCSP response
to a well-known path pre-agreed with the server. The server can monitor for
changes to this file. When changes are noticed, it can start serving the
new response. While some logic (such as shutting down the service) may be more
complicated, that at least starts with some basic functionality.