errzey/ssbug.md

## ssbug.md

      
    Raw
  

              ssbug.md
            
          
    Lets take a look at the vulnerable code:
if (s->servername_done == 0) {
    switch (servname_type) {
        case TLSEXT_NAMETYPE_host_name:
            if (s->session->tlsext_hostname == NULL) {
                if (len > TLSEXT_MAXLEN_host_name ||
                    ((s->session->tlsext_hostname = OPENSSL_malloc(len + 1)) == NULL)) {
                    *al = TLS1_AD_UNRECOGNIZED_NAME;
                    return 0;
                }
                memcpy(s->session->tlsext_hostname, sdata, len);
                s->session->tlsext_hostname[len] = '\0';
            }
    }
}
Here if tlsext_hostname is set to NULL, it will allocate len+1 bytes, then memcpy into
that buffer with a size of len. Nothing out of the ordinary, But there were some oversights
here that take a little digging to see.
OpenSSL is not thread-safe, it wasn't designed that way. You can force openssl to be
thread safe, but it's obvious that the code to do this was wedged in there at the last minute.
Here is how we initialize OpenSSL to be thread-safe:
pthread_mutex_t *thread_locks;

unsigned long 
openssl_thread_id(void) {
       return (unsigned long) pthread_self();
}

void 
openssl_thread_lock( int mode, int lock_id, const char* file, int line ) {
    if ( mode & CRYPTO_LOCK )
       pthread_mutex_lock( &thread_locks[lock_id] );
    else
       pthread_mutex_unlock( &thread_locks[lock_id] );
}

void init_ssl_lock(void) {
       int num_thread_locks = CRYPTO_num_locks();
       thread_locks = calloc( sizeof( pthread_mutex_t ), num_thread_locks );

       for (i = 0; i < num_thread_locks; i++)
       {
           if ( pthread_mutex_init( &thread_locks[i], NULL ) ) 
           {
               fprintf(stderr, "Unable to create mutex\n");
               exit(80);
           }
       }

       CRYPTO_set_locking_callback( &openssl_thread_lock ); 
       CRYPTO_set_id_callback( &openssl_thread_id );
}
We can usually assume this would cover any type of potential race-conditions,
but in this case, one was overlooked.
There are several functions that add, fetch, and remove entries from
the session cache: ssl_get_new_session(), ssl_get_prev_session(),
SSL_CTX_add_session(), SSL_CTX_remove_session(). These all call
CRYPTO_w_lock()/CRYPTO_w_unlock() for any type of data modifications. Yay!
Here is the catch: ssl->session is shared across multiple threads, modified
by the functions above. But in the function ssl_parse_clienthello_tlsext(),
the value of s->session->tlsext_hostname is checked for a NULL value
without any locking.
There is a possibility that (if the value of tlsext_hostname is NULL)
that two threads enter the same block of code at the same time,
one overwiting the others pointer to tlsext_hostname.
The outcome of the flow would look something like this:

[ Thread-A ]                                  [ Thread-B ]
  if (tlsext_hostname == NULL)         |           |
     |                                 |        if (tlsext_hostname == NULL)
    tlsext_hostname = malloc(256);     |           |
     |                                 |         tlsext_hostname = malloc(1);        
    memcpy(tlsext_hostname, buf, 255); |           |
                                       |         memcpy(tlsext_hostname, buf, 1); 

What happens here is Thread-A sees that tlsext_hostname is NULL. Thread-A then
allocates a user controlled length (note: max of 256), finally doing memcpy of
the user controlled data with a size of 255 (leaving room for the trailing '\0').
But in this case, between the time Thread-A allocated the buffer and the memcpy,
Thread-B had allocated a much smaller amount of memory into the tlsext_hostname
variable.
If executed properly, Thread-A would overwrite the heap by 254 bytes (255 - 1)
So the fix was basically to check the value of s->hit before doing anything stupid. Why?
int ssl3_get_client_hello(SSL *s) {
      /* bunch of crap here */

      /* remember, ssl_get_prev_session properly locks s->session */
      i = ssl_get_prev_session(s, p, j, d + n);
      if (i == 1) { /* previous session */
            s->hit=1;
      }
}
SSL *s is allocated on the heap thus thread safe. If ssl_get_prev_session()
(which does proper locking) returns 1, s->hit is set, and then used later
on inside ssl_parse_clienthello_tlsext to determine if tlsext_hostname is
already allocated.

  
## sslpend.md

      
    Raw
  

              sslpend.md
            
          
    During one of our load tests we found that in some cases, randomly but under high load,
SSL sockets would just stop receiving data which would result in hanging connections.
This condition was basically impossible to reproduce in a consistent manner as we had
no idea how such a thing could be triggered.
It HAD to be OpenSSL, because, well, OpenSSL is to blame for all of the problems anyone
has ever had. Take the TV show "The Jersey Shore" as an example; while you could argue
that the success of the show was the result of the diminishing intelligence of American
viewers, I place the full blame on OpenSSL. This I am sure of.
Because of this issue, I now know OpenSSL far to well, that is, as much as is humanly
possible without wanting to jam nails in my eyes. Let me sum up OpenSSL in two words:
callback hell. Oh, and also: developers nightmare. It is impossible to maneuver using
only static analysis. All debugging had to be done at run-time. I remember hours of
'step, step, step, step, "ah man, I'm in memcpy", finish, step, step, step'.
I was at my wits end. I felt broken, and, perhaps I was. It seemed as if all of my work
was for not, as if this one single issue could destroy any notion of my self-worth. I
was seriously considering quitting my job to work for 711, where my skills may be better applied.
It was at this point when I decided I must explore higher-level API's for any issues that
might be the cause. So I started with the abstraction around the SSL IO: Libevent. I had
already been an active contributor to the project and knew the internals fairly well. The
OpenSSL IO was wrapped around libevent's bufferevent API, which is in itself, an abstraction
around an event API.
A bufferevent is a simple concept: All IO is abstracted into one structure where you
deal only with input and output buffers.
On the other hand, OpenSSL is incredibly complex, especially when it comes down to
non-blocking IO. A read or a write may return one of many error conditions which
may be the opposite of the function you called. For example, SSL_read() may return a
status that says that SSL_write() must be called first. It's this type of logic that
makes life hard.
One thing I remembered, which I wish I had remembered a week beforehand, was the
fact that the OpenSSL bufferevents would always segfault unless it was put in a "deferred"
state. A deferred bufferevent means any IO is not immediately executed. Instead, the IO
would be queued and ran in the next iteration of the event loop.
Looking back, I think we all took this for granted, a fact of life: OpenSSL
bufferevents did not work unless they are deferred, end of story. I don't think a
nyone ever dug in to figure out exactly why.
The only reason I started looking was because it was easier to debug events that were
executed immediately. The path of execution was much better consumed by my brain in this state.
From here, things started to fall in place, I was able to fully visualize the flow. I
ran the the application through high-load and waited for a client to hang. A client hanged.
I attached my debugger and started stepping my way around the event loop. From what I
could see, the OS never flagged the underlying socket as readable again. Knowing this,
I was confident that all the data had been received from the socket.
Yet the application thought otherwise.
So where did that data go? I came across an OpenSSL function which I had never seen before:
SSL_pending(). The documentation for this function is cryptic; a single line description:
"obtain number of readable bytes buffered in an SSL object".
So wait a minute, does this mean that in some cases, the data returned by SSL_read() was
not everything? Was there a magical buffer where data is hidden until the next call to
SSL_read() was called?
As it turns out, yes. There was a magical buffer which contained data that SSL_read()
did not immediately give to you. The idea was the next SSL_read() would give you back
the data both in this magical buffer, and the data on the socket. But in a non-blocking
event handling world, a socket would only be read when the file descriptor has been
set as readable. If by chance the last 10 bytes of a request was in this magical
buffer, the socket would never become readable again because technically, all the
data had been read. It just so happened to be somewhere it wasn't expected to be.
The solution was simple, SSL_read() as normal, then immediately call SSL_pending(), and
if there is pending data, SSL_read() again.
Problem solved. I relaxed. And life was good.
A few days after the release with this patch (considered a critical bugfix), emails started
to trickle in about TOR instabilities. For those who do not know, TOR's entire networking
backend is Libevent. Users started seeing high-cpu utilization and eventual crashes
of their TOR processes. The only change made in that time-frame was mine.
I broke TOR. Not many can say this, but I can. With two lines of code, I was able to do a lot of damage.
While I was not able to reproduce the problem, I quickly figured out what would
cause the problem. If the network was under high load, and SSL_pending always
returned true, your program would be in a constant state of SSL_read(). In other
words, an infinite loop that is only processing one connection. So in came the
bad bug, and the worse bug.
The bad bug: we could potentially cause resource unfairness by reading too much
data from the underlying bufferevent; it can potentially cause read looping if
the underlying bufferevent is not deferred. The worse bug: If we didn't do this,
then we would potentially not read any more until more data arrived, which could
lead to an infinite wait.
Another release was made, and life was once again good.
In the end, it wasn't a direct problem with OpenSSL, instead it was an issue of
OpenSSL not being used correctly. Which in turn, was a direct result of OpenSSL
not documenting what WAS correct.