Finding 'Heartbleed' class of bugs with taint analysis.
Background reading: https://heartbleed.com/
While Coverity
is now able to detect this bug, we wanted to evaluate the
state of open-source security tooling in 2024.
Have we been able to reduce the cost of finding such bugs after all these years?
Can we find an execution path
from the tainted data in the n2s
function to
sensitive functions?
Since n2s
typically operates on network received bytes, it can serve as a
taint source.
int
tls1_process_heartbeat(SSL *s)
{
unsigned char *p = &s->s3->rrec.data[0], *pl;
unsigned short hbtype;
unsigned int payload;
/* ... */
hbtype = *p++;
n2s(p, payload);
pl = p;
/* ... */
if (hbtype == TLS1_HB_REQUEST)
{
/* ... */
memcpy(bp, pl, payload); // BAD: overflow here
/* ... */
}
/* ... */
}
Source: https://codeql.github.com/codeql-query-help/cpp/cpp-openssl-heartbleed/
The payload
variable is the number of bytes that should be copied from the
request back into the response. The call to memcpy does this copy. The problem
is that payload
is supplied as part of the remote request, and there is no
code that checks the size of it. If the caller supplies a very large value,
then the memcpy call will copy memory that is outside the request packet.
Install LLVM and Clang 20 from https://apt.llvm.org/. I am actually running these under WSL 2 on a Windows 11 laptop for a change.
Fetch and extract the affected OpenSSL source code.
wget http://www.openssl.org/source/openssl-1.0.1f.tar.gz
tar -xf openssl-1.0.1f.tar.gz
$ cat ~/taint_config.yml
Filters:
Propagations:
- Name: n2s
DstArgs: [0, 1]
Sinks:
- Name: CRYPTO_malloc
Args: [0]
- Name: memcpy_
Args: [2]
See https://clang.llvm.org/docs/analyzer/user-docs/TaintAnalysisConfiguration.html for details on this topic.
Let's patch the OpenSSL source code a bit to make it amenable to taint analysis.
Replace the following macro with a function definition:
#define n2s(c,l) (l =((IDEA_INT)(*((c)++)))<< 8L, \
l|=((IDEA_INT)(*((c)++))) )
void n2s(unsigned char *data, unsigned int *b);
Rename memcpy
to memcpy_
in ssl/d1_both.c
file.
Declare the following function helper:
void memcpy_(void *a, void *b, size_t len);
Patch the n2s
calls in ssl/d1_both.c
file:
Before:
n2s(p, payload);
After:
n2s(p, &payload);
user@newie:~/openssl-1.0.1f$ ./config
...
Configured for linux-x86_64.
user@newie:~/openssl-1.0.1f$ scan-build-20 -enable-checker optin.taint.GenericTaint -analyzer-config optin.taint.TaintPropagation:Config=/home/user/taint_config.yml clang -I. -Iinclude -c ssl/d1_both.c
scan-build: Using '/usr/lib/llvm-20/bin/clang' for static analysis
ssl/d1_both.c:1490:12: warning: Untrusted data is passed to a user-defined sink [optin.taint.GenericTaint]
1490 | buffer = OPENSSL_malloc(1 + 2 + payload + padding);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
include/openssl/crypto.h:368:29: note: expanded from macro 'OPENSSL_malloc'
368 | #define OPENSSL_malloc(num) CRYPTO_malloc((int)num,__FILE__,__LINE__)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ssl/d1_both.c:1496:3: warning: Untrusted data is passed to a user-defined sink [optin.taint.GenericTaint]
1496 | memcpy_(bp, pl, payload);
Done - We have successfully found network data flowing into sensitive functions directly!
Can we find these bugs with less source code patching work?
Or can we use Coccinelle for this patching work?
-
https://www.blackduck.com/blog/detecting-heartbleed-with-static-analysis.html
-
https://www.giac.org/paper/gsec/36189/role-static-analysis-heartbleed/143117
-
https://blog.trailofbits.com/2014/04/27/using-static-analysis-and-clang-to-find-heartbleed/
-
https://clang.llvm.org/docs/analyzer/user-docs/TaintAnalysisConfiguration.html