ircmaxell/its_all_about_time.md Secret

## its_all_about_time.md

      
    Raw
  

              its_all_about_time.md
            
          
    It's All About Time

An interesting pull request has been opened against PHP to make bin2hex() constant time. This has lead to some interesting discussion on the mailing list (which even got me to reply :-X). There has been pretty good coverage over remote timing attacks in PHP, but they talk about string comparison. I'd like to talk about other types of timing attacks.

What Is A Remote Timing Attack?

Ok, so let's say you have the following code:
function containsTheLetterC($string) {
    for ($i = 0; $i < strlen($string); $i++) {
        if ($string[$i] == "c") {
            return true;
        }
        sleep(1);
    }
    return false;
}

var_dump(containsTheLetterC($_GET['query']));

It should be pretty easy to tell what it does. It accepts the query parameter from the URL, then goes letter by letter through it, checking to see if it's a lowercase c. If it is, it returns. If not, it sleeps for one second.
So let's imagine we passed in the string ?query=abcdef. We'd expect that the check would take 2 seconds.
Now, let's image that we didn't know what letter was being looked for. Let's imagine that instead, the "c" was a different value that we don't know. Can you figure out how to figure out what that letter is?
It's simple. We construct a string "abcdefghijklmnopqrstuvwxyzABCDEFGHIJ....", and pass it in. Then we can time how long it takes to return. Then we know which character is different!!!
That's the basis of a timing attack.
But we don't have code that looks like that in the real world. So let's look at a real example:
$secret = "thisismykey";
if ($_GET['secret'] !== $secret) {
    die("Not Allowed!");
}

To understand what's going on, we need to look at is_identical_function from PHP's source code. If you look at that function, you'll see that the result is defined by the following case:
case IS_STRING:
    if (Z_STR_P(op1) == Z_STR_P(op2)) {
        ZVAL_BOOL(result, 1);
    } else {
        ZVAL_BOOL(result, (Z_STRLEN_P(op1) == Z_STRLEN_P(op2))
        && (!memcmp(Z_STRVAL_P(op1), Z_STRVAL_P(op2), Z_STRLEN_P(op1))));
    }
    break;

The if basically is asking if both variables are the same variable (like $secret === $secret). In our case, that's not possible, so we only need to look at the else block.
Z_STRLEN_P(op1) == Z_STRLEN_P(op2)

So we immediately return if the string lengths don't match.
That means that more work is done if the strings are the same length!
If we time how long it takes to execute different lengths, we would see something like this:


length
time run 1
time run 2
time run 3
average time


7
0.01241
0.01152
0.01191
0.01194


8
0.01151
0.01212
0.01189
0.01184


9
0.01114
0.01251
0.01175
0.01180


10
0.01212
0.01171
0.01120
0.01197


11
0.01210
0.01231
0.01216
0.01219


12
0.01121
0.01211
0.01194
0.01175


13
0.01142
0.01174
0.01251
0.01189


14
0.01251
0.01121
0.01141
0.01171


If you ignore the average column, you'll notice that there doesn't appear to be much of a pattern. The numbers are all within reason of each other.
But if you average a number of runs, you start to notice a pattern. You'll notice that length 11 takes longer (slightly) then the other lengths.
This example is greatly exaggerated. But it illustrates the point. It's been shown that you can remotely detect differences in time down to about 15 nanoseconds using a sample size of about 49,000 (so 49,000 tries instead of 3 in the above example).
But so what, we found the lenght. That doesn't buy us much... But what about the second part? What about memcmp(...)?
If we look at the implementation of memcmp()::
int memcmp(const void *s1, const void *s2, size_t n)
{
    unsigned char u1, u2;

    for ( ; n-- ; s1++, s2++) {
        u1 = * (unsigned char *) s1;
        u2 = * (unsigned char *) s2;
        if ( u1 != u2) {
            return (u1-u2);
        }
    }
    return 0;
}

Wait a tick! That returns at the first difference between two strings!
So once we've identified the length of the string, we can try different strings to start detecting a difference:
axxxxxxxxxx
bxxxxxxxxxx
cxxxxxxxxxx
dxxxxxxxxxx
...
yxxxxxxxxxx
zxxxxxxxxxx

And through the same technique, wind up noticing a difference with "txxxxxxxxxx" taking ever so slightly longer than the rest.
Why?
Let's look at what happens step by step in memcmp.


First, it looks at the first character of each string.
If the first character is different, return immediately.


Next, look at the second character of each string.
If they are different, return immediately.


And so on.


So with "axxxxxxxxxx", it only executes the first step (since the string we're comparing is "thisismykey"). But with "txxxxxxxxxx", the first and second steps match. So it does more work, hence takes longer.
So once you see that, you know t is the first character.
Then it's just a matter of repeating the process:
taxxxxxxxxx
tbxxxxxxxxx
tcxxxxxxxxx
tdxxxxxxxxx
...
tyxxxxxxxxx
tzxxxxxxxxx

Do that for each character, and you're done. You've successfully deduced a secret!
Protecting Against Comparison Attacks

So that's a basic comparison attack. == and === are both vulnerable to that in PHP.
There are two basic ways of defending against it.
The first is to manually compare the two strings, and always compare every character (this is my function from an earlier blog post:
/**
 * A timing safe equals comparison
 *
 * To prevent leaking length information, it is important
 * that user input is always used as the second parameter.
 *
 * @param string $safe The internal (safe) value to be checked
 * @param string $user The user submitted (unsafe) value
 *
 * @return boolean True if the two strings are identical.
 */
function timingSafeEquals($safe, $user) {
    // Prevent issues if string length is 0
    $safe .= chr(0);
    $user .= chr(0);

    $safeLen = strlen($safe);
    $userLen = strlen($user);

    if ($userLen != $safeLen) {
        return false;
    }

    $result = 0;

    for ($i = 0; $i < $userLen; $i++) {
        // Using % here is a trick to prevent notices
        // It's safe, since if the lengths are different
        // $result is already non-0
        $result |= (ord($safe[$i]) ^ ord($user[$i]));
    }

    // They are only identical strings if $result is exactly 0...
    return $result === 0;
}

The second is to use PHP's built in hash_equals() function. This was added in 5.6 to do the same thing as our above code.
NOTE In general, it's not possible to prevent length leaks. So it's OK to leak the length. The important part is that it doesn't leak information about the difference of the two strings.
Other Types Of Timing Attacks - Index Lookup

So that's comparison. And that's quite well covered. But let's talk about index lookup:
If you have an array (or string), and you use secret information as the index (key), you can potentially leak information about that key.
To understand why, we need to get into a bit on how CPUs work with memory.
A CPU, in general, has fixed width registers. Think of these as small variables. On a modern processor, these registers are likely to be 64 bits (8 bytes) wide. That means that the largest variable that a CPU can ever deal with at a single time is 8 bytes. (Note: this is not true, as most processors have vector-based operations such as SIMD that allow it interact with more data. For the purposes of this discussion though that's not important).
So what happens when you want to read a string that's 16 bytes long?
Well, the CPU needs to load it in chunks. Depending on the operation, it may load the string 8 bytes at a time, and operate on it 8 bytes at a time. Or, more often, it operates on it one byte at a time.
So that means that it needs to fetch the rest of the string from somewhere. This "somewhere" is main memory (RAM). But memory is extremely slow. like REALLY slow. On the order of 100ns. Which is WAY over our 15 nanosecond threshold.
And since main memory is so slow, CPUs have little bits of memory on the CPU itself to act as a cache. In fact, they typically have 2 types of cache. They have L1 cache, which is specific to each core (each core gets its own L1 cache), a L2 cache, which is also specific to a core, and a L3 cache which is often shared across all cores on a single chip. Why 3 tiers? Because of speed:
Memory Type | Size   | Latency
------------+--------+---
L1 Cache    | 32kb   | 0.5 ns
L2 Cache    | 256kb  | 2.5 ns
L3 Cache    | 4-16MB | 10-20 ns
RAM         | LOTS   | 60 - 100 ns
So let's look at what happens when you do string[index] on a C string (char *, a character array). Imagine you have this code:
char character_at_offset(const char *string, size_t offset) 
{
    return string[offset]
}

The compiler will compile that as:
character_at_offset:
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movq    %rdi, -8(%rbp)
    movq    %rsi, -16(%rbp)
    movq    -16(%rbp), %rax
    movq    -8(%rbp), %rdx
    addq    %rdx, %rax
    movzbl  (%rax), %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

There's a lot of noise in there though. Let's trim it down to a non-functional, but more appropriate size:
character_at_offset:
    addq    %rdx, %rax
    movzbl  (%rax), %eax
    popq    %rbp
    ret

The function takes two arguments, one of which is a pointer (the first element of the string), the second is an integer offset. It adds the two together to get the memory address of the character we want. Then, movzbl moves a single byte from that address and store it in %eax (it also zero's out the rest).
So, how does the CPU now where to find that memory address?
Well, it traverses up the chain of cache until it finds it.
So if it's in L1 cache, the overall operation will take approximately 0.5 ns. If it's in L2, 2.5ns. And so on. So by carefully timing the information, we can deduce where that item is cached (or if it's cached at all).
It's also worth noting that CPUs don't cache individual bytes. They cache blocks of memory called lines. A modern processor will typically have 64 byte wide cache lines. That means that each entry in cache is a continuous 64 byte block of memory.
So, when you do the memory fetch, the CPU will fetch a 64 byte block of memory into the cache line. So if your movzbl call needs to hit main memory, an entire block is copied into the lower cache lines. (note, this is a gross simplification, but it's for demonstration with what happens next).
Now, here's where things get really interesting.
Let's imagine that we're dealing with a large string. One that doesn't fit into L2 cache. So 1MB.
Now, let's imagine that we're fetching bytes from that string based off a secret sequence of numbers.
By watching how long it takes to fetch the bytes, we can actually determine information about the secret!!!
Let's imagine we fetch the following offsets:

offset 10
offset 1

The first fetch will cause a cache miss, which will load from main memory into the caches.
But the second fetch (offset 1) will fetch from L1 cache, since it's likely to be on the same cache line (memory block) as offset 10 was. So it's likely to be a cache hit.
If we then fetched offset 2048, it's likely to not be in cache.
So by carefully watching the pattern of delays, you can determine some information about the relationships of sequences of the offsets. And by doing this with the right information a lot of times, you can deduce the secret.
This is called a cache-timing attack.
Now that seems really far fetched, right? I mean, how often do you fetch information that perfectly? How can that possibly be practical. Well, it is 100% practical, and happening in the real world.
Defense Against Cache-Timing Attacks:

There is only one practical way of defending against this style attack:

Don't index arrays (or strings) by secrets.

It's really that simple.
Branching Based Timing Attacks

How many times have you seen something like the following code?:
$query = "SELECT * FROM users WHERE id = ?";
$stmt = $pdo->prepare($query);
$stmt->execute([$_POST['id']]);
$user = $stmt->fetchObject();

if ($user && password_verify($_POST['password'], $user->password) {
    return true;
}
return false;

Surely that's secure?
Well, there is information leak there.
If you try different user names, it will take a different amount of time depending on if the username is there or not. If password_verify takes 0.1 seconds, you can simply measure that difference to determine if the username is valid or not. On average, requests for taken usernames will take longer than those for available ones.
Now, is this an issue? I don't know, it depends on your requirements. Many websites want to keep usernames secret, and try not to expose information about them (for example: not saying whether the username or password is invalid on a login form).
If you're trying to keep the username secret, you're not.
Defense against branching based timing attacks

The only way to do this is to not branch. But there's a problem there. How do you get functionality like above if you don't branch?
Well, one idea would be to do the following:
$query = "SELECT * FROM users WHERE id = ?";
$stmt = $pdo->prepare($query);
$stmt->execute([$_POST['id']]);
$user = $stmt->fetchObject();

if ($user) {
    return password_verify($_POST['password'], $user->password);
} else {
    password_verify("", DUMMY_HASH);
}
return false;

Which means that you run password_verify on both cases. This cuts out the 0.1 second difference.
But the core timing attack still exists. The reason is that the database will take a slightly different amount of time to return the query for one where it found the user, and one where it didn't. This is because internally, it's doing a lot of branching and conditional logic, and it eventually needs to transmit the data over the wire back to the program.
So the only way to defend against this style attack, is to not treat your username as a secret!!!
A Note On "Random Delays"

Many people, when they hear about timing attacks, think "Well, I'll just add a random delay! That'll work!". And it doesn't.
To understand why, let's talk about what actually happens when you add a random delay:
The overall execution time is work + sleep(rand(1, 10)). If rand is well behaved (it's decently random), then over time we can average it out.
Let's say it's rand(1, 10). Well, that means when we average out runs, we'll get an average delay of about 5. The same average added to all cases. So all we need to do is run it a few more times per run to average out the noise. The more times we run it, the more that random value will tend to average out. So our signal is still there, it just needs slightly more data to combat the noise.
So if we needed to run 49,000 tests to get an accuracy of 15ns, then we would need perhaps 100,000 or 1,000,000 tests for the same accuracy with a random delay. Or perhaps 100,000,000. But the data is still there.
Fix the vulnerability, don't just add noise around it.
Back To The Point

There's currently a thread on PHP internals about whether to make certain core functions timing safe or not. The specific functions being discussed are:

bin2hex
hex2bin
base64_encode
base64_decode
mcrypt_encrypt
mcrypt_decrypt

Now, why those functions? Well, bin2hex and base64_encode are quite often used when encoding output to browsers (encoding session parameters for example). The more important ones however are the hex2bin and base64_decode, as they can be used for decoding secret information (like a key prior to using it for encryption).
The concensus among most of the respondents so far has been that it's not worth making them all slower just to get more safety. And I agree with that.
However, what I don't agree with is that it'll make them "slower". Changing comparison (from == to hash_equals) is slower because it changes the complexity (best, average, worst) of the function from O(1, n/2, n) to O(n, n, n). That means that it will have a significant impact on peformance for the average case.
But changing encoding functions doesn't affect the complexity. They will remain O(n). So the question is, what's the speed difference? Well, I took a stab at benchmarking bin2hex and hex2bin with the PHP algo and a timing safe one, and the difference wasn't overly significant. Encoding (bin2hex) was about the same (margin of error), and the difference for decoding (hex2bin) as about 0.5µs. That's 5e-10 seconds more for a string of about 40 characters.
To me, that's a small enough difference to not worry about at all. How many times does the average application call one of those effected functions? Perhaps once per execution maybe? But what's the potential upside? That perhaps a vulnerability is prevented?
Perhaps. I don't think there's a strong reason to do it, in general these vulnerabilities are going to be extremely hard to pull off in the types of applications people write in PHP. But with that said, if the implementation was sane and fast enough (to me, 0.5µs is fast enough), then I don't think there's a significant reason not to make the change. Even if it helps prevent one single attack in all the millions of users of PHP, is that worth it? Yes. Will it prevent a single attack? I have no idea (likely not).
However, there are a few functions that I believe must be continuously audited for timing safety:

mcrypt_*
hash_*
password_*
openssl_*
md5()
sha1()
strlen()
substr()

Basically, anything we know will be used with sensitive information, or will be used as a primitive in sensitive operations.
As far as the rest of the string functions, either there's no need for them to be made timing safe (like lcfirst or strpos) or it's impossible (like trim) or it's already done (like strlen) or it has no business being in PHP (like hebrev)...
length	time run 1	time run 2	time run 3	average time
7	0.01241	0.01152	0.01191	0.01194
8	0.01151	0.01212	0.01189	0.01184
9	0.01114	0.01251	0.01175	0.01180
10	0.01212	0.01171	0.01120	0.01197
11	0.01210	0.01231	0.01216	0.01219
12	0.01121	0.01211	0.01194	0.01175
13	0.01142	0.01174	0.01251	0.01189
14	0.01251	0.01121	0.01141	0.01171