Everything I'm talking about below is not new, but I thought it was an interesting idea and realized I already had the majority of pieces in place to play with it. I want to share what I learned. If you are at all interested in exploring this topic further a good paper on it is here. Also, a few years ago IOActive published a blog post on the technique which is also a good read. Finally, the last two paragraphs in section 6 of RFC5246 clearly document the problem the best I've been able to find:
Any protocol designed for use over TLS must be carefully designed to deal with all possible attacks against it. As a practical matter, this means that the protocol designer must be aware of what security properties TLS does and does not provide and cannot safely rely on the latter. Note in particular that type and length of a record are not protected by encryption. If this information is itself sensitive, application designers may wish to take steps (padding, cover traffic) to minimize information leakage.
It is precisely the length of records we will take advantage of. This is easiest to understand with a stream cipher so we'll start there. But first, a quick primer on SSL
If you are unfamiliar with SSL it is actually a simple protocol to understand on the wire (the complexity is in the state machine). I'll gloss over some of the details but this will serve as a basic primer to SSL to make sure everyone is up to speed.
An SSL session starts with a handshake. The basic gist is this:
- Client to server: Hi! I'd like to talk SSL to you so here are the cipher suites and compression methods I know.
- Server to client: Awesome! I'll speak to you using cipher suite X and compression Y. Oh, and here's my public RSA key.
- Client to server: Brilliant! Here's something encrypted with your public key, now let's get on with our secret stuff!
At this point the client and server have each sent their hello messages, negotiated a cipher suite to use, and exchanged enough information (some in the clear and some encrypted with asymetric cryptography) to generate the symmetric keys to use during the session. The only thing left to do is to start using the negotiated symmetric cipher.
So now that they have done the handshake and are exchanging data what can we, a third-party observer, tell about the traffic? We know everything that was sent in the handshakes, which includes the negotiated cipher and compression methods. This is precisely what we will use to infer the contents of an application.
The basic premise is that applications have patterns to their communication which, unless otherwise protected, can be detected and used to infer the contents of the encrypted data with some degree of certainty. In my examples I'll be talking about a reverse shell running over SSL, but the technique can be applied to any application. Because stream ciphers are better suited for this demonstration I will start with a reverse shell over SSL where the cipher negotiated is
TLS_RSA_WITH_RC4_128_SHA. If you want to follow along the PCAP file is here.
Reverse Shells Over SSL (Stream Ciphers)
For my example the reverse shell I'll be demonstrating is just cmd.exe piped over socat. Other things may look different but the basic technique applies.
When cmd.exe starts up the first thing printed is:
0000000: 4d 69 63 72 6f 73 6f 66 74 20 57 69 6e 64 6f 77 |Microsoft Window| 0000010: 73 20 5b 56 65 72 73 69 6f 6e 20 36 2e 31 2e 37 |s [Version 6.1.7| 0000020: 36 30 31 5d 0d 0d 0a 43 6f 70 79 72 69 67 68 74 |601]...Copyright| 0000030: 20 28 63 29 20 32 30 30 39 20 4d 69 63 72 6f 73 | (c) 2009 Micros| 0000040: 6f 66 74 20 43 6f 72 70 6f 72 61 74 69 6f 6e 2e |oft Corporation.| 0000050: 20 20 41 6c 6c 20 72 69 67 68 74 73 20 72 65 73 | All rights res| 0000060: 65 72 76 65 64 2e 0d 0d 0a 0d 0d 0a 63 3a 5c 57 |erved.......c:\W| 0000070: 69 6e 64 6f 77 73 5c 53 79 73 74 65 6d 33 32 3e |indows\System32>|
From our perspective as a third-party observer this will look like random data to us. But what we can tell is the size of the message. A typical banner is 128 bytes long. We know the negotiated cipher (
TLS_RSA_WITH_RC4_128_SHA) is a stream cipher and the size of the MAC (
SHA) is 20 bytes. That means we can reasonably expect to see a 148 byte application record from client to server. Looking at the PCAP file we can see exactly that at frame #11.
We also know the first thing typically run on a reverse shell is
ipconfig /all, which is 14 bytes (including the 0x0a) plus 20 bytes for the MAC and we can reasonably expect to see 34 bytes from server to client. This is precisely what we see in frame #13.
At this point it should become clear what is going on. The application is a reverse shell which has predictable application record sizes. By knowing the application you can begin to make educated guesses about the contents of the traffic.
Reverse Shells Over SSL (Block Ciphers)
For this example you can play along using this PCAP file. In this example the negotiated cipher is
TLS_RSA_WITH_AES_256_CBC_SHA. We know that this is a block cipher using 16 byte blocks and MAC size is 20 bytes.
We already know the typical banner is 128 bytes long, which is exactly 8 blocks long, plus the 2 blocks for the MAC (20 bytes padded to a multiple of the blocksize). This means in the case of this cipher suite we should expect to see 10 blocks from client to server.
Looking at the PCAP file we can clearly see two application data records in frame #11, 32 bytes (2 blocks) long and 160 bytes (10 blocks) long, respectively. We already know the cipher is running in CBC mode and section 188.8.131.52 of RFC4346 says that the first block must be discarded. This means we can discard the first application record, leaving us with just the 10 blocks.
ipconfig /all command is one block of cleartext plus the two blocks for the MAC. Looking at the PCAP file we see frame #13 has two application records, 32 bytes (2 blocks) long and 48 bytes (3 blocks) long, respectively. The same discarding applies as before and we can begin to make educated guesses about the contents of SSL traffic by knowing what the application inside it should look like.
In my examples I've been using a reverse shell because it is easiest to demonstrate the concept. However a reverse shell is a human driven protocol, and it can quickly become difficult to reasonably infer what is going on simply because they may execute a command that results in odd output, or worse yet, output which has similar length characteristics as other commands.
In many cases this technique is not terribly interesting, but in the case where all you have is a PCAP consisting of network traffic from an intrusion it can be a useful technique to apply.
Collecting this information with ChopShop
In the interest of making this programatically useful I extended my SSL decryption capability (chop_ssl and sslim) in ChopShop to output as much metadata about SSL sessions as possible. I then wrote a module called sslam which will allow you to perform this kind of analysis by dumping blocks out to a file. For example:
./chopshop -f tests/ssl/reverseshell-rc4.pcap -s . "chop_ssl | sslam"
This will create a file in the current working directory that contains a series of C's and S's. By default, when running on a stream cipher it will output one C or S per encrypted byte seen.
When run on a block cipher it will output one C or S per block.
This allows you to then write regular expressions that can be used to look for specific patterns as described above.
In the case of web applications running over SSL, the more complicated your application is the harder it is to make reasonable statements about what is happening. This is why my example uses a reverse shell; they are simply the easiest thing to demonstrate the concept. With that said, the technique can be applied to more complicated applications running over SSL, as was demonstrated by IOActive.
This is obviously error prone but I found it to be an interesting way to look at encrypted data and make some educated guesses about the contents based upon the protocol inside it. Simple protocols like a reverse shell or a binary protocol should have patterns that stick out when doing this kind of analysis, while things like HTTP can become too deep in the grey area to reasonably conclude anything.