Oh hey, this rant of mine is making the rounds.
After I wrote this, one of the Matrix leads commented on it, which prompted me to look at their code. I have since found, uh, 4 3 different cryptographic issues in Matrix's Olm and Megolm code.
Expect a blog post on Dhole Moments at some point in August.
One of them is extremely bad, and will put a lot of burden on Matrix users to mitigate effectively. False alarm: I was mistaken about this one. I'll include it in the write-up, though.
Ever since I wrote It's Time For Furries to Stop Using Telegram, I've had a few folks ask me about my opinion on Matrix.
(I've also had a few people evangelize Matrix in my mentions. That's annoying.)
My stance on Matrix has been the same for years: I don't trust the Matrix developers to produce a secure protocol, and until they abandon Olm / Megolm in favor of something like MLS, I'm adamant about refusing to trust their team's designs.
To understand why I feel so strongly about this, you need to understand that practically exploitable vulnerabilities were found in Matrix in 2022.
It isn't enough that there were vulnerabilities found to be alarming. Vulnerabilities happen. You aren't writing software if you don't occasionally fuck up.
The problem is the nature of these vulnerabilities. They're fucking clown-shoes. From the Q&A section of the disclosure:
a. Simple confidentiality break: The root cause of this attack is the fact that room management messages are not authenticated, which is a design flaw in the protocol itself, as no mechanism was specified for authentication of such messages.
Let me translate this for you: It apparently didn't occur to the developers who created Olm and Megolm to authenticate the messages that manage who has access to encrypted groups.
b. Attack against out-of-band verification: This attack exploits an insecure implementation choice enabled by a design flaw in the specification as there is no domain separation enforced there.
(Italic text is my emphasis.)
Not using domain separation was inexcusable in 2009, when Nate Lawson was angry about people continuing to write APIs vulnerable to length-extension attacks.
The Olm spec was written six years later in 2015.
A common pattern with the Matrix disclosure is wanting to ask, "Who was the cryptography adult in the room when this was written?" followed by the chirping of crickets.
d. Trusted impersonation: This is an implementation error as no check is performed to check whether Olm is used for encryption or not.
e. Impersonation to confidentiality break: This is an implementation error as no check is performed to check whether Olm is used for encryption or not.
This is a stupid bug to have. Maybe we can forgive these two.
f. IND-CCA break: This theoretical attack exploits a protocol design flaw.
The protocol design flaw, for the record, was using AES-CTR to encrypt information, then HMAC to authenticate the ciphertext, but not including the IV in the HMAC calculation.
If these vulnerabilities were found from an earlier draft, then corrected by the team, I wouldn't mind so much. Iterative design is a thing in software development.
But if you want me to trust the protocol designs of people who felt qualified to design cryptography in 2015 when they didn't use domain separation, or completely forgot to authenticate their IV-- and didn't realize how badly they fucked up until Martin Albrecht, et al. pointed it out to them-- you're in for some disappointment.
Let's set aside the Cryptographic Doom Principle (which failing to authenticate the room management messages is a pretty glaring example of).
There are three things you have to test for in order to meet the lowest bar for a secure encrypted protocol design:
- Encrypt a message. Then flip every single bit in the ciphertext, one at a time, including any unencrypted headers. Your protocol should reject every altered ciphertext with one bitflip, and the time it takes for this rejection to happen should not be dependent on any secret numbers.
- Take some component of your system that gets hashed. Doesn't matter how it's hashed! Feed it into any other part of your system that uses hash functions. It should be rejected. You can try this with everything that hashes data or make it a rule to ALWAYS domain-separate your hashes.
- Bonus round: If you're feeding a multi-part message into a hash function, you need to take care to avoid canonicalization issues.
- When performing any kind of Diffie-Hellman key exchange, test against someone setting their public key to zero. Or to the prime of the field. Or to a low-order point if you're in a non-prime field (i.e. Montgomery curves). Matrix gets a point for this one, as far as I know.
I do not believe there are any backdoors, nor am I aware of any specific unpatched vulnerabilities, in Matrix today EDIT: Spoke too soon! See the update at the top.
However, as a cryptography professional, I personally do not trust Matrix. I do not trust it because the people developing Matrix have not demonstrated the competence necessary to perform the task they believe they are doing.
Most people are not capable of meeting this high bar. Do not take it as a personal attack. But please abandon Olm and Megolm and work with the cryptography community to build something actually secure. MLS is a good start.
Hi - thanks for writing this up. Speaking as project lead for Matrix, the RHUL vulnerabilities were certainly not our finest hour. I think some of your points may be worth clarifying though, in terms of their actual impact:
For context: Olm is a clone of Signal’s Double Ratchet, and Megolm is the protocol used to exchange message keys for group communication over Olm. Room management is done via the Matrix layer (not Olm or Megolm), which doesn’t yet authenticate events by the user.
Practically speaking, this means that the server can change who is in the conversation - but the users will be able to see it if this happens. The actual attack here is that a malicious server could fake an invite from
@admin:example.com
to@eve:evil.com
to join a room… at which point all the users at risk of encrypting messages to Eve will see “Eve has joined the room”. Similarly, a malicious server could add a malicious device to Alice’s account… at which point everyone at risk of encrypting messages to the malicious device will see a massive “Alice is no longer trusted” warning, assuming they verified Alice’s identity.Now, obviously this isn’t great, and it’d be much better to block malicious membership changes entirely - and we’ve been working on that both in Matrix (MSC3917) and in MIMI/MLS. Meanwhile we’re in the process of mandating that Matrix clients must never encrypt to unverified devices, but given this is a significant breaking change for the network, we need to give time to implementations to adjust.
The domain separation problem here is not a generally exploitable flaw, nor is it a bug in Olm (which is literally a clone of the Double Ratchet). Instead it’s effectively a type safety/confusion bug: the Matrix layer (https://spec.matrix.org/v1.10/client-server-api/#key-and-signature-security) stores either public keys or device identifiers for cross-signing in the same field. As the spec says:
In practice, this caused a vulnerability only on a single platform (the 1st gen JS sdk), so while it is obviously bad practice and a potential source of bugs, it’s not clear that this is a total disaster - it’s not like the design itself is intrinsically exploitable.
We’ve come a long way since starting E2EE in 2015, and
https://matrix.org/blog/2022/09/28/upgrade-now-to-address-encryption-vulns-in-matrix-sdks-and-clients/ gives an example of how the current team operates in terms of addressing these issues.
Hang on. While it is most definitely a screwup that the IV was not MACed, the reason it was not prioritized is because it is not exploitable. An attacker who manipulates the IV to corrupt the plaintext has no mechanism to see what the corrupted plaintext actually is - so can’t sculpt an attack on the recipient (or infer the private key). Also, this issue only applied to file transfers and secret storage in Matrix - not Olm or Megolm traffic.
So yes: overall, it’s true that we’ve accumulated some baggage while implementing E2EE in Matrix (and the main complaint from Albrecht et al was that we’d added encryption incrementally and the layering leaves room for bugs), but we’ve learned from this: first we consolidated the encryption implementations on Rust so we could improve things in one code base, we got it audited, and are now busy simplifying the whole thing: shifting to Trust On First Use, refusing to encrypt for untrusted devices, and having improved the user-facing behaviour we will eventually get cryptographic-controlled group membership in place too.
We’re definitely not perfect here; but I personally feel the level of response here is overstated when you look at the specific issues and their impact. And none of the issues here were actually in the Olm or Megolm protocol, but instead how those protocols interacted with the Matrix layer. Agreed that MIMI/MLS is an interesting approach though - which is why we co-authored the MIMI draft: https://datatracker.ietf.org/doc/draft-ietf-mimi-protocol/ and why we’re also working on Matrix+MLS: https://arewemlsyet.com/ (although progress is slow there due to prioritizing refining the current E2EE first).