Skip to content

Instantly share code, notes, and snippets.

@soatok
Last active May 27, 2024 17:08
Show Gist options
  • Save soatok/8aef6f67fec9c702f510ee24d19ef92b to your computer and use it in GitHub Desktop.
Save soatok/8aef6f67fec9c702f510ee24d19ef92b to your computer and use it in GitHub Desktop.
Why I Don't Trust Matrix Developers to Produce a Secure Protocol

Update (2024-05-17)

Oh hey, this rant of mine is making the rounds.

After I wrote this, one of the Matrix leads commented on it, which prompted me to look at their code. I have since found, uh, 4 3 different cryptographic issues in Matrix's Olm and Megolm code.

Expect a blog post on Dhole Moments at some point in August.

One of them is extremely bad, and will put a lot of burden on Matrix users to mitigate effectively. False alarm: I was mistaken about this one. I'll include it in the write-up, though.

Original Gist Below

Ever since I wrote It's Time For Furries to Stop Using Telegram, I've had a few folks ask me about my opinion on Matrix.

(I've also had a few people evangelize Matrix in my mentions. That's annoying.)

My stance on Matrix has been the same for years: I don't trust the Matrix developers to produce a secure protocol, and until they abandon Olm / Megolm in favor of something like MLS, I'm adamant about refusing to trust their team's designs.

To understand why I feel so strongly about this, you need to understand that practically exploitable vulnerabilities were found in Matrix in 2022.

It isn't enough that there were vulnerabilities found to be alarming. Vulnerabilities happen. You aren't writing software if you don't occasionally fuck up.

The problem is the nature of these vulnerabilities. They're fucking clown-shoes. From the Q&A section of the disclosure:

a. Simple confidentiality break: The root cause of this attack is the fact that room management messages are not authenticated, which is a design flaw in the protocol itself, as no mechanism was specified for authentication of such messages.

Let me translate this for you: It apparently didn't occur to the developers who created Olm and Megolm to authenticate the messages that manage who has access to encrypted groups.

b. Attack against out-of-band verification: This attack exploits an insecure implementation choice enabled by a design flaw in the specification as there is no domain separation enforced there.

(Italic text is my emphasis.)

Not using domain separation was inexcusable in 2009, when Nate Lawson was angry about people continuing to write APIs vulnerable to length-extension attacks.

The Olm spec was written six years later in 2015.

A common pattern with the Matrix disclosure is wanting to ask, "Who was the cryptography adult in the room when this was written?" followed by the chirping of crickets.

d. Trusted impersonation: This is an implementation error as no check is performed to check whether Olm is used for encryption or not.

e. Impersonation to confidentiality break: This is an implementation error as no check is performed to check whether Olm is used for encryption or not.

This is a stupid bug to have. Maybe we can forgive these two.

f. IND-CCA break: This theoretical attack exploits a protocol design flaw.

The protocol design flaw, for the record, was using AES-CTR to encrypt information, then HMAC to authenticate the ciphertext, but not including the IV in the HMAC calculation.

This does not meet the bar for end-to-end encryption.

If these vulnerabilities were found from an earlier draft, then corrected by the team, I wouldn't mind so much. Iterative design is a thing in software development.

But if you want me to trust the protocol designs of people who felt qualified to design cryptography in 2015 when they didn't use domain separation, or completely forgot to authenticate their IV-- and didn't realize how badly they fucked up until Martin Albrecht, et al. pointed it out to them-- you're in for some disappointment.

Why is Matrix's design inadequate to meet the lowest bar?

Let's set aside the Cryptographic Doom Principle (which failing to authenticate the room management messages is a pretty glaring example of).

There are three things you have to test for in order to meet the lowest bar for a secure encrypted protocol design:

  1. Encrypt a message. Then flip every single bit in the ciphertext, one at a time, including any unencrypted headers. Your protocol should reject every altered ciphertext with one bitflip, and the time it takes for this rejection to happen should not be dependent on any secret numbers.
  2. Take some component of your system that gets hashed. Doesn't matter how it's hashed! Feed it into any other part of your system that uses hash functions. It should be rejected. You can try this with everything that hashes data or make it a rule to ALWAYS domain-separate your hashes.
  3. When performing any kind of Diffie-Hellman key exchange, test against someone setting their public key to zero. Or to the prime of the field. Or to a low-order point if you're in a non-prime field (i.e. Montgomery curves). Matrix gets a point for this one, as far as I know.

To be clear

I do not believe there are any backdoors, nor am I aware of any specific unpatched vulnerabilities, in Matrix today.

However, as a cryptography professional, I personally do not trust Matrix. I do not trust it because the people developing Matrix have not demonstrated the competence necessary to perform the task they believe they are doing.

Most people are not capable of meeting this high bar. Do not take it as a personal attack. But please abandon Olm and Megolm and work with the cryptography community to build something actually secure. MLS is a good start.

@TruncatedDinoSour
Copy link

meow

@ColonelThirtyTwo
Copy link

ColonelThirtyTwo commented May 15, 2024

Most of my concern of Matrix vs Signal is that Signal is yet another corporate un-interoperable IM system that's a setup for another entrap-extract business model (aka "enshitification"). It's just a change of the guard - while Signal currently seems better than Telegram and Discord so far, it would be easy for them in the future to turn around, close source their app, lock down their platform, and start adding "premium" features and ads.

I don't want to sound like a Matrix fanboy - your post is certainly valid criticism, and not the first I've heard of about Matrix. And I don't want to disparage on Signal too much - I've heard good things about it, and it's good that it's open source and non-profit. I want to explore other paradigms that help solve these systemic problems rather than jump ship yet again to another product that is at risk of sinking in the future.

@ara4n
Copy link

ara4n commented May 15, 2024

Hi - thanks for writing this up. Speaking as project lead for Matrix, the RHUL vulnerabilities were certainly not our finest hour. I think some of your points may be worth clarifying though, in terms of their actual impact:

a. Simple confidentiality break: The root cause of this attack is the fact that room management messages are not authenticated, which is a design flaw in the protocol itself, as no mechanism was specified for authentication of such messages.

Let me translate this for you: It apparently didn't occur to the developers who created Olm and Megolm to authenticate the messages that manage who has access to encrypted groups.

For context: Olm is a clone of Signal’s Double Ratchet, and Megolm is the protocol used to exchange message keys for group communication over Olm. Room management is done via the Matrix layer (not Olm or Megolm), which doesn’t yet authenticate events by the user.

Practically speaking, this means that the server can change who is in the conversation - but the users will be able to see it if this happens. The actual attack here is that a malicious server could fake an invite from @admin:example.com to @eve:evil.com to join a room… at which point all the users at risk of encrypting messages to Eve will see “Eve has joined the room”. Similarly, a malicious server could add a malicious device to Alice’s account… at which point everyone at risk of encrypting messages to the malicious device will see a massive “Alice is no longer trusted” warning, assuming they verified Alice’s identity.

Now, obviously this isn’t great, and it’d be much better to block malicious membership changes entirely - and we’ve been working on that both in Matrix (MSC3917) and in MIMI/MLS. Meanwhile we’re in the process of mandating that Matrix clients must never encrypt to unverified devices, but given this is a significant breaking change for the network, we need to give time to implementations to adjust.

b. Attack against out-of-band verification: This attack exploits an insecure implementation choice enabled by a design flaw in the specification as there is no domain separation enforced there.

(Italic text is my emphasis.)

Not using domain separation was inexcusable in 2009, when Nate Lawson was angry about people continuing to write APIs vulnerable to length-extension attacks.

The Olm spec was written six years later in 2015.

The domain separation problem here is not a generally exploitable flaw, nor is it a bug in Olm (which is literally a clone of the Double Ratchet). Instead it’s effectively a type safety/confusion bug: the Matrix layer (https://spec.matrix.org/v1.10/client-server-api/#key-and-signature-security) stores either public keys or device identifiers for cross-signing in the same field. As the spec says:

Since device key IDs (ed25519:DEVICE_ID) and cross-signing key IDs (ed25519:PUBLIC_KEY) occupy the same namespace, clients must ensure that they use the correct keys when verifying.

In practice, this caused a vulnerability only on a single platform (the 1st gen JS sdk), so while it is obviously bad practice and a potential source of bugs, it’s not clear that this is a total disaster - it’s not like the design itself is intrinsically exploitable.

A common pattern with the Matrix disclosure is wanting to ask, "Who was the cryptography adult in the room when this was written?" followed by the chirping of crickets.

We’ve come a long way since starting E2EE in 2015, and
https://matrix.org/blog/2022/09/28/upgrade-now-to-address-encryption-vulns-in-matrix-sdks-and-clients/ gives an example of how the current team operates in terms of addressing these issues.

f. IND-CCA break: This theoretical attack exploits a protocol design flaw.

The protocol design flaw, for the record, was using AES-CTR to encrypt information, then HMAC to authenticate the ciphertext, but not including the IV in the HMAC calculation.

This does not meet the bar for end-to-end encryption.

If these vulnerabilities were found from an earlier draft, then corrected by the team, I wouldn't mind so much. Iterative design is a thing in software development.

But if you want me to trust the protocol designs of people who felt qualified to design cryptography in 2015 when they didn't use domain separation, or completely forgot to authenticate their IV-- and didn't realize how badly they fucked up until Martin Albrecht, et al. pointed it out to them-- you're in for some disappointment.

Hang on. While it is most definitely a screwup that the IV was not MACed, the reason it was not prioritized is because it is not exploitable. An attacker who manipulates the IV to corrupt the plaintext has no mechanism to see what the corrupted plaintext actually is - so can’t sculpt an attack on the recipient (or infer the private key). Also, this issue only applied to file transfers and secret storage in Matrix - not Olm or Megolm traffic.

So yes: overall, it’s true that we’ve accumulated some baggage while implementing E2EE in Matrix (and the main complaint from Albrecht et al was that we’d added encryption incrementally and the layering leaves room for bugs), but we’ve learned from this: first we consolidated the encryption implementations on Rust so we could improve things in one code base, we got it audited, and are now busy simplifying the whole thing: shifting to Trust On First Use, refusing to encrypt for untrusted devices, and having improved the user-facing behaviour we will eventually get cryptographic-controlled group membership in place too.

We’re definitely not perfect here; but I personally feel the level of response here is overstated when you look at the specific issues and their impact. And none of the issues here were actually in the Olm or Megolm protocol, but instead how those protocols interacted with the Matrix layer. Agreed that MIMI/MLS is an interesting approach though - which is why we co-authored the MIMI draft: https://datatracker.ietf.org/doc/draft-ietf-mimi-protocol/ and why we’re also working on Matrix+MLS: https://arewemlsyet.com/ (although progress is slow there due to prioritizing refining the current E2EE first).

@soatok
Copy link
Author

soatok commented May 16, 2024

@ara4n

Hi - thanks for writing this up.

This is not a technical write-up. It's a core dump of opinions from someone who's tired of being asked for said opinion by strangers because I'm disgusted with an unrelated project's leadership (i.e., Telegram's). It is not an invitation for commentary from anyone (but especially said strangers who keep asking me about my opinion on Matrix, which you are admittedly and thankfully not a part of).

Speaking as project lead for Matrix, the RHUL vulnerabilities were certainly not our finest hour. I think some of your points may be worth clarifying though,

My opinions do not need clarification, because they are just that: Opinions.

Other people are free to disregard my opinions if they prefer. I honestly don't care.

I only wrote this because I got tired of being asked, and wanted a convenient link to send them instead of having to explain what my opinions are over and over and over again.

But! Since we're talking technicals:

Hang on. While it is most definitely a screwup that the IV was not MACed, the reason it was not prioritized is because it is not exploitable.

The exploit is that invalid data was accepted. Just because you can't leverage it to trigger a higher impact secondary exploit to recover plaintext doesn't mean it's not exploitable in the sense of defeating the security properties of the HMAC.

If your goal in replying here was to start a message board debate with me, I'm not interested. This Gist doesn't exist for that purpose. I'm just tired of having to answer the same needling fucking question over and over again, and Mastodon replies aren't really large enough for nuance.

If, instead, your goal was to convince me of the perceived error of my ways for holding a negative opinion of your project, this sort of "not exploitable" rhetoric is having the opposite effect.

Do whatever you want with this information.

@soatok
Copy link
Author

soatok commented May 16, 2024

@ara4n I decided to take the opportunity to look at your project's source code and found some vulnerabilities. Expect an email soon.

EDIT: Email sent to security@

@ara4n
Copy link

ara4n commented May 16, 2024

hey - huge thanks for taking a look and following the SDP. would be super interested to know what you think of vodozemac, which is the Rust implementation we've switched to. (and yes, no desire for a message board debate here either)

@lucasmz-dev
Copy link

@ColonelThirtyTwo Signal is a non profit; I doubt they'd turn into something like Telegram or Discord. They have a very clear mission, and they're well opniated in what they do without hurting what other people are doing in the process.

@SkyfallWasTaken
Copy link

SkyfallWasTaken commented May 18, 2024

@lucasmz-dev OpenAI used to be a nonprofit though (and in fact, still is!) - where are they now?

@RGBCube
Copy link

RGBCube commented May 18, 2024

@ColonelThirtyTwo Signal is a non profit; I doubt they'd turn into something like Telegram or Discord. They have a very clear mission, and they're well opniated in what they do without hurting what other people are doing in the process.

Wasn't OpenAI also a non-profit? It's definitely not impossible for Signal to follow the route of enshittification.

@ColonelThirtyTwo
Copy link

@ColonelThirtyTwo Signal is a non profit; I doubt they'd turn into something like Telegram or Discord. They have a very clear mission, and they're well opniated in what they do without hurting what other people are doing in the process.

That's certainly a point for Signal, don't get me wrong. And, for our sake, I certainly hope I'm wrong about Signal and they remain an excellent app.

But non-profit doesn't necessarily mean that it acts in the interest of its users, nor does it bar them from transitioning to a for-profit later. Between my experiences as a software dev at Datto and my observations of companies like Twitter and Boeing, it's not unusual for a company to be bought and the culture to do a 180 very quickly.

Which is why I'm interested in tech that shakes up the megaplatform paradigm.

@soatok
Copy link
Author

soatok commented May 20, 2024

@ColonelThirtyTwo @RGBCube @lucasmz-dev @SkyfallWasTaken Please, take this conversation elsewhere.

@DraconicNEO
Copy link

Most of my concern of Matrix vs Signal is that Signal is yet another corporate un-interoperable IM system that's a setup for another entrap-extract business model (aka "enshitification"). It's just a change of the guard - while Signal currently seems better than Telegram and Discord so far, it would be easy for them in the future to turn around, close source their app, lock down their platform, and start adding "premium" features and ads.

I don't want to sound like a Matrix fanboy - your post is certainly valid criticism, and not the first I've heard of about Matrix. And I don't want to disparage on Signal too much - I've heard good things about it, and it's good that it's open source and non-profit. I want to explore other paradigms that help solve these systemic problems rather than jump ship yet again to another product that is at risk of sinking in the future.

Pretty sure whatsapp used to be a lot like signal in the past, and now they are not anymore (considering mods and the operators can view people's messages, basically the opposite of what signal is now). So yeah it can happen, could it also happen with something like Matrix? It's possible, but also slightly more difficult because you need compliance of all the servers to get that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment