Skip to content

Instantly share code, notes, and snippets.

@quinthar
Last active February 7, 2021 16:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save quinthar/44e1c4f63f84556a9822ebf274dc510a to your computer and use it in GitHub Desktop.
Save quinthar/44e1c4f63f84556a9822ebf274dc510a to your computer and use it in GitHub Desktop.
Expensify.cash end-to-end encryption proposal

Expensify.cash end-to-end encryption proposal

Hi! I'm one of the developers behind Expensify.cash (and the CEO/founder of the company -- chat with me on Twitter @dbarrett) and these are some quick notes on how we might add end-to-end encryption. I'm generally familiar with the basics of encryption, and have read quite a bit on the Signal protocol, but am no expert so am eager to get advice from those who are. In particular, I would like to build a system optimized for simplicty and security against very plausible real world attacks people care about, without overengineering against exotic attacks that are unlikely to happen in the real world.

tl;dr

In short, I think this design can make Expensify.cash provide very strong protection against Your Friends, Your Boss, and Lawyers. And I also think it will protect you from The Cops, Hackers, and The Feds for all but the most severe concerns. But no amount of encryption will protect you from The Feds if sufficiently motivated, and anyone claiming otherwise is lying.

Attack Surface

To start, here are the major layers that an adversary might attack:

  1. The Network. This could mean eavesdropping on your wifi, intercepting your home internet connection to your provider, mass capture on some internet backbone, or targeted capture inside our datacenters right before the packets go to our cabinets.

  2. The Code. This could mean inserting an intentional vulnerability into our own code, or the code of some library we depend upon, or even into the operating system of our servers or your device.

  3. The Servers. This could mean someone establishing a persistent foothold in our servers, or even replacing our servers entirely with alternatives.

  4. Your Device. This means the phone in your hand or laptop in your posession (or the device of any friend you are talking with).

  5. The Math. This means the mathemtical complexity of the encryption algorithms themselves.

Adverseries, Capabilities, and Countermeasures

Clearly there are no shortage of potential poeple who might want to intercept your communications. But here are probably the major categories you might think of, and a quick summary of their likely capabilities:

The Feds

This might be the CIA, NSA, FBI, or some other shadowy three-letter organization we don't even know about. Additionally, in light of Five Eyes, the GWOT, and a bunch of multilateral information sharing agreements, it's probably safest to assume nearly all democratic first-world nations have combined their efforts, with all of them being collectively known as "The Feds".

Though there's no way to know their full capabilities, I think it's safest to assume that they have at least full control over The Network (ie, they can realistically record broad swathes of encrypted internet traffic for large periods time). In practice this is likely paranoid -- it's unlikely they are truly recording every single byte forever. But it seems safest to design on the assumption they are.

Furthermore, despite being a rather disturbing prospect, we should likely acknowledge that if sufficiently motivated, The Feds also have control over The Code: they can get a FISA warrant to secretly force any US citizen ship modified code (ie, to insert a back door), and then use a gag order to prevent them from admitting it. Any US hosted code (including all code in Github), as well as US hosted binaries (including every app in the iPhone App Store and Google Play) also fall under thier potential purview. Again, it's very unlikely that they are doing this in all but the most extreme situations. But we shouldn't ignore the possibility that they are, and that we'd have no way of knowing.

Next, they we should also assume they have control over The Servers: they can get a warrant to access the physical hardware of any US hosted server. This means if necessary they can force any US citizen to reveal any information stored on the servers, either on a point in time or on an ongoing basis. They might also force anyone to install special hardware or really do anything you can imagine can be done with physical access to the actual hardware.

Finally, there's Your Device. And this might be hard to hear, but I think a good security model should assume that The Feds can access your device pretty much whenever they want, so long as they are sufficienty incentivized. It doesn't need to be some Hollywood heist plot: they would just stop you, physically take your phone from you, and put you in jail until you unlock it (assuming they don't already have some trick to get it themselves). Hopefully that's not true, but let's build a system that anticipates this possibility, and then any surprises are good surprises.

In fact, the only thing I think is reasonable to assume they don't have control over is the the basic math of encryption itself. It's pretty widely inspected stuff, and even if The Feds have a monopoly on violence, they don't have a monoploy on mathemeticians. Granted, every algorithm is always one breakthrough from being obsolete -- some new technique (or the ever-lurking threat of quantum computing) is always right around the corner. And it's likely that The Feds will have a leg up on any new techniques merely because they are more motivated than most to find them. But to the degree encryption can be trusted at all, I think it's a reasonable position to assume that it in fact works as advertised. If nothing else, the alternative is pretty demoralizing.

Regardless, I can assure you that as of right now, they aren't doing anything of the sort with Expensify's code, data, or servers, and we would fight every request to the fullest extent of the law. And I'm sure the CEOs of Signal, Google, and Apple would say the same. They might even claim that they would personally go to jail or wipe all their servers to avoid data falling into the wrong hands -- and they might even mean it!

Unfortunately, you can't actually trust any of us: it's entirely possible we are being forced to lie to the public, conceivably at risk of physical harm to us and our families. There are a wide variety of opinions on whether that's a good or a bad thing, but a full analysis of the moral foundation of a nation's monopoly on violence in a particular geographic region is out of the scope of this document.

The main thing is: if you are planning on becoming a "person of interest" of The Feds, I'd urge you to reconsider your life choices, and be damn sure whatever cause you are fighting for is worth it. Because "the grid" goes far and wide, and no amount of encryption will protect you forever.

Hackers

If The Feds have the most power due to just being able to get a warrant for this information, the next most threatening entity would be a nebulous group of "Hackers". These would be people who have technical sophistication and presumably some motive to attack, as well as a willingness to work outside the law. Hackers fall into a few major buckets:

  • Nation States - These would be major state-sponsored hacking groups who have enormous skill and patience, and no real profit motive. They are willing to spend virtually unlimited time and money, so long as doing so aligns with national security objectives. Russia, China, and Iran are the most commonly mentioned, but it's best to assume that every nation with a military has some form of offensive "cyber" (eyeroll) capability. Nation States will take the time to create bespoke attacks to specific services, or specific targets. Just like it's a pretty dangerous move to piss off The Feds, it's maybe not the best idea to piss off other Nation States with active hacking groups.

  • Failed Nation States - The only difference between a Nation State and a Failed Nation State is a profit motive. In general, there's only one Failed Nation State hacker, and that's North Korea's Lazarus Group. Unlike other nation states who are hacking for national security, North Korea is hacking for cold hard cash: much of their work is to generate hard currency to fund state operations and bypass sanctions. This is also the group that hacked Sony as punishment for releasing a movie about assassinating the North Korean dictator Kim Jong Un. Once again, if you can avoid it, probably best to avoid pissing off the North Koreans.

  • Industrial Espionage - The most common accusation for this is against China, which (supposedly) works in close coordination with major Chinese companies to steal military and civilian technology to give their industry a leg up. I don't think anyone really has a good idea what's going on here, but it seems safest to assume that it is in fact happening, and that if there is some reason for Chinese industry to steal your IP, you should assume they will try.

  • Profiteers - Hackers with an overt profit motive definitely exist. However, they generally target banks -- especially those who run off-the-shelf banking software that runs on Windows servers. These are typically very well organized groups that have a whole network of specialized parties that work in coordination to not just attack the servers, but also steal the money, transfer it abroad, and then eventually withdraw it from thousands of ATMs around the world. That last step especially is very dangerous, but in general every group distrusts all the others, so it's much safer and more cost effective to repeat attacks that are already known to work rather than investing in creating a new one.

  • Bitcoin Blackmailers - The next most dangerous group are those who are trying to steal and threaten to release embarassing information, extorting you for Bitcoin. Another variation would be encrypting all your data to hold it hostage. Eihter way, these are likely opportunistic groups that will use general tools to hack a wide variety of systems -- probably starting with phishing attacks to install malware. This is a lot safer because there is no physical money to deal with, though Bitcoin is (despite its reputation for anonymity) quite literally the most carefully tracked currency in the history of the world, so it's not without risk to actually convert thier ill-gotten gains into real money.

  • Script Kiddies - These are amateur hackers who are just learning the ropes, typically by running well-known off-the-shelf tools designed to test servers for known vulnerabilities. Script Kiddies are just in it for the lolz.

Regardless, there are as many kinds of hackers as there are Hollywood plots. But just like you don't make a unique defense to protect against every possible infection, the general defense against all these is good hygiene: keep all your systems patched and don't click, don't click on sketchy links.

The Cops

Granted, the line between The Cops and The Feds is pretty murky -- and the closer you model your behavior on Snowden or Bin Laden, the closer you get to The Fed territory (which as discussed, is a real dangerous place to be). But 99% of people who care about encryption have no interest in genuinely overthrowing the government, and instead are looking for protection against government overreach.

One sceanrio of concern could be excessive force or other abuse of authority by an individual officer, such as at a border crossing, traffic stop, or protest. Thankfully, the powers of any individual officer to compel you to reveal your information on the spot are very limited, and a good lock screen is your best defense.

That said, if you are arrested and charged with a crime (even a trumped up crime that will never be prosecuted just to intimidate you), it's possible that you'll be asked to unlock your phone and present your chat history to officials. In this situation, end-to-end encrytion and self-destructing chats are important tools to ensure you remain in control of the information you reveal, by preventing The Cops from subpoening your chat history out of our servers.

Accordingly, it's best to assume The Cops can gain control of The Servers via a valid subpoena, and maybe Your Device if they are particularly aggressive. But it's unlikely they could get access to The Network or The Code.

Lawyers

The next threat -- and we're finally getting to something a bit more realistic that might actually concern you -- is some Lawyer coming after you via civil lawsuit: them claiming you did something, and you claiming you didn't. This would be resolved in a court, and would generally result in a court ordering you to turn over any written documents (including chats) related to the case. To be clear, you are legally required to do this; encryption doesn't absolve you this legal requirement. But it does ensure that the court can only make this request to you directly, and can't bypass you by sending the request straight to someone else (eg, us). Because if they did, we would honestly explain that we cannot help them due to you using legal encryption technology to protect your own privacy, and we don't have the key.

This means Lawyers have largely the same capabilities of The Cops: access to The Servers via subpoena, but probably not Your Device.

Your Boss

Expensify already has very strong privacy protections in place to ensure employers do not arbitrarily inspect the private data of employees. And similar to above, if your boss is served a legal subpoena to produce documents related to some civil lawsuit, any chats you have related to your employer are legally "discoverable" -- even if you use an end-to-end encrypted chat tool, or SMS, or one-time-pad encrypted paper notes, you are required to honor the court's request. But this proposed design creates very strong protections to ensure that no private communications with non-employees get accidentally revealed as well, by ensuring all chats with non-employees are fully encrypted and that neither Your Boss nor Expensify has the key.

Similar to The Cops and Lawyers, Your Boss can realistically have access to The Servers via subpoena in relation to a legitimate lawsuit, but probably not Your Device.

Your Friends

The final group you might want to protect your chats from being revealed to, are the people immediately around you -- your firends, family, and so on. Again, a good lockscreen PIN along with biometric unlock is probably your best defense. But unless your grip is really good, it's not infeasible that your unlocked phone could be snatched from your hand. Toward that end, self-destructing chats are likely the strongest protection.

Strangely, Your Friends probably have an easier time getting access to Your Device than literally anyone else -- even though they have no access to anything else.

Common Attacks & Protections

Ok, with that quick review of attack surfaces and common adversaries, let's talk about the most likely attacks to happen in the real world that you might want protection against:

  1. Dragnet surveillance. Probably the most common fear is just of some undirected, omnipresent observer eavesdropping on all communications and taking notes for some unknown future purpose. I wager 90% of the desire for (and value of) end-to-end encryption is just to eliminate the anxiety that you might accidentally say something that could be misinterpreted and used against you at some future point by some some unknown party. Even the most social person needs the ability to retreat to a private space every once in a while, just to recharge out of public eye. Accordingly, I think by far the most important value of end-to-end encryption is just so online conversations can maintain the same natural privacy expectations we have enjoyed for millennia offline, by eliminating any central unencrypted chokepoint through which dragnet surveillance could be realistically installed.

  2. Targeted surveillance. Next, even though very few of us are interesting enough to be the subject of any kind of targeted surveillance, nobody likes the lurking suspicion that someone is secretly listening in. The next value of end-to-end encryption is to ensure that if someone does have a legal reason to review your historical communcations, they need to come directly to you to get it -- they can't do it secretly, without your explicit consent. This can be done by putting each user in full control of access to their communications history, such that any legal requests to access it must go through them. Similar to how protection against dragnet surveillance creates an instant anxiety relief even if no such surveillance is in place, the knowledge that nobody can access your history without going through you is a contiuous relief even if nobody ever asks to see it.

  3. Device confiscation. The next most worriesome attack is simply having your device confiscated. Whether or not you are pressured to unlock it, the possibilty that they have found some method to unlock your phone without you can be extremely alarming. To be clear, this has nothing to do with the presumption of guilt: as more and more conversations go online, the range of personal information contained on your device -- including not just chats but also personal photos -- creates an enormous risk of an invasion of privacy. Toward that end, the best protection against privacy invasion upon device confiscation is to use expiring messages, thereby limiting the potential for exposure.

  4. Device compromise. Though it's clearly alarming for someone to grab an unlocked phone out of your hand, it at least has the benefit of you knowing it happened. An even worse attack involves your device somehow being compromised without you realizing it. In Hollywood that seems to involve swapping out the SIM card when you aren't looking; in practice it's probably a more mundane issue of someone clicking on a malware link in some kind of phishing email. Either way, this is probably the most damaging attack because it conceivably allows your adversary to extract the decrypted communications directly from the device on an ongoing fashion. Unfortunately, there's really nothing encryption can do to prevent this because the attack happens after the decryption is finished.

  5. Key extraction. Though it's very unlikely to be dealing with an attacker this sophisticated, and this is nearly impossible to do without first confiscating or otherwise compromising the device itself, an attacker might somehow obtain a copy of the encryption key, enabling them to eavesdrop on the encrypted network communication and decrypt on an ongoing manner without needing ongoing access to the device. The best protection against this is to use a protocol that has two properties:

    • Forward Secrecy - This means that the information on the device can't be used to reveal all future conversations. The most common way to solve this is to frequently reset the encryption key, rendering any compromised key useless for future messags.
    • Backward Secrecy - As the name implies, if Forward Secrecy ensures the keys on the device can't be used to decrypt future messags, Backward Secrecy ensures that past messages (other those currently stored on the device, which are revealed) cannot be decrypted using any key on the device. So even if you were recording all the network traffic to a device for weeks or years ahead of time, if you finally get ahold of the device and extract its key, the key will be useless for all of those previous years of messages. This is most commonly done by combining the frequent key resetting of Forward Secrecy, with simply deleting the decryption key after it's used, rendering undecipherable any outstanding copies of the historical encrypted message.
  6. Password extraction. Given the difficulty of extracting and using your encryption keys (not to mention the limited value of doing so as a result of Forward/Backward Secrecy), another attack is to simply extract the username/password you use to sign into the service (along with perhaps any underlying key used to power a two-factor authentication app) to simply sign in as you at a future time. The best protection against this is to make it really obvious when a new device signs into your account (by notifying the other devices), as well as using a "paring" feature (where an existing device needs to actively authorize the new device) before revealing any sensitive data to the new device.

Summary of Encryption Primitives

Though there are a zillion important details being left out, here is a quick summary of the major tools used to build encrypted systems:

Symmetric (aka "Secret Key") Encryption

The simplest (though still insanely complex) form of encryption is called "symmetric" encryption, because the same key is used to both encrypt and decrypt. In general, all encryption works on "blocks" of data (eg, AES works on 16-byte blocks), meaning long messages need to be split up into many blocks, and small messages need to be "padded" to fill out the block. Symmetric encryption is extremely fast, in part because most modern chips have instructions specifically designed for it. It's also extremly secure: a block encrypted with a symmetric key is nearly impossible for an attacker to decrypt, within any reasonable timeframe (ie, longer than your lifetime).

That said, encrypting each block with the same key isn't enough to fully protect the message, because if there are two blocks with the same "plaintext" content, the encrytped "ciphertext" will be the same for those blocks. Even if the attacker won't know precisely what that block is, they'll know that there were multiple instances of the same block. Said another way, even if it's unclear what the message is, it might reveal that the same messages was sent several times in a row. Accordingly, symmetric encryption is paired with an encryption "mode" (seeded with an "initialization vector"), which generally combines the output of previously encrypted blocks with each new block, such that even the same block encrypted many times in a row will look random to an attacker.

Though not strictly a feature of symmetric encryption per se, it's becoming increasingly popular to "ratchet" your symmetric key over the course of communications using a "hash function", such that every message you send is encrypted with a different key. To break this down further, the purpose of a "one way hash function" (like sha) is to generate a "fingerprint" of some input that perfectly identifies it, without being able to "reverse" it and reproduce the input. So two different people who have the same content, can each independently hash it, and produce the same exact fingerprint. Then if you hash that fingerprint, you get another one -- and so on. This is a neat trick, because so long as two parties agree on the initial content (in this case, the content is a symmetric key), then they can agree on an infinite sequence of keys -- each generated from the one before. So if both parties agree to "ratchet" the key (ie, replace the symmetric key with its own hash) after each message, then they will both change the key every single message -- in a way that appears completely random to an outside party. Even better, if someone intercepts one of the keys somehow, because the hash function is "non-reversible", the attacker wouldn't be able to figure out any of the previous keys that came before it (though they would be able to "ratchet along" for future messages).

Symmetric encryption is also called "secret key" encryption because anyone who has the key can decrypt the content, so it's important to keep that key "secret". That is hard to do when two parties are communicating over a public channel that others can see, which is what brings us to...

Asymmetric (aka "Public Key") Encryption

To solve the "bootstrapping" problem of enabling two parties to agree upon a common symmetric key, a second kind of "asymmetric" encryption exists that uses a key split into two parts: the "public" and "private" half. As the name implies, the "public" half of the key can be safely shared in the open, and in fact is the most common way of cryptographically identifying yourself to the rest of the world. The "private" half, again as the name implies, must be kept secret, and the fact that you are the only one who knows it is what enables you to prove you are in fact the person who is identifying yourself with the corresponding public key. Though like anything there are a million different algorithms, the two main asymmetric encryption algorithms are:

  • RSA - A workhorse from the 70's, RSA is neat in that a message encrypted with one half of the key can only be decrypted with the other. So, this means anybody can securely send a message to someone over a public network by encrypted the message with the recipient's public key, confident that only the recipient has the private key that can decrypt it. On the other hand, this works in such a fashion where someone can demonstrate that they do in fact have the private key by "signing" a message using the private key. This is done by sharing a message like "My name is David", and then encrypting that message with the private key, enabling anybody to decrypt it with the public key and confirm that the message was in fact sent by the holder of the private key. RSA encrypts in blocks of 256 bytes, but in practice the maximum message size is 190 bytes due to the need to "pad" the message. RSA is much, much slower than symmetric encryption, and much more computationally expensive to create a key. Accordingly, in nearly all cases, RSA is not used to encrypt actual content (ie, by splitting it up into a bunch of 190-byte blocks and encrypting each separately). Rather, the sender would typically pick some other symmetric key (aka a "session key" as it's generally unique for each communication's session), encrypt that session key with RSA, then send the encrypted session key to the recipient. After the session has been "initiated" in this way, subsequent communications would just use the much, much faster symmetric encryption key.

  • Diffie Hellman (DH) - DH isn't just an alternative to RSA, it's actually a totally different thing. Though both RSA and DH has the same concept of public/private keys, DH doesn't let you encrypt with one key and decrypt with the other, like RSA. Rather, DH is used exclusively to enable two parties to independently derive the same session key. Basically, given two parties Alice and Bob, each of whom has publicly revealed their public key halves (Alice.pub and Bob.pub), and secretly stored their private keyhalves (Alice.priv and Bob.priv), DH allows Alice and Bob to each combine thier private key with the others' public key, to independently calculate the same shared session key. So it accomplishes the same real-world effect as RSA in that the end result is both parties have securely agreed upon a symmetric key to encrypt/decrypt the session content, but just goes about it a differnet way. So while DH can't encrypt arbitrary messages directly with the public key like RSA -- nor can it sign messages with the private key -- DH has a very significant advantage over RSA in that generating a new key is vastly less expensive, so it's much easier to "rotate" (ie, regenerate) a DH key than an RSA key. Indeed, DH with Curve25519 can use any 32-byte block of data as a DH key, meaning generating a new key is as simple as picking a random 32 bytes. This is particularly useful when doing a "DH ratchet", which is just like a symmetric key ratchet (explained above), but changing your DH key with every message. Ratcheting with RSA would likely be prohibitively expensive.

Problem

Alright, with all that background out of the way, what is the actual problem we are trying to solve? Security is never absolute, it is always a tradeoff, so as we think about Expensify's end-to-end encryption design, what specifically are we trying to protect again? I would suggest that the bulk of our users would prioritize being protected from the following, in order:

  1. Mass surveillance. Most people have nothing to hide and no particular "need" for secrecy. But when continuously observed we get anxious and behave differnetly. People just want to know that there is nobody casually and continuously looking over their shoulder, just for a general peace of mind.

  2. Your Boss. Expensify is a tool used both for personal and professional use -- and because we're not robots, we often have personal relationships in the workplace that need protecting. Expensify needs to guarantee that nobody can see your private chats except those who participated in them.

  3. Lawyers. As law-abiding citizens we have an obligation to participate in "legal holds" and "discovery", and to enable our employers to do the same. But we should put our users in control of their information by ensuring that the requests come directly to them, rather than people they don't know revealing thier private information to other people they don't know.

  4. The Cops. We expect that many of our users will be engaged in righting society's many wrongs, which might include protest -- and even civil disobediance. Thia could involve tangling with law enforcement -- possibly in the streets, possibly at border crossings -- who have an imperfect track record in honoring the full rights of citizens. Ideally the information they could obtain would be limited.

  5. Hackers. Expensify users would expect that every reasonable precaution will be taken to ensure that hackers of all kinds will not be able to access private information, whether business or personal.

That said, at risk of being controversial, I'd like to call out one threat that is "out of scope" for a v1 of our end-to-end encryption:

  • The Feds. A full protection against the united global efforts of the world's richest and most technologically advanced nations is... difficult. Indeed, I'd say it's basically impossible for any private company to realistically claim a serious defense against the multi-trillion dollar military/intelligence complex, and anyone claiming otherwise is simply delusional. More to the point, given that these nations have all the legal and physical authority to force anyone involved in this process to bypass their own security and lie to everyone about it, as a customer, nobody should trust anyone who claims they are providing this kind of impossible protection.

Solution Summary

The above sections outline the various kinds of attackers and attacks that we should pro_tect against, and hinted at how those protections might work in practice. Let's dig into each of those protections now:

  • Device keys. The first thing the client device does is generate an asymmetric key, which will be used to communciate with this specific device (of which a user might have multiple). Whenever the device's mailbox on the server is empty (which is most of the time), this key can be safely regenerated, with new public keys broadcast out to the other devices.

  • User authentication. Expensify uses a standard email/sms + password + optional (but highly recommended) two factor authentication scheme. The device's public key is uploaded with this authentication request, and on succesful authentication, is used to open a store-and-forward "mailbox" for other devices owned by the same user to send it messages.

  • New device pairing. In particular, the new device broadcasts to all the other devices "I have successfully authenticated, and my random pairing PIN is XXXX!" by encrypting this message to each other device's public key, and sending it to the other devices' mailboxes. Every device gets this message (either immediately, if online, or later when it comes online), and shows a "Did you sign in to the device showing PIN XXXX?" modal. Whether yes or no, that device broadcasts a message to all other devices saying "This device's pairing request was approved/denied", and they all hide the pairing modal.

  • Prekey initialization. The client takes a page from Signal's book and generates 100 "prekeys" and uploads their public keys to the server. The server will give each of these keys out as necessary, one at a time, whenever initiating a new session. The server notifies all devices whenever a prekey is used, and each device uploads a new one to replace it, such that there are always at least 100 prekeys for each user.

  • Prekey sharing. Becuase Expensify is built for multi-device operation, the private half of each prekey needs to be synchronized between all devices. Accordingly, when uploading the public prekey to the server, each device also encrypts the private prekey half with each other device's public key, and inserts it into that device's mailbox. This mailbox will deliver instantly if the client is online, otherwise it'll be held until teh client comes online again. In this way, all of a user's devices will have every prekey's private half, and thus be equally ready to process the new incoming session.

  • Session initialization. Every chat in Expensify is a room with 2 or more users -- there is no technical differentiation between "1:1" and "group" DMs. Furthemore, unlike Signal, all messages are recorded on the server forever, and delivered to clients "in order". In essence, each chat is a sequential list of messages, each of which can be optionally encrypted. (Some rooms have encryption disabled, to enable serverside searching and legal hold discovery for business purposes.) Accordingly, a "new chat" just means a new room has been created with two or more people in it -- and once created, it never goes away. So if Alice is creating a room with Bob and Carol:

    1. Alice's device 0 (Alice0) notifies the server it is creating a chat with Bob and Carol
    2. The server picks the next prekey for Alice, Bob, and Carol, and adds a message to the chat saying Alice, Bob, and Carol are using prekeys A, B, and C respectively -- all devices currently online receive this message (and all that are not online will receive it when they sign in)
    3. Alice0 sees it has no symmetric "room key" R for this room
    4. Alice0 picks a random symmetric "room key" R and creates a message M like Alice will be encrypting future messages to this room with room key R, until further notice.
    5. Alice0 encrypts message M once each for Bob's and Carol's public prekeys, as well as once again to her own prekey (such that her other devices will get it), combines it all into a single post signed by her private key A, and sends to the room
    6. Bob and Carol, along with all of Alice's other devices receive this message on all their devices, each:
      1. verify it's signed by A's public key
      2. decrypt using their own room's prekey private half
      3. record Alice's symmetric room key for future use
  • Session communications. Now Alice has an active room key known to Bob and Carol; all of Alice's future messages are just encrypted once using the symmetric room key, and all devices for all users can decrypt it.

  • Disappearing messages. If Alice would like disappearing messages (eg, after 1 week) then at some cadence (eg, daily):

    1. Alice0 picks a new room key and broadcasts it out just like before, but this time including along with the key a request to delete all messages encrypted with this key after 7 days -- along with the key itself
    2. All devices keep track of all historical room keys, and delete them at their configured expiration dates

Conclusion

This proposal to add end-to-end encryption is heavily influenced by Signal's design, but adapted for a different set of requirements, and hopefully simplified where possible as a result.

@quinthar
Copy link
Author

quinthar commented Jan 31, 2021

Note to self:

  • Maybe we can replace the "device mailbox" with just a private chat room? It's basically the same thing.
  • Rather than having all devices share prekeys, perhaps they all just generate their own separate set of prekeys -- and then each device independently joins every room, basically like a totally separate user. This way the client doesn't broadcast out its new room key in a way that someone who had cloned it would be able to decrypt. (Currently it says it encrypts with its own prekey in order to distribute to the other clients, but this would break forward secrecy.)
  • Actually if we eliminate prekey sharing, we don't really need the mailbox...
  • How does a user/device join an existing room and get access to past keys/messages?

@olabiniV2
Copy link

olabiniV2 commented Jan 31, 2021

To answer your two questions.

  1. The main reason is because Signal doesn't assume that the other people in a chat are friendly. In general, when creating keys, you want contributions from everyone involved, in order to minimize the risk that an adversary can control the key and cause bad results. This also protects against other potential issues, such as a bad RNG on one of the devices.
  2. It's important to remember that the Signal ratchet is not just using an incremental hash. It is also ratcheting using new DH exchanges. And since those DH exchanges uses new, fresh, randomness, even if an attacker has a snapshot of the device at one single point in time, the new randomness contribution from both sides, mean that they won't be able to actually decipher messages after a ratchet has happened. That's the reason why this property these days is usually known as "post-compromise security".

@olabiniV2
Copy link

olabiniV2 commented Jan 31, 2021

Expanding a little bit on question 2. Signal uses what is called a Double Ratchet. What does that mean? It means that it ratchets at two levels. One level is the DH-ratchet, which is what I described in the previous post. This ratchet provides for new randomness, and gives post-compromise security. The hash-ratchet, which rolls a hash forward for each message, and generates keys from this, is the main mechanism for providing forward secrecy in the same session. The basic idea is that each time you send or receive a message, you have the current hash, and can forward that, and then delete the old hash. In that way, an attacker that gets access to the device will not be able to get the previous keys - since the hash can only move forward.

@olabiniV2
Copy link

Also, expanding on question 1. There's another reason to use DH, instead of just sending a key - and that is also for the reason of forward secrecy. If someone collects all traffic, and then get access to the private keys for a person, they can get the symmetric key, and decrypt everything they've collected. If you use DH instead, this threat doesn't exist. In Signal, the long-term key is only used for verification of identity, nothing else.

@olabiniV2
Copy link

And of course, once the session has been initialized, that first session key, or root key, will be ratcheted away. In this way, there doesn't exist anything on the device that can decrypt earlier conversations.

@quinthar
Copy link
Author

Thank you @olabiniV2 for your comments!

In general, when creating keys, you want contributions from everyone involved, in order to minimize the risk that an adversary can control the key and cause bad results.

Can you elaborate on this? How does me incorporating data from you make my key more secure than just randomly picking it from scratch? I almost understand this: it seems that by deriving new keys from a combination of past keys + new randomness, it prevents some attacker who has compromised a past key from inserting themselves into the conversation... somehow. But I'm not quite adding up in my mind how it does that.

This also protects against other potential issues, such as a bad RNG on one of the devices.

Oh wow, very interesting!

One level is the DH-ratchet, which is what I described in the previous post. This ratchet provides for new randomness, and gives post-compromise security.

I agree the DH-ratchet does this: both sides of the conversation are constantly generating and sharing a new public key with the other party, and each time I receive your updated public key, I use DH to combine your public key and my private key to generate the new session key you will be using to encrypt new messages to me.

However, why use DH? Why not just have you generate a new symmetric key, encrypt it with my public key, and then just send it along with your next message such that I can decrypt it with my private key? Why send me a new public key and force me to use DH to generate your new symmetric key, rather than simply encrypting the symmetric key and giving it to me?

The best reason I can come up with is that it might be slightly more efficient to just send over your new public key and have me derive the new symmetric key, rather than sending over your new public key and ALSO an encrypted package containing the new symmetric key: if I have your public key, I can generate the symmetric key so the encrypted package is redundant.

But is that it? Or is there some fundamental cryptographic benefit that I'm not seeing?

There's another reason to use DH, instead of just sending a key - and that is also for the reason of forward secrecy. If someone collects all traffic, and then get access to the private keys for a person, they can get the symmetric key, and decrypt everything they've collected. If you use DH instead, this threat doesn't exist.

I must be misunderstanding (or have miscommunicated) because I don't think that's true. I think both of these are equally secure from a mathematical perspective:

  1. You randomly pick a new asymmetric key A, you send me your public key A.pub, I use DH to combine A.pub with my private key B.priv to generate a new symmetric key X
  2. You randomly pick new symmetric key X, you encrypt it with my public key B.pub, I decrypt it with my private key B.priv

Either way, you will have communicated a fresh new X to me in a secure fashion. I don't think DH offers any cryptographic or post-compromise security advantage when you already have a secure session in place. I think the genius of DH is enabling a new secure session to be created from scratch.

The hash-ratchet, which rolls a hash forward for each message, and generates keys from this, is the main mechanism for providing forward secrecy in the same session. The basic idea is that each time you send or receive a message, you have the current hash, and can forward that, and then delete the old hash. In that way, an attacker that gets access to the device will not be able to get the previous keys - since the hash can only move forward.

Ahh, that's interesting. I've been focusing on the hash ratchet being reproducible by an attacker who clones the device, who is able to "ratchet forward" with the hash, thereby getting all future keys (until the next DH ratchet). However you make a great point that if an attacker clones my device halfway between DH ratchet updates, the hash ratchet will ensure they don't have keys to the previous several messages.

That said... how long in real world terms between DH-ratchet increments? I've been thinking this would be a matter of seconds... but now I'm thinking about how asynchronicity is a big goal of the Signal design. So it's possible I actually send you a hundred messages while you are offline, over the span of many days, without getting any response (and thus, without the DH ratchet advancing). So the hash ratchet might actually protect hours or days of previous messages from being decrypted. Interesting!

Though this presumes that a user is using "disappearing" messages configured on a very short timeframe, otherwise those messages will just be hanging out on the device and would be compromised anyway, without all the hassle of decryption. But if someone is using (for example) 5-second disappearing messages, they would:

  1. Write the message
  2. Encrypt it with the current sending symmetric key
  3. Advance the sending symmetric key using the hash ratchet
  4. Queue it to the recipient's mailbox
  5. Wait 5 seconds
  6. Delete the key and message locally

The message could securely wait in the recipient's queue indefinitely, and only the recipient will be able to get the key (because only the recipient is able to advance the hash ratchet to generate the decryption key). Iiinnntteerrreesssstttinnggg..... Am I thinking about this the correct way?

Thanks again for your detailed response and for helping me understand these nuanced points!

@olabiniV2
Copy link

Hi,

Let me see if I can clarify some of the confusion here.

Before I jump in to respond to your specific points, let me mention that you're missing one adversary in your analysis - this is the peer you are conversing with. (I know you only specify group chats here, but let's simplify this to a 1-to-1 case). The peer themself can be hostile in various ways. Their device can be compromised, which gives the same effect. And finally, the peer can be compromised in some way - by blackmail, by a judicial order, etc. For all these reasons, the peer should be considered as a potential adversary as well.

Can you elaborate on this? How does me incorporating data from you make my key more secure than just randomly picking it from scratch? I almost understand this: it seems that by deriving new keys from a combination of past keys + new randomness, it prevents some attacker who has compromised a past key from inserting themselves into the conversation... somehow. But I'm not quite adding up in my mind how it does that.

Not exactly. This simply relates to the point I made above. You need to consider your conversation partner as a potential adversary. So, if Sita and Rama are talking, Sita should consider the possibility that Rama is hostile, and Rama should consider the possibility that Sita is hostile. This can include many different things, including the possibility that the other will send malware hidden inside some kind of regular document format, or try to use some kind of exploit against your machine. When it comes to the question of creating shared keys, it's considered recommended practice to always get contributions from both sides, so that neither side can control the key. If Sita could control the shared key completely, it might be possible for them to use that to attack the algorithm or implementation in some way. Because of this, and many other reasons, it's better to simply get contributions from both sides, something that DH does efficiently, cleanly and in a well understood way.

However, why use DH? Why not just have you generate a new symmetric key, encrypt it with my public key, and then just send it along with your next message such that I can decrypt it with my private key? Why send me a new public key and force me to use DH to generate your new symmetric key, rather than simply encrypting the symmetric key and giving it to me?

When you say "my public key" here, I assume you are referring to the long term public key for each participant? Or maybe you refer to the public key generated at the beginning of the session? If it's the latter, that key shouldn't exist anymore. We want both sides to delete these "session public/private key pairs", in order to get forward secrecy. Basically, the idea is that we immediately delete keys as soon as we don't need them. If you're talking about the long term keys, we have another problem, which is that this would break post-compromise security, at least from one side. Remember, the idea of post-compromise security is that new randomness should be injected into the session, in such a way that a one time compromise on one side or both sides can't continue being a threat. But if a one-time compromise has happened, that means that the long term private key has been stolen, and the current session keys, which means that the attacker could use this private key to decrypt the new symmetric key. Thus, there's no post-compromise security anymore. This is why it's extremely important that you use new asymmetric key-pairs every time you do the DH ratchet.

There's another reason to use DH, instead of just sending a key - and that is also for the reason of forward secrecy. If someone collects all traffic, and then get access to the private keys for a person, they can get the symmetric key, and decrypt everything they've collected. If you use DH instead, this threat doesn't exist.

I must be misunderstanding (or have miscommunicated) because I don't think that's true. I think both of these are equally secure from a mathematical perspective:

1. You randomly pick a new asymmetric key `A`, you send me your public key `A.pub`, I use DH to combine `A.pub` with my private key `B.priv` to generate a new symmetric key `X`

2. You randomly pick  new symmetric key `X`, you encrypt it with my public key `B.pub`, I decrypt it with my private key `B.priv`

Depending on what you mean with B.pub in this case, these are not necessarily equivalent, no - especially if you refer to the long term key for B. But yes, it is possible to have a scheme where you do something like this:

  1. A generates a new asymmetric key pair.
  2. A sends A.pub to B inside the secure channel.
  3. B generates a new symmetric key
  4. B encrypts the symmetric key with A.pub and sends it to A.

In this case, you do end up with a situation where the result should be as secure as using DH - if you ignore the point above about contributions from both sides. But compare this to the steps for DH:

  1. A generates a new asymmetric key pair.
  2. A sends A.pub to B inside the secure channel.
  3. B generates a new asymmetric key pair.
  4. B sends B.pub to A inside the secure channel.

You don't really get any benefit from B generating the symmetric key directly. There's the same amount of data sent. And by using DH, you do get the benefit of contributions from both sides. Another reason is from a perspective of aestethics. For these kinds of protocols, it's a good idea to have as much symmetry as possible between the two sides. It makes it significantly easier to analyze and understand what is going on. It also reduces code complexity, which is always an enemy for security. Of course, you sometimes have to break symmetry, but you should have a good reason to do it.

Either way, you will have communicated a fresh new X to me in a secure fashion. I don't think DH offers any cryptographic or post-compromise security advantage when you already have a secure session in place. I think the genius of DH is enabling a new secure session to be created from scratch.

Remember, the secure session doesn't actually help with confidentiality in the post-compromise case. It only helps with authentication and integrity.

Ahh, that's interesting. I've been focusing on the hash ratchet being reproducible by an attacker who clones the device, who is able to "ratchet forward" with the hash, thereby getting all future keys (until the next DH ratchet). However you make a great point that if an attacker clones my device halfway between DH ratchet updates, the hash ratchet will ensure they don't have keys to the previous several messages.

Exactly.

That said... how long in real world terms between DH-ratchet increments? I've been thinking this would be a matter of seconds... but now I'm thinking about how asynchronicity is a big goal of the Signal design. So it's possible I actually send you a hundred messages while you are offline, over the span of many days, without getting any response (and thus, without the DH ratchet advancing). So the hash ratchet might actually protect hours or days of previous messages from being decrypted. Interesting!

Yes, you're getting it. This kind of ratcheting model started with the OTR protocol, and the idea is that since ratchets only happen when messages are sent, there can be a long time between them. OTR implements heartbeat messages to make sure that ratchets move forward sometimes even without user-level messages, but yeah, the session can live for a long time, especially with Signal.

One thing to keep in mind with all these features, like deniability, forward secrecy and post-compromise security, is that they might not seem so important right now. But we are designing protocols that will be used for the next 5-10 years, or maybe even more. And it's likely that the attack scenarios will change moving forward. Security doesn't work retroactively, so we need to put these protections in place, before they become a serious problem. As one example, notice that the Lavabit case could have been a disaster, primarily because the server was not configured with TLS with forward secrecy, so someone getting hold of the RSA key would have allowed them to decrypt ALL traffic. TLS 1.3 has now made forward secrecy mandatory, for these kinds of reasons.

Another thing to keep in mind - the integrity and verification of the long term keys is very important. Without that, end-to-end encryption doesn't really do very much. The ultimate goal of end-to-end encryption is that you don't have to trust the server. But unless verification is possible, that doesn't really work. In your proposal you also talk about the possibility of just replacing the keys every week or so. I would strongly advice against that, for verification reasons.

Cheers

@quinthar
Copy link
Author

quinthar commented Feb 3, 2021

When it comes to the question of creating shared keys, it's considered recommended practice to always get contributions from both sides, so that neither side can control the key. If Sita could control the shared key completely, it might be possible for them to use that to attack the algorithm or implementation in some way.

If Sita were compromised, why does it matter if Sita "controls the key" -- Sita has it and can just decrypt everything, whether or not it's randomly picked or generated via DH.

Because of this, and many other reasons, it's better to simply get contributions from both sides, something that DH does efficiently, cleanly and in a well understood way.

Other than someone unwittingly having a compromised RNG, what are some of the "many other reasons" why it's better to generate a key using DH than just pick one randomly? Furthermore, if my RNG is compromised by an attacker (meaning, they know what random numbers its going to generate), then the attacker would be just as able to guess any random asymmetric key I generate as any symmetric key.

I understand the value of picking new keys and discarding the old ones, that makes sense. But the narrow question I'm trying to ask is, if we assume the same exact starting conditions:

  • A.pub/priv is a fresh new secure asymmetric key
  • B.pub/priv is a fresh new secure asymmetric key

What is the security difference between:

  1. Randomly picking a symmetric key:

    1. A generates a random symmetric key X
    2. A encrypts X with B.pub and sends B.pub(X) to B over a public, unencrypted channel
    3. B decrypts X = B.priv( B.pub(X) )
  2. Generating the symmetric key with DH:

    1. A sends A.pub to B over a public, unencrypted channel
    2. B sends B.pub to A over a public, unencrypted channel
    3. A calculates X = DH(A.priv, B.pub)
    4. B calculates X = DH(B.priv, A.pub)

I'm trying to keep this example very narrowly constrained, because I think X is equally secure in either case, and I'm struggling to understand why it wouldn't be. Thanks for your time educating me on this!

@olabiniV2
Copy link

When it comes to the question of creating shared keys, it's considered recommended practice to always get contributions from both sides, so that neither side can control the key. If Sita could control the shared key completely, it might be possible for them to use that to attack the algorithm or implementation in some way.

If Sita were compromised, why does it matter if Sita "controls the key" -- Sita has it and can just decrypt everything, whether or not it's randomly picked or generated via DH.

Well, first of all, the content is not the only thing to protect. If Sita has been compromised, that opens the door for Sita to mount other kinds of attacks on Rama, which aren't necessarily focused on just the content. For one thing, it could be attacks aimed at breaking the crypto-system for Rama, getting access to Rama's long-term private key. It could be other kinds of attacks, aimed at exploiting the implementation of the crypto-system, in order to compromise the device, and so on. It might be other types of attacks on the crypto-system, which would invalidate the security properties of the system, by invalidating deniability, or other properties.

Because of this, and many other reasons, it's better to simply get contributions from both sides, something that DH does efficiently, cleanly and in a well understood way.

Other than someone unwittingly having a compromised RNG, what are some of the "many other reasons" why it's better to generate a key using DH than just pick one randomly? Furthermore, if my RNG is compromised by an attacker (meaning, they know what random numbers its going to generate), then the attacker would be just as able to guess any random asymmetric key I generate as any symmetric key.

Yes, that is true, assuming that the RNG is completely broken. But that's not usually the case - usually the RNG is degraded in some way, which means that mixing in randomness from two sides might still make the situation good enough to survive for longer, than if you just put the randomness on one side.

Another reason why DH might be better is that it is interactive. This is related to the factor of key contribution, but slightly different. By using DH you are guaranteeing that the other party will not be able to pre-compute any kind of complicated answer that might be used for attacks.

Fundamentally, using DH in this setting is simply the conservative choice. It protects against many possible problems, and also gives us some comfort against problems we haven't thought about yet. In this kinds of situation, the real question is really why you would not choose it.

I understand the value of picking new keys and discarding the old ones, that makes sense. But the narrow question I'm trying to ask is, if we assume the same exact starting conditions:

* `A.pub/priv` is a fresh new secure asymmetric key

* `B.pub/priv` is a fresh new secure asymmetric key

What is the security difference between:

1. Randomly picking a symmetric key:
   
   1. `A` generates a random symmetric key `X`
   2. `A` encrypts `X` with `B.pub` and sends `B.pub(X)` to `B` _over a public, unencrypted channel_
   3. `B` decrypts `X = B.priv( B.pub(X) )`

2. Generating the symmetric key with DH:
   
   1. `A` sends `A.pub` to `B` _over a public, unencrypted channel_
   2. `B` sends `B.pub` to `A` _over a public, unencrypted channel_
   3. `A` calculates `X = DH(A.priv, B.pub)`
   4. `B` calculates `X = DH(B.priv, A.pub)`

I'm trying to keep this example very narrowly constrained, because I think X is equally secure in either case, and I'm struggling to understand why it wouldn't be. Thanks for your time educating me on this!

Well, if the public, unencrypted channel is not authenticated, it doesn't really matter. Both are completely insecure. But assuming you have an authenticated channel, then 2 is still more secure than 1, for the reasons I've outlined in this and previous posts. Also, remember that in 1, you are missing a step - B needs to send B.pub to A before the process starts.

@quinthar
Copy link
Author

quinthar commented Feb 4, 2021

Thanks @olabiniV2, I really appreciate your answers! However, I'm not sure I've really communicated my question effectively. I've tried writing it up in a lot more detail here: https://crypto.stackexchange.com/questions/87981/is-it-more-secure-to-use-diffie-hellman-to-generate-a-symmetric-aes-key-for-use Can you please answer there, or here, if you know? Again, thank you so much for your time!

@olabiniV2
Copy link

Hi, reading that post, you are bringing up several things that you haven't mentioned here, the most prominent one being the use of combinig RSA with CBC directly. First of all, that's not a great idea at all, so let's look at that.

In general, you don't want to encrypt content with RSA if you can avoid it. There are several reasons for that, including problems with padding, problems with encoding the data in a proper way (for RSA to function, you need to encode the content as a number, sometimes with constraints. This is easy to do with random data, but hard to do with content). There are other problems as well, including that encrypting non-random content can lead to problems with RSA that weakens the security. And then we have CBC on top of that. First of all, you should avoid using CBC if you can. It's not considered a particurarly safe mode these days. If you need to manage your own cipher modes, you should use GCM, CCM or another authenticated encryption mode - or, you need to implement integrity checking yourself, something which is not recommended, unless you know exactly what you're doing and what the effects are on your overall protocol. In general, CBC and other cipher modes are analyzed and proven in the specific cryptographic properties of symmetric algorithms. There's absolutely no guarantee that they will work the same with asymmetric ciphers. In particular, if you use the same key to encrypt two or more blocks of data that have some mathematical relationship (which obviously something using CBC has), you are setting yourself up for a very bad surprise. So, in summary, IF you absolutely HAVE to ignore the DH advice, you should still generate a symmetric key and use that with an existing symmetric algorithm and cipher mode - for example AES with GCM, or ChaCha20 with Poly1305, and so on.

Just to be clear here. No existing protocol I have ever heard of, EVER uses RSA for anything else than encrypting a symmetric key. These things are exceedingly complex, and there are good reasons for using the conservative choices. Protocol design is not easy. It takes a lot of experience, and a lot of knowledge of the kinds of areas where problems can happen. RSA is very error prone. There have many many problems with TLS that were caused by this, and TLS had a huge number of experienced people involved in its design.

When it comes to the original question of DH for deciding on a key, and using your RSA-encryption scheme, I feel I have sufficiently answered it already in the above comments. I'm starting to get the feeling that you don't really want to accept the reasons I've given, so I don't think there's any point in me continuing arguing that point here.

@quinthar
Copy link
Author

quinthar commented Feb 5, 2021

I'm starting to get the feeling that you don't really want to accept the reasons I've given, so I don't think there's any point in me continuing arguing that point here.

Ah! I'm sorry, I don't want to sound unappreciative or seem that I'm ignoring your advice. I'm just trying to understand why you are giving it. I'm not claiming you are wrong, I'm just not quite understanding in detail why you are right -- but I'm trying to.

No existing protocol I have ever heard of, EVER uses RSA for anything else than encrypting a symmetric key.

I agree, I haven't either, and I'm thankful for you helping me understand why. If RSA with OAEP is an extremely tried and true method of encrypting messages -- proven secure after decades of research against every conceivable attack -- why does the content of the message matter? If it's secure for delivering N back to back messages, why is it only secure when those messages contain symmetric keys, but not secure when they contain actual content?

Again, I'm not trying to ignore your advice -- I'm trying to laser focus on one extremely narrow question, but my lack of precision is creating a lot of noise in the conversation. I'm trying to understand the security property of the low level primitives themselves, but we keep getting hung up on the extraneous details.

Anyway, I think I've generally gotten what I was looking for out of this conversation, and again, I really appreciate your time. I've learned a lot from the conversation. Thank you!

@olabiniV2
Copy link

Ah! I'm sorry, I don't want to sound unappreciative or seem that I'm ignoring your advice. I'm just trying to understand why you are giving it. I'm not claiming you are wrong, I'm just not quite understanding in detail why you are right -- but I'm trying to.

Well, once again, I have given a substantial amount of the "why" for my reasons in the above answers. That's why it seems like you're not actually reading, or understanding it. You have to realize that the issue of creating cryptographic protocols is NOT a simple thing. It requires years and years of study and very detailed work. I can't write a full book on these subjects here - there are many references out there which you would have to read yourself. And the truth is in this kind of work, I have never worked by myself - I have always been supported by very experienced actual cryptographers that have made sure that I don't make any silly mistakes. I would never dare approach this kind of work without that kind of support environment. That would be extremely arrogant.

No existing protocol I have ever heard of, EVER uses RSA for anything else than encrypting a symmetric key.

I agree, I haven't either, and I'm thankful for you helping me understand why. If RSA with OAEP is an extremely tried and true method of encrypting messages -- proven secure after decades of research against every conceivable attack -- why does the content of the message matter? If it's secure for delivering N back to back messages, why is it only secure when those messages contain symmetric keys, but not secure when they contain actual content?

But it's NOT. RSA-OAEP in different variants is not used for encrypting messages. It's used for encrypting randomly generated keys or the output from hash functions. Not messages. And the thing is, there are many types of attacks that no-one has bothered doing, so you can't claim that RSA-OAEP has been subjected to research against every conceivable attack. Instead, the attacks that researchers actually go after are of two types. The first is finding ways of breaking the proofs or stated security properties. And the second is to find breaks against actually deployed software. But the problem is that since what you are describing as possibilities are SO FAR away from these two alternatives - I'm pretty sure the security proofs wouldn't cover it, AND, no one would deploy it in production, there hasn't been much research in that direction. That's another way of saying "here be dragons", and there's no real reason for anyone to even go in that direction.

Incidentally, you have talked a lot about RSA here. Do keep in mind that RSA is not the only asymmetric crypto system out there.

Again, I'm not trying to ignore your advice -- I'm trying to laser focus on one extremely narrow question, but my lack of precision is creating a lot of noise in the conversation. I'm trying to understand the security property of the low level primitives themselves, but we keep getting hung up on the extraneous details.

Those "extraneous" details are not. They are necessary, because the answers depend on those details. It's like asking "Is RSA secure?". It's a question that simply doesn't make any sense, because there are too many details left out. These are, once again, the kinds of details that you CAN'T ignore for the purpose of what you want to achieve.

@quinthar
Copy link
Author

quinthar commented Feb 6, 2021

Again, with total respect, I don't think you are giving tangible reasons why RSA cannot be used to encrypt each block of a stream (so long as each block uses OAEP, and performance isn't a consideration). You are saying that nobody does it, and because encryption is complicated, you shouldn't do anything that nobody else does. But that's just a general warning against the unknown, it's not an actual explanation of a problem. RSA isn't "for" encrypting symmetric keys (even if it's generally used for that), it's just math. It's for anything the math allows, and you haven't shown any reason why it's secure to encrypt a 190-byte payload with RSA so long as that payload contains a symmetric key, but not secure if that 190-byte payload contains something else.

Truly, I appreciate your time and feedback, and I don't want to sound ungrateful. But I'm looking for a specific explanation, not a generic warning.

@olabiniV2
Copy link

I'm not sure if you are actually noticing it, but from my perspective you seem to be engaging in the fallacy of "moving the goalposts" - you started with one question, but with every argument, you have changed the parameters for the question, until you have now reduced it to "why RSA cannot be used to encrypt each block of a stream (so long as each block uses OAEP, and performance isn't a consideration)".

It seems to me as if you have made a decision already, and are now trying to find confirmation that this decision is correct, instead of being open to changing your mind.

For that reason, I'm not going to even try to respond to your final question, especially since this thread already contains abundant reasons for why you shouldn't do what you are proposing. Instead, I'll tell you how you can go about answering this yourself. This will also be my final message to this thread.

Let's look at this from two perspectives. Are you asking the question from a practical perspective or from a theoretical perspective? If it's practical, meaning that you are planning on putting this in production unless I give you arguments to not do it, then the answer would go something like this:

When it comes to implementing cryptography and choosing how to put together protocols, the burden of proof HAS to be on the side of deviation from conservative choices. Meaning, you can't say something random and then expect others to prove why what you are proposing is dangerous. In cryptography we know from experience that deviating from the conservative choices is risky. Extremely risky. Which means that you have the responsibility of proving that what you are proposing is safe. It's like if you were an astrophysisist and said that "stars are made of cheese", and then expected other astrophysisists to give you evidence that you are incorrect - and if they don't, that is proof that "stars are made of cheese". It has to be other way around, where your hypothesis is something you should back up with proof that it's safe. Or to put it another way, in cryptography, our base assumption is always that something is unsafe. If you think that something should be safe, it's your responsibility to show that.

Now, using RSA in the way you are describing might be safe, for some definition of safe. But the history of cryptography shows us that there are several warnings signs in what you are proposing, including things like non-contribution from all parties, input with structure, and input with mathematically related structures between blocks. Even one of these things by itself should be enough to disqualify the design, unless you can show that it is safe.

Implementing cryptography, and creating cryptographic protocols, is a very hard discipline. Even if you make all the conservative choices, and create an extremely simple protocol, it's still hideously complicated and hard to get right, such that end users are actually protected. And adding more unknowns by making non-standard choices for no good reason, is simply a way of making the situation even worse.

OK, so, if you are not asking from a practical perspective, but from a theoretical standpoint - meaning, you want to know the answer, but you have no real plan to design a protocol in this way, the steps you should take are these:

  • Define the question in a well defined way, with all details necessary. This includes specifying exactly what security properties you expect your solution to have, and so on. This needs to be done in a semi-mathematical way, not in human language, saying things like "it should be safe against network observers". No, it should be something like, "it should be IND-CPA against any attacker with less than 120bits storage, 160bits computing and 70bits time advantage".
  • Read all papers about asymmetric cryptographic, starting with all the Diffie-Hellman papers, Merkle, RSA and forward. Look at all the cryptologic papers (the attacks) against them. Finally, read all the papers about possible countermeasures against attacks, things like OEAP, and so on
  • Create an actual "instantiation" of your protocol proposal, once again including all relevant security properties
  • Create a proof that shows that this protocol delivers these security properties

The thing is, no one else is likely to do this work for you, because there's no advantage for anyone to do it. As a community, we have better ways of achieving what you want to do already, at lower cost. RSA can be notoriously brittle, and it's used less and less these days. So, why would any cryptographer spend valuable time analyzing this question? We don't necessarily know that non-contribution is a problem for RSA or specific RSA implementations. But we know that it has been a problem in other circumstances, so why take the chance? And why even analyze it in a setting which no-one would use anyway?

One final point. You are simplifying or mis-representing my opinion by saying "...you haven't shown any reason why it's secure to encrypt a 190-byte payload with RSA so long as that payload contains a symmetric key, but not secure if that 190-byte payload contains something else.". My actual opinion is that it is less likely to be secure to encrypt input with structure, using RSA, than it is to encrypt data without structure. Further, my actual opinion going back to my first answers, is that neither of those two alternatives are likely to be secure enough for any kind of production deployment, especially not if you are comparing to the possibility of using a DH style scheme instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment