Skip to content

Instantly share code, notes, and snippets.

@phyro
Last active September 14, 2022 16:31
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save phyro/73a5db45dfaa29e96e5b4b0beb503957 to your computer and use it in GitHub Desktop.
Save phyro/73a5db45dfaa29e96e5b4b0beb503957 to your computer and use it in GitHub Desktop.

Nostr private DMs

Motivation

Consider Alice and Bob want to communicate with their public keys A and B. With NIP-04, Alice publishes an encrypted message for B, but tells everyone she (A) is communicating at time T with B. An ongoing NIP proposal improves on this by requiring relay authentication when asking for DMs. While this is an improvement, it still requires trusting the relay nodes that hold the events to not share these with some entity. It might even incentivize bad actors to spin up good relay nodes in an attempt to collect communication metadata which in this case is "A communicated with B at time T". This document tries to research how we could further improve the privacy of DMs by not relying on the honesty of relay nodes.

Privacy properties

We try to push the limits of what information is shared with parties outside of the two that are communicating. This is an attempt at a communication protocol where we define a new event type:

  • dm event - Direct message event holds an encrypted message, but also contains blinded pubkeys of both the sender and the receiver

This seems to leave us only with websocket connections and timing attacks that could be used to correlate messages both of which could likely be improved over time if really needed. An exception is the start of a conversation when the receiver pubkey is not blinded.

Events

first dm event

Alice wants to start a new conversation with Bob. She needs to create a dm event for Bob. To do this she first generates an ephemeral pubkey and computes the shared secret with Bob's key

k, K = gen_keypair()  # Alice's ephemeral key
shared_secret = k*B   # shared secret of K and B

Alice constructs a dm event where she encrypts her real pubkey A in the content field

kind: 1010,  # dm event
tags: [B],   # Bob's public key is included for the purpose of scanning
pubkey: K    # Alice's ephemeral key only used publicly once in chat-open event 
content: enc(shared_secret, A)  # Alice encrypts her pubkey with AES
sig: sign(event, K)

Bob obtains the dm event by fetching all events containing his pubkey in tags and decrypts the event content by computing the shared secret. This way he obtains Alice's real pubkey A which she secretly encoded

shared_secret = b*K
A = dec(shared_secret, content)

Now both Alice and Bob have exchanged public keys and the only pubkey that was not blinded was Bob's.

key derivation

To communicate in private, the two can generate a new key for every message based on their shared secret. The general derivation of Alice's next pubkey is

# b*A - we use shared secret to make it possible only for Alice or Bob to compute the new key
# K - used as a conversation identifier (Alice and Bob could start many separate conversations)
# A - receiver's pubkey - in this case Alice's
# n - receiver's message counter - in this case Alice's

# Bob computes a random scalar by calling a hash function with these params as an input
# and offsets Alice's key by this value
pubkey_an = H(b*A | K | A | n) * A

For instance the pubkeys for the first two messages from Bob to Alice would be

pubkey_a1 = H(b*A | K | A | 0) * A  # Alice's offset pubkey for the first message
pubkey_a2 = H(b*A | K | A | 1) * A  # Alice's offset pubkey for the second message

To send a "hello" message to Alice, Bob constructs the event

kind: 1010,         # dm event
tags: [pubkey_a1],  # Alice's computed offset key for the next (1st) msg
pubkey: K1,         # an ephemeral pubkey from Bob
content: enc(shared_secret, "hello")  # Bob encrypts the message with pk derived from the shared secret
sig: sign(event, K1)

Both keys present in the dm event are ephemeral keys (used only once) and the content is encrypted with a shared secret of these two keys. This way only Alice and Bob can read the content and only they can know the keys belong to them.

Fetching chat-msg events

Both Alice and Bob need to keep the message counter for their messages so they know:

  1. what pubkey to query for to see if there was a new message
  2. what pubkey to generate for the other party when sending them a new message

If Alice is expecting the first message from Bob, she can check if Bob sent her a message by querying events where the following pubkey was used in a tag

# Alice computes her expected pubkey for the first message from Bob
pubkey_a1 = H(a*B | K | A | 0) * A

and asks relays if there was a message with a tag pubkey_a1. She can send a message to Bob who, like Alice, queries if there was a new message from Alice by computing the pubkey he expects

pubkey_b1 = H(b*A | K | B | 0) * B

They can now blindly exchange messages by increasing the counter. The first message will leak the recipient pubkey and the next chat sessions would not leak any of the keys as there would be a new key for every message. If either party loses track of the counter, the other party can either fill them in or they can simply start a new conversation - it's possible to have multiple simultaneous conversations with a different history.

Ensuring an event was sent by X

To prove Bob sent an event and not Alice (they both can encrypt the content), the pubkey can be derived as Kn = H(b*A | K | A | n | n)*B where the second counter is simply used to avoid producing the same key twice. This way, Alice could prove it was Bob who signed and published the event to avoid any disputes.

Privacy improvements

First 'dm' event

Anyone seeing the first dm event

kind: 1010,
tags: [B],
pubkey: K
content: enc(shared_secret_hash, A)
sig: sign(event, K)

can tell that this is an attempt to start a conversation with Bob, but they can't tell with whom. We could push this even further by making users start random (fake) conversations. A fake dm event could encode the content such that it would encrypt "dm::fake". Bob could discard every dm event that encrypted such a message. Relays and other observers would not be able to tell fake from real dms.

Timing attacks

Given a dm event that Alice sends to the relay

kind: 1010,
tags: [pubkey_b1],
pubkey: K1,         # Alice's ephemeral pubkey
content: enc(shared_secret, msg)
sig: sign(event, K1)

the relay could group messages by the websocket connection because they keep the connection established. This could be mitigated by:

  1. the client could select random relay to send the message to
  2. the client could connect to a relay through a VPN to avoid leaking the IP
  3. if there was some sort of client->client communication, the clients could send events through Dandelion++ where the stem phase would be client->client bouncing and the fluff phase would mean we send it to a relay (we'd have to agree on the relays though)

Open questions

  1. This seems to improve privacy, but how does this impact the UX when we can have multiple distinct conversations with the same person?
  2. Would it make sense to update the counter every 5 messages rather than every message? This way we could query multiple messages in a batch rather than one by one
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment