Skip to content

Instantly share code, notes, and snippets.

@AdamISZ
Created December 27, 2018 18:48
Show Gist options
  • Star 12 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save AdamISZ/4551b947789d3216bacfcb7af25e029e to your computer and use it in GitHub Desktop.
Save AdamISZ/4551b947789d3216bacfcb7af25e029e to your computer and use it in GitHub Desktop.
Basic payjoin/p2ep protocol for Joinmarket wallets

Described here is a variant of what has previously been published under the name "P2EP" or Pay-to-endpoint, in which A pays B but B contributes utxos, i.e. it's a coinjoin-payment.

I'm using the term "payjoin" here to refer to using that idea, but not including a URI/endpoint specific to B, and not allowing (as a merchant would) arbitrary payments, which opens up certain problems around snooping attackers (more on this below). So payjoin just means "A pays B but B actively participates and passes across utxos as extra inputs".

I'll defer a more features-focused and non-tech friendly description of what this means to a later blogpost.

Here is just the step by step protocol interactions, both at the user level and under-the-hood.

Alice will be Sender, Bob will be Receiver. Note that the Sender is a Taker class in JM code, and the Receiver a Maker class. This is because they inherit the main functional difference that one waits to complete the tx, and the other initiates.

Bob starts by running

python receive-payjoin.py -m mixdepth <walletname> <amount-in-sats>.

After normal Joinmarket start up (wallet sync, connect to message channels, also: Bob's bot puts a fake offer into the joinmarket pit; he effectively runs a spoofed yieldgenerator, but doesn't respond (for now) to requests in private-message)), Bob gets a printout like:

2018-12-27 17:57:17+0100 [-]  Your receiving address is:  2NCiuh9iJ3s8vEQGUWAfeZHTwUVqE5gEdB5
2018-12-27 17:57:17+0100 [-]  You will receive amount:  50000000  satoshis.
2018-12-27 17:57:17+0100 [-]  The sender also needs to know your ephemeral nickname:  J57Pwz4XHyyGu9QB
2018-12-27 17:57:17+0100 [-]  This information has been stored in a file payjoin.txt; send it  to your counterparty when you are ready.

This information is passed out of band to Alice (i.e. address, amount, nick). Bob's bot now waits as long as needed until Alice turns up and completes the protocol.

Alice runs as follows:

python sendpayment.py -m 1  <alicewalletname> 50000000 2NCiuh9iJ3s8vEQGUWAfeZHTwUVqE5gEdB5 -T J57Pwz4XHyyGu9QB

... using, of course, the data passed out of band. What happens next will usually not require further intervention, and after some seconds Alice will broadcast a transaction paying net 0.5 btc to Bob, in which at least one of his utxos is included.

  • Protocol

Alice sends in plaintext !pubkey <Alice-pubkey>. Bob receives and in plaintext also sends !pubkey <Bob-pubkey> back. These are of course ephemeral ECDH pubkeys used just for this session of communication.

Both sides now set up the end-to-end encryption (the existing codebase handles this automatically; both sides have crypto-box libnacl objects; and !tx messages are enforced to only be transferred under encryption).

Notice that any Alice-attacker can do this step, but it's harmless because the next step, which is encrypted, authenticates before Bob has sent any identifying info

Now Alice constructs a kind of dummy-tx: It takes form:

Inputs: <Alice utxos selected to pay .5> Outputs: <2NCiuh9iJ3s8vEQGUWAfeZHTwUVqE5gEdB5: 50000000>, <Alice-change: remainder>

Alice doesn't bother to calculate a fee here (so it's zero); this was not really a transaction, just a proposal of inputs and change. We send it as a transaction, because, as noted in the code comments:

however for the purposes of code reuse and signalling
that we are the right counterparty, sending a transaction,
containing the outputs, works fine.

The meaning of "signalling" here should be fairly obvious: by passing a transaction with the correct destination and amount, we ~ prove that we are the intended sender. Note that this line of reasoning is a bit too weak for the general merchant case, because anonymous senders operating over automated protocols may use this to make repeated spend request and collect your utxos, then not pay, damaging your privacy. The previously linked blog post I believe covers this issue in some detail (it was the subject of considerable debate in the London Coinjoin meetup).

The intention here is that for an ad-hoc payment this is not really a problem. However, TODO: I will alter the code so that Alice actually sends a signed version, she loses nothing this way and Bob can broadcast it if something goes wrong Alice-side.

So after Bob receives this !tx encrypted transaction message, he decrypts, and deserializes the transaction, checks that it conforms in destination, amount, and bitcoin balance, checks that the inputs are valid, then appends his own inputs. He selects coins to fulfil the same amount as the destination (this is a heuristic that could easily change if someone has a better idea; use twice the amount? half? etc.). If he fails to select that much he tries half as much, and repeats the halving three times before giving up, i.e. at the very least he adds 1/8 the amount of coins as he is receiving. Then he of course alters the output amount at the destination address to account for the coins that he's added. Finally, he also alters the change output amount proposed by Alice to account for the bitcoin network fee that he calculates. If it's dusty, he drops it. Finally if all checks pass he signs each input he himself has added (usually 1), and reserializes the now partially signed transaction. He then sends another !tx message in return to Alice, containing this partially signed transaction.

Alice receives, decrypts and deserializes, and does the expected checks: Checks the inputs that are signed, verify correctly. Checks that her utxos are included. Checks that the destination receives net the originally intended amount. Calculates the bitcoin fee and checks that it's between 0.3 and 3x the fee that her wallet estimates for current network conditions. If not, the user is prompted to decide. If all checks pass, signs all her inputs, and broadcasts the payment.

In short: Alice is a P2EPTaker and runs sendpayment. Bob is a P2EPMaker and runs receive-payjoin. They both use standard Joinmarket wallets. Alice connects and spoofs Taker behaviour, Bob connects (first) and spoofs Maker behaviour. Bob sends destn, amt and nick out of band. Alice starts with !pubkey, Bob responds the same, both sides set up encryption, Alice calculates and sends !tx encrypted proposal, Bob adds his input(s), signs and sends back !tx, Alice checks and if OK co-signs and broadcasts.

@AdamISZ
Copy link
Author

AdamISZ commented Jan 5, 2019

I think what I have in the code now is probably OK; I try to be !UIH2 on maker side once he sees the set-up, but I still go ahead and do a coinjoin even if that's not possible. In other words, in line with what I said in my previous two comments in this thread.

This data is very helpful, thanks, and I believe it supports that approach; there are an appreciable fraction (1/3) but still minority of 2-output txs that violate UIH2, so if we violate it 20-50% of the time, that should be fine.

I guess there's not much point in looking at UIH1 at all, and currently I don't (I wasn't entirely clear @LaurentMT from the above, but is that also what you're saying?).

Of course we can always be more sophisticated in future.

I'll also note that I've tried to mimic Core locktime and rbf approach (I have not yet put in a 5-10% opt-in rbf but that could easily be done).

@LaurentMT
Copy link

LaurentMT commented Jan 13, 2019

@AdamISZ Good catch! Fixed it in my comment.

WRT UIH1, the cause of the 50/50 is still not clear to me. Among the potential causes:

  • txs spending all utxos associated to a same address => identifiable fingerprint
  • rbf'd txs (FSS RBF) => identifiable "fingerprint" (for network eavesdroppers)

But I doubt that these 2 causes alone explain the 50/50 ratio.

@NicolasDorier
Copy link

There is 3 parameters to separately copy and paste (50000000,2NCiuh9iJ3s8vEQGUWAfeZHTwUVqE5gEdB5, J57Pwz4XHyyGu9QB)

It would be cool if you could somehow combine this in one string so people do not screw up (like you screwed up yourself in your video during copy pasting) :)

@AdamISZ
Copy link
Author

AdamISZ commented Jan 15, 2019

@NicolasDorier

Agreed; in fact, @chris-belcher made the same comment at the start of this thread. Only question would be encoding; perhaps bech32 similar to BOLT11 is the way to go, else something like existing stuff (BIP21 etc?)

On a more general point: there is obviously no attempt here to develop a standardized protocol. This was more a rough-and-ready "what we can easily do in Joinmarket given its existing infrastructure for inter-client communication". AFAIK the existing standardisation proposal is BIP79 by R Havar. I'd like to comment on that proposal in some detail, but only after I've finished implementing this implementation. It obviously wouldn't be a simple matter for all, or even several, wallets to implement a common standard, not to mention, there are a number of tricky details to consider in the more general merchant use case, which as I said at the start here, I'm deliberately not trying to address in this version. BIP79 discusses some of them, as of course did the blockstream blog post which is linked at the start of the gist.

@NicolasDorier
Copy link

NicolasDorier commented Jan 17, 2019

@AdamISZ I think as long as it is easy to copy/paste the protocol itself does not matter too much. (no separator like - for example, because double clicking on the string would not select everything)
Better standard can emerge once it starts to be actually used.

@AdamISZ
Copy link
Author

AdamISZ commented Jan 21, 2019

I realise this discussion is kinda "done" now, but I would really appreciate if anyone could actually review the algo/thinking in this comment above. Especially since it's ... kinda running in "production" :)

@shesek
Copy link

shesek commented Mar 2, 2019

"UIH1" : one output is smaller than any input. This heuristically implies that that output is not a payment, and must therefore be a change output.

"UIH2": one input is larger than any output. This heuristically implies that no output is a payment, or, to say it better, it implies that this is not a normal wallet-created payment, it's something strange/exotic.

I'm a bit confused by these definitions. Aren't they only true for transactions of exactly 2 inputs?

For example, this transaction doesn't seem covered by any of the definitions, but it is a UIH2.

This transaction is covered by UIH1's definition, but is actually a UIH2.

I think a more accurate definition for UIH (1+2) could perhaps be something like "one output is smaller-or-eq the sum of inputs excluding the smallest input", or in other words "we can do without the smallest input and still have enough to fund the smaller output", or sum(in)-min(in) >= min(out).

The definition for UIH2 could be "the sum of inputs excluding the smallest input is larger-or-eq than any output", or in other words "we can do without the smallest input and still have enough to fund the larger output", or sum(in)-min(in) >= max(out).

The definition for UIH1 would then be anything that matches UIH but not UIH2, i.e. min(out) <= sum(in)-min(in) < max(out)

If we also take fees into consideration, this becomes sum(in)-min(in) >= min(out)+fee for UIH1+2 and sum(in)-min(in) >= max(out)+fee for UIH2. (of course, this is just a rough estimation, because adding inputs/outputs effects the size which effects the fee, which isn't taken into consideration here. but I think it could perhaps still provide more accurate results than ignoring the fee entirely?)

Note that these definition only work with transactions of two outputs. I think it could be extended to transactions with more outputs, by changing the UIH1+2 definition from min(out) to sum(out)-max(out) and the UIH2 definition from max(out) to sum(out)-min(out). But someone should double check that :)

@DanGould
Copy link

DanGould commented Feb 5, 2023

The Unnecessary Input Heuristics & PayJoin Transactions paper extends the discussion we're having here.

This paper investigates the resemblance of PayJoin transactions to ordinary payment transactions by examining the significance of the unnecessary input heuristic in transactions with more than one input and exactly two outputs which is the common template of recent PayJoin transactions.

It lists specific definitions of UIH1 & UIH2 where they're used in real block explorers and transaction statistics relevant to the practical privacy benefits of payjoin.

@AdamISZ
Copy link
Author

AdamISZ commented Feb 5, 2023

The Unnecessary Input Heuristics & PayJoin Transactions paper extends the discussion we're having here.

This paper investigates the resemblance of PayJoin transactions to ordinary payment transactions by examining the significance of the unnecessary input heuristic in transactions with more than one input and exactly two outputs which is the common template of recent PayJoin transactions.

It lists specific definitions of UIH1 & UIH2 where they're used in real block explorers and transaction statistics relevant to the practical privacy benefits of payjoin.

Thanks! This whole area definitely needed some deeper thinking so, nice to see that someone's doing that!

@DanGould
Copy link

DanGould commented Feb 5, 2023

The biggest takeaway I get from the paper is that @AdamISZ's definition of UIH2 would not miss UIH2 identification if it were defined as "A subset sum of inputs is larger than any output" rather than "One input is larger than any output", as put to practice in btcpay. I think this is a summary of @shesek's most recent comment.

@NicolasDorier
Copy link

NicolasDorier commented Feb 6, 2023

Reading the paper, but just dropping here: The coin selection of BTCPay, and actually anything which rely on NBitcoin should add unnecessary outputs without Payjoin.

The coin selection of NBitcoin is the same as old coin selection of Bitcoin core, where if you pay amount X, it tries to cover X + 0.1 BTC instead for consolidation purposes. I am unsure about other implementations.

Regardless, it is interesting to improve the coin selection algo of payjoin.

@NicolasDorier
Copy link

It seems quite a low hanging fruit, so I'll do it.

btcpayserver/btcpayserver#4599

@NicolasDorier
Copy link

@NicolasDorier
Copy link

NicolasDorier commented Feb 6, 2023

Actually it seems a bit different.

EDIT: I will now comment in the discussion on btcpayserver, I believe the heuristic avoidance we had in BTCPay Server wasn't working properly.

@NicolasDorier
Copy link

Ok will roll it out for next btcpay release: btcpayserver/btcpayserver#4600

@DanGould
Copy link

DanGould commented Feb 16, 2023

I think I found a way to split outputs to eliminate even the subset-sum UIH1 by creating multiple subsets of the same value. Here's an example:

Payjoin A, a real Payjoin

before output splitting:

0.02292119 btc ---> 0.01172121 btc
0.01395977 btc      0.025 btc

Payjoin A', avoiding UIH

The larger of the two outputs is split to produce another of the difference between them. (That's knapsack mixing)

0.02292119 btc ---> 0.01172121 btc
0.01395977 btc      0.01172121 btc
                    0.01327879 btc

The resulting transaction no longer conforms to unnecessary input heuristic. Obviously two outputs of identical amounts is itself a different heuristic. I imagine that could be mitigated somewhat during fee subtraction. Since bip78 payjoin is done in one interaction only the receiver may choose to split in the case that payment amount + receiver contribution > sender change amount. The sender seems to be unable to know whether they could split or not after receiving the signed payjoin proposal which they can't alter.

@shesek
Copy link

shesek commented Feb 18, 2023

@DanGould A transaction having >2 outputs is itself an indication that it was not a standard payment, possibly a stronger one compared to UIH

@DanGould
Copy link

DanGould commented Mar 1, 2023

@shesek Yes, >2 outputs removes some of the covert nature of payjoin. The question that follows is which is a privacy property should be prioritized: Concealing the fact that a sophisticated piece of software constructed the tx? Or the identity of which output received payment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment