Skip to content

Instantly share code, notes, and snippets.

@AdamISZ
Last active April 3, 2023 20:00
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save AdamISZ/51349418be08be22aa2b4b469e3be92f to your computer and use it in GitHub Desktop.
Save AdamISZ/51349418be08be22aa2b4b469e3be92f to your computer and use it in GitHub Desktop.
Lightweight anti-Sybil with anonymity in Bitcoin

RIDDLE

Due to unexpected failures of github's LaTeX parsing (which were not evident until I published this, but have persisted afterwards), and since the mathematical parts are important in this, I have migrated this proposal to a blog post with identical content, but correctly formatted equations.

Please continue to put any comments here.

@chris-belcher
Copy link

This is a cool idea.

I wonder if the UTXOs in the ring can be chosen with a fixed pseudorandom seed, so that multiple rings will always involve the same UTXOs, making set intersection attacks less viable.

@phyro
Copy link

phyro commented Jun 14, 2022

Very interesting. It's really cool seeing ideas develop in this direction. I was trying to solve a particular problem not that long ago and came up with what I believe to be a similar construction that uses the UTXO set for spam protection. It was with a slightly different blockchain design in mind (still utxo based), so I'll describe the TLDR here as it can be confusing reading the gist there. The idea is that a service S shares a public key with everyone and exposes 2 endpoints /claim and /consume. A user calls /claim with a triplet <UTXO, UTXO_sig, blinded_hash_to_curve> where UTXO_sig is a proof of ownership of UTXO and blinded_hash_to_curve is the step1 of blind DHKE as described here. The service verifies UTXO is in the utxo set and that the signature is valid and issues a blind signature that is returned in the response. Blind signatures serve as credits the user obtained which can be used by calling /consume with the unblinded signature - to prevent knowing which utxo consumed the credits. In my case, the service would issue storage tokens allowing to store 10kb of data for 3 days or so per credit consumed. It's essentially a Chaumian ecash service which issues tokens based on UTXO proofs. In my case, amounts were not involved in the scheme as an output itself is a proof of onchain fees being paid and was enough for the spam I was trying to solve - a spam attacker would need to pay onchain fees and would thus bring security to the network. This design obviously means each service would need to keep track of spent credits (similar to key images in your scheme). It however introduces potential timing attacks, but at least in theory, if you don't consume immediately after you claim, the anonymity set can be the set of all claimed tokens (zcash like). If amounts need to be taken into account, one could issue more blind signatures or even have pubkeys for different denominations. I had a brief discussion with @RubenSomsen on this and he mentioned a combination of ring-sigs and Chaumian ecash ideas could be possible which I think is also a potentially interesting direction to explore.

@AdamISZ
Copy link
Author

AdamISZ commented Jun 14, 2022

@chris-belcher

I wonder if the UTXOs in the ring can be chosen with a fixed pseudorandom seed, so that multiple rings will always involve the same UTXOs, making set intersection attacks less viable.

Interesting; I had a similar though very vague thought float through my head (some determinism in "decoy" choice) but didn't pursue it. Now I think about it again, I agree the logic for this is quite persuasive, probably the only disadvantage is that it's bringing complexity into a core protocol description (decoy choice) that would be nice to keep outside.

So I'd envisage it like: first filter on the existing utxos that satisfy the given criteria. Then seed the pseudorandom function that comes out of the setup of the protocol (hash(our utxo, service name) is a crude example of the top of my head), which somehow maps to an ordering and we choose the first $M$ from that.

(Of course it will change over time as they get spent, but that's fine. It still has the deterministic quality and we still get a big overlap, presumably).

This brings up a point that I hadn't considered much, which is a 'selection' or a 'filter' on the utxo set could be a computationally intensive thing, even with real-time access like we have on a node.

@AdamISZ
Copy link
Author

AdamISZ commented Jun 14, 2022

@phyro

Very interesting. It's really cool seeing ideas develop in this direction. I was trying to solve a particular problem not that long ago and came up with what I believe to be a similar construction that uses the UTXO set for spam protection. It was with a slightly different blockchain design in mind (still utxo based), so I'll describe the TLDR here as it can be confusing reading the gist there. The idea is that a service S shares a public key with everyone and exposes 2 endpoints /claim and /consume. A user calls /claim with a triplet <UTXO, UTXO_sig, blinded_hash_to_curve> where UTXO_sig is a proof of ownership of UTXO and blinded_hash_to_curve is the step1 of blind DHKE as described here. The service verifies UTXO is in the utxo set and that the signature is valid and issues a blind signature that is returned in the response. Blind signatures serve as credits the user obtained which can be used by calling /consume with the unblinded signature - to prevent knowing which utxo consumed the credits. In my case, the service would issue storage tokens allowing to store 10kb of data for 3 days or so per credit consumed. It's essentially a Chaumian ecash service which issues tokens based on UTXO proofs. In my case, amounts were not involved in the scheme as an output itself is a proof of onchain fees being paid and was enough for the spam I was trying to solve - a spam attacker would need to pay onchain fees and would thus bring security to the network. This design obviously means each service would need to keep track of spent credits (similar to key images in your scheme). It however introduces potential timing attacks, but at least in theory, if you don't consume immediately after you claim, the anonymity set can be the set of all claimed tokens (zcash like). If amounts need to be taken into account, one could issue more blind signatures or even have pubkeys for different denominations.

Thanks, that is a very valid point to raise here, i.e. why not use chaumian cash tokens instead? It's certainly something I've thought about in the past (indeed, it seems like such a good idea I'm rather confused it hasn't been done yet; a server/service could implement this themselves without buy-in from the outside world .. well, assuming they had client side software to do the blind signing. I guess you could argue that, technically, Wasabi has done this)

A few points of comparison spring to mind without really delving into it yet. To save time I will write CT for chaumian token (i.e. blind signing).

  1. The CT approach does expose the utxo being used, even if the initial connection is network anonymized (e.g. Tor) and even if there is a 'clean disconnect' between the registration and spending phase. This seems to me the obvious distinguishing property of the RIDDLE approach. So that would mean the CT approach would not be useful for the activist who doesn't want the government to know that they used service S, even though the government knows their utxos.
  2. The CT approach is probably simpler for the user (remembering that both variants require the client software to carry out at least some non-standard cryptographic operation) - but that's partly because we're not trying to obfuscate the utxo. They don't need utxo set access. It gives the user more convenience too, in that they can get tokens way in advance, but this also may run counter to the intention of this design, which is to prevent large scale Sybil (see comments about 'already spent utxos', in the final section .. whether that even matters might depend on whether service S needs to defend against very fast bursts, or just wants to keep usage reasonable over long timescales).
  3. Another example related to flexibility: Chaumian tokens could be sold, e.g. for a Lightning payment. I don't think something similar is available for the RIDDLE approach. So this should be an advantage of the CT approach.

I had a brief discussion with @RubenSomsen on this and he mentioned a combination of ring-sigs and Chaumian ecash ideas could be possible which I think is also a potentially interesting direction to explore.

Yes, this brings in point (1) above; it is perhaps possible to combine the ideas together, though I fear it's pretty complicated. The whole fields of "Chaumian tokens" and "anonymized credentials" (see the history behind recent Wabisabi ideas, e.g. Chase, Perrin, Zaverrucha stuff for Signal, I blogged about it here), well not that I really have thorough knowledge of them, I don't, but just generally it seems they're mostly focused on "identified user is able to do fancy stuff with the tickets/tokens we issue them with" but not "how to allow anything for unidentified user without opening massive Sybil hole". Maybe we can get that and keep some of the token advantages.

I agree with your idea that it's possible to consider such systems not caring about utxo amount, i.e. just using the transaction fee. It is a little weak like that, though.

@phyro
Copy link

phyro commented Jun 15, 2022

@AdamISZ, thanks for the comprehensive answer, you make some good points.

The CT approach does expose the utxo being used

Indeed, this seems to be the main difference. I'd say the main benefit of CT is that it has an ever growing anonymity set (assuming clean disconnect) while a ring is of a fixed size. There's something to say about ring-signatures and repeated usage. They work cleanly when you do it once, but repeated use leaves public patterns e.g. if you have a ring S = {O1, O2, ... O10} which signs with one in S, then if O2 is spent into a new output O7 which is again used in a new ring {O5, O3, O7, ...}, then if O7 is again spent into O9 and a ring appears {O1, O3, O9, ...}, it might make the probability of linking this chain of outputs O2 -> O7 -> O9 more likely in some cases. The anonymity sets that are actually achieved are rather non-obvious to me and seem to, in some cases at least, depend on how good the users are at not leaving obvious traces. But in general, the point you're making is a valid one, the case I was solving for had in mind that every output would claim CT which doesn't make any UTXO stand out. This is probably not the case in most other situations. On top of that, the pattern tracing example I gave may be reduced with a sufficiently big ring size.

The CT approach is probably simpler for the user... but this also may run counter to the intention of this design, which is to prevent large scale Sybil

Indeed, it's easier to collect many tokens this way. I think this makes a point to avoid using amounts as tokens in a CT design because otherwise, a rich output could claim tokens, create a new change output, claim again and repeat. On-chain storage is far more resilient in this case because it doesn't care about the amounts, you pay for onchain bytes which are scarce.

Another example related to flexibility: Chaumian tokens could be sold

Yes, there might even develop a market around these which could allow activists to purchase these tokens privately. In theory, a user could issue just-in-time ring signature as a token to some other user, but this seems less obvious. One issue with having a market of CT could be that it seems impossible to tell whether they've been spent because the nullifier set is available to the service.

I agree with your idea that it's possible to consider such systems not caring about utxo amount, i.e. just using the transaction fee. It is a little weak like that, though.

You're right, a burst could happen, but I'm not sure I'd call it weak because the scarcity is as protected as a cumulative sum of block storage if you give credit per output and ignore the amounts altogether. You're essentially getting credits per chain storage which is very limited and expensive. One way to prevent very high bursts would be to swap the public key with which the service does blind signatures frequently e.g. every month. This way, those that are still in the UTXO set could re-claim their UTXO credit for the new pubkey and they wouldn't lose their credits if they have claimed them before but not used yet. The difference in the amount vs non-amount credits essentially boils down to what should be the citizen in this model. Is a citizen a coin on the chain or an output? I slightly prefer the model where an output is a citizen due to direct correlation to on-chain bytes rather than how rich the output is, both are valid though. In either, a citizen (however defined) can then register themselves at the service's front desk to get a citizen token (or many) from the service which is valid for a month and needs to be re-claimed if it expires. I think there are a lot of use-cases with such systems, a simple and a fun one would be to get a free drink at a bitcoin conference based on the bitcoin citizen credits or similar as a thank you for using the service and providing a bit to the chain security.

P.S. the math isn't rendering properly for me on many parts of the document, probably some syntax errors to fix.

@sethforprivacy
Copy link

Very excited to see this progress, this seems like an excellent way to improve Bitcoin-based auth protocols with Sybil-resistance. Would love to see something like this implemented in Auth47 or lnurl-auth long-term!

Just FYI, these issues may be relevant if you decide to go the route of decoy binning or deterministic decoy selection, as this is a recent topic of discussion within the Monero community:

monero-project/research-lab#84
monero-project/research-lab#86
monero-project/research-lab#87

Enforcing a decoy selection algorithm would be ideal if at all possible, especially as you don't deal with re-org issues etc. in this proposal as it's not publishing transactions. That way authenticators would be bound to best-practices for both privacy and Sybil-resistance.

@chris-belcher
Copy link

chris-belcher commented Jun 15, 2022

@AdamISZ

probably the only disadvantage is that it's bringing complexity into a core protocol description (decoy choice) that would be nice to keep outside.

Unless I've misunderstood, I don't know why it has to be in the core protocol? The verifier can accept any ring of UTXOs, it's just the signer that is strongly recommended to pick the ring in a deterministic way. The protocol doesn't have to enforce anything.

(hash(our utxo, service name) is a crude example of the top of my head)

Make sure the seed has some kind of non-public information. With the example you came up with it, the verifier could try inserting every UTXO in the ring into the "our_utxo" value and see if the pseudorandom function gives exactly the whole ring. I'd suggest using the privkey as in hash(our_utxo_privkey, service name)

@AdamISZ
Copy link
Author

AdamISZ commented Jun 15, 2022

especially as you don't deal with re-org issues etc.

right, the whole fork issue (which always amused me because they're very reminiscent of the mechanics of the security reduction arguments used to guarantee soundness in crypto proofs .. rewind time and play back the same algorithm with the same starting state but inject fresh randomness, extract the secret information!) seems not to apply here, but thinking about it firms up the content of what @chris-belcher was saying and makes me rethink my response there. Look at it from two perspectives: if you are only allowed to use your utxo once (so counter value, as discussed in doc, is kept to always be 1), then double spend is your own fault as a user. But if we allow counters > 1 for more flexibility, as discussed, we need to ensure that there is no linkage between the rings chosen for counters 1,2,3.. etc. It's publically verifiable what counter value is being used (the verifier plugs in j=2 for example, for every key in the ring). So anyway the long and short of it is that I think the right approach is deterministic random generation of decoy/ring set based on public info, i.e. verifier policy: btc amount, utxo age and counter value.

Another way to say it, that the set of decoys is disjoint between different runs is only a problem if there is a linkage between two runs of the protocol. With counters we don't get that, only a "real" "double spend" gets that, which means same counter value, which we mustn't do.

Enforcing a decoy selection algorithm would be ideal

Enforcement is clearly a bit trickier here, but I agree. Not being on a consensus ledger makes it different. But the concept still applies since there's interaction. Verifiers individually can enforce a rule, and a protocol spec can clarify what the rule MUST be, which gets us most of the way to enforcement (if both sides, signer and verifier, choose not to apply it, then they're just doing their own thing, and we can't help).

monero-project/research-lab#84

Yeah this one looks like a very interesting read to consider some details. Some overlap, but some differences. I see that there is a deterministic random generated from key images of inputs (and other stuff). That makes sense there but not here; we are not chaining these ring sigs so we have no concept of 'input'.

@AdamISZ
Copy link
Author

AdamISZ commented Jun 15, 2022

@chris-belcher

Make sure the seed has some kind of non-public information. With the example you came up with it, the verifier could try inserting every UTXO in the ring into the "our_utxo" value and see if the pseudorandom function gives exactly the whole ring. I'd suggest using the privkey as in hash(our_utxo_privkey, service name)

Right! That one doesn't fly, oops :)

But it's interesting to contrast with the discussion above with @sethforprivacy . You might not want to use private information, because it can't be enforced (albeit, that is viable). If we based it on public information but not specific to the specific signer, then you get that enforcement potential: so example is : current blockheight and verifier policy (age, amount minimums) and signer chosen minimum size (which has to be bigger than verifier policy amount minimum) - that last one sort of implies the counter value.

The argument for trying to do the public-info way is to help prevent users footgunning. But it's a pretty good argument to do that, if it's possible.
If that seeds the choice from the set defined by the policy then it still has the properties we want, doesn't it?

@AdamISZ
Copy link
Author

AdamISZ commented Jun 19, 2022

If we based it on public information but not specific to the specific signer, then you get that enforcement potential: so example is : current blockheight and verifier policy (age, amount minimums) and signer chosen minimum size (which has to be bigger than verifier policy amount minimum)

@chris-belcher

Sorry, heh, managed to contradict myself in one sentence there :) Can't be (realistically) signer-chosen minimum size. I can't help wondering if the "current blockchain state" is OK then, as I vaguely suggested 'block height', it's a little inelegant for various reasons, not least being that of course there can easily be more than one signer at the same time, and time (even block height) is not globally synchronous.

Hmm, on second thoughts, maybe I take back what I said earlier about how the seeding works in the above Monero document. I think we could do that: we have precisely one key image being consumed in each RIDDLE. So seed the ring/decoy selection with exactly that, which is public information.

@AdamISZ
Copy link
Author

AdamISZ commented Jun 19, 2022

@phyro

Yes, there might even develop a market around these which could allow activists to purchase these tokens privately. In theory, a user could issue just-in-time ring signature as a token to some other user, but this seems less obvious. One issue with having a market of CT could be that it seems impossible to tell whether they've been spent because the nullifier set is available to the service.

I agree; I think this line of research makes a lot of sense. The only downside is, to have a market that really works for those kind of users, it needs to not be centralized and/or have really good privacy properties. It's tough to build markets like that, but it's definitely desirable.

I slightly prefer the model where an output is a citizen due to direct correlation to on-chain bytes rather than how rich the output is, both are valid though.

I certainly take that point of view on board. My sense is that you can't get around the fact that this is basically measuring wealth, in the sense that even if you correlate specifically to chain space, that still is specifically something that costs, and so the richer person can do more of it. The main idea of this RIDDLE system to me, is that you can make a large nonlinear cost to attack, versus a minimal to zero cost to use honestly. The closer we get to that model the happier I'd be.

P.S. the math isn't rendering properly for me on many parts of the document, probably some syntax errors to fix.

Yeah, usually that's the reason, but here the new github markdown latex parser was actually genuinely kind of losing it. It was perfect a day or two before then went crazy s.t. I had to write $\\ x $ instead of $x$ in a large number of places - no joke. Anyway I "fixed" it as soon as I saw it, it seems almost perfectly readable now.

@AdamISZ
Copy link
Author

AdamISZ commented Jun 22, 2022

Having spent some time reading it, I now think that Groth's 2014 construction as described in How to leak a secret and spend a coin would do the job of creating a logarithmic sized ring signature using only the ECDLP. Specifically, you can create a ring sig on $N$ keys of size roughly $32(7 log_{2}(N))$ bytes using that exact construction. Albeit, that is a specific construction using Pedersen commitments to a value of zero where the secret key is the randomness; I think that to translate it to our setting is likely trivial (for example, if you prove that 1 of $N$ of a set of commitments to zero, $C_i = 0H + r_{i}G$, then those commitments are exactly public keys as we use), but I haven't checked that.
That could mean that figures of 1k-20k+ keys could be more practical, albeit one has to consider the computational cost of signing and verifying too.

Afaik Monero research here into sublinear ring signatures was based on similar ideas, though I haven't read and understood that paper yet.

@AdamISZ
Copy link
Author

AdamISZ commented Jun 24, 2022

Following up on the previous, here's my attempt at a simple proof of concept in Python: https://gist.github.com/AdamISZ/77651979025d16b778494047c86c3a7c

Basically you have about 1.9kB for a ring size of 256 secp256k1 keys, and 2.3kB for 1024 keys. The latter takes about 10-13 seconds (and the former, maybe 3 seconds), but my algorithms are laughably bad and it's in Python, I'd expect it to be about 2 orders of magnitude faster if done properly, and in C :) (also this is for sign + verify together, and of course, ignores any potential batching benefit verify side).
The paper, obviously, discusses the performance scaling in some detail.

(*edited because had got byte sizes wrong)

Editing to note: to go from ring signatures to linkable ring signatures, I should note, there is a simple transformation as explained in the paper: commit, instead of to 0 with randomness r, to S with randomness r, and then form a ring signature over each key in the ring added to C(-S, 0).

The idea here is that S are effectively 'serial numbers' of the 'coins' or tokens represented by the user's key, and therefore are one time and function like key images.

@sambacha
Copy link

sambacha commented Jan 8, 2023

This is somewhat similar to Rate Limiting Nullifiers a zkSNARKs construct

@AdamISZ
Copy link
Author

AdamISZ commented Feb 11, 2023

Rate Limiting Nullifiers

Thanks @sambacha for that breadcrumb! Some pretty interesting ideas there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment