Skip to content

Instantly share code, notes, and snippets.

@naumenkogs
Last active November 1, 2024 13:28
Show Gist options
  • Select an option

  • Save naumenkogs/514ff3901960161a7b7ee08449840768 to your computer and use it in GitHub Desktop.

Select an option

Save naumenkogs/514ff3901960161a7b7ee08449840768 to your computer and use it in GitHub Desktop.
A discussion on Erlay issue: fanout child (so it arrives earlier than a parent being-reconciled)
This is taken from a conversation on #minisketch channel (log [here](https://gist.github.com/sipa/3a5aff7674eb4d89325d5dc9054972a7)).
I keep only those comments relevant for the topic.
15:44:53<@gmaxwell> So I think erlay as it exists in PR30116 is kinda broken, the right fix is not entirely obvious to me. In the PR around the code that mentions "parent+child" it's attempting to deal with the fact that the fanout is selected randomly, which in the presence of transaction dependencies could mean that a child could be fanned out before the parent and then be an orphan (which really should be avoided).
15:44:59<@gmaxwell> The code attempts to deal with this by always fanning out parents. Now, to begin with this is a gratitious loss of efficiency, but that isn't my concern: I say instead that this code just doesn't work right. The problem is that it works by looking at a transaction its processing and asking if it has children in the mempool. If it does, it fans it out. Now, this will work sometimes: when the
15:45:05<@gmaxwell> relay batching/delay logic delays consideration of the txn long enough that its children were accepted in the interm. But often-- usually I think, you'll have the parent long before you have its children. So this code will fail to fanout the parent, and may then later fanout the children.
15:45:32<@gmaxwell> I think handing dependencies was just something we totally didn't consider in the erlay paper, as it wasn't necessary to describe the concept and its benefits.
15:47:14<@gmaxwell> In any case, the obvious (?) fix to me is to instead of making the fanout decision independantly, instead track some earliest ancestor of each transaction and fanout to the same peer every transaction decended from that ancestor.
15:48:56<@gmaxwell> but it's clear to me that this isn't quite sufficient, because a child may merge multiple clusters.
15:50:19<@gmaxwell> I think at one point in erlays design we had an idea that whole peers would be fanout only, instead of spreading it around. This totally solves the issues, but I believe the random per transaction fanout was introduced to avoid hotspotting, where just a few peers end up carrying a lot of the fanout traffic?
15:51:49<@sipa> hmm, or was it dandelion-inspired?
15:52:31<@sipa> having time-dependent single fanout peer may solve it too?
15:52:47<@gmaxwell> dandelion-like would actually have you fanout to just specific peers I think? as it's preferable for attackers to see 1 or 0 of the traffic instead of 1/n?
15:52:48<@sipa> i guess that does result in the same funnel effect as dandelion
15:53:22<@gmaxwell> Now, since as written this PR keeps a bunch of state per transaction in the reconcillation set, it might be possible when you choose to fanout a transaction you check if any of its ancestors are still in the reconcillation set, and then go back and fan those out first. though I suspect that adds a lot of complexity to the fanout process! :P
15:54:44<@sipa> yeah, and it's hard to bound... you could have 8 different transactions tx_{1..8} which are fanned out to 8 distinct peers, and then one tx_9 that depends on all of them
15:56:47<@gmaxwell> Right, I'm trying to remember if there are reasons than hotspotting to not just have two peers or whatever that you don't use erlay with... and then do no fanout at all on the rest? as that would neatly solve it.
15:58:27<@gmaxwell> (one could imagine adapting the topology, you want the peers you initially learn the most txn from to be peers that fanout to you)
16:01:46<@gmaxwell> (though it might imply some extra protocol messages, as you
'd want to be able to tell a peer if you want them fanning out to you or supporting your reconcillation).
[.....]
18:00:36<@sipa> one possibility (i'm brainstorming, this may be broken) would be to not do anything special in the decision logic on where to fanout to, but if you happen to pick a peer for which one or more parents of the to-be-fanned-out tx are in the to-reconcile set, just fanout those dependencies too there (removing them from the recon set)?
18:01:11<@sipa> at worst this means the benefit of reconciliation is reduced to leaf transactions
18:01:32<@sipa> is there a privacy concern there?
18:04:01<@gmaxwell> sipa: yeah I think I suggested that above, assuming I understand you. when you pick a txn to fanout, scan the recon set for all its ancestors and fan those out too.
18:04:17<@sipa> ah right, i missed that
18:04:35<@sipa> i guess this can be combined with an O(n) loop over peers to preferentially pick fanout peers that have more de
pendencies fanned out as well already, but if that happens to be less than perfect, maybe that's not too bad
18:05:23<@sr_gi[m]1> gmaxwell: So the case you are describing is: A transaction is accepted to mempool, it has no children, so its added to the recon set. Later on, before reconciling, children are accepted to mempool and shared using fanout instead of erlay. Those would potentially be orphans to the receiving end, so the logic that is trying to prevent this by checking whether a transaction has children in the first place is mostly ineffective
18:06:09<@gmaxwell> sr_gi[m]1: exactly. I think it's a rare/exceptional case that the child is already in the mempool when you consider the parent for relay.
18:08:32<@sr_gi[m]1> So a solution could be extend the current logic with, if a transaction is added to the mempool and its parent in already in a recon set, add it to the recon set too, instead of fanning it out?
18:08:59<@gmaxwell> It also loses the erlay gain for those transactions, though that is a secondary concern to it not working right. :P It's not of no concern, the idea behind erlay is really to allow nodes to change bitcoin bandwidth usage to be more like O(transactions*constant) instead of O(transactions*peers), so as to allow nodes to have many more peers. If some constant percentage of transactions are part of
18:09:05<@gmaxwell> multitransaction clusters and can't be erlayed then it's back to O(txn*peers) again. :)
18:09:27<@sr_gi[m]1> There is also the complementary, remove the parent form the set and send it (potentially alongside the children), but I bet there are going to be weird cases where that won't work
18:09:31<@gmaxwell> sr_gi[m]1: yes but then some transactions would not get any fanout at all, which isn't ideal because it increases the size of the recon sets and is bad for latency.
18:09:39<@sipa> sr_gi[m]1: i think that risks having no fanout peer at all; an alternative is if you were going to fanout to a peer, but a dependency is already in the recon set, also fanout then dependency
[.....]
18:11:33<@gmaxwell> sr_gi[m]1: regarding no fanout at all: part of the idea of erlay is that with fanout ideally a peer should get 1 non-erlay announcement of almost every transaction before the recon happens. If they do, the recon set will be nearly empty. ... which makes everything very fast. :P
18:12:27<@gmaxwell> sipa: Why not the alternative of simply making some peers all fanout and the rest no fanout? (I don't mean to advocate for it, but it's the most efficient and simplest option)
18:14:39<@sipa> ah, i discarded that option for having the dandelion funnel problem, but it doesn't matter, because you don't need to hide this (in dandelion, the stem relay needs its own "stempool" per peer really for privacy + dos reasons, but they don't apply to erlay)
18:15:30<@gmaxwell> AH! yeah that isn't a problem.
18:16:25<@sipa> how frequently would you change the fanout peers?
18:16:43<@sr_gi[m]1> Funny enough, Martin already commented this out on the original PR
18:16:46<@sr_gi[m]1> https://github.com/bitcoin/bitcoin/pull/28765#discussion_r1401014445
18:18:09<@gmaxwell> I think what might be a problem is balancing the parties. Ideally you want really only one peer to fanout to you (maybe two) and you want to fanout to only two peers. If many parties fanout to you then you'd needlessly get multiple announcements.
18:18:56<@sipa> but even if everything is fanned out to two peers, you get the erlay scaling benefit
18:19:30<@sipa> (as in: the inv traffic does not go up linearly with the #connections per peer)
18:19:47<@gmaxwell> sipa: yeah its the other direction, what happens when 50 peers decide to make you a fanout peer from them to you?
18:20:07<@sipa> right
18:20:32<@sr_gi[m]1> gmaxwell: oh right, I get you. The goal is to minimize the amount of fanout to optimally one, given the rest of the txflood traffic is basically redundant
18:20:37<@sipa> but that problem exists even with just one fanout peer
18:20:59<@gmaxwell> sr_gi[m]1: yes, exactly, and not less than 1 because that hurts the erlay computational complexity/success rate.
18:21:55<@sipa> due to random network failures, spies/sinks/..., even a bit more than 1 may not be enough to get full fanout propagation
18:22:01<@gmaxwell> like you could do bitcoin relay with no fanout at all, but it would have poor latency and would burn a lot of cpu in the O(N^2) minisketch decoding. And it would be no more bandwidth efficient than having a in/out fanout of exactly 1.
18:23:05<@sipa> gmaxwell: having exactly one fanout peer has another problem: what do you do with new transactions received from that peer?
18:23:34<@gmaxwell> It's okay (and as sipa notes necessary) for it to be >1 in practice, so long as it doesn't go up with the number of peers we keep the O(txn*constant) communications complexity.
18:24:15<@gmaxwell> sipa: I think above I said exactly two, in part for that reason!
18:25:39<@sipa> also, you'll want to occasionally change the set of fanout peers, and when you do, the dependency issue reappear (but that could be rare, so just fanning out in-recon dependencies when that happens is probably fine)
18:27:51<@sipa> in fact, i think you can measure how well a peer functions as fanout peer... by observing how often they actually getdata the announced tx from you
18:28:08<@gmaxwell> sr_gi[m]1: re that set of comments on the older PR, good find. I think though the parties there are missing a point that I would have made: At least historically I was of the view (and I think I'd convinced others) that it's not good for txn relay to intentionally use the orphan pool. It's just that with some transaction race conditions it wasn't possible to remove the thing entirely. :P (now
18:28:14<@gmaxwell> people want to use it to approximate package relay, and okay it's fine for that too-- but it ought not be used where the sending peer has all the means to avoid it).
18:28:37<@gmaxwell> sipa: ah thats better than what I observed above which was backwards: you could pick fanout peers that you most often learned txn from.
18:29:09<@gmaxwell> sipa: your metric has some bad behaviors though, a peer that is ONLY connected to you will always getdata but is useless to the network as a fanout peer. :P
18:29:27<@sipa> gmaxwell: yes, and it's gameable
18:30:20<@gmaxwell> another alternative solution? only ever fanout depth 1 transactions, any children will be relayed by erlay only.
18:33:05<@sipa> gmaxwell: only recon for dependent transactions would probably significantly slow down propagation time for those, i don't think that's desirable
18:34:37<@gmaxwell> sipa: I don't recall what the envisioned recon schedule was, I seem to think it was something like you'd round robbin peers and begin a new recon once every couple seconds. So I don't recall how much the fanout is for erlay computational complexity vs latency.
18:35:58<@gmaxwell> but you're right that it would have a latency cost. though that could be reduced by doing recons more frequently. The opposite approach of fanning out all clusters >1 would have a bandwidth cost. If you reconed fast enough to reach the almost the same bandwidth what would the latency look like?
18:36:08<@sipa> yeah, i think that's it, with a constant (average?) recon frequency across all peers, but the per-peer recon frequency would scale with 1/#peers
18:36:45<@sipa> (mental note: that's a weak connectivity inference signal: it roughly reveals how many erlay-capable peers a peer of ours has)
18:37:00<@gmaxwell> sipa: my hesitance on the fanout the entire cluster is that it messes up the O(txn*constant) ... if all txn are in clusters you're back to O(txn*peers) again.
18:37:53<@sipa> gmaxwell: well if combined with fixed (within some period) fanout peers, the dependency degradation would only happen whenever you rotate fanout peers
18:38:32<@gmaxwell> hah one could hybridize: depth 1 transactions get current random fanout behavior. depth>1 get fanned out always to two specific peers and reconned to everyone else.
18:39:48<@gmaxwell> (the two specific peers getting fanout for all, to be clear)
18:40:19<@gmaxwell> or even just the 'fanout all the ancestors'.
18:41:29<@gmaxwell> So you would still have the potential problem of some unlucky peer getting 50 peers fanning out to them, but at least it would only be txn which are part of a non-singleton cluster.
18:46:25<@gmaxwell> a general reworking of fanout could do this: for every transaction you accept to your mempool you will select N (presumably 2) peers to fan it out to. When you do fan it out, you'll also fanout all its ancestors which are in the peers recon set. You can prioritize peers for fanout based on which have the fewest of those ancestors in their recon sets.
18:49:33<@gmaxwell> so in the case where there were 4 parents already fanned to different 4 peers and a child joins them, two of those 4 peers will get 4 transactions fanned to them, but it least it won't be some other peers that have to get 5.
18:53:36<@sipa> gmaxwell: is this with or without consistent (preferred) fanout peers?
19:00:54<@gmaxwell> without, in that suggestion.
19:01:44<@gmaxwell> in the case where there is just one depth1 parent, it would amount to a per cluster selection of fanout peers. If there are multiple, then there would be some bandwidth waste.
19:04:36<@sipa> (i feel like an echo device) why not use fixed fanout peers (you can still have the fanout dependencies from recon set rule, to deal with transitions)
20:12:19<@gmaxwell> I like fixed fanout peers. The concern I have with that is what happens when you're an unlucky node that 50 people have selected as their fanout peer?
20:13:10<@gmaxwell> One thought I had is that fanoutness could be negoiated and peers could have a maximum number, though that makes it harder to have it rotate.
--- Log opened Wed May 29 00:00:13 2024
15:52:15<@sr_gi[m]1> I discussed this offline with sipa and I like the approach of random (per transaction) fanout selection + removing al dependencies from the recon set of the selected peers in case a transaction is picked to be fanout to them
15:53:35<@sr_gi[m]1> It feels better to avoid the unlucky burst case gmaxwell was referring to
15:53:41<@sr_gi[m]1> Will update the PR with that
16:12:10<@sipa> sr_gi[m]1: this approach will require a map from txids (not wtxids!) to entries in the reconciliation map, to quickly look up parents of a transaction being/considered for fanout
16:13:12<@sipa> i suspect you'll perhaps want to use a boost multi_index, so you can have one data structure that stores everything at once
16:14:05<@sr_gi[m]1> Right, because you want to search based on a prevout, which is txid not wtxid
16:14:49<@sipa> with a multi_index, you could also have it contain an index based on shortid simultaneously, so you don't need to build that up at sketch send time
16:14:57<@sipa> *recon time
17:32:27<@gmaxwell> sipa: can the mempool just give you a list of ancestors?
17:44:50<@sr_gi[m]1> gmaxwell: what sipa is mentioning actually reminds me a lot about how the orphanage works this days, not sure if some of that can be re-used/copied
17:45:23<@gmaxwell> the orphanage though deals with txn that aren't in the mempool.
17:45:45<@gmaxwell> My recollection is fuzzy but I thought the mempool already tracked the ancestor set of each transaction.
17:46:23<@gmaxwell> if thats correct it would just require iterating the ancestor set of any flooded transaction to check each one to see if its on track to get reconciled.
--- Log opened Thu May 30 00:00:14 2024
09:36:53<@sr_gi[m]1> Oh I meant having to keep track of parents by txid but also transactions by wtxid. The multi-map approach was brought up in a conversation recently too
09:38:17<@sipa> mempool entries store ancestors (and post-cluster-mempool, it'll still be easy to compute the ancestor set of an entry)
09:39:16<@sipa> but what if a block has been found in between the relay of the parent and the child for example, and the parent is no longer in your own mempool, but as far as you know, the parent doesn't know about the parent yet?
09:39:34<@sipa> *the peer doesn't know about the parent yet
09:42:53<@gmaxwell> Existing relay logic checks that transactions are still in the mempool before relaying them. I think the issue there would go away if instead of keeping around some erlay datastructure you just keep growing the queue of the peers transactions to relay until its time to reconcile, enh? then the existing logic to check if things are still in the mempool is sufficient.
09:43:13<@gmaxwell> (the parent won't be in the mempool anymore so it won't get reconed)
09:53:57<@sipa> gmaxwell: well which transactions are observable to which peers is time-dependent (poisson timers) for privacy
09:58:18<@gmaxwell> yeah but you can still build up that queue on the timers but just don't send it. when you get to recon do the still in mempool sweep (you don't want things that got confirmed to get reconed anyways)
09:58:36<@sipa> hmm, that means we can't build up the sketch on the fly
09:58:56<@sipa> as you want to delay until recon time, at which point you need to check if a tx is still in the mempool?
09:59:37<@gmaxwell> I was imagining you wouldn't (as it currently doesn't) though you could remove anything from the sketch that gets excluded at that point.
09:59:45<@sipa> right, that works too
10:00:39<@gmaxwell> so its orthorgonal. I do think you want to just in time remove anything that has been confirmed or replaced.
10:02:51<@gmaxwell> (so as to not waste bandwidth transferring useless stuff)
[...]
12:40:32<@sr_gi[m]1> I'm revisiting the ancestors/descendants issue with Erlay. My understanding of this is as follows:...
We want to fanout transactions to at least one (realistically 2) peers to make Erlay as performant as possible. Under regular conditions, the fanout target is randomly picked in a per transaction manner (and the amount of targets is decided based on our connection count). There are two exceptions to this rule:
- If there is a short id collision, we fanout both. This is already in place with my changes up to friday
- If the transaction has children in the mempool, we fanout. This is Gleb's approach and it deals with out of order parent/children recenption. However, this sends to all Erlay peers*
- If the transaction has a parent in the mempool, we should fanout. This is what was discussed last week and it's the complementary (and most likely to happen) version of the previous point. The approach would be to find the peer* that has the least amount of ancestors and fanout all of them, plus the target transaction
* The amount of peers to target this could be the same for the two cases I think (something close to 2? Something also based on the connection count?)
13:24:59<@sipa> can parent after child even occur?
13:25:11<@sipa> they can be relayed out of order, but they can't be inserted into the mempool out of order
13:25:37<@sipa> and relay happens when inserting into the mempool, i thought
13:34:55<@instagibbs> if we ever supported same-txid-different-wtxid replacement it might, but we don't
13:36:59<@sipa> instagibbs: but even then, there would be no need to re-fan-out the children if the parent were replaced with a different witness version of itself
17:21:36<@gmaxwell> oh I didn't think the solution we discussed was to always fanout. I thought the solution we discussed was that if we were to fan out a child we go identify the ancestor set and fan that out too.
17:22:57<@gmaxwell> sipa: I thought the code as written only makes these decisions at the delayed times at which relay is considered for the peer, which does allow children to make it into the mempool by that point. Otherwise it couldn't happen.
17:43:56<@sipa> indeed, it seems you're right
17:44:02<@sipa> hmm, wouldn't it be better to make the decision where to fanout to at the time the transaction is accepted to the mempool (so before the poisson timer goes off)?
17:50:46<@sipa> that means the erlay timing decision (when to request reconciliation, and how frequently to permit responding) does become very privacy critical
17:54:03<@sipa> but it feels more natural, (1) because of the fact that it'll never deal with parents after their children and (2) the dissemination of a transaction to all peers at accept time is inherently an all-peers operations, while the sendmessage poisson times is already per-peer
17:57:33<@sipa> hmm, that does seem hard to be private with, as a node with many inbound connections must tolerate rather frequent reconciliation requests
18:19:33<@gmaxwell> I felt the same way but refrained from saying ot because of the privacy thing.
18:19:53<@gmaxwell> I also think it's outside the normative elements of the protocol, so it could be changed later.
18:21:21<@gmaxwell> In any case, if the logic is any time you would fan out a child fan out its ancestor set to the same peer(s) subsumes the step of faning out the parent if there are children at the time you consider it...
18:28:41<@sipa> i guess it's possible to make the decision to fanout at acceptance time, but only put the elements actually in the set when the poisson timer goes off?
18:29:06<@gmaxwell> oh true
18:36:46<@sipa> i guess you'd want two sets (to-fanout, and to-put-in-recon) per peer, and keep the invariant that if a transaction is in the fanout set, its ancestors cannot be in the recon set
18:37:07<@sipa> because these two sets will be wiped/iterated over at different points in time
--- Log opened Fri Jun 07 00:00:26 2024
14:15:52<@sr_gi[m]1> I just pushed a patch to Github with the following:
14:17:02<@sr_gi[m]1> Deal with in-mempool ancestors by fanning out to a subset of the reconciling peers (the ones that have the least pending ancestors to reconcile)
14:18:10<@sr_gi[m]1> Move the decision logic from the INV creation in SendMessages to RelayTransactions, and keep two different sets per peer, one of things to fanout and one of things to reconcile
14:18:59<@sr_gi[m]1> I'm going to move the PR to draft given there are some things that may need cleaning, but I'd appreciate some approach review if you have time to give it a look
[...]
12:41:50<@sipa> if we uniformly randomly pick 2 peers to inv out to (among those which haven't already announced it to us, and if there are dependencies just among those with minimum number of to-be-reconned ancestors)... does that actually result in a good coverage over all nodes, given the non-randomness of the connectivity graph?
12:43:58<@sipa> (we obviously don't know the actual graph, but it seems reasonable to assume that e.g. reachable nodes are far more connected than unreacable ones)
13:13:24< gmaxwell> it may be the case that it doesn't cover all unreachable nodes well, but they'll pickup the txn via retcon.
13:18:53<@sipa> it depends on what we think of as "well", i think
13:32:10< gmaxwell> yeah, I mean two peers means an exponential fanout. Gleb should have had simulations of this.
13:33:03<@sipa> it's fine if recon is used to resolve the occasional failed relay of course, that's the point; but using it at scale sounds like it could be inefficient (if two peers recon with you simultaneously, you'll consume recon capacity with both if they both have a transaction that you don't)
13:35:35< gmaxwell> from a bandwidth perspective it's not much more than the relay itself.
13:36:07<@sipa> fair point
[.......]
--- Log opened Sun Jun 30 00:00:58 2024
16:22:31<@sipa> one possible downside of the deciding at tx accept time rather than at tx inv time which peers to inv to, is that by being earlier, peers may be picked that in between relay the transaction to us
16:23:09<@sipa> which would result is ocasionally reducing the number of peers a node invs to
16:24:09<@sipa> but this may actually be an advantage, if compensated by increasing the total of invs a bit... in the sense that it'll result in peers early in the relay process inving a bit more, and peers later in the relay process a bit less
@naumenkogs
Copy link
Author

naumenkogs commented Nov 1, 2024

15:44:59<@gmaxwell>   The code attempts to deal with this by always fanning out parents.  Now, to begin with this is a gratitious loss of efficiency, but that isn't my concern:  I say instead that this code just doesn't work right.   The problem is that it works by looking at a transaction its processing and asking if it has children in the mempool. If it does, it fans it out.   Now, this will work sometimes: when the 
15:45:05<@gmaxwell> relay batching/delay logic delays consideration of the txn long enough that its children were accepted in the interm.  But often-- usually I think, you'll have the parent long before you have its children.  So this code will fail to fanout the parent, and may then later fanout the children.
  1. If we have parent is minutes before children, by the time we consider a child the parent is already announced (via either method). Meaning parent arrives before child.

  2. Now, if the parent is seconds before the children (we make a decision before we see a child), then yeah, my idea won't work in this specific window.

  3. And if it arrives sub-second before the child (we consider them in the same time), only then the idea works.

(2) could be solved by looking for the in-set parents. Then either reconcile both (add child to the sets), or flood both (remove parent from the set). I think current PR version does the latter, and consequentially achieves the former for the rest.

Then the initial idea of looking for children is not necessary at all.

The thread above then elaborates on the idea.


Well, another solution for (2) (and the problem in general) is to handle this upon GETDATA (order dependency delivery after sets are reconciled etc.), but that's probably super complicated (although perhaps is along the general package handling idea).

@naumenkogs
Copy link
Author

naumenkogs commented Nov 1, 2024

15:50:19<@gmaxwell> I think at one point in erlays design we had an idea that whole peers would be fanout only

This was more prone to blackholing/first-spy, and less performant (at least with the round-robin reconciliation strategy we had). That was in experiments too, a while ago though, and we can probably find some intuition behind it too. Something about evening out set differences?

18:38:32<@gmaxwell> hah one could hybridize: depth 1 transactions get current random fanout behavior. depth>1 get fanned out always to two specific peers and reconned to everyone else.

I might agree only because we don't have many 1p1c in general.

@naumenkogs
Copy link
Author

18:35:58<@gmaxwell> but you're right that it would have a latency cost. though that could be reduced by doing recons more frequently.   The opposite approach of fanning out all clusters >1 would have a bandwidth cost.  If you reconed fast enough to reach the almost the same bandwidth what would the latency look like?

I'm skeptical about more frequent reconciliations as what we currently have seems like a very sweet spot. Simply reducing the timer between peers makes sets smaller and hardly worthy the data exchange (simply by thinking 7 transactions per second and 8 peers reconciling every second).

The reconciliation strategy could be changed completely to get a speed up, but i can't find of a way.

@naumenkogs
Copy link
Author

naumenkogs commented Nov 1, 2024

16:22:31<@sipa> one possible downside of the deciding at tx accept time rather than at tx inv time which peers to inv to, is that by being earlier, peers may be picked that in between relay the transaction to us
16:23:09<@sipa> which would result is ocasionally reducing the number of peers a node invs to
16:24:09<@sipa> but this may actually be an advantage, if compensated by increasing the total of invs a bit... in the sense that it'll result in peers early in the relay process inving a bit more, and peers later in the relay process a bit less

This part I don't really understand, perhaps @sipa answers better if that's something important.


[Answer from @sipa]

The big question is around timing of when we decide who become the fanout peers: (a) at the time the incoming transaction is processed (once per tx) or (b) at the time the inv would be sent to the peer (once per tx*peer). I believe your old PR used (b), but correct me if I'm wrong.

In that conservation, I think we came to the conclusion that there are advantages to doing (a) instead. It makes reasoning about dependencies easier, and is just logistically simpler because you don't need caching logic (as the decision is just made once jointly for all peers at once, per transaction).

What I pointed out in the quote you're asking is about is that making the fanout decision earlier means that sometimes you'll pick a peer to fanout to, but before you actually send the INV out to it, it INVs (or erlays) that same transaction to you. Since the decision for fanout was made already for every peer, you can't go back and instead pick another peer to fanout to instead (because presumably the goal was having some specific number of fanouts, and so now you'll have fewer).

My belief now is that this is actually a good thing, that if you receive INVs from peers that you had already decided to fanout to, the number of fanouts decreases. The reasoning is that the "later" you are in the relay process (i.e., further away from the source in a topology sense), the fewer peers you want to fanout to. So maybe this compensates exactly right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment