amitiuttarwar/rebroadcast filters

## rebroadcast filters
Reducing noise in the rebroadcast set:

I think there are two ways these filters reduce the rebroadcast set

1. Caching min fee rate by itself —> mempool is emptying out, blocks are being mined & txn fees are decreasing. Time to rebroadcast, set is calculated & cache is applied.

Time 1: caching job runs, top block computed to include a, b, c
Mempool looks like: j i h g f e d | c b a

Time 2-4: txns come in
Mempool looks like: j i h g f Z e d | c Y b a X

Time 5: block comes in
Mempool looks like: j i h g f Z e d | c

Time 8: time to rebroadcast. Set calculated to be Z e d c. Min fee rate applied and only txn c is rebroadcast.

Some observations about this mechanism & potential scenarios:
- if a block doesn’t come in between job & rebroadcast, the cache doesn’t filter any txns out.

- In this order of events (cache, block, rebroadcast)… the cache would filter out all txns if there were no new transactions > cached fee rate between the job & time of rebroadcast.

- if multiple blocks come in between the job & rebroadcast, the cache would filter out more txns. However, I think the txns that were truly missing & should be rebroadcast would still be picked up into the set. And more blocks is a stronger guarantee that the txns rebroadcasted should have been mined by now.

- between the caching job & time of rebroadcast, the size of txns rebroadcast will be *at most* (size of new txns above cached-min-fee-rate) - (4M * num blocks).

- interplay with recency filter is that the new incoming txns would have to displace the existing txns from being picked up into the block. Which is why I said *at most* in the previous point.


2. Recency + min fee rate work together —> txn fees are increasing.

Time 1: caching job runs, top block computed to include a, b, c
Mempool looks like: j i h g f e d | c b a

Time 2-4: new txns come in
Mempool looks like: j i h g f e d | c X b Y Z a

Time 5: caching job runs, top block computed to include a, Z, Y
Mempool looks like: j i h g f e d c X b | Y Z a

Time 6: time to rebroadcast. Even though top 3M of txns would be Y Z a, the implementation means the initial set comes out to be a, b, c. Applying the filter of cached min fee rate would then eliminate Y Z, leaving a to be rebroadcast.

Something to note is in this second example, the cache is actually reducing the set of txns even though a new block wasn’t mined. But this is only true if the time between Time 1 and Time 6 <= 30 min (so X Y Z considered recent & filtered out in final step).

Something else to note is that if the caching job did not run at time 5, then at time 6, Y & Z would not have been filtered out. So, in this scenario, more frequent runs are better filters. More specifically, the cache should be updated < 30 minutes before rebroadcast occurs.


So, putting together #1 & #2, my thoughts are…
1. Don’t rebroadcast unless there has been 1 or more block since cache was updated.
2. Try to keep time between last cache run & rebroadcast < 30 min.

So I believe the optimal frequency logic would be a balance between these constraints.
	Reducing noise in the rebroadcast set:

	I think there are two ways these filters reduce the rebroadcast set

	1. Caching min fee rate by itself —> mempool is emptying out, blocks are being mined & txn fees are decreasing. Time to rebroadcast, set is calculated & cache is applied.

	Time 1: caching job runs, top block computed to include a, b, c
	Mempool looks like: j i h g f e d \| c b a

	Time 2-4: txns come in
	Mempool looks like: j i h g f Z e d \| c Y b a X

	Time 5: block comes in
	Mempool looks like: j i h g f Z e d \| c

	Time 8: time to rebroadcast. Set calculated to be Z e d c. Min fee rate applied and only txn c is rebroadcast.

	Some observations about this mechanism & potential scenarios:
	- if a block doesn’t come in between job & rebroadcast, the cache doesn’t filter any txns out.

	- In this order of events (cache, block, rebroadcast)… the cache would filter out all txns if there were no new transactions > cached fee rate between the job & time of rebroadcast.

	- if multiple blocks come in between the job & rebroadcast, the cache would filter out more txns. However, I think the txns that were truly missing & should be rebroadcast would still be picked up into the set. And more blocks is a stronger guarantee that the txns rebroadcasted should have been mined by now.

	- between the caching job & time of rebroadcast, the size of txns rebroadcast will be at most (size of new txns above cached-min-fee-rate) - (4M * num blocks).

	- interplay with recency filter is that the new incoming txns would have to displace the existing txns from being picked up into the block. Which is why I said at most in the previous point.


	2. Recency + min fee rate work together —> txn fees are increasing.

	Time 1: caching job runs, top block computed to include a, b, c
	Mempool looks like: j i h g f e d \| c b a

	Time 2-4: new txns come in
	Mempool looks like: j i h g f e d \| c X b Y Z a

	Time 5: caching job runs, top block computed to include a, Z, Y
	Mempool looks like: j i h g f e d c X b \| Y Z a

	Time 6: time to rebroadcast. Even though top 3M of txns would be Y Z a, the implementation means the initial set comes out to be a, b, c. Applying the filter of cached min fee rate would then eliminate Y Z, leaving a to be rebroadcast.

	Something to note is in this second example, the cache is actually reducing the set of txns even though a new block wasn’t mined. But this is only true if the time between Time 1 and Time 6 <= 30 min (so X Y Z considered recent & filtered out in final step).

	Something else to note is that if the caching job did not run at time 5, then at time 6, Y & Z would not have been filtered out. So, in this scenario, more frequent runs are better filters. More specifically, the cache should be updated < 30 minutes before rebroadcast occurs.


	So, putting together #1 & #2, my thoughts are…
	1. Don’t rebroadcast unless there has been 1 or more block since cache was updated.
	2. Try to keep time between last cache run & rebroadcast < 30 min.

	So I believe the optimal frequency logic would be a balance between these constraints.