Skip to content

Instantly share code, notes, and snippets.

@ZenGround0
Created July 31, 2023 22:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ZenGround0/c3285852f45c1f2631a02c047502f850 to your computer and use it in GitHub Desktop.
Save ZenGround0/c3285852f45c1f2631a02c047502f850 to your computer and use it in GitHub Desktop.
Summary of findings related to AMT churn and the nv18-nv19 market cron disruption

AMT Churn Explanation

Background

Market Churn

Churn is defined to be the size in bytes of the data in a datastructure that is overwritten over some period of time. Because GC in lotus is somewhat difficult lots of churn in filecoin state has a high system cost

We learned after nv19 that AMT churn is > 100x higher than optimal

AMT size was 500 MB

Churn was 50 GB in 2000 epochs, so a full rewrite of AMT values takes over 100x the space of just rewriting all of those values.

AMT

The AMT is a simple hash linked tree datastructure used to implement an array.

It has structure:

type AMTRoot struct {
	BitWidth uint64
	Height   uint64
	Count    uint64
	AMTNode  AMTNode
}

type AMTNode struct {
	Bmap   []byte
	Links  []cid.Cid
	Values []*cbg.Deferred
}

Either Links or Values is populated (link nodes vs leaf nodes), not both. The bitwidth defines the max length of Links or Values, setting both the leaf node size and the internal node branching factor. All leaf nodes have the same height.

Explaining States AMT Churn

Empirical Investigation

Running lotus-bench amt (filecoin-project/lotus#11074) confirms that this churn is an expected behavior of the states AMT. The market cron workload is almost worst case for the AMT. It overwrites evenly spaced deal states counting up by the cron interval period. Measuring this overwriting behavior for an AMT shaped like market states (40 M entries, bitwidth = 6 (node size = 64) we find that each epoch of market cron triggers churn of the entire set of AMT link nodes: 27.4 MB and also a large section of leaf nodes: 11.7 MB . We confirm that the growth is linear in churn rounds (because all such nodes just get replaced again). For 20 rounds we get 549 MB of link node data and 234 MB of leaf node data, a perfect linear fit. Extrapolating to 2000 rounds of snapshot epochs we get the expected 80GB of churn

Rerunning this with an update interval of every 30 days to match post nv19 behavior we get 2 MB total of churn each epoch putting us at a predicted 4 GB churn, matching what we see on the network today.

Raw output:

./lotus-bench amt                                                                                    20.2sWed Jul 12 23:24:34 2023
2023-07-12T23:24:40.871-0600	INFO	lotus-bench	lotus-bench/main.go:110	Starting lotus-bench
Populating AMT
Measuring AMT

------------
Link Count: 634922
Value Count: 40000000
9923 link nodes 27430652 bytes
625000 value nodes 527993648 bytes
Total bytes: 555424300
------------

Overwrite 1 out of 2880 values for 20 rounds
round: 0
round: 10
Measuring 20 rounds of churn

------------
Link Count: 634922
Value Count: 888896
9923 link nodes 27430652 bytes
13889 value nodes 11733173 bytes
Total bytes: 39163825
------------

Full rewrite and Bitwidth

An AMT with bitwidth 6 has 64 entries per node. This high bitwidth explains why most of the states AMT is rewritten when only a fraction of values are overwritten. In order for an evenly spaced overwrite workload to rewrite all link nodes all bottom level link nodes must change. To accomplish this 1 out of 64*64 = 4096 value nodes must change since each link node addresses 64 link nodes. Since 4096 > 2880, the pre-nv19 market cron rewrote all link nodes.

Additionally high bitwidth means that much more data in the leaves is churned through than the actual data being updated. About 12 MB of value data are churned through in one update at interval 2880. Over the course of a full overwrite that puts us at ~34 GB, a churn amplification on the order of 100x.

Datastructure mitigations for churn

A smaller bitwidths can help churn. Empirically it does not help that much. One round of churn with the same parameters and an AMT bitwidth of 2 is only a factor of 4 reduction from 39MB down to 12 MB. Additionally there is the downside that other AMT operations unrelated to cron churn will incur more overhead with lower bitwidth which makes this approach less promising.

A more specialized datastructure with ipld node locality for deal states grouped by interval would probably work ok for this. An array of AMTs one per interval would work for smaller interval sizes. For the current 86400 intervals that array is too big for top level state so perhaps an AMT of low bitwidth pointing to AMTs of high bitwidth would do the trick.

In summary this can’t be seriously addressed by just tweaking a parameter. More sophisticated approaches will need to keep in mind the additional constraints for the States AMT from PSD / sector activation alongside cron churn and hence it will be non trivial to get this right. All this supports the proposed approach (”Long-term fix” section of filecoin-project/FIPs#638) of mitigating churn issues by skipping cron churn altogether by pushing market payments into user triggered messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment