tstuefe/vmatree.md

## vmatree.md

      
    Raw
  

              vmatree.md
            
          
    Proposal for an alternative memory tracker for virtual memory that is faster and simpler.
Preface

Regions as sequence of addresses

Stop treating and storing regions as regions. That causes a lot of redundancy (end pointer of one region is the start of another) and ambiguity.
Instead, lets treat regions as sequence of annotated pointers:
Let a sequence of regions be a sequence of disjunct addresses. Let each address be associated with a state change consisting of an incoming and an outgoing state. The incoming state of address X describes the state of the memory region before X, the outgoing state of X describes the state from address X (inclusive) on.
A state is just a set of properties. The prototype uses { MEMFLAGS, commit state } tupels as state. The concept can be extended to e.g. contain "call stack" as another part of the state.
A special state is NONE_STATE: that state defines an address hole - no reserved region.
Example 1

Address 0x8_0000_0000 designates the start of CDS. So, we have
                0x8_0000_0000
                     |
-(address hole)------[... committed, mtShared ...

And the CDS start address would be represented by:
Address: 0x8_0000_0000, state_in: NONE, state_out: { committed, mtShared }

Example 2

The following operation sequence

reserve (A, D, mtNMT)
commit (A, B)
commit (C, D)
reserve (E, G, mtClass)
commit (F, G)

results in this layout:
     A              B             C              D     E               F               G
-----[ comm. mtNMT ][ res. mtNMT ][  com. mtNMT ]------[ res. mtClass ][ com. mtClass ]------

and these annotated pointers:
| Address | State In    | State out   |
|---------|-------------|-------------|
| A       | NONE        | COM mtNMT   |
| B       | COM mtNMT   | RES mtNMT   |
| C       | RES mtNMT   | COM mtNMT   |
| D       | COM mtNMT   | NONE        |
| E       | NONE        | RES mtClass |
| F       | RES mtClass | COM mtClass |
| G       | COM mtClass | NONE        |

Example 3 - Noop addresses

The following operation sequence

reserve (A, C, mtNMT)
commit (A, B)
commit (B, C)

would results in these pointers:
| Address | State In    | State out   |
|---------|-------------|-------------|
| A       | NONE        | COM mtNMT   |
| B       | COM mtNMT   | COM mtNMT   |  << noop
| C       | COM mtNMT   | NONE        |

However, this would result in a "noop" pointer - an address where state_in == state_out. It has no function. We can therefore remove it from the sequence and get this:
| Address | State In    | State out   |
|---------|-------------|-------------|
| A       | NONE        | COM mtNMT   |
| C       | COM mtNMT   | NONE        |

which means the sequence now merged both regions together seamlessly.
Example 4: "Unmapping" means just adding another region :)

Unmapping is just adding a region with NONE_STATE.

reserve (A, D, mtNMT)
unmap(B, C)

results in:
| Address | State In    | State out   |
|---------|-------------|-------------|
| A       | NONE        | RES mtNMT   |
| B       | RES mtNMT   | NONE        |
| C       | NONE        | RES mtNMT   |
| D       | RES mtNMT   | NONE        |

which correctly represents the memory hole between B and C.
Keep these addresses in a BST

We put these addresses into the BST. Node key is the address, the State change (the tupel of { state_in, state_out } ) is the datum.
When adding regions to the tracker, we add the first and the last address, and remove any nodes for addresses that are inbetween.
Example: unmapping multiple mappings

// reserve, commit CDS and class space together. Commit CDS fully, class space in "spots" (that is realistic!)

reserve(A, B, mtShared)
commit(A, B)
reserve(B, Z, mtClass)
commit(B, C)
commit(H, J)

results in:
| Address | State In     | State out     |
|---------|--------------|---------------|
| A       | NONE         | COM mtShared  |
| B       | COM mtShared | COM mtClass   |
| C       | COM mtClass  | RES mtClass   |
| H       | RES mtClass  | COM mtClass   |
| J       | COM mtClass  | RES mtClass   |
| Z       | RES mtClass  | NONE          |


Now we unmap CDS+class space:

unmap (A, Z)

which means we add a new "region": Region from A to Z, state "NONE"
And we do:

add A (NONE (before-state at A), NONE)
add Z (NONE, NONE (after-state at Z) )
remove all points inbetween
since both A and Z are noop pointers (NONE in NONE out), we remove them too (or rather, don't add them)

And now we have unmapped everything.
For details, please see: [2]
Advantages to the current solution:

Speed

It is a lot faster. No surprise. A BST-based dictionary beats a list- or array-based approach, since most operations are O(log n) vs O(n).
I added a gtest to compare times [3]. Lets hope I made no errors, but the results show:
That gtest

A) creates one million committed regions, spread over 100 reserved regions
B) does one million commit/uncommit operatons on randomly chosen regions

On my machine test times:


Tracker                    
Part A      
Part B        
Reporting                    


VirtualMemoryTracker (old)  
~19 seconds
~46 secs      
0.003 seconds                


VMATree (new)              
0.6 seconds
~1.7 seconds  
0.02 seconds                


which would mean my new implementation is about 27-30x faster during normal operations than the existing implementation.
Reporting, while a tad slower, is still very acceptable with 20 ms.
Note that this is just the very first prototype. There is a lot of untouched optimization potential, some very low hanging fruits actually.
Simplicity, Fidelity and Robustness

I find the new approach to be a lot simpler than the old approach.
One reason is that we don't need to distinguish between reserved and committed regions in the data structure. These are just mapping properties. In fact, if we like, we could just add other properties we want to track, e.g. executable, page size etc.
Call stacks would just be another property to add to the mix.
The new solution solves the problem of merging neighboring regions with the same properties rather elegantly. If you do commit A-B, commit B-C, commit C-D, we now end up with a single committed region. If we the re-commit B-C with different properties (e.g. different flag), we now have three regions.
I think it is a lot more robust than the old approach, since it does not assume a certain order of operations.
As for code size, just look at the new vmaTree.cpp, which is basically everything but the underlying BST implementation. That's the whole logic. It is a lot smaller than VMT.
Remarks

For this prototype, I salvaged a BST implementation from FreeBSD, a red-black tree. Obviously I cannot commit that; should we go forward with this idea, adding a RB tree to the code base would be part of the work. Not terribly complex, its just work.
The new Tracker is not yet wired up to os::reserve_memory and friends, nor to NMT reporting. I just wanted to get the principle right and some performance numbers.
Cheers, Thomas

[1] https://github.com/tstuefe/jdk/tree/nmt-vmatree
[2] https://github.com/tstuefe/jdk/blob/6be830cd2e90a009effb016fbda2e92e1fca8247/src/hotspot/share/nmt/vmaTree.cpp#L141
[3] https://github.com/tstuefe/jdk/blob/6be830cd2e90a009effb016fbda2e92e1fca8247/test/hotspot/gtest/nmt/test_nmtvmadict.cpp#L203
Tracker	Part A	Part B	Reporting
VirtualMemoryTracker (old)	~19 seconds	~46 secs	0.003 seconds
VMATree (new)	0.6 seconds	~1.7 seconds	0.02 seconds