Take(n)
returns a new vector containing the first n
elements. It performance as well or better than a single call to Update
.
It is not possible, using the current implementation, to have a method that Skip
or Splits
.
The implementation is separate for leaf nodes and parent nodes.
This method is implemented in my Solid.Vector
class. See the implementation here.
When Take(n)
is called on a leaf node, n ≤ node.Length
.
We allocate a new array the length of n
, and array-copy the first n
items of the current array into the new array.
When Take(n)
is called on a parent node, n ≤ total count
.
When this happens, we can divide the range specified by n
into that section of the node array that can be copied, and the last node, with the index n-1
, that we only want some part of.
For example, assuming our parent node has block offset of 2, that is, its bitmask is 11111 00000 00000
and there is a call to Take(01111 00100 10000)
which translates to the index 01111 00100 01111
, it means that all the indexes ranging from 00000 00000 000000
to 01110 11111 11111
are definitely part of the new vector.
This corresponds to all the child nodes at indices ranging from 00000
to 01110
.
The last group of indexes, ranging from 01111 00000 00000
to 01111 00100 01111
belongs to the final child node, but this may be a subset of its indexes. As such, we need to recursively invoke Take(n)
on the node at the last index -- 01110
. n
can remain as it is.
We then replace the last node of the newly created, partial array with the one returned by the recursive call.
The recursion terminates at a call to a leaf node's Take(n)
method, which does not recurse.
As such, there number of calls to Take(n)
is at most the height of the tree. Each call performs one array copy.
Bulk loading is more complicated. It allows significantly more efficient loading of many elements to the end of the vector because it modifies arrays in place.
A call to BulkLoad
requires
- Data, in the form of an array:
data
- An index of the data array from which loading is to begin
data_start
- A count that denotes the number of elements to load
data_length
When the externally visible Vector
class wants to BulkLoad
, it calls the implementation method with data_start = 0
and data_count = data.Length
. This ensures all the data is added.
For leaf nodes,
Allocate a new array of values the size of min(32, CurrentCount + data_length)
. Copy the current value array into it, and fill it with all the values contained in the parameter data_array
from the index data_start
to data_start + data_length
or the remaining capacity of the node. This can be achieved with efficient array copy.
If the current node cannot hold all the items specified by data_length
it needs to create a parent node that contains itself, and call BulkLoad
on it, with data_length
being the remaining items that were not added in this call to BulkLoad
.
For parent nodes,
We first find the size of the array we need to allocate to fit all the values specified by data_length
. We can do this, for example, by:
let needed_size = (data_length + MyTotalCount) >>> 5·H
Where H
is the level of the parent node (the size of its 5-bit offset)
needed_size
may be bigger than 32
.
We create a new array the size of needed_size
or 32
, whichever is smaller. This array is initialized with empty nodes.
We copy the current array of nodes into the new array, overwriting the first empty nodes but possibly retaining those at the end.
In a parent, all the initial nodes are going to be filled. Only the last node can possibly accept more elements.
We iterate over the nodes in the new array, starting from what was the last node of the original vector, which may have some additional capacity.
Every iteration, we call BulkLoad
on the current child node with data_start
being equal to the index we stopped at in the last iteration, and data_length
being equal to the total number of values the child node can possibly hold, in addition to its current total count.
Every time, we modify the new array in place so that it contains the node returned by the call to BulkLoad
.
If we screw up the data_length
, there will be a structural issue where child nodes have an offset equal to the offset of the parent node, or there will be 'gaps' in the indexing structure.
After this is done, we create a new parent around the array we just filled.
Now, if the needed_size
was bigger than 32
, there are still elements remaining.
We need to create a new higher level parent (its array will contain the parent we just bulk loaded) and call BulkLoad
on it, with the remaining elements.