Skip to content

Instantly share code, notes, and snippets.

@yaojingguo
Last active June 30, 2018 07:37
Show Gist options
  • Save yaojingguo/467ddc93af3cd09f5f1d3773715c9a23 to your computer and use it in GitHub Desktop.
Save yaojingguo/467ddc93af3cd09f5f1d3773715c9a23 to your computer and use it in GitHub Desktop.
math

Cache-Oblivious Streaming B-trees

SHUTTLE TREE

Shuttle-tree structure

Shuttle tree algorithm complexity:

  • Search(optimal): $O(\log_{B+1}N)$
  • Range query of $L$ successive elements (Optimal): $\log_{B+1}N + L/B$
  • Insertion: $\frac{\log_{B+1}N}{B^{\Theta(\frac{1}{(\log\log B)^2})}+\frac{(\log^2 N)}{B}}$

A shuttle tree is a strongly weight-balanced search tree (SWBST). Each non-leaf node points to a a linked list of buffers. These buffers are recursively defined to be shuttle trees.

The weight of a node $u$ in a tree, denoted $w(u)$, is the total number of descendents of $u$ plus one; that is, $w(u)=\sum_{v\in children(u)}w(v)$, with $w(u)=1$ if $u$ is a leaf. The depth of a node $u$ is $u$’s distance from the root. SWBSTs have leaves all at the same depth. For such trees, we define the height of node $u$, denoted $h(u)$, to be the distance to a leaf plus one, i.e., $h(u) = h(v)+1$ for every child $v$ of $u$, and $h(u) = 1$ if $u$ is a leaf.

A SWBST is a balanced search tree that maintains the following invariant: For fanout parameter $c > 1$ and for any node $v$, $w(v) = \Theta(c^{h(v)})$.

A height 4 tree with fanout $c=4$has $1 + 4 + 4^2 + 4^3=\frac{4^4-1}{4-1}=85$ nodes. A height $h$ tree has $N=\frac{c^h-1}{c-1}$ nodes.

LEMMA 1. Consider an N-node weight-balanced tree with constant balance parameter $c$. (1) The degree of any node is $\Theta(c)$. (2) For any node u and constant $d<h(u)$, the number of descendants of $u$ that have height at least $h(u)-d$ is $\Theta(c^d)$. (3) Suppose that a node split has cost 1. Then the amortized cost to insert into the tree is the search cost plus O(1). (4) Suppose that splitting a node u costs Q(ch(u)). Then the amortized cost to insert into the tree is the search cost plus O(logN).

Proof. (1) This claim is intuitive.

For a leaf node, its height is $1$. By the invariant, we have $N=\Theta(c)$ and the degreee is $\Theta(c)$. We will prove $N=\Theta(c) c^{h-1}$ and the degree is $\Theta(c)$ for any node $u$. Assume that this claim is true for any node $v$ with height $h-1$. $w(u) = \Theta(c) c^{h-1} = c_{u} c^{h-1}$ $w(u) = \Theta(c) c^{h-1} + ... + \Theta(c) c^{h-1} = c_{v_1} c^{h-1} + ... + c_{v_{degree}}c^{h-1}$ $c_{u} c^{h-1} = c_{v_1} c^{h-1} + ... + c_{v_{degree}}c^{h-1}$ $c_{u} c = c_{v_1} + ... + c_{v_{degree}}$ $c = \frac{c_{v_1}}{c_u} + ... + \frac{c_{v_{degree}}}{c_u}$, Each term in the right side $\approx 1$, so the term count should be $\Theta(c)$. The term count is the degree.

(2) See the picture in Omnigraffle.

(3) $T = \frac{1}{c} + \frac{1}{c^2} + ... + \frac{1}{c^h}=O(1)$, $h$ is the tree height. (4) $T = \frac{c}{c} + \frac{c^2}{c^2} + ... + \frac{c^h}{c^h}=h=\log_{c}{N}=O(\log N)$, $h$ is the tree height.

LEMMA 2. When an element is inserted into a leaf , all nodes on the path from the root to can be fetched without increasing the asymptotic complexity, as long as $M = W(B\log N)$.

Define Fibonacci factor for a positive integer $h$:

$\xi(h) = \begin{cases} \displaystyle h & \text{if } h \text{ is a Fibonacci number,} \ \displaystyle \xi(h-f) & \text{otherise, } f \text{ is the largest Fibonacci number less than } h \end{cases}$

The buffer sizes of a node at height $h+1$ depend upon $\xi(h)$. In particular, consider a node $u$ at height $h+1$ in the tree, and let $k$ be such that $F_k = \xi(h)$. We define the buffer-height-index function $H(j)=j- \lceil 2\log_{\phi} j \rceil$, where $\phi\approx 1.618$ is the golden ratio. Then $u$ has buffers with heights $F_{H( j)}$, for each integer $j$, $j = \Theta(1)$, . . . , $k-1$, $k$. In other words, there are roughly $k$ buffers increasing geometrically in their heights, and the largest buffer has height $F_H(k) = F_{k-2\lceil \log_{\phi} k\rceil}$. These settings mean that the parent node of a subtree containing roughly $K$ nodes has the largest buffer of size roughly $K^{\frac{1}{\Theta((\log \log k)^2)}}$.

Use $m$ to denote the weigth of the parent:

$m \approx c^{F_{k-2\lceil \log_{\phi} k\rceil}}$ $F_{k-2\lceil \log_{\phi} k\rceil} \approx \phi^{k-2\lceil \log_{\phi} k\rceil}$ $\approx \frac{\phi^{k}}{\phi^{2\lceil \log_{\phi} k\rceil}}$ $\approx \frac{\phi^{k}}{k^2}$ $m \approx c^{\frac{\phi^k}{k^2}} = (c^{\phi ^k})^{\frac{1}{k^2}}$ $m \approx c^{\frac{\phi^k}{k^2}} = (c^{F_{k}})^{\frac{1}{k^2}}$ $m \approx c^{\frac{\phi^k}{k^2}} = (K/c)^{\frac{1}{k^2}} \approx K^{\frac{1}{k^2}}$ Since $k = \Theta(\log_{\phi} height) =\Theta(\log_{\phi} \log_c K) = \Theta(\log \log K)$ $K^{\frac{1}{\Theta((\log \log K)^2)}}$.

LEMMA 3. Consider a node at height $h+1>1$ in a shuttle tree (i.e,. a non-leaf node). This node is the leaf of some height-$F_{k-1}$ recursive subtree if and only if $\xi(h) F_k$.

Proof.

$h+1=85$, $\xi(84) = 8$, $F_6=8$, $F_5=5$

Base case: $\xi(h) = F_x$, $F_x \geq F_k$

Maintenance:

  • If $F_x = F_k$, split tree with $F_k = F_{k-2} + F_{k-1}$, $F_{k-1}$ is the height of the bottom subtree. Termination.
  • If $F_x > F_k$, split tree with $F_x = F_{x-1} + F_{x-2}$, We have $F_{x-1} \geq F_k$ since $F_{x-1} \geq F_{x-2}$. We have $F_{x-1} \geq F_k$. Recursion.

LEMMA 4. For a shuttle tree of height $F_k$ and fanout $c$, which contains $N = \Theta(c^{F_k})$ elements, the worst-case search cost is $O(F_k/\log B) = O(log_B N)$.

Proof.

$H(j) = j - \lceil2\log_{\phi} j\rceil$ Recurrence: $T(F_{k+1}) = T(F_{k-1}) + T(F_k) + T(F_{H(k)}) + T(F_{H(k+1)})$

$T(F_k) \leq (c_1F_k-\frac{c_2F_k}{\log_{\phi}F_k})/\log B$

$T(F_{k+1}) \leq (c_1F_{k+1}-\frac{c_2F_{k+1}}{\log_{\phi}F_{k+1}})/\log B$

$F(F_{k+1}) \leq (c_1F_{k-1}-\frac{c_2F_{k-1}}{\log_{\phi}F_{k-1}} + c_1F_k-\frac{c_2F_k}{\log_{\phi}F_k} + c_1F_{H(k)}-\frac{c_2F_{H(k)}}{\log_{\phi}F_{H(k)}}+ c_1F_{H(k+1)}-\frac{c_2F_{H(k+1)}}{\log_{\phi}F_{H(k+1)}})/\log B$

$F(F_{k+1}) \leq (c_1F_{k+1}-\frac{c_2F_{k-1}}{\log_{\phi}F_{k-1}} -\frac{c_2F_k}{\log_{\phi}F_k} + c_1F_{H(k)}-\frac{c_2F_{H(k)}}{\log_{\phi}F_{H(k)}}+ c_1F_{H(k+1)}-\frac{c_2F_{H(k+1)}}{\log_{\phi}F_{H(k+1)}})/\log B$

$F(F_{k+1}) \leq (c_1F_{k+1}-(\frac{c_2F_{k-1}}{\log_{\phi}F_{k-1}} +\frac{c_2F_k}{\log_{\phi}F_k} - c_1F_{H(k)}+\frac{c_2F_{H(k)}}{\log_{\phi}F_{H(k)}} - c_1F_{H(k+1)}+\frac{c_2F_{H(k+1)}}{\log_{\phi}F_{H(k+1)}}))/\log B$

We need to prove:

$\frac{c_2F_{k-1}}{\log_{\phi}F_{k-1}} +\frac{c_2F_k}{\log_{\phi}F_k} - c_1F_{H(k)}+\frac{c_2F_{H(k)}}{\log_{\phi}F_{H(k)}} - c_1F_{H(k+1)}+\frac{c_2F_{H(k+1)}}{\log_{\phi}F_{H(k+1)}}\geq \frac{c_2F_{k+1}}{\log_{\phi}F_{k+1}}$

$rightside\approx \frac{c_2F_{k+1}}{k+1}$

$leftside\geq \frac{c_2F_{k-1}}{k-1} +\frac{c_2F_k}{k} - c_1\frac{F_k}{k^2}+\frac{c_2F_k}{k^3} - c_1\frac{F_{k+1}}{(k+1)^2}+\frac{c_2F_{k+1}}{(k+1)^3}$

Make $c_1 \leq c_2$

$leftside \geq \frac{c_2F_{k-1}}{k-1} +\frac{c_2F_k}{k} - c_2\frac{F_k}{k^2}+\frac{c_2F_k}{k^3} - c_2\frac{F_{k+1}}{(k+1)^2}+\frac{c_2F_{k+1}}{(k+1)^3}$

We need to prove $\frac{c_2F_{k-1}}{k-1} +\frac{c_2F_k}{k} - c_2\frac{F_k}{k^2}+\frac{c_2F_k}{k^3} - c_2\frac{F_{k+1}}{(k+1)^2}+\frac{c_2F_{k+1}}{(k+1)^3} \geq \frac{c_2F_{k+1}}{k+1}$. $\frac{F_{k-1}}{k-1} +\frac{F_k}{k} - \frac{F_k}{k^2}+\frac{F_k}{k^3} - \frac{F_{k+1}}{(k+1)^2}+\frac{F_{k+1}}{(k+1)^3} \geq \frac{F_{k+1}}{k+1}$. $\frac{F_{k-1}}{k-1} +\frac{F_k}{k} - \frac{F_k}{k^2}+\frac{F_k}{k^3} - \frac{F_{k+1}}{(k+1)^2}+\frac{F_{k+1}}{(k+1)^3} \geq \frac{F_{k}}{k+1}+\frac{F_{k-1}}{k+1}$ $\frac{2F_{k-1}}{(k-1)(k+1)} +\frac{F_k}{k(k+1)} - \frac{F_k}{k^2}+\frac{F_k}{k^3} - \frac{F_{k+1}}{(k+1)^2}+\frac{F_{k+1}}{(k+1)^3} \geq 0$

$\frac{2F_{k-1}}{(k-1)(k+1)} +\frac{\phi F_{k-1}}{k(k+1)} - \frac{\phi F_{k-1}}{k^2}+\frac{\phi F_{k-1}}{k^3} - \frac{\phi^2F_{k-1}}{(k+1)^2}+\frac{\phi^2F_{k-1}}{(k+1)^3} \geq 0$

$\frac{2}{(k-1)(k+1)} +\frac{\phi }{k(k+1)} - \frac{\phi }{k^2}+\frac{\phi }{k^3} - \frac{\phi^2}{(k+1)^2}+\frac{\phi^2}{(k+1)^3} \geq 0$

  • $F_{H(k)} \approx \phi^{H(k)} = \phi^{k-\lceil2\log_{\phi}k\rceil}=\frac{\phi^k}{\phi^{\lceil2\log_{\phi}k\rceil}} \leq \frac{\phi^k}{\phi^{2\log_{\phi}k}} = \frac{\phi^k}{\phi^{\log_{\phi}k^2}}=\frac{\phi^{k}}{k^2}\approx \frac{F_k}{k^2}$

  • $\log_{\phi}F_{H(k)} \approx \log_{\phi}\phi^{H(k)} = H(k)$

  • $F(F_{k+1}) \leq (c_1F_{k+1}-\frac{c_2F_{k-1}}{\log_{\phi}F_{k-1}} -\frac{c_2F_k}{\log_{\phi}F_k} + c_1\frac{F_k}{k^2}-\frac{c_2F_{H(k)}}{\log_{\phi}F_{H(k)}}+ c_1\frac{F_{k+1}}{(k+1)^2}-\frac{c_2F_{H(k+1)}}{\log_{\phi}F_{H(k+1)}})/\log B$

  • $\frac{F_{H(k)}}{\log_{\phi}F_{H(k)}}\approx\frac{F_k}{k^2H(k)}=\frac{F_k}{k^2(k-\lceil2\log_{\phi}k\rceil)} \geq \frac{F_k}{k^3}$

LEMMA 5. An n-node shuttle tree uses $O(n)$ space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment