B-Trees
- unbalanced trees
- balanced trees
- B-Trees
- invariants
- the root node
- search
- add
- remove
- depth analysis
unbalanced trees
- we like search trees because we can find things
"quickly" (O( log n) on average)
- unfortunately, the worst case is still O(n):
- insert a sorted list of items into a tree
- the first (least) item goes at the root
- the next item is larger, so goes to the right
- the next item is larger still, so goes to the right of the right
- ... and so on
- the resulting tree very much resembles a linear list
- this happen very rarely if the items are inserted in truly random order
balanced trees
- a balanced tree of depth d has bd < n <= bd+1 - 1 nodes,
so d = O( log n)
- we know that worst-case, d = O(n)
- search, addition, and removal time are proportional to d, so
balanced trees are nice
- we can take an unbalanced tree and rebalance it: take elements
in a random order and insert them into a new tree
- we can also add new restrictions to our trees so they stay balanced:
- incremental re-balancing (AVL, red-black, splay trees)
- all leaves have the same depth (B-trees)
B-Tree Properties
- each node has a number n of elements
somewhere between a fixed minimum and maximum: min > 0 and
min <= n <= 2min
- all leaf nodes have the same depth
- each non-leaf node with n elements has n+1 subtrees
- search proceeds as for (n+1)-ary search trees
- the root may have fewer than min elements
Addition and Removal
- when adding a new element, we might have to split a node
that has exceeded the maximum number of elements:
- split creates two nodes instead of one
- these nodes are stored in the node above
- this may cause the node above to split
- when removing an element, we might have to merge two nodes
that don't have at least the minimum number of elements:
- first we try to re-arrange values from siblings to come to this node
- otherwise we merge two nodes into one
- this removes an element from the node above, and may cause that
node to merge, and so on
B-Tree invariants
- all leaf nodes have the same depth
- a node has at most max elements
- nodes (except the root) have max/2 <= n <= max elements
- each node has an array el of size max to store elements
in locations 0... n-1
- if i < j, el[i] <= el[j]
- each non-leaf node has an array children
of size max+1 to store subtrees in locations 0... n
- for any non-leaf node, el[i] is greater than
all the elements in subtree i and less than all the elements in subtree
i+1
The root of a B-Tree
- suppose MINIMUM is 3
- how can we have a tree with only 1 element?
- how can we have a tree with 7 elements?
- the B-tree rules are designed to ensure each node has
at least MINIMUM+1 subtrees, to prevent the degenerate
case of an unbalanced tree
- to accommodate special cases, we need to make exceptions somewhere
- the root is the place to make an exception
- other than this exception, each subtree in a B-Tree is also a B-Tree
(almost recursive definition)
Search in a B-Tree
- looking for element e
- search through the array of elements (linear or binary search)
until we find the least i such that data [i] >= e
- if data [i] == e or we are a leaf node, we are done
- if data [i] > e, recursively search subtree[i]
- otherwise, data [i] < e, so recursively search subtree[i+1]
Adding in a B-Tree
- search to find the place where the element would be
(if implementing a set, we are done if the element is present)
- if the element is not present, add it -- the node may now have
MAXIMUM+1 elements
- split the node into:
- one node with the first MINIMUM=MAXIMUM/2 elements
- an element that will be inserted in the node above us
- one node with the last MINIMUM=MAXIMUM/2 elements
- add the two nodes and the middle element to the node above us
Add: temporarily violating the invariants
- recursive implementation -- hard to modify your parents (like
real life?)
- loose add:
- add the element to the subtree below us
- when the call returns, the root of the subtree may have
MAXIMUM+1 elements
- fix that by splitting the root of the subtree and taking in the
middle element
- before returning, we may have MAXIMUM+1 elements, which
is fine -- our parent will fix that
- if we are the root, we can loosely add the element, then if necessary
split ourselves and create a new root
Removing from a B-Tree
- loose remove may leave nodes with too few elements
(e < MINIMUM): we then must fix this shortage
- if the element to be removed is in a non-leaf node,
we must:
- replace it with the biggest element in the subtree to its left
- remove the biggest element in the subtree to its left
- to fix a shortage we can:
- if available, move an element from a sibling subtree
- otherwise, merge with a sibling subtree --
which may leave us with a shortage
Depth Analysis
- every non-leaf node has at least m = MINIMUM subtrees
- therefore, every layer has at least m times more nodes
than the previous layer
- so a tree of depth d has at least n = dm leaf nodes
- therefore, d = O( logm n)
- a B-tree has all leaves at the same depth:
- nodes split, but when the root splits we can add a level
- nodes merge, but when the root merges we can drop a level