B-Trees
- unbalanced trees
- balanced trees
- B-Trees
- invariants
- the root node
- search
- add
- remove
- depth analysis
unbalanced trees
- we like search trees because we can find things
"quickly" (O( log n) on average)
- unfortunately, the worst case is still O(n):
- insert a sorted list of items into a tree
- the first (least) item goes at the root
- the next item is larger, so goes to the right
- the next item is larger still, so goes to the right of the right
- ... and so on
- the resulting tree very much resembles a linear list
- this happen very rarely if the items are inserted in truly random order
balanced trees
- a perfectly balanced binary tree of depth d has n = 2d+1 - 1 nodes
- therefore, d = O( log n) in a balanced tree
- we know that worst-case, d = O(n)
- search, addition, and removal time are proportional to d, so
balanced trees are nice
- we can take an unbalanced tree and rebalance it: take elements
in a random order and insert them into a new tree
- we can also add new restrictions to our trees so they stay balanced:
- incremental re-balancing (AVL, red-black, splay trees)
- all leaves have the same depth (B-trees)
B-Trees
- all leaf nodes have the same depth
- each nodes have a number of elements somewhere between
a fixed minimum and maximum
- each node with n elements has n+1 subtrees
- search proceeds as for binary and ternary search trees
Addition and Removal
- when adding a new element, we might have to split a node
that has exceeded the maximum number of elements:
- split creates two nodes instead of one
- these nodes are stored in the node above
- this may cause the node above to split
- when removing an element, we might have to merge two nodes
that don't have at least the minimum number of elements:
- first we try to re-arrange values from neighbors to come to this
node
- otherwise we merge two nodes into one
- this removes an element from the node above, and may cause that
node to merge, and so on
B-Tree invariants
- a node has e elements
- a node has at most n elements (n = MAXIMUM)
- nodes (except the root) have e >= n/2 elements
- nodes have e <= n elements
- the elements are stored in sorted order in
locations 0... e-1 of an array
- each non-leaf node has e+1 subtrees, stored in
locations 0... e of an array
- for any non-leaf node, the element at index i is greater than
all the elements in subtree i and less than all the elements in subtree
i+1
- all leaf nodes have the same depth
The root of a B-Tree
- suppose MINIMUM is 3
- how can we have a tree with only 1 element?
- the B-tree rules are designed to ensure each node has
at least MINIMUM+1 subtrees, so we cannot have the degenerate
case of an unbalanced tree
- to accommodate special cases, we need to make exceptions somewhere
- the root is the obvious place to make an exception
- other than this exception, each subtree in a B-Tree is also a B-Tree
(almost recursive definition)
Search in a B-Tree
- looking for element e
- search through the array of elements (linear or binary search)
until we find the least i such that data [i] >= e
- if data [i] == e or we are a leaf node, we are done
- if data [i] > e, recursively search subtree[i]
Adding in a B-Tree
- search to find the place where the element would be
- set: if the element is present, we are done
- if the element is not present, add it -- the node may now have
MAXIMUM+1 elements
- split the node into:
- one node with the first MINIMUM=MAXIMUM/2 elements
- an element that will be inserted in the node above us
- one node with the last MINIMUM=MAXIMUM/2 elements
- add the two nodes and the middle element to the node above us
Add: temporarily violating the invariants
- recursive implementation -- hard to modify your parents (like
real life?)
- loose add:
- add the element to the subtree below us
- when the call returns, the root of the subtree may have
MAXIMUM+1 elements
- fix that by splitting the root of the subtree and taking in the
middle element
- before returning, we may have MAXIMUM+1 elements, which
is fine -- our parent will fix that
- if we are the root, we can loosely add the element, then if necessary
split ourselves and create a new root
Removing from a B-Tree
- loose remove may leave nodes with too few elements
(e < MINIMUM): we then must fix this shortage
- if the element to be removed is in a non-leaf node,
we must:
- replace it with the biggest element in the subtree to its left
- remove the biggest element in the subtree to its left
- to fix a shortage we can:
- if available, move an element from a neighboring subtree
- otherwise, merge with a neighboring subtree --
which may leave us with a shortage
Depth Analysis
- every non-leaf node has at least m = MINIMUM subtrees
- therefore, every layer has at least m times more nodes
than the previous layer
- so a tree of depth d has at least n = dm nodes
- therefore, d = O( logm n)
- a B-tree has all leaves at the same depth:
- nodes split, but when the root splits we can add a level
- nodes merge, but when the root merges we can drop a level