Binary Search
- suppose we have an array of n values in ascending order, or any
other sorted order
- to find element x,
- we can search through the entire array (linear search), or
- we might look in the middle of the array, at index n / 2, and see
- whether x < a[n/2] (condition L)
- if condition L is true, then
- the value must be, if anywhere, at one of the indices 0..(n/2
- 1)
- otherwise, we see whether x > a[n/2] (condition G)
- if condition G holds, then
- the value must be, if anywhere, at one of the indices (n/2 +
1)..(n - 1)
- if neither L nor G holds, x == a[n/2], and our search
is complete
- if either L or G holds, we repeat the binary search
- either recursively or iteratively
Recursive binary search
/* Returns -1 if not found, or the index of the element if found */
public Type binarySearch (Type value, Type array [], int first, int last) {
int count = last - first + 1;
if (count < 1) {
return null;
} else if (count == 1) {
return value.compareTo (array [first]) == 0) array [first] ? null;
} else { // assert (count > 0)
int middle = first + count / 2;
if (value.compareTo (array [middle] < 0) { // in the first half of the array
return binarySearch (value, array, first, middle - 1);
} else { // in the second half of the array
return binarySearch (value, array, middle + 1, last);
}
}
}
- in-class exercise: what is the runtime of this method?
- remember iterative (looping) binary search
- in-class exercise: what is the runtime of the iterative method?
Maintaining a sorted array
- suppose the sorted array (e.g. "cassette", "cd", "dvd", "vinyl") needs
a new element, e.g. "iPod"
- to insert this new element in the correct position,
- find the location where it would belong, e.g. by modifying above binarySearch()
- shift all elements after this location up by one
- store the element in this location
- in-class exercise: what is the runtime of this algorithm?
- suppose the sorted values were kept in a linked list
- in-class exercise: what is the runtime of this algorithm?
Trees
- imagine having to store, in an organized fashion,
all the descendants of a given person
- as in this family tree:
image by Derrick Coetzee, in entry
"
family tree" in Wikipedia
- trees can hold any kind of data, even numbers:
image by Derrick Coetzee, in entry
"
Tree (data structure)" in Wikipedia
- a node in a tree is similar to a node in a linked list, except:
- a linked list node has a reference to zero or one other link nodes,
whereas
- a tree node has a reference to zero or more other tree nodes
- later we will consider how trees are implemented
Tree definitions
- a tree has one root node
- all other nodes can be reached by following links from the root
- each node in a tree has exactly one parent
- each node in a tree can have zero or more children
- the other children of a node's parent are the node's siblings
- nodes are in a hierarchical relationship:
- node X is an ancestor of another node Y, or
- node X is a descendant of another node Y, or
- node X is on a different branch than node Y
- nodes without children are leaf nodes
- nodes with children and a parent are interior nodes
Tree properties exercise
image by Derrick Coetzee, in entry
"
Tree (data structure)" in Wikipedia
- in-class exercise: identify the
- root node
- interior nodes
- leaf nodes
- parent of node 6
- children of node 6
- ancestors of node 5
- descendants of node 7
More tree definitions
- a subtree is a node with all its descendants
- the depth of a node is the number of its ancestors,
- e.g. the root always has depth 0:
- some people prefer to say the root has depth 1, but in this class
the root always has depth 0
- the children of the root are at depth 1
- their children are at depth 2
- the children of a node at depth n are at depth n+1
- the height of a tree is the maximum depth of any node in the tree
Special trees
- in a binary tree
- each node has at most two children
- similarly, in a ternary tree
- each node has at most three children
- in a balanced binary tree
- the depth of each subtree of every node differs by at most one
- there are also other definitions of balanced binary trees
- in a binary search tree
- each node has a value
- greater than every node in its left subtree, and
- less than every node in its right subtree
Binary search trees
image by Derrick Coetzee and Booyabazooka, in entry
"
Binary search tree" in Wikipedia
- in-class exercise: how can we check whether this is a binary search tree?
- binary search (for value x) is now much easier:
- if the value in the root is x, we are done
- otherwise, if x < the value in the root, search (recursively)
in the left subtree
- otherwise, x > the value in the root, so search
in the right subtree
- if the subtree we need to search is empty, x is not in the tree
- in-class exercise: what is the runtime of this algorithm?
Binary tree traversals
- if we want to visit each node in a binary tree, we have a choice of
how to do it:
- preorder traversal: visit the root node, then recursively
visit the left subtree, then the right subtree
- inorder traversal: recursively visit the left subtree, then
visit the root node, then recursively visit the right subtree
- postorder traversal: recursively
visit the left subtree, then the right subtree, then the root node
- in-class exercises: do a pre-order, in-order, and post-order traversal
of this tree, writing down node values in the appropriate sequence
image by Derrick Coetzee, in entry
"
Tree (data structure)" in Wikipedia
- in-class exercises (individually): do a pre-order,
in-order, and post-order traversal of this tree
image by Derrick Coetzee and Booyabazooka, in entry
"
Binary search tree" in Wikipedia
Expression trees
- an expression tree is a tree where:
- every node with children is an operator
- every leaf node is an operand
- each operator operates on the values of its children
- in-class exercise: do a pre-order,
in-order, and post-order traversal of this tree
Binary search tree properties
- adding, finding (get), or removing a node are all O(depth),
- if the tree is balanced, this is O(log n), with n the number
of nodes
- compare to a sorted array, where binary search means finding is always
O(log n), but inserting is O(n)
- binary search trees are an efficient way to search data and to sort data,
as long as the trees remain balanced
- an unbalanced binary tree might look like a linked list
- in-class exercise: what is the runtime of the get method when
the tree is not balanced?
- data can be identified with a unique key
- in-class exercises:
- add everyone's name to a search tree (in groups of five-ten people)
- is the resulting tree balanced?
Binary search tree add operation
- if key is less than the key of the current node, recursively add to the
left subtree
- if key is greater than the key of the current node, recursively add to
the right subtree
- otherwise, add at this node:
- if there is no current node, create one and return it
- if there is a current node, replace its contents with the new contents
- this assumes:
- the method returns a new root for the new (sub)tree with the desired
value inserted
- the caller of this method knows what to do with the new root
Binary search tree remove operation
- find node with the given key
- if the node is a leaf node, just delete it, return null
- if the node only has a left subtree, return the left subtree
- if the node only has a right subtree, return the right subtree
- if the node has both subtrees, must replace it with another node that fits
in the slot
Removing a node that has both subtrees
- the rightmost node in the left subtree can be put in place of the
current node without altering the sorted property
- likewise, the leftmost node in the right subtree can be put in place
of the current node
- that node might have a left (right) subtree, which can be used in its old
position
- the rightmost node in the left subtree is the inorder predecessor of the
current node
- the leftmost node in the right subtree is the inorder successor of the
current node
- so either can be used
- in-class exercise: delete node 17 from this tree:
- in-class exercise (individually): delete node 14 from the original tree
- in-class exercise (individually): delete node 31 from the original tree
Binary node class
- a singly-linked list node has (at most) one reference to another node
- a binary tree node has (at most) two references to other nodes
- for example, see here
- both types of nodes also need a reference to the locally stored value
- data fields are value, left, right
- methods include constructors, accessor methods, mutator methods, toString
- test program builds small tree, tests the methods
- in-class exercise: build the following tree using the methods from class
BinaryNode
- in-class exercise: write a recursive static method to print the tree in
postorder
Binary search tree class
- for linked lists, we have:
- a node class to define the node of a linked list and the operations
on a single node (for example, to print the value of a node)
- a linked list class to define the operations on an entire linked list
(for example, to print the entire list)
- likewise for trees, we need a class to define the operations on an entire
tree
- different classes can use the same BinaryNode class to provide different
operations on:
- binary trees in general
- binary search trees
here, the second class might be a subclass of the first class
Binary search tree implementation
- root is the only data field
- methods include
- the constructors,
- get (to get a specified item from the tree),
- add,
- remove,
- toString (inorder traversal of the tree) and pre-order and
post-order conversions to strings, and
- main, which exercises the code, ensures basic functionality
- BinarySearchTree.java,
uses TreeIterator.java,
Binary search tree get operation
- if searching for a key in an empty subtree, report that the key was not
found (base case 1)
- if the key matches the key in the current node, return that value (base
case 2)
- if the key is less than the key of the current node, recursively get
from the left subtree
- if key is greater than the key of the current node, recursively get
from the right subtree
- when getting a value, the key has to have type Type
- however, only the key part of the key object, the part which is checked
by compareTo, needs to be initialized
Binary search tree add operation
- if key is less than the key of the current node, recursively add to the
left subtree
- if key is greater than the key of the current node, recursively add to
the right subtree
- otherwise, add at this node:
- if there is no current node, create one and return it
- if there is a current node, replace its contents with the new contents
- this assumes:
- the method returns a new root for the new (sub)tree with the desired
value inserted
- the caller of this method knows what to do with the new root
- in-class exercise: write the code for this method
Adding a node to a binary search tree
- non-recursive public method calls recursive private method
(with same name but different signature) with as argument the root of the
tree
- the result of the recursive call is the new root
- so if the root was null, the new root is a leaf node
- recursive method base cases:
- item not in tree: return new node
- item found in tree: replace
- recursive method recursion:
- item less than current node: set left subtree to result of adding new
item to left subtree
- item greater than current node: set right subtree to result of adding
new item to right subtree
- see BinarySearchTree.java
Binary search tree remove operation
- find node with the given key
- if the node is a leaf node (or null), return null (base case 1)
- if the node only has a left subtree, return the left subtree (base case
2)
- if the node only has a right subtree, return the right subtree (base case
3)
- if the node has both subtrees, must replace it with another node that fits
in the slot
- that node N is either the rightmost node in the left subtree,
or the leftmost node in the right subtree
- claim: this maintains the essential characteristic of a binary search tree
- in-class exercise: is the claim true, and why or why not?
- the rightmost node in the left subtree is the inorder predecessor
of the current node
- the leftmost node in the right subtree is the inorder successor
of the current node
Removing a node with both subtrees
- node N, the node before (after) the node being removed, might itself
have a left (right) subtree, which can replace N in the tree when N
is moved to the slot of the node being removed
- in-class exercise: delete node 17 from this tree, replacing the deleted
node with the node right before it:
Traversing the search tree
- toString() uses a parentheses notation that is quite messy but
can be used to figure out the structure of the tree
- three iterators: pre-order, in-order, post-order
- all using TreeIterator.java,
which is for all binary trees (that use
BinaryNode), not just binary
search trees
- to create a new iterator:
- the user program calls one of the iterator methods in the class BinarySearchTree
- the BinarySearchTree iterator method calls the appropriate
constructor of the TreeIterator class, with, as argument, the
root of the tree
- the resulting TreeIterator object can be used to traverse
the binary search tree
- tree iterator uses a stack (or two) to keep track of where it is in the
tree, and which node is next
- by comparison, the recursive (pre-order) traversal done by toString()
is much simpler
- the constructors for the iterators all take the root of the tree as a parameter,
so they do not need explicit access to the private (protected) class data
- in other words, a tree iterator can iterate over any binary tree, but must
be given the binary tree over which to iterate
- a binary search tree class knows over which tree to iterate, and can provide
the parameter to the TreeIterator constructor
- this means the iterator() methods must be defined within the BinarySearchTree
class or one of its subclasses
- they are the only classes that have access to the root of the tree
Efficiency of binary search tree operations
- most operations have time O(height of the tree)
- in a balanced tree, the height is O(log N), with N the number of nodes
- in-class exercise: why is this true?
- in an unbalanced tree, the height is O(N)
- adding and deleting random data may or may not give the desired result
- adding data in sorted order is guaranteed to give the worst possible result
- traversing an entire tree, of course, is O(n)
Databases and search keys
- databases (e.g. shopping lists, address lists, payroll information) often
have one or more identifying items, called keys
- depending on the database, each key may be unique, or each set of keys
may be unique
- binary search trees can be used to store objects that have keys
- the compareTo() method should compare just the keys
- the search key uniquely identifies the item (e.g. Social Security Number
in IRS records)
- the value of the item can be changed without changing the location of the
item
- if the search key is modified in place, the binary search tree is no longer
a search tree
- database may refer to the collection of items, to the collection
of items stored on disk, or to the program (database program) that accesses
the collection