Binary Search

• suppose we have an array of n values in ascending order, or any other sorted order
• to find element x,
• we can search through the entire array (linear search), or
• we might look in the middle of the array, at index n / 2, and see
• whether x < a[n/2] (condition L)
• if condition L is true, then
• the value must be, if anywhere, at one of the indices 0..(n/2 - 1)
• otherwise, we see whether x > a[n/2] (condition G)
• if condition G holds, then
• the value must be, if anywhere, at one of the indices (n/2 + 1)..(n - 1)
• if neither L nor G holds, x == a[n/2], and our search is complete
• if either L or G holds, we repeat the binary search
• either recursively or iteratively

Recursive binary search

```
/* Returns -1 if not found, or the index of the element if found */
public Type binarySearch (Type value, Type array [], int first, int last) {
int count = last - first + 1;
if (count < 1) {
return null;
} else if (count == 1) {
return value.compareTo (array [first]) == 0) array [first] ? null;
} else { // assert (count > 0)
int middle = first + count / 2;
if (value.compareTo (array [middle] < 0) { // in the first half of the array
return binarySearch (value, array, first, middle - 1);
} else {               // in the second half of the array
return binarySearch (value, array, middle + 1, last);
}
}
}
```
• in-class exercise: what is the runtime of this method?
• remember iterative (looping) binary search
• in-class exercise: what is the runtime of the iterative method?

Maintaining a sorted array

• suppose the sorted array (e.g. "cassette", "cd", "dvd", "vinyl") needs a new element, e.g. "iPod"
• to insert this new element in the correct position,
1. find the location where it would belong, e.g. by modifying above binarySearch()
2. shift all elements after this location up by one
3. store the element in this location
• in-class exercise: what is the runtime of this algorithm?
• suppose the sorted values were kept in a linked list
• in-class exercise: what is the runtime of this algorithm?

Trees

• imagine having to store, in an organized fashion, all the descendants of a given person
• as in this family tree:

image by Derrick Coetzee, in entry " family tree" in Wikipedia
• trees can hold any kind of data, even numbers:

image by Derrick Coetzee, in entry " Tree (data structure)" in Wikipedia
• a node in a tree is similar to a node in a linked list, except:
• a linked list node has a reference to zero or one other link nodes, whereas
• a tree node has a reference to zero or more other tree nodes
• later we will consider how trees are implemented

Tree definitions

• a tree has one root node
• all other nodes can be reached by following links from the root
• each node in a tree has exactly one parent
• except for the root node
• each node in a tree can have zero or more children
• the other children of a node's parent are the node's siblings
• nodes are in a hierarchical relationship:
• node X is an ancestor of another node Y, or
• node X is a descendant of another node Y, or
• node X is on a different branch than node Y
• nodes without children are leaf nodes
• nodes with children and a parent are interior nodes

Tree properties exercise

image by Derrick Coetzee, in entry " Tree (data structure)" in Wikipedia
• in-class exercise: identify the
• root node
• interior nodes
• leaf nodes
• parent of node 6
• children of node 6
• ancestors of node 5
• descendants of node 7

More tree definitions

• a subtree is a node with all its descendants
• the depth of a node is the number of its ancestors,
• e.g. the root always has depth 0:
• some people prefer to say the root has depth 1, but in this class the root always has depth 0
• the children of the root are at depth 1
• their children are at depth 2
• the children of a node at depth n are at depth n+1
• the height of a tree is the maximum depth of any node in the tree

Special trees

• in a binary tree
• each node has at most two children
• similarly, in a ternary tree
• each node has at most three children
• in a balanced binary tree
• the depth of each subtree of every node differs by at most one
• there are also other definitions of balanced binary trees
• in a binary search tree
• each node has a value
• greater than every node in its left subtree, and
• less than every node in its right subtree

Binary search trees

image by Derrick Coetzee and Booyabazooka, in entry " Binary search tree" in Wikipedia
• in-class exercise: how can we check whether this is a binary search tree?
• binary search (for value x) is now much easier:
• if the value in the root is x, we are done
• otherwise, if x < the value in the root, search (recursively) in the left subtree
• otherwise, x > the value in the root, so search in the right subtree
• if the subtree we need to search is empty, x is not in the tree
• in-class exercise: what is the runtime of this algorithm?

Binary tree traversals

• if we want to visit each node in a binary tree, we have a choice of how to do it:
• preorder traversal: visit the root node, then recursively visit the left subtree, then the right subtree
• inorder traversal: recursively visit the left subtree, then visit the root node, then recursively visit the right subtree
• postorder traversal: recursively visit the left subtree, then the right subtree, then the root node
• in-class exercises: do a pre-order, in-order, and post-order traversal of this tree, writing down node values in the appropriate sequence

image by Derrick Coetzee, in entry " Tree (data structure)" in Wikipedia
• in-class exercises (individually): do a pre-order, in-order, and post-order traversal of this tree

image by Derrick Coetzee and Booyabazooka, in entry " Binary search tree" in Wikipedia

Expression trees

• an expression tree is a tree where:
• every node with children is an operator
• every leaf node is an operand
• each operator operates on the values of its children
• in-class exercise: do a pre-order, in-order, and post-order traversal of this tree

Binary search tree properties

• adding, finding (get), or removing a node are all O(depth),
• if the tree is balanced, this is O(log n), with n the number of nodes
• compare to a sorted array, where binary search means finding is always O(log n), but inserting is O(n)
• binary search trees are an efficient way to search data and to sort data, as long as the trees remain balanced
• an unbalanced binary tree might look like a linked list
• in-class exercise: what is the runtime of the get method when the tree is not balanced?
• data can be identified with a unique key
• in-class exercises:
• add everyone's name to a search tree (in groups of five-ten people)
• is the resulting tree balanced?

Binary search tree add operation

• if key is less than the key of the current node, recursively add to the left subtree
• if key is greater than the key of the current node, recursively add to the right subtree
• otherwise, add at this node:
• if there is no current node, create one and return it
• if there is a current node, replace its contents with the new contents
• this assumes:
• the method returns a new root for the new (sub)tree with the desired value inserted
• the caller of this method knows what to do with the new root

Binary search tree remove operation

• find node with the given key
• if the node is a leaf node, just delete it, return null
• if the node only has a left subtree, return the left subtree
• if the node only has a right subtree, return the right subtree
• if the node has both subtrees, must replace it with another node that fits in the slot

Removing a node that has both subtrees

• the rightmost node in the left subtree can be put in place of the current node without altering the sorted property
• likewise, the leftmost node in the right subtree can be put in place of the current node
• that node might have a left (right) subtree, which can be used in its old position
• the rightmost node in the left subtree is the inorder predecessor of the current node
• the leftmost node in the right subtree is the inorder successor of the current node
• so either can be used
• in-class exercise: delete node 17 from this tree:
• in-class exercise (individually): delete node 14 from the original tree
• in-class exercise (individually): delete node 31 from the original tree

Binary node class

• a singly-linked list node has (at most) one reference to another node
• a binary tree node has (at most) two references to other nodes
• for example, see here
• both types of nodes also need a reference to the locally stored value
• data fields are value, left, right
• methods include constructors, accessor methods, mutator methods, toString
• test program builds small tree, tests the methods
• in-class exercise: build the following tree using the methods from class BinaryNode
• in-class exercise: write a recursive static method to print the tree in postorder

Binary search tree class

• for linked lists, we have:
• a node class to define the node of a linked list and the operations on a single node (for example, to print the value of a node)
• a linked list class to define the operations on an entire linked list (for example, to print the entire list)
• likewise for trees, we need a class to define the operations on an entire tree
• different classes can use the same BinaryNode class to provide different operations on:
• binary trees in general
• binary search trees
here, the second class might be a subclass of the first class

Binary search tree implementation

• root is the only data field
• methods include
• the constructors,
• get (to get a specified item from the tree),
• remove,
• toString (inorder traversal of the tree) and pre-order and post-order conversions to strings, and
• main, which exercises the code, ensures basic functionality
• BinarySearchTree.java, uses TreeIterator.java,

Binary search tree get operation

• if searching for a key in an empty subtree, report that the key was not found (base case 1)
• if the key matches the key in the current node, return that value (base case 2)
• if the key is less than the key of the current node, recursively get from the left subtree
• if key is greater than the key of the current node, recursively get from the right subtree
• when getting a value, the key has to have type Type
• however, only the key part of the key object, the part which is checked by compareTo, needs to be initialized

Binary search tree add operation

• if key is less than the key of the current node, recursively add to the left subtree
• if key is greater than the key of the current node, recursively add to the right subtree
• otherwise, add at this node:
• if there is no current node, create one and return it
• if there is a current node, replace its contents with the new contents
• this assumes:
• the method returns a new root for the new (sub)tree with the desired value inserted
• the caller of this method knows what to do with the new root
• in-class exercise: write the code for this method

Adding a node to a binary search tree

• non-recursive public method calls recursive private method (with same name but different signature) with as argument the root of the tree
• the result of the recursive call is the new root
• so if the root was null, the new root is a leaf node
• recursive method base cases:
• item not in tree: return new node
• item found in tree: replace
• recursive method recursion:
• item less than current node: set left subtree to result of adding new item to left subtree
• item greater than current node: set right subtree to result of adding new item to right subtree
• see BinarySearchTree.java

Binary search tree remove operation

• find node with the given key
• if the node is a leaf node (or null), return null (base case 1)
• if the node only has a left subtree, return the left subtree (base case 2)
• if the node only has a right subtree, return the right subtree (base case 3)
• if the node has both subtrees, must replace it with another node that fits in the slot
• that node N is either the rightmost node in the left subtree, or the leftmost node in the right subtree
• claim: this maintains the essential characteristic of a binary search tree
• in-class exercise: is the claim true, and why or why not?
• the rightmost node in the left subtree is the inorder predecessor of the current node
• the leftmost node in the right subtree is the inorder successor of the current node

Removing a node with both subtrees

• node N, the node before (after) the node being removed, might itself have a left (right) subtree, which can replace N in the tree when N is moved to the slot of the node being removed
• in-class exercise: delete node 17 from this tree, replacing the deleted node with the node right before it:

Traversing the search tree

• toString() uses a parentheses notation that is quite messy but can be used to figure out the structure of the tree
• to create a new iterator:
1. the user program calls one of the iterator methods in the class BinarySearchTree
2. the BinarySearchTree iterator method calls the appropriate constructor of the TreeIterator class, with, as argument, the root of the tree
3. the resulting TreeIterator object can be used to traverse the binary search tree
• tree iterator uses a stack (or two) to keep track of where it is in the tree, and which node is next
• by comparison, the recursive (pre-order) traversal done by toString() is much simpler
• the constructors for the iterators all take the root of the tree as a parameter, so they do not need explicit access to the private (protected) class data
• in other words, a tree iterator can iterate over any binary tree, but must be given the binary tree over which to iterate
• a binary search tree class knows over which tree to iterate, and can provide the parameter to the TreeIterator constructor
• this means the iterator() methods must be defined within the BinarySearchTree class or one of its subclasses
• they are the only classes that have access to the root of the tree

Efficiency of binary search tree operations

• most operations have time O(height of the tree)
• in a balanced tree, the height is O(log N), with N the number of nodes
• in-class exercise: why is this true?
• in an unbalanced tree, the height is O(N)
• adding and deleting random data may or may not give the desired result
• adding data in sorted order is guaranteed to give the worst possible result
• traversing an entire tree, of course, is O(n)

Databases and search keys

• databases (e.g. shopping lists, address lists, payroll information) often have one or more identifying items, called keys
• depending on the database, each key may be unique, or each set of keys may be unique
• binary search trees can be used to store objects that have keys
• the compareTo() method should compare just the keys
• the search key uniquely identifies the item (e.g. Social Security Number in IRS records)
• the value of the item can be changed without changing the location of the item
• if the search key is modified in place, the binary search tree is no longer a search tree
• database may refer to the collection of items, to the collection of items stored on disk, or to the program (database program) that accesses the collection