In this assignment, you write a class similar to BinarySearchTree.java but which tracks keys and values separately. Since this class tracks the number of occurrences of words in a document, it is called BinaryStringTree. You then use this class to analyze actual text files.
As well as studying in depth binary search trees, this assignment requires you to review some of the material we have studied earlier in the semester, including reading files and looking at individual substrings in strings.
The instructor's BinarySearchTree.java is parametrized on a comparable type, conventionally named T. Your BinaryStringTree (template here) is not parametrized, and instead, each BinaryNode in your implementation must have a key of type String and a value of type long.
To complete the implementation of BinaryStringTree you must provide at least:
This assignment requires that occurrences and keys be implemented recursively. You do this by creating and using a recursive helper method that does the work.
In the case of occurrences, the helper method must do a search in the binary search tree and return the corresponding value, or 0 if the key is not in the tree.
For keys, the helper method must take the set as parameter and add to the set all the keys found when visiting every node of the tree.
For other methods, you have a choice of using or not recursion, as you prefer.
A key is considered the same if .equals declares it to be the same. This means comparisons are case-sensitive, and "hello" is different from "Hello".
add increments the value if the key was already in the tree, whereas if the key was not previously creates a node for the key if it was not previously in the tree, it adds a new node with a value of 1. Since this is a binary search tree, the new node will be to the left of an existing node if the key being added is less than the existing key, and to the right if it is greater than the existing key. There is a special case when adding a value to an empty tree.
removeOne:
To read the text file and separate it into words (in the constructor), I suggest using a Scanner constructed on a File object that opens a named file. Each word can then be read from the Scanner by calling the Scanner's next() method. Since such words might contain non-alphabetic characters, test each character in the word with Character.isJavaIdentifierStart, and append it to a StringBuilder object only if it is alphabetic. At the end, get the String from the StringBuilder and, if it is not the empty string, add it to the tree. So for example "x,y.z" is added as "xyz", "item1" is added as "item", and "2345" is not added at all.
Create a class TestBST with a main method to test the methods from your BinaryStringTree class.
Your main method must read every file it is given on the command line, and for each print the 10 most frequent words. You must test your method on at least two text files: a copy of the U.S. constitution, and a file of your choice. This copy of the constitution has 1265 unique words (counting upper and lowercase as distinct), and the 10 most frequent words are:
To print the most frequent words, call keys to obtain the keys, save them in an array or any kind of list, then sort them with java.util.Arrays.sort To call this sort method you will have to provide a compare method that calls occurrences to find out which string has more or fewer (or the same) occurrences.
In order for compare to call occurrences, your Comparator class must either be declared internally to the BinaryStringTree class, or its constructor must take as parameter the tree of strings. This is somewhat similar to an external iterator.
Report the height of your tree for the constitution file, and compare that height to the minimum possible height of a perfectly balanced tree, which would have a height of ceiling(log2n). For the constitution's unique 1265 words, ceiling(log2n) = 11 (since 210 = 1024 and 211 = 2048).
Since we have done nothing to balance the tree, it is unlikely that your tree will be balanced, and you should expect significantly greater heights than the minimum. If, on the other hand, you get a height less than 11, it is an indication of bugs in the code.
In addition, convincingly argue that your code maintains the invariant that all values in the tree are 1 or more, and that your constructor establishes this invariant. You may do this by writing your own checkInvariants method and calling it at the end of each constructor and at the beginning and end of each public method, or you may do this by providing a written explanation.
Use Laulima to turn in all your source files. Once you log into Laulima and select the ICS 211 site, on the left-hand side will be an assignments tab. Be sure you are providing all the files needed to compile your code.