Radix Sort Algorithm

Importance of Sorting Algorithms

Searching for a particular item in a collection of data is a very common occurrence for computers. One looks for a name in a phone listing, a specific word in a dictionary and comes across many similar situations when identification of a particular item in a collection is required. If one is faced with a large collection of data, the use of sequential search, which examines items one by one, is going to take a great amount of time and effort. In order to utilize a more efficient search algorithm, such as a binary search, the collection of data that is to be checked for an occurrence of a particular item, would have to be sorted.

Sorting is defined as a process of arranging a list of entries into either alphabetical or numerical order. In addition, the items can be sorted in either ascending or descending order. There are many sorting algorithms in existence nowadays and this report examines in detail one of the most popular and commonly used among them the radix sort algorithm.

Origins of Radix Sort Algorithm

Unlike some other sorting algorithms like Shell and Quicksort, there is no clear indication of a person responsible for the invention of the radix sort algorithm and the specific date of its creation remains unknown. However, the emergence of the radix sort is closely associated with the mechanical sorting of data containing punch cards that were distributed into buckets during a single pass through the machine card sorter. http://www.cs.rutgers.edu/~chvatal/notes/rsort.html

The first tabular machine card sorter was invented by Herman Hollerith (1860-1929), and was initially used in U.S. census in 1890. The project was a complete success: The population count was completed in mere six months, the whole census finished in two years and the cost for the whole project was five million dollars below the estimated. This event is considered to be the first implementation of the radix sort. Herman Hollerith continued his career as the professor of mechanical engineering at MIT until 1896, when he founded a Tabulating Machine Company, predecessor of Computer Tabulating Recording Company (CTR). He was a consulting engineer with CTR until retiring in 1921. In 1924 CTR changed its name to IBM- the International Business Machines Corporation.http://museum.nist.gov/panels/conveyor/hollerithbio.htm

The Basic Principle of Radix Sorting

Unlike most of the other sorting methods that treat numbers to be sorted as whole, the radix sorting algorithm works with individual digits of the numbers, or in the case of strings, with individual alphabetical characters. There are two essentially inverse types of Radix sorting:

1)MSD the most significant digit radix sort, and

2)LSD the least significant digit radix sort.

The MSD examines the most significant digit first, traversing the items in a left to right order, while the LSD works in the exact opposite manner, examining the least significant digit first and working from right to left. The following is a simple demonstration of both the MSD and the LSD radix sorting:

LSD -- least significant digit sorting:

Take the least significant digit (byte)of each key;

Sort the collection of items according to that digit, but keep the order of items with the same digit;

Repeat the sort with each more significant digit;

Example: 250, 041, 025, 080, 002, 033, 701, 077;

Sorting by least significant digit (1s place) gives:

250,080,041,701,002,033,025,077

Sorting by next digit (10s place) gives:

701,002,025,033,041,250,077,080

Sorting by most significant digit(100s place) gives:

002,025,033,041,077,080,250,701;

MSD most significant digit sorting:

MSD is usually written recursively and it goes as follows:

Take the most significant digit (byte)of each key;

Sort the collection of items based on that digit, grouping items with the same digit into one bucket;

Recursively sort each bucket, starting with the next most significant digit;

Arrange the buckets together in order;

The MSD sort is most commonly illustrated by the example of sorting the letters into piles in the post office according to the first digit of a five-digit zip code.

In addition, due to attempts to make the radix sorting as efficient as possible new kinds of radix sorting algorithms were developed: Binary Quicksort and Three-Way radix Quicksort are just two among many variations that are in use today.

Binary Quicksort is also known as the radix exchange sort. It sorts data by making two groups, one whose keys start with 1s and the other in which the keys start with 0s.it proceeds by sorting the files independently. It scans the collection from left to right until it encounters a 1, and from right to left until it finds a 0,which results in exchange of those two keys. The process is completed when the two pointers cross.(Sedgewick p.423)

The three-way radix Quicksort is a combination of an ordinary Quicksort and the MSD radix sort. In essence, doing three way radix Quicksort amounts to sorting the file on the leading characters of the keys (using Quicksort), then applying the method recursively on the remainder of the keys.(Sedgewick p.435)

Radix Sort Algorithm Java Source Code

/**

* This code is from the book:

*

* Winder, R and Roberts, G (1998)Developing Java

*

Software John Wiley & Sons.

*It is copyright (c) 1997 Russel Winder and Graham Roberts.

*/

package ADS ;

import java.util.Vector ;

/**

* Sort an array of int using Radix Sort (also has been

* referred to as Digit Sort). This is an O(n) sort.

*

* @version 1.0 19.5.97

* @author Russel Winder

*/

public final class RadixSort

{

/**

* The per object sort operation.

*

* @param v the array of int to be sorted.

*

* @param n the number of digits in each sort key.

*/

public final void sort(final int[] v, final int n)

{

execute(v, n) ;

}

/**

* The statically accessible sort operation.

*

* @param v the array of int to be sorted.

*

* @param n the number of digits in each sort key.

*/

public static void execute(final int[] v, final int n)

{

//

// Set up the numberOfPossibleDigits (10) pigeonholes into

// which we will put numbers. Have numberOfPossibleDigits

// array of Vectors.

//

Vector[] holes = new Vector[numberOfPossibleDigits] ;

for (int i = 0 ; i < numberOfPossibleDigits ; i++)

{

holes[i] = new Vector() ;

}

//

// For each digit in the sort key do a pigeonhole stuffing

// exercise.

//

for (int i = 0 ; i < n ; i++)

{

int divisor = (int)Math.pow(numberOfPossibleDigits, i) ;

//

// For each number to be sorted.

//

for (int j = 0 ; j < v.length ; j++)

{

//

// Put the number in a pigeonhole determined by the

// value of the digit in the slot we are currently

// sorting on.

//

//holes[(v[j] / divisor) % numberOfPossibleDigits].

// addElement(new Integer(v[j])) ;

}

//

// Merge all the entries in the pigeonholes in order int

// a single list ready for the next pass. Reset the

// pigeonholes ready for the next pass as well.

//

int index = 0 ;

for (int j = 0 ; j < numberOfPossibleDigits ; j++)

{

for (int k = 0 ; k < holes[j].size() ; k++)

{

v[index++] = ((Integer)(holes[j].elementAt(k))).intValue();

}

holes[j].removeAllElements() ;

}

}

}

/**

* A partially gratuitous constant which is here to move

* "magic" constants in the code. Hopefully this makes things

* more comprehensible, certainly it will make it easier to

* change. Of course, the number will only change if someone

* changes the Arabic number system!

*/

//

// NB This has to be static in order for the static method to be

// able to access it.

//

public final static int numberOfPossibleDigits = 10 ;

}

Big (O) Analysis and Performance of a Radix Sort Algorithm

As previously stated, a radix sort algorithm manipulates only the constituents of an item, and never deals with an item as a whole. In the commonly accepted computing phraseology the items are called keys which can be decomposed in a sequence of fixed sized pieces or bytes(R.Sedgewick. p.445)

The radix sort algorithm is of O(2(n)(d)) order of magnitude whereas n is a number of keys to be sorted and d represents the length of a bytes that constitute the key. The algorithm needs to make n moves in order to form groups of keys using either the MSD or the LSD method and then another pass requiring n moves in order to combine the groups into one resulting sequence again. It needs to be pointed out that in case that all the keys are consisted of an equal number of bytes, there is no need for comparison. In short, the radix sort algorithms are of a linear order of magnitude and can be optimized to work as O(n);

The worst case scenario for the radie sorting is when it needs to examine all the bytes in all the keys. (Sedgewick p.444) This observation follows from the fact that time taken is at most proportional to the number of digits in the input. Thus, the worst case is achieved when all the keys are equal.

Comparison and Contrast of the Radix Sort Algorithm and Other available Algorithms

Advantages

The radix sort, as efficiency is concerned, is generally superior to other sorting algorithms such as the Selection Sort, Insertion Sort, bubble Sort, Quicksort, Treesort, to mention just a few. Reasoning behind this bold statement can be explained by the mere comparison of the orders of magnitude. Whereas the radix sort is linear, all the above mentioned sorting algorithms have a O(nư) order of magnitude.

Another advantage is that not exchange items, except in the case of the radix exchange sort and that the radix sort efficiency does not depend on the size of the input. With the later statement one needs to be careful because it applies only in the case when the number are not extremely long and when the radix is not larger than the file. Some other method should be used for smaller files.(Sedgewick 445). In short it does not matter how large the number of keys to be sorted, as long as the number of bytes that consist the keys is not too large.

Another good point that needs to be mentioned about the radix sort is that is very straightforward and easily explained in the classroom. Moreover it is quite different from all other sorting algorithms and therefore easily remembered. For me personally although the selection, bubble and insertion sort are called simple sorting algorithms it was strenuous to distinguish between the three, whereas there was no confusion about the working principle of the radix sort.

Disadvantages

Despite its being known as on of the fastest performing algorithms, the performance can be slowed down significantly if the bytes are very long or not equal in length. In the latter case, additional test statements are necessary to be added to the program, which in return increase the number of the operations needed and make the program less efficient. Another argument that arises in disfavor of the radix sort is that it was created to deal with digits and characters and is unpractical for dealing with any other type of data. Therefore it is hard to create a general-purpose radix sort algorithm, like the omni-purposeful Quicksort algorithm.

I hope that in this report I was able to satisfactorily present the advantages, disadvantages and the importance of the radix-sorting algorithm.

Bibliography:

Sedgewick, Robert. Algorithms and Data Structure in C++. Third edition. Princeton Press: 2000.

Carrano, F. & Prichard Janet J. Data abstraction and Problem Solving with JAVA;Walls and Mirrors. First Edition. USA: 2001.

http://www.umiacs.umd.edu/research/EXPAR/papers/3548/node9.html

http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Sort/Radix.html

http://www.aihorizon.com/essays/basiccs/lists/sorting/radix.htm

http://www.wikipedia.com/wiki/Radix_sort&action=edit

http://www.codercorner.com/RadixSortRevisited.htm

http://museum.nist.gov/panels/conveyor/hollerithbio.htm

http://arianna.dei.unipd.it/corsi/FondInf/source/ADS/

.