To search is nothing in painting. To find is everything. -- Picasso
The metaphor of the Internet is a World Wide Web (WWW) inhabited by spiders. The evolution of the Web is a living ongoing real-time example of Darwinian evolution. The spiders stored away in alcohol-filled half-liter bottles in museums were born before man; the spiders of the Internet are in our lifetime. The Net is evolving so rapidly that the time between generations is almost as short as a generation of mosquitoes. What a beautiful specimen the Internet is!
We are all familiar with the famous taxonomy from biology (1996 World Almanac, p. 192):
("Kinetic Purple Cats Over Fence Gates Soar")
"Kingdom" originally consisted of plants and animals. Bacteria complicated the picture and by now there are anywhere from 8 to 30 kingdoms. This classification scheme has itself evolved into cladistics and cladograms -- words which were born near the time of the birth of the Internet. The origin of the science of cladistics dates back to a World War II prisoner of war held by the British in Italy -- Willi Henning (Sue Hubbell, "How taxonomy helps us make sense out of the natural world"; Smithsonian, May 1996, pp. 140-151). The words cladistics and cladograms date back to 1966.
Cladistics is a biological classification scheme, based on phylogenetic (evolutionary) relationships. A cladogram is a branching tree which diagrammatically displays similarities, differences and time patterns (evolvement). Rather similar to time-series cross-sectional analysis of economics. Phylogeny is the evolutionary history of a group whereas ontogeny is the evolutionary history of the individual. They are themselves related in the saying "ontogeny recapitulates phylogeny" (much criticized, however; see, for example, Stephen Jay Gould, "Freud's Phylogenetic Fantasy: Only great thinkers are allowed to fail greatly"; Natural History, December 1987, pp. 10, 14, 16, 18, 19).
Jonathan Coddington, chair of the entomology department and curator of spiders at the Smithsonian's National Museum of Natural History, believes that "Good taxonomy...has predictive value." (Sue Hubbell, op.cit., p. 143).
He has what he calls "The matrix" -- an arachnoid cladogram which displays 354 characteristics along one axis and 139 genera of spiders along the other axis. A grand total of 49,000 cells which can be analyzed digitally to form family trees ("cladograms").
Which brings us to the family tree of an important segment (kingdom?) of the Web -- the Internet Search Engines. The spiders of the World Wide Web are computer programs that travel the Internet, visiting sites, retrieving documents. and retrieving all documents mentioned in those documents. The process continues until eventually all documents which are cited by other documents will be discovered. These will be gathered into an indexed database. A search engine is a program that searches these databases for the keywords or phrases that they contain (Richard Peterson, "Harvesting Information from the Internet Using Search Engines," April 1996. On-line. Available at http:www2.hawaii.edu/~rpeterso/harvest9.htm).
In my earlier article ("Internet Search Engines," March 1996. On-line. Available at http://www2.hawaii.edu/~rpeterso/engine_.htm), I divided search engines into four basic categories:
Within the four-fold classification scheme itself, however, there are time-dimensioned cladograms available for examining the cross-section of search engines, however defined. As a simple example, consider just the six primary search engines as comprising the rows of the matrix. There could be over a hundred columns delineating their various characteristics such as
The title of the table itself is up for grabs. In what I have just presented, it was "Primary Search Engines." But that is just a subset of Search Engines -- both primary and niche. There is, then, a family of search engines.
The Internet, however, consists of many more things than just search engines. The Internet has many kingdoms. It has many spiders, ants and worms. It has many human beings in many countries. It has directories which are like the table of contents of millions of documents. It has search engines which are like indexes to all the documents. It is a wild and woolly world which needs a map. It needs, of course, a family of maps. It needs the clarity of classification scheme. It does not need to wrestle for centuries for a model to use, as did biology. Their current standard is the science of cladistics. Hence, the "biology of the Internet."