ELECTRONIC TERM LIST (ETL)
A PROPOSED TOOL FOR CONCEPTUAL AND TERMINOLOGICAL ANALYSIS
By Fred W. Riggs
This is a confidential draft -- readers are invited to send comments and suggestions to the author . All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the author. Copyright by Fred W. Riggs. Patent Pending.
ABSTRACT: The systematic understanding and use of concepts and terms in any field of discourse can be enhanced by the use of a new kind of computerized tool called an Electronic Term List (ETL). This tool - there can be many lists -- will enable writers to recognize relevant terms by having them highlighted, and then clicking on them, as desired, to find definitions and texts in which they are used. An augmented version of the ETL will enable users to open ordinary dictionary definitions of the highlighted term (if there is one) in order to compare lexiconized senses of a word with the glossed text of a concept definition. An interactive version of the ETL will enable users to input new concepts and terms. Shared use of the tool by members of a user community will support evaluation, revision, consensus building and utilization of conceptual and terminological innovations. The philosophical rationale for developing and using an ETL is provided at: http://www2.hawaii.edu/~fredr/nomen.htm However, the methods proposed here are quite different from those originally outlined in paragraphs 3 and 4 of this text.
CONTENTS:
1. Terms and Words: the basic distinction
2 Term Recognition: the marking of terms included in
an ETL
3 List Generation: the production of entries in an
ETL
4 Types of Lists: product differentiation and
different ETL functions
5 User Communities: different levels and groups to
use an ETL
6 Management of Lists: the production and
distribution of ETLs
ANNEX: A LEXICAL THESAURUS
PRE-TEXT: When Alice reversed a text by viewing it through her mirror, she found this puzzling piece:
Readers can find the full text on screen by going to: http://encyclopediaoftheself.com/classic_books_online/lglass18.htm and then using the FIND command to look for "JABBER". Here is how Alice reacted: `It seems very pretty,' she said when she had finished it, `but it's RATHER hard to understand!' (You see she didn't like to confess, even to herself, that she couldn't make it out at all.)
The underlined words in this text are hyperlinks to definitions in the Encyclopedia of the Self. Readers can find the definition of understand in this encyclopedia by clicking on: http://www.selfknowledge.com/102768.htm. Background suggestions about how to understand definitions and relate them to one's self can be found at: http://www.selfknowledge.com/insdic.htm In case this seems too esoteric, think about how it relates to our ability to understand the meanings of words as they are used by scholars and specialists in computer programming.
You will find these examples and many more at: http://users.erols.com/amato1/AC/Reg.acr.html. How can we understand and learn to use such expressions more expeditiously. This paper outlines a methodology and computer program designed to help us fund answers.
#1. Terms and Words The logic of listing terms hinges on a basic distinction between terms and words. Since this distinction is not well understood and our vocabulary blocks understanding, some preliminary comments about the subject are necessary. Words as linguistic units are entered in dictionaries as entry words. Such words normally have a number of meanings - they can represent more than one concept. By contrast, a term is a word that represents only one concept. A word, therefore, may be viewed as containing several terms.
#1. Terms and Words
The logic of listing terms hinges on a basic distinction between terms and words. Since this distinction is not well understood and our vocabulary blocks understanding, some preliminary comments about the subject are necessary. Words as linguistic units are entered in dictionaries as entry words. Such words normally have a number of meanings - they can represent more than one concept. By contrast, a term is a word that represents only one concept. A word, therefore, may be viewed as containing several terms.
#1A. Terms
A big stumbling block stems from the fact that we typically equate terms with words. Here I propose a fundamental distinction: whereas words can and usually do represent more than one concept, a term, by definition, represents only one concept - because this point is not well understood, it often happens that one word (term-form) contains two or more terms, a point to be explained below (at #2C) in more detail.
To clarify the word/term distinction, it is useful to introduce the notion of a form. Both words and terms have lexical forms - they are the visual patterns used to represent them. A spell checker on one's word processor (Microsoft WORD, or WordPerfect, for example) focuses on lexical forms, helping us spell them correctly. A spell checker is based on a word list - such a list contains all the forms we use to write a word. Interestingly, some words are homographs - two different words spelled the same way. Lexicographers write them as two different dictionary entries - see, for example,
The first link defines fast(1) meaning rapid movement and the second lists fast(2) as referring to the act of not eating. The dictionary recognizes one form (fast) as two words, and accordingly gives it two entries. Each word, of course, can represent several concepts. If you go again to fast(1), you will see that this word can represent 14 different concepts when used as an adjective, and 7 more when used as an adverb - fast(2) also has several meanings, but not many.
Extending this information to the identification of terms, we can say that the first word (fast(1)) contains 21 terms - each of its senses constitutes a term. The first sense, for example, is defined as acting or moving rapidly: acting, functioning, or moving quickly, or capable of doing this: a fast car.
If we were creating a term list, we would want to have 21 term entries for fast(1). Unfortunately, this point is not understood by those who compile glossaries, the special language dictionaries that focus on terms. They normally provide separate entries for each term-form regardless of the number of terms it houses. This contributes to confused thinking because it tends to fuzz up important conceptual distinctions. To illustrate this point, please go to the Computing Dictionary http://www.instantweb.com/D/dictionary and type mirror in the query window. You will then find a single entry with the following two definitions:
Of course, mirror is a word with several senses - six are reported on the Encarta Dictionary, not including the two identified above in computing language. (NOTE: All lexical definitions cited in this paper will be from this dictionary - other electronic dictionaries are available, but it is convenient to rely here on just one of them) I believe the status of these terms would be more clearly recognized if they were presented in two separate term-entries, one for the storage sense and a different entry for the networking sense. Each would then constitute a single term.
The logic of this proposal will become more apparent if we introduce term-form to refer to the form used to represent a concept. In the example of mirror, one term-form is used to represent two terms. Each term consists of a term-form followed by a concept description. I use description in preference to definition because, unlike a dictionary entry, this is not an explanation of what a word means. Rather, it is an independently described concept represented by a deliberately selected term-form. In this case, two concepts were identified, but the same term-form was appropriated for both of them. Because the lexicographic format was borrowed in the design of the Computing Dictionary, the misleading format of a dictionary entry was adopted.
#1B. Words
The basic difference between terms and words may be easier to understand if we make a sharp distinction between two types of dictionaries. The definition of dictionary found in the Encarta Electronic Dictionary - see: http://dictionary.msn.com/find/entry.asp?search=dictionary - contains six different meanings for this word. Among them two are relevant here:
We may use two different words to represent these two meanings of dictionary: I shall use lexicon for the first and glossary for the second. Dictionary will be used equivocally for both concepts when it is not important to make this distinction. However, for present purposes the lexicon/glossary distinction is basic. Moreover, we can say that information about words presented in a lexicon is called an entry, and the data about terms found in a glossary is referred to as a gloss. One enters words in a lexicon, after which they have been lexiconized. One glosses terms in a glossary, after which they become glosses. This distinction is basic to the distinction between word lists and term lists. To stress this distinction, please remember:
Notice that words in this usage refer to a concept that is both broader and narrower than what we usually mean by this word.
Narrower: All the words in a word list are content words, which means that they represent concepts. This means it must excludes words that do not represent concepts, such as the names of individuals (persons, places, organizations, etc.) and function words, i.e. prepositions, conjunctions, pronouns, and auxiliary verbs. This narrows the definition of the words included in a word list.
Broader. Words in a word list may have a form that requires only one space (as in affixes), two spaces (our usual notion of a word) or more than two spaces (as in phrases). Lexicographers recognize this usage when they speak of an entry word or a head-word. You will find thisghy definition at: http://dictionary.msn.com/find/entry.asp?search=head-word : word or phrase that forms a heading at the start of a text and is usually printed in distinctive type, especially a main entry word in a dictionary. As this definition clearly shows, word is used in the Encarta Dictionary to include phrases. Indeed, it gives the first sense of word as: a meaningful sound or combination of sounds that is a unit of language or its representation in a text. Interestingly, as one may see by going to: http://dictionary.msn.com/find/entry.asp?search=word
Encarta does not identify a sense of word that refers only to a unit of speech separated, in writing,
by spaces. Since main entries in a dictionary include many affixes, we must class them as words,
even though the Encarta definition fails to mention them.
This broadens our usual understanding of word. Word lists composed of words in this sense found
in the spell checkers and thesauruses used regularly by authors writing with a word processor like
Microsoft WORD, or WordPerfect. A word list may be viewed as a companion to these familiar
tools. It offers a third set of options: instead of correcting the form of a word, or listing synonyms,
a word list provides definitions for the listed words.
To summarize:
There are different kinds of terms, however, and we may now identify them in the context of a discussion of the process of recognizing terms by electronic means.
#2. Term Recognition
The main heuristic feature of an ETL is its ability to highlight terms in order to remind writers to look up a definition. Writers normally consult a dictionary only when they are puzzled about the meanings of a word. They do not look for the definition of words that they understand quite well, forgetting that they might have different meanings for their readers. This is especially true when terms take the form of familiar words. The more clearly writers understand the status of the terms they use, the more effective they will be in communicating to the audiences they want to reach. In order to understand how an ETL can help writers recognize the status of terms and take steps to clarify their meanings we need to consider three status differences applicable to all terms.
We can discuss them under these headings:
#2A.. Neologisms vs. Lexiconizations
Many terms are lexiconized and may be found in ordinary lexicons, although the word-oriented format of a dictionary entry tends to obscure them. For example, consider gerontology, a term glossed by Jary and Jary (The Harper Collins Dictionary of Sociology) as meaning: the study of aging and of elderly people. Compare this gloss with the definition of gerontology found at http://dictionary.msn.com/find/entry.asp?search=gerontology in the Encarta Electronic Dictionary: : the scientific study of aging and its effects. In this instance, we may say that the meaning of this term in Sociology is virtually identical with its lexiconized sense in ordinary English. My guess is that quite a few terms used by sociologists are lexiconizations whose meanings can also be found as senses of words found in lexicons. Having a word list conveniently available for double-checking terms found in a term list will enable writers to determine whether or not they are lexiconizations. If the word list is linked to the term list, the comparison can be made very easily, but even if one has to consult a published or electronic dictionary separately, it is not difficult to make this comparison.
By contrast with lexiconizations, many terms are neologisms, which simply means that their meanings are not given in a dictionary, they are unlexiconized. Consider, for example, the word hermeneutics. First, go to the following dictionary definition in the Encarta Electronic Dictionary at: http://dictionary.msn.com/find/entry.asp?search=hermeneutics. If you click on this link, you will find that hermeneutics is defined only in the sense of interpreting texts, especially in the Bible, or that of explaining religious concepts. However, when one turns to the gloss for hermeneutics in Jary and Jary, one finds the following description: a theory and method of interpreting human action and artifacts.
This is a new concept not included in the Encarta Dictionary. The term, hermeneutics, as used in Sociology is, therefore, a neologism. Since this word is often misunderstood, please turn to its Encarta definition at: http://dictionary.msn.com/find/entry.asp?search=neologism new word or meaning: a recently coined word or phrase, or a recently extended meaning of an existing word or phrase Careless readers may think neologisms are always new words (coinages or neoterisms) but, as in this example, they are actually, and no doubt more often, new meanings assigned to old words. To establish the fact that they are neologisms, it is important to be able to compare glosses with relevant dictionary definitions which is why we need to link a word list with the term list.
The most important function of a term list is, precisely, to identify unlexiconized terms - i.e. terms that are neologisms. Writers can assume that most readers with easy access to a lexicon can determine the intended meanings of words, but they may well not understand the meaning of neologisms. When the term-form of a neologism is a familiar lexiconized word, readers may be truly perplexed - e.g. thinking that a sociologists talking about hermeneutics is referring to the interpretation of biblical texts. An ETL may contain lexiconized terms but they should be marked as such. However the main emphasis in an ETL will be on neologisms, i.e. unlexiconized terms. Such terms may be metaphors or coinages.
#2B. Metaphors vs Coinages (Neoterisms)
The gloss for hermeneutics in Jary and Jary explains that the new usage derives from the term for interpreting biblical texts. All such neologisms derived from existing senses of a word may be thought of as metaphors. Turning to the dictionary definition of this word in Encarta, at http://dictionary.msn.com/find/entry.asp?search=metaphor one finds that the word is defined as: implicit comparison: the application of a word or phrase to somebody or something that is not meant literally but to make a comparison, It may be metaphoric to extend this definition to include the extension of the established senses of a word to support a new meaning - I believe cognitive psychologists have done this. For present purposes, I feel justified in using metaphor to characterize all neologisms that assign new meanings to lexiconized words. Many if not most of the technical terms glossed in dictionaries for Sociology are probably metaphors - they involve the use of familiar words in new senses. Hence metaphors express unlexiconized senses of lexiconized words.
Sometimes the new (unlexiconized) meanings are so similar to established (lexiconized) senses of a word that the metaphoric shift is quite obvious. We readily understand that role as defined in theatrical performances can be applied metaphorically to expected patterns of behavior in social interaction. Actually, this new meaning has already been lexiconized as one can see by consulting a dictionary. However, in many cases the metaphoric usage is not quite so obvious. Consider, for example, the following definition of aggregate found in Gordon Marshall's Dictionary of Sociology: An audience or crowd may be said to be an aggregate in so far as its members lack any organization or persisting pattern of social relationships.
By contrast, if one consults Encarta at:http://dictionary.msn.com/find/entry.asp?search=aggregate one will find that in addition to mineral mixtures or ingredients of concrete, aggregate refers to: sum total: a total or whole made up of different parts from often disparate sources: The political party was an aggregate of many diverse groups. Perhaps the semantic difference here is slight, but one may argue that social relationships persist in a political party, but not in an audience or crowd, and that this difference is theoretically significant. A sociologist talking about aggregates may be thinking about temporary groupings, excluding amorphous but stable aggregations such as those found in political parties. An ETL will prod writers to compare the lexiconized definition with the metaphor before deciding whether, in context, the intended audience will understand that this word is being used in a way that might exclude political parties.
It often happens, no doubt, that the lexical understanding of a word differs only slightly from the metaphoric meaning intended in a technical context. However, metaphors frequently involve so drastic a semantic shift that readers will be truly baffled to understand a word if no explanatory comments are offered. Consider, for example, the word ontology, taken from computing language, as glossed at: http://www.instantweb.com/D/dictionary/foldoc.cgi?query=ontology&action=Search Here one finds that this word has three meanings:
The second and third senses are clearly radical metaphors that require explication if they are to be made intelligible to most audiences. Such radical metaphors merge into the realm of coinages, neologisms that take the form of newly coined words. Sometimes these words masquerade as metaphors. For example, in computing language bit is used, according to the Online Computing Dictionary, http://www.instantweb.com/D/dictionary/foldoc.cgi?query=bit, to refer to the amount of information obtained by asking a yes-or-no question; One might assume that the word is a metaphor of bit as used to mean a small piece of something, but as this Dictionary tells us, it's an acronym for binary digit that evolved over a lunch table [in 1949] as a handier alternative to "bigit" or "binit"
A clearly modern coinage is byte, although the word has already become lexiconized. One may find a definition and comment on its origins at: http://www.instantweb.com/D/dictionary/foldoc.cgi?query=byte. The term is defined as: A component in the machine data hierarchy usually larger than a bit and smaller than a word; now most often eight bits and the smallest addressable unit of storage. Among the various hypotheses about how this term was coined, the easiest to remember is that it is an acronym for Binary Yoked Transfer Element . More importantly, this example illustrates the ease with which newly coined terms can become lexiconized and accepted as words that would appear in our word list.
Whether a term is a coinage or a metaphor is not really important. What is important is to recognize that readers need to understand the meaning of a term whenever it is not familiar or readily determined by consulting an ordinary dictionary. Having the word list as a complement to the term list will enable us to recognize neologisms, whether they be newly coined words, or metaphors whose relevant meanings remain unlexiconized. Lexiconized or not, neologisms may not be understood unless explanatory comments are added to a text. The utility of using a term list will become apparent when it helps writers take pains to make sure that their intended audience clearly understands the relevant meanings of the specialized terms they use.
#2C. Equivocal vs. Unequivocal Terms
To conclude our discussion of problems involved in the recognition of terms, a comment is needed to explain the difference between equivocal and unequivocal terms and its significance for an ETL. As noted above, all terms are in principle unequivocal - they represent one and only one concept. However, in practice it often happens that the same term-form (word) is used to represent more than one concept - we may refer to them as equivocal terms. When an equivocal term is included in a term list, it should be marked so that users will recognize when they respond to a prompt that there is a problem - perhaps a question mark (?) could follow the term.
To illustrate the problem and its solution, consider the definition of mirror given above. In that example, we saw that one concept is identified with hardware and the storage of data, while the other involves the networking of files. In a term list for computing terminology, the entry for mirror might read, do you mean hardware mirror or network mirror? The user would then be given the option of hyperlinking either entry to get further details.
A sociological example illustrates how complex the analysis of equivocal terms can be. Three different sociological glossaries handle class in quite different ways. Jary and Jary provide a clear distinction between six concepts: (1) social stratification; (2) a particular position in a class system; (3) occupational categories; (4) open systems based on social mobility; (5) Marxian concept based on class conflict; and (6) Weberian analysis oriented to life chances. Accepting this scheme, the term list could provide six terms with hyperlinks to relevant definitions and texts.
Other sociology dictionaries adopt different modes of analysis and definition of class. For example, the Penguin Dictionary of Sociology (Abercrombie, Hill and Turner) offers an encyclopedic article, 7 pages in length, that discusses various interpretations and controversies found in the sociological literature on class, including an extended list of related entries. However, they do not underline the 6 concepts identified in Jary and Jary entry. By contrast, the Oxford University Press Dictionary of Sociology (Gordon Marshall) has no general entry for class, but lists 13 related entries, including class awareness, class consciousness, class imagery, class interest, class position, middle class, underclass, and working class.
The Iverson electronic Sociology Dictionary - see: http://www.iversonsoftware.com/sociology/class.htm - offers this definition: A large category or group of people within a system of social stratification who have a similar socioeconomic status in relation to other socioeconomic segments of the society or community. It does not try to identify the range of concepts represented by class, but it does offer a short discussion of the Marxian mode of analysis of class relations and conflicts.
Let us now view the treatment of class in a general dictionary. go to Encarta at:http://dictionary.msn.com/find/entry.asp?search=class. The entry for class offered here identifies 14 concepts that can be represented by this word, including three that seem to fit clearly among those defined in the sociological dictionaries. On the basis of this sample, there are significant differences in the way different sociological glossaries handle class, and the concepts they identify seem to be at least partially lexiconized. Perhaps the entry for class in a term list ought to provide some broad (fuzzy?) links for the definitions that fail to distinguish clearly between the different concepts this equivocal term can represent. It could also include more precise links for each of the specific concepts identified in the Jary and Jary dictionary. Writers consulting this array of overlapping information should be able to indicate what they have in mind when they use this word. Clearly there is little agreement among sociologists on exactly what they have in mind when they speak of class, but this very terminological fuzziness may prod users of a term list to explain what they have in mind and then use a comment or two to make sure their readers will understand their intentions.
To summarize, a term list needs, first, to identify lexiconized terms whose meanings a reader can discover by consulting an ordinary dictionary. However, much more importantly, the term list will focus on neologisms, i.e. terms used to represent concepts (word senses) that are not included in an ordinary dictionary.
Among them, it might be interesting to distinguish between metaphors and coinages, but, in the long run, this distinction will not turn out to be very significant. Since newly coined words will, in due course, become lexiconized, the distinction itself is quite ephemeral. Most of the neologisms used in special languages (like Sociology or Computing) use lexiconized words but assign them new meanings - they are metaphors. Coinages, like bit and byte, will soon become lexiconized.
A special problem that arises in the interpretation of terms involves those that are equivocal, i.e. where a term-form can represent more than one concept within the same context of discourse. Suggestions about how to deal with them are offered above. Let us now turn to a discussion of the procedures required to generate term lists.
#3. List Generation: the Production of Entries in an ETL
A fundamental distinction needs to be made between the way word lists and term lists are generated. Let us take up each in turn.
#3A. Word Lists
Well-established electronic word lists already exist in the form of spell checkers and thesauruses as posted on such word processors as WORD and WORDPERFECT. A general word list for world English could be prepared by posting on it all the terms found in a good lexicon. A list of more than a quarter million English-language words can be found and accessed from WordIt – see: http://www.hotdiary.com/wordlist.html. This is a commercial service available for a modest fee. The Wordlist Project also has a lengthy list of words in many languages – see: http://wordlists.security-on.net/download.html. The Dictionary and Glossary List at: http://www.onelook.com/browse.shtml claims to have a list of 3919357 words based on data from 740 online dictionaries. Of course, that is in many languages, but queries can be answered for any selected language, including English. The CalendarHome project has a long list of English words designed for the use of anyone doing crossword puzzles: http://www.calendarhome.com/wordlist.html. One can actually see this list right away. –All the others require preliminary steps.
Indeed, many word lists already exist and it would be possible to generate the kind of list proposed here, although it might be a costly and perhaps redundant process. On further reflection, I believe a different approach needs to be considered and I believe it is more suitable for our needs than an actual word list – it should also cost a lot less to implement!. I'm thinking about a new kind of "thesaurus" that could be linked with a metacrawler designed to search dictionaries for entries headed by the search word. My tentative suggestion is to call it a lexisaurus, formed by melding lexical and thesaurus. Coining a new term seems to be justified because this is, indeed, a new concept. Moreover, since word list is already used extensively for a variety of products --– some of them support computer hackers! -- it seems wise to avoid using this phrase in a new sense here. As I visualize the process, instead of looking for lists of words, we should just search the Internet for electronic dictionaries. They would contain not only a search word but also its definitions. I have compiled a selective list of some of them at: http://www2.hawaii.edu/~fredr/sites.htm#gen.
Of course, other more complete lists are also available. The Encyberpedia list of dictionaries, encyclopedias and thesauri on the Internet is quite comprehensive and provides a basis for selecting works that might be searched by a lexisaurus. See: http://www.encyberpedia.com/glossary.htm . Another outstanding resource is yourDictionary.com, available at: http://www.yourdictionary.com/languages/germanic.html#english. It lists many English-language dictionaries on the Net, plus a large number of specialized glossaries and multi-lingual and mono-lingual dictionaries in various languages.
Among the electronic dictionaries, the most familiar name may be The Merriam Webster Collegiate Dictionary which offers a quick reference tool, coupled with a thesaurus for synonyms. Go to: http://www.m-w.com/netdict.htm. A much more comprehensive lexicon for English is the classic Oxford English Dictionary at: http://dictionary.oed.com/entrance.dtl This work, however, is only available on a subscription basis. The Encarta Electronic Dictionary, posted at: http://dictionary.msn.com, claims to be a dictionary of World English, based on data from around the globe. By contrast, Webster focuses on American usage and Oxford on English as written in England.
Some innovative electronic dictionaries offer features not found in traditional lexicons. Among them, a particularly useful product is the WordNet, described as a lexical database for the English language. It was produced by the Cognitive Science Laboratory at Princeton university. Open it at: http://www.cogsci.princeton.edu/cgi-bin/webwn.
Another very innovative electronic dictionary for English is the Wordsmyth Educational Dictionary-Thesaurus. It can be found at: http://www.wordsmyth.net. A special feature of this work that extends its utility is its to search for dictionary entries by clicking on words found in a text, thereby expanding the lexisaurus scope when it hits on Wordsmyth.
Because all the lexiconized words of a language are contained in its lexicons, we need look no further to find information about any lexiconized word. The search would be launched by clicking on any word in a text. A search engine that combines the lexical data contained in dictionaries with the hyperlinking technology of a thesaurus will produce all the results I have proposed for a term list. The electronic dictionaries hyperlinked suggest the possibility of creating not just one word list, but rather a word list system. Such a system would enable users to consult any or all of the available electronic dictionaries.
Instead of posting all the words on one list, we would get more same benefits at a fraction of the cost by using a hyperlink tool, a lexisaurus. This tool would provide hyperlinks to the available electronic dictionaries (lexicons) using a format similar to what one finds on Google, Lycos, Metacrawler, Northern Light, or IxQuick. –Anyone can find alist of them at: http://www2.hawaii.edu/~fredr/sites.htm#search. This method will give users a choice of different dictionaries to consult on-line, and they could compare the entries in each of them if they so desire.
To recapitulate, let us imagine that all content words written on a text could be automatically linked, on command, to a simple instrument that would first copy the word and then hyperlink it to each of several electronic dictionaries. From this text one could jump to any one of the dictionaries and promptly find the relevant word entry. As noted above, these dictionaries may be freely available to anyone, as are the Encarta, WordNet, and Wordsmyth lexicons - or it may be available only to subscribers, as is the Oxford English Dictionary. Users could decide for themselves, if they wanted to consult the OED and whether to pay for its use. This knowledge will enable authors to make informed decisions about how to improve the likelihood that their readers (ranging from specialists to the informed public) will correctly understand what they have in mind.
Moreover, this tool will not only permit users to find the dictionary entries and texts in which the words they write have been defined and used, but it will also provide the reverse data: in other words, it will inform writers that a word they have written has not been lexiconized. Although this will not prove that the intended concept cannot be found in a dictionary, it will support the proposal to add a new word. For example, I have searched for lexisaurus and cannot find it in any dictionary. Until someone offers contradictory evidence, this provides support for my suggestion to add it to our lexicon. This point also paves the way for us to turn, now, to a discussion of basic and complementary project recommended here: namely, the preparation of term lists. For further information about the lexisaurus concept, and some supporting data, please go to the Annex to this paper.
#3B. Term Lists
A term list, by contrast with a word list, will contain only terms used in a specified field of discourse. There can be as many term lists as there are domains in which special languages are used and glossaries are needed.. Terms for a term list can be posted from any glossary, such as those identified in the Encyberpedia list. Many glossaries have been published as books, including those for Sociology mentioned above, but they are usually unavailable on-line. An extensive list of bi-lingual and specialist dictionary published by the Oxford University Press can be found at: http://www.oup.co.uk/dictionaries. I have posted a list of sites for many social sciences glossaries at: http://www2.hawaii.edu/~fredr/sites.htm#spe. In addition, of course, a term list can be created from scratch by posting new terms in any field of knowledge - and, as explained at #5E, publishers of ordinary dictionaries, textbooks and other books will also find good uses for their own term lists.
An ETL can be produced quickly by posting on its software program a list of terms glossed in one or more of these glossaries - some of those now available for Sociology are noted above in #2b. Of course, it would be necessary to work with the editors and publishers of these works not only to get their permission but also to secure their active cooperation. I believe an ETL in their field of interest can be quite advantageous for them so it should be easy to establish good working relationships.
Three different levels of posting data from a glossary should be distinguished.
This option would promptly give users minimal information about each concept, but it would also motivate them to hunt for more details which could only be found in the complete glossary - ideally, a copy of it will sit on the desk of all sociologists. The intermediate level seems to me to be quite practical and also highly advantageous for both publishers and the users of an ETL. Of course, it will not be possible to decide how to proceed until after consultations with the owners and creators of each target glossary have been held.
#4. Types of Lists
Three types of term lists may be generated in response to different needs: they are basic, augmented, and interactive. There are also different kinds of term lists - they are explained below in the context of comments about the augmented term list.
#4A. Basic Term Lists
The basic ETL may only contain the terms for a selected field of interest. If possible, it should also contain brief definitions. However, it will include hyperlinks to the full definition and relevant documents illustrating the use of the concept. Term entries will also contain additional terms as explained below. To open the term entry, users will click on any word in their text that is highlighted.
The ETL will highlight any term used by an author that is included in the list. If the list is on while an author is writing, highlighting will occur as soon as the author writes the term. However, it will also be possible to activate the list after a draft has been prepared. It will then highlight all the listed terms simultaneously, permitting the writer to scroll through the draft to view the marked terms and check their definitions as desired. This option will permit an author to check listed terms as soon as they are written, but some writers may find that the highlighting of terms is distracting. They will be able to delay checking on the meanings of these terms until after they have finished writing.
Each marked term will be a hyperlink to the term-entry where the listed term is posted, together with relevant information of various kinds. The most immediately relevant data will be definitions of the listed terms, plus links to documents illustrating their use. If possible, these links will promptly open the definitions whenever they are available on the Net. However, when definitions are not available electronically, the term entry will hyperlink bibliographic information about the glossaries where the definitions can be found. Similarly, each term entry should hyperlinks texts where listed terms have been used. Again, for documents not on the Net, bibliographic references should be hyperlinked. The goal will be to help writers make their intentions more understandable to an anticipated audience. In some cases, they may feel that there is no need to do anything, but they will also have the opportunity to insert explanatory words or comments, or substitute other terms, as they choose.
In addition to the main term identified in each term entry, there should be relevant linked terms, each of which will have its own hyperlinks. Several types of linked terms may be posted: equivalent, entailed, and related.
#4B. Augmented Term Lists
An augmented ETL will provide links to a word list containing a comprehensive list of words used in English (or any other available language). The relevant entries in the word list will be automatically opened whenever the term list highlights a listed term. In addition, every word used in an author's text that is included in the word list may be accessed by a mouse click. It will open a window that reveals the dictionary definition of that word. The technology to do this is already available to anyone, without charge, through the Wordsmyth Search Button, available at: http://www.wordsmyth.net/ptbookmark.html. This new technology enhances current practice which normally only permits users who have opened an electronic dictionary to plug in a word and find its definition. The word list, as illustrated by the Wordsmyth resource, will cut out the intermediate step that requires users to open a dictionary site before they can look for a word's definition. Reference to a dictionary definition of any word will enable users of the ETL to compare the concept descriptions found in their special language glossary with the definitions available to everyone as a dictionary user. The example using aggregate, as explained above, shows the utility of this process.
Many other uses can be found for a word list in addition to the augmentation of term lists as described here. These applications will be especially valuable for dictionary, thesaurus, handbook, and textbook publishers. A few suggestions are offered below at #5E. They will be explained in more detail in a supplementary document. Let us now turn to the interactive form of term list.
#4C. Interactive Term Lists
An interactive ETL will contain input forms that enable users to enter new concepts and terms, and it will support the development of discourses among users designed to expand the contents and utility of the term list. Information provided by users on the input forms will be made available to umpires and to all members of a discourse community by means of listservs and web sites. Feed-back and archived comments will help members of that community utilize the innovations they find useful - and the relevant term list can be revised to incorporate new information. Procedures will also be included in the interactive instrument to support recommendations based on the list to be forwarded to the editors of glossaries and dictionaries for addition to their corpus of reference material. We may expect that some at least of these recommendations will be accepted, and will lead to the addition of new entries in revised versions of their publications, both on paper and through the Internet.
Merriam-Webster maintains a page for users to submit entries for inclusion in a personal "dictionary" compiled by the user. See: http://www.wordcentral.com. It may be possible to work with Merriam-Webster to borrow and adapt their program for the purpose of developing an interactive ETL. Perhaps there are other such resources that might also be relevant and applicable. We should investigate all such possibilities before deciding to do the programming needed to create such a utility de novo.
Before anyone proposes a new concept and term, the would-be innovator needs to canvas the field to determine, if possible, that the proposed innovation is, indeed, new. Anyone who offers a new term and concept that is not actually new is likely to be ridiculed and ignored. Unfortunately, it is not always easy to determine the status of ostensible innovations, especially because the main method for checking involves hunting for established words in alphabetical lists, as in existing dictionaries and glossaries.
There are many universal classification schemes such as Dewey, Library of Congress, Bliss, the Universal Decimal Classification. A list of links to classification schemes can be found at: http://www.fbi.fh-koeln.de/labor/bir/thesauri_new/classif.htm.
In many fields there are special classification schemes and some of
them can be viewed on-line. A good example is the Mathematics Subject
Classification, sponsored by the Ameican Mathematical Society. It is
available at:
http://www.ams.org/msc.
A classification scheme for Information Science, developed by Dagobert
Soergel, can be found at:
http://www.iud.fh-darmstadt.de/iud/wwwmeth/publ/example/werkz/risclass/menu1.htm
These schemes usually have two fundamental limitations for our purposes: first because they were designed to locate books on library shelves, they presuppose that every concept needs to be located at only one place in the scheme. This limitation is unnecessary for concepts which often can be located at different places. Moreover, they also presuppose the need for a single framework for classifying entities, but it seems much more useful to accept a variety of different classification schemes to accommodated the needs of different perspectives, schools of thought, or disciplines.
A classification scheme designed to arrange concepts rather than books can be found in Roget's Thesaurus. It contains synonymies and is usually viewed as just a word-finder. However, at least in principle, any concepts can be located at some point in this scheme. It may be found at: http://www.thesaurus.com/Roget-Outline-Top.html. In practice, I doubt that many people rely on this classification scheme which seems to be rather antiquated. Instead, they tend to hunt for concepts (actually they hunt for words) by tracking them through synonymies. To see how this works, go directly to: http://www.thesaurus.com./Roget-Alpha-Index.html.
An imaginative approach to finding concepts has been developed by Henry Burger in The Wordtree: http://www.wordtree.com/. According to a blurb on this site: you simply look up any part, any fragment of the term you are looking for. It will offer you many leads in different directions. Follow the direction that sounds closest to your idea, and, depending on how precisely you want the expression, you''ll run into it. One needs to be aware of the fact that this work focuses on transitive verbs (process words), and in order to use it one must buy the book.
The most specific and relevant classified context for concepts can often be found in an indexing language thesaurus. A good example is the http://www.csa.com/helpV3/sociothes.html.Thesaurus of Sociological Indexing Terms. This text contains alphabetical list of descriptors used to characterize sociological abstracts and texts. Although it does not present a formal classification scheme, its listing of broader terms (BT) narrower terms (NT) and related terms (RT) would enable anyone to construct a scheme placing these terms in a systematic array.
For our purposes a better scheme to look at is the The ACM Computing Classification System at: http://www.acm.org/class/1998/homepage.html An overview of the two top levels of this scheme is presented at: http://www.acm.org/class/1998/overview.html. Clicking on any topic in this list will open lists of relevant sub-topics. The arrangement strikes me as ideal for any subject field, each of which may need to have its own scheme.
Help for the development of an ETL may well be secured from some members of the International Society for Knowledge Organization (ISKO) which has a home page at: http://www.ceit.sk/wwwisis/iskoabo.htm. The basic principles required for the organization of knowledge rest on the premise that it is possible to classify concepts and fields of research in a systematic way, and then use this knowledge to find and use the kinds of knowledge that anyone may need. The ISKO site, unfortunately, has but recently been established and, as yet, lacks links to the kinds of information that we might want to use. Until better schemes for classifying and retrieving concepts become available, the best way for any innovator to determine whether or not proposals for the recognition of new concepts are truly significant will be a "seat of the pants" approach that can be founded on the basis of discourse among members of a user community.
To explain this option, we turn next to a discussion of these communities.
#5. User Communities
An ETL can serve a wide range of communities that we can visualize under several headings: disciplinary, sub-disciplinary, inter-interdisciplinary, personal and proprietary. Since proprietary organizations seek to profit from their enterprise, it seems reasonable to expect them to pay a fee or royalty for the right to use the patented soft-ware prepared for the creation of term lists. However, since our aims are essentially scholarly and academic, I hope it will be possible to let non-profit groups and individuals at all levels receive and use the soft-ware without charge. A brief statement about each of these levels follows, but more details will be provided in supplementary documents.
#5A. Disciplinary Communities
This paper originated in the context of a panel planned for the world congress of the International Sociological Association in Brisbane, July 2002. In the hope and expectation that the ISA will officially sponsor an ETL for sociologists, we may take this discipline as a prototype. However, the principles mentioned here apply equally to Political Science, Economics, Anthropology, Psychology, Geography, Communications, Journalism or any other social science discipline - actually, they apply also with equal force to the natural sciences, humanities, engineering, health and medicine, librarianship, business, architecture, computer science, and many other fields.
In most disciplines one will find glossaries (often called "dictionaries") that list and define a large number of terms used by its members. The Dictionary of Sociology (Oxford University Press) edited by Gordon Marshall, now contains, in its second edition, 2,500 entries. The Harper Collins Dictionary of Sociology (edited by David and Julia Jary) contains more than 1,800 entries. Many of these entries are encyclopedic in scope, presenting not only definitions but scholarly discussions of the origin and use of terms. Understandably, because of the costs involved in compiling and publishing such works, their sponsors are reluctant to give their contents away free of charge which is why they have not been posted on the Internet. Nevertheless, there are at least two sociological dictionaries on the Net: Iverson's at: http://www.iversonsoftware.com/sociology and the Athabasca University Dictionary of the Social Sciences, at: http://datadump.icaap.org/cgi-bin/glossary/SocialDict. The former contains about 130 terms and the latter some 1000 entries for several social science disciplines. The point is that although it will be possible to hyperlink some definitions electronically, definitions for most sociological terms can only be found on paper.
The proposal to create an ETL for Sociology under the sponsorship of the ISA needs to take these facts into consideration. However, I believe it is possible to design an electronic list of terms used in the sociological dictionaries that will actually increase rather than diminish the income of their publishers. It will provide bibliographic citations for the books and hyperlinks for the available electronic definitions. Users who want to find the definitions will, therefore, be inclined to buy the glossaries - relying on copies found in a library will simply be too time consuming to be acceptable.
If we imagine a list of terms used by sociologists and managed by or on behalf of the International Sociological Association, we can then suggest several aspects of its operations. These correspond to the distinction made in #4 above between basic, augmented, and interactive lists. At a minimum, the Sociological ETL will contain a basic list of terms used by sociologists in their professional work - the entry terms will be supplemented by equivalent, entailed, and related terms as also explained above. Every term found in each entry will be followed by hyperlinks that take a user to definitions (in published books or on the internet) and to linked texts that illustrate the significance and use of that term. Valuable as this information will be, it will be incomplete without two kinds of additional data that community members can obtain through the augmented and interactive versions of the ETL. I shall use member here to mean any member of the community using an ETL who seeks and uses this additional information.
The augmented ETL will contain a word list that enables members to find definitions for all the words used in the English language that are contained in an electronic word list. English is specified here because we are working in English and we need a foundation, but once established, the same principles can be applied to parallel utilities in any other language users may want to use. The word list will enable members to compare the term definitions found in glossaries to the senses of any word as entered in a dictionary. This will help them decide how to write clearly enough to let their readers know whether they are using words for a meaning that anyone can discover by looking in a dictionary, or whether they have special meanings that their readers may not be familiar with. The ETL can help members determine the semantic status of a term only by making such comparisons.
The interactive ETL will further supplement and refine the data available to members by enabling them to enter new information on an input form. Such information can include neologisms, whether as new senses of familiar words (metaphors) or as new term-forms (coinages) used to represent new concepts. However, it may include any other information or comments deemed useful to fellow sociologists. For example, it could include evaluations of any established term or definitions with proposals for revisions, qualifications, or amendments.
Data entered on the input form will be processed at several levels. At the first level, it will remain the property of the author and will simply be added to that individual's records - an analogy can be seen in the way a spell checker adds new words to a writer's list without affecting the contents of the general list used by everyone else. However, at any time the member can decide to forward the information to the whole community of sociologists - e.g. through an e-mail message to the ISA secretariat for forwarding to all members. Before doing that, however, the member might decide to consult with other members by seeking their views - preferably a list of umpires who agree to facilitate the process. The manager of the ETL for Sociology could then decide, on the basis of recommendations received, whether or not to add the new term to the List.
#5B. Sub-Disciplinary Communities
Organized disciplinary associations typically maintain a set of organized research committees or special interest groups. For the ISA, there are currently 53 such sub-disciplinary communities, as listed at: http://www.ucm.es/info/isa/rc.htm. Among them, RC35 is the Committee on Conceptual and Terminological Analysis (COCTA). This paper is being prepared for presentation at a session sponsored by this committee, and we may expect that COCTA will play a leading role in the establishment and testing of the ETL concept. However, as soon as possible, all ISA research committees should be invited to participate in the project as sub-disciplinary communities. Each such community would be entitled to have its own ETL, and to create a list of terms that have a special meaning for its members.
They would not need to duplicate information contained in the general ETL for Sociology, however. Although they would be using many of these terms, they would rely on the disciplinary list for general support. This would enable them to concentrate on additional terms (not yet included in the general list) that they use in their own field of specialization. The procedures to be followed in managing and using a sub-disciplinary ETL would be the same as those used in the pan-disciplinary project. However, the standard against which sub-disciplinary terms would be compared should include the total sociological list - in addition, of course, to the word list for all English speakers. Moreover, using the interactive form, any member of a sub-disciplinary group could recommend terms for addition to the whole body of terms used by sociologists, following procedures mentioned in the previous section.
#5C. Personal Lists and Inter-Disciplinary Cooperation
Many creative scholars work in cross-disciplinary contexts that draw on information and concepts used in the established disciplines yet often transcend them. They have been described as hybrids, and they are often seen as the most innovative minds. Many of the intellectual giants whose work paved the way for development of our established disciplines can be so characterized. The ETL project needs to take into account the potential for useful conceptual and terminological work by individuals not working in any institutionalized context. Accordingly, in addition to making the ETL program available to disciplinary and sub-disciplinary groups, we should support the use by creative individuals of a personal ETL.
Moreover, there are organizations like the International Social Science Council that promote research that crosses disciplinary boundaries. Creative scholars often play a leading role in such efforts - they find the inter-disciplinary milieu more hospitable to their work. Moreover, the current explosion of global networks and the inclusion of many scholars from third world countries means that the basic concepts of our academic disciplines which trace primarily to Western (or European) experience need to be augmented by new ways of thinking that are better adapted to contemporary global realities. Evidence for this evolving trans-disciplinary and evolutionary perspective can be found in the proliferation of new international associations for a wide range of subjects or fields of interest. See the list of international social science associations found at: http://www2.hawaii.edu/~fredr/sites.htm#assoc. The associations on this list that are marked with an asterisk (*) are members of the International Social Science Council and may be viewed as representative of the established disciplines. However, many other names on this list belong to non-members of the ISSC and, in many cases, represent newly-formed hybrid fields. All of them, as international associations, are products of globalization as it has grown since the end of World War II some half century ago.
Of course, there are many other associations outside the social sciences that might also be interested in using an ETL. An easy way to form an impression of the scope of these fields is to go to http://www.unesco.org/general/eng/programmes/index.shtml This site lists and links a wide range of disciplines and fields that are the subject of attention at UNESCO. Without further comment, let me just note that even as the ETL program should be made available to creative individuals, so it should also be used by all communities of scholars as they are organized globally and as they exist within and cutting across all the established disciplines.
#5D. Information Science, Librarianship and Terminology
There are a range of official and non-profit organizations whose activities and interests hover between non-profit academic concerns and profitability. Their products are often marketed in competition with the those of commercial firms. We may class them under two sub-headings: first, the specialists who use thesauri and subject heading lists for information handling, especially in libraries; and second the terminologists and knowledge engineers who emphasize harmonization and standardization of terms as tools for the efficient organization of knowledge.
Subject Headings and Information Systems Thesauri. Librarians and information specialists often prepare lists of preferred terms to be used in the management of information, library cataloging, and data retrieval. Although these lists are usually computerized, they are not easily available on the Web. They may or may not have programs to help users find the preferred terms when different semantic equivalents first come to mind. The proposed ETL technology would be capable of doing this for them. Here are a few examples with their Web Sites.
The classic Subject Heading list is maintained by the U.S. Library of Congress and it is very widely used. -- information about it can be found at: http://lcweb.loc.gov/cds/lcsh.html However, almost all of this information is only available for a fairly stiff fee. The list of Subject Headings, in various forms, is announced at: http://lcweb.loc.gov/cds/lcsh.html#lcsh20 The text at this site reads: The big red books. Library of Congress Subject Headings, 24th edition (LCSH 24) is the standard subject heading list for thousands of libraries as well as for a multitude of printed indexes. LCSH is the most comprehensive list of subject headings in print in the world -- the one tool no librarian should be without. Provides an alphabetical list of all subject headings, cross-references and subdivisions in verified status in the LC subject authority file. It is available in 5 volumes for $275. in the U.S., of $325. elsewhere. Exceptionally, the LC makes openly public its alphabetical list of form subdivisions for use with its subject headings. This list can be found at: http://www.lib.usm.edu/~techserv/cat/formsubv.htm< However, this list only includes the standard terms -- it offers no "USE" or "USE FOR" help for those who may not know what to call a category or what other terms or synonyms might be relevant.
The University of Michigan Documents Center, under the rubric Library of Congress Suject Headings offers a list of LC headings used for Political Science -- but it comments, It does not recognize the term "Political Theory to "Political Science" -- what the list shows is that anyone writing under the heading "political theory" -- a very large literature -- would not find a place for this topic in the LC system. For details, go to: http://www.lib.umich.edu/govdocs/pstheory.html. I have been unable to discover if the LC or any other authority or center for librarianship has a tool that would enable authors to determine promptly and easily what term to use as a subject heading. An ETL for catalogers would surely be useful. If they have an equivalent service already, I've not been able to find out about it.
As for thesauri, -- as indexing languages, used for information retrieval, not as sets of synonyms -- they are normally designed to provide lists of preferred terms for more narrowly defined fields of interest which means that they usually provide cross-references from non-preferred terms. However, I find no evidence that they have automated terms lists (ETL) that would help writers find the preferred terms listed in a thesaurus. An example of an elaborate IS Thesaurus can be found at NASA (U.S. National Atmospheric and Space Administration -- see: http://www.sti.nasa.gov/thesfrm1.htm In addition to their own thesaurus, this NASA site provides links to several related non-NASA thsauruses.
For a good list of indexing thesauri, organized by subject fields, see: http://www.fbi.fh-koeln.de/labor/bir/thesauri_new/thesen.htm#lenoch. This social science information center in Germany does a lot of innovaive work and may be a good location for help in developing the ETL project. The OECD, also based in Germany, maintains a Macro-Thesaurus that may be accessed at: http://www.iud.fh-darmstadt.de/iud/wwwmeth/LV/WS97/im3/GrpCordis/thes.htm
In the UK, The (HASSET) Humanities And Social Science Electronic Thesaurus can be found at: http://155.245.254.46/services/zhasset.html It supports extensive and important information retrieval activities. If these organizations have not developed an equivalent of the ETL themselves, they should surely be interested in using its technology once it has become established.
Terminology and Knowledge Engineering. An important field of work that overlaps information science focuses on the development and standardization of specialized vocabulary in many fields of knowledge. A global network of specialists in this domain has been established with its focal Web site at: Go to: http://linux.infoterm.org/index.html
The International Information Centre for Terminology provides a bibliography of works relating to Terminology, including a few glossaries and thesauri in selected fields. Some data based on their work can be found at: http://linux.infoterm.org/infoterm-e/i-infoterm.htm. The Association for Terminology and Knowledge Transfer has become a leader in organizing terminologists and facilitatiing their work through a variety of specialized research groups. Details can be found at: http://gtw-org.uibk.ac.at
The International Standards Organization, through TC37 on Terminology, has sponsored a variety of projects that link computerization to the development of terminological standards -- see: http://linux.infoterm.org/iso-e/i-iso.htm Although they emphasize computer applicaitions in the production of standards, including standardized terminology, they do not appear to provide links for writers and users of specialized terminology -- a core function for the proposed electronic term list (ETL).
The Congress on Terminology and Knowledge Engineering, Innsbruck, August 1999, brought together a truly learned assembly of specialists who delivered papers on many dimensions of the theme, including some that seem to verge on the processes anticipated for an ETL. A list of the papers can be found at: http://gtw-org.uibk.ac.at/tkeproceed.html. So far as I can see, however, there are no projects like the ETL that develop methods for helping writers use terms more effectively, as they have been elaborated in this paper.
#5E. Proprietary Organizations
In addition to all the non-profit scholars and scholarly communities mentioned above, there are a large number of organizations motivated by market forces. Among them, there are some in which there will be a genuine interest in the use of ETLs. Among them, an outstanding example can be found in the publishers of dictionaries. The Cognitive Science Laboratory at Princeton University has posted Word Net http://www.cogsci.princeton.edu/~wn/online, a "lexical database for the English language." Most traditional dictionaries, however, rely exclusively on paper-based publications, though there seems to be a growing trend to post some of them on the Internet. However, all publishers are reluctant to abandon their markets which may now, at least to some degree, be captured by publishing on CD-ROMS.
Nevertheless, a few general language dictionaries, in English, are now available on-line. A very interesting and useful example is the Encarta Electronic Dictionary at: http://dictionary.msn.com This product is part of a large package of services offered by MSN and Microsoft. Another fascinating and important electronic dictionary, the Wordsmyth: the Educational Dictionary-Thesaurus. It can be found at: http://www.wordsmyth.net/home.html The Merriam-Webster Collegiate Dictionary (and also the Thesaurus) are available without charge at: http://www.m-w.com/dictionary. Also visit the American Heritage Dictionary of the English Language at: http://www.bartleby.com/61. Other electronic dictionaries are offered on a subscription basis. An outstanding example can be seen in the venerable, comprehensive and authoritative Oxford English Dictionary, at: http://dictionary.oed.com/entrance.dtl An extensive list of dictionaries, in many languages, including specialized glossaries, can be found at: http://www.yourdictionary.com
On the basis of a little exploratory work, my impression is that at least some of these dictionaries would welcome the additional capabilities provided by an ETL and would be willing to subscribe or pay for the service. In each instance, the ETL would call the attention of writers to lexical information they would be unlikely to find or even look for in a published dictionary. Several possibilities come to mind.
International English. At international meetings
sponsored by the UN, UNESCO, and many other global organizations one hears
English spoken as the lingua franca. However, many words that are
familiar to native speakers of English in America, England, Australia, or
South Africa would not be easily understood in these international
settings. A list of the English words most often used at international
meetings would surely have a market - it would not only serve
native-speakers of English seeking to address an international audience,
but it would also become a powerful tool for non-native speakers English
as a second language. It could easily supplement or even replace the
Cobuild English Dictionary for Learners of English: http://titania.cobuild.collins.co.uk/catalogue/gem.html
This work offers
"key vocabulary" defined as all the most important English words
defined in clear and simple language. In addition, there is the Longman
Dictionary of American English : A Dictionary for Learners of
English; the Cambridge Learner's Dictionary; and the
American Heritage Children's Dictionary.
Basic English was designed to be an international language using less
than a thousand words. Information about the system and its word list can
be found at:
http://www.diac.com/~entente/basicpg.html. Grady Ward's Moby
identifies several different wordlists, some very long. See:
http://www.dcs.shef.ac.uk/research/ilash/Moby/mwords.html.
Consider also the Mnemonic Encoding Wordlist at:
http://www.tothink.com/mnemonic/wordlist.html , and there are many
wordlists for translators that identify words used in various languages.
The U.S. Information Agency has published a "special language" list of
about 1400 words that are widely used internationally – it was designed
for use in Voice of America broadcasts – see:
http://www.rick.harrison.net/annex/specialeng.txt. Unlike most of the
other lists, this one is immediately viewable.
These works list basic or much used words but, clearly, many technical
and difficult terms are often used at international meetings, while some
of the simplest words may never be heard. The vocabulary of international
English, therefore, differs significantly from the list of words for
children or learners of English, or even for international broadcasting.
Although these dictionaries and lists might provide a good starting point
for a dictionary of international English, my guess is that speakers at
international meetings will use words not found in Cobuild, or
other learners' dictionaries, and many of the vocabulary items found in
these works are not often used by internationalists. At least, this is an
open question that should be answered by analysis of documents posted by
participants in international meetings. ETL will provide the ideal
vehicle for helping writers select words that can be well understood by
sophisticated international audiences - they do not need definitions, and
writers will scarcely take the time to look for these words in an ordinary
dictionary -- but if words not on the international list are highlighted
as they write, they are more likely to replace or comment on them.
Problem Words. The usage of many familiar words is problematical, even for native speakers of English.. The Encarta Dictionary makes a point of discussing them in usage essays inserted in the text following entries for them. In the introduction to this work, the distinction between house and home is discussed as an example. A few more examples, picked at random from the text, include dinner, lunch and supper; avoid, evade, and elude; guy; hardly; do; and done. Other dictionaries, of course, have similar user helps. However, users are unlikely to look up words like this when they feel, rightly or not, that they know quite well how to use them. An ETL for problem words that highlighted them on the page where one is writing would ask authors to think about the correctness of their usage. Since usage notes are scattered throughout a long dictionary, it is not easy to find them. They could easily be cumulated in an ETL and made available, perhaps for a fee, to writers eager to avoid blatant usage mistakes.
Homonyms. A Dictionary of Homonyms and Homographs is described at: http://rogersreference.com, but one must pay for the book to see its contents. A list of homographs that differ in pronunciation can be found at: http://www.marlodge.supanet.com/wordlist/homogrph.html. Such words, however, are not really problematical since users can easily distinguish them from each other since they are not homophones. The really troubling problem involves true homonyms - words that are both written and pronounced the same way. Strangely, I have been unable to find any good list of these homonyms. Therefore, an ETL that identifies all the homonyms used in English would be quite useful. Writers may even be unaware of the fact that different words that are identical in spelling and pronunciation. An example is fast, mentioned above in #1. The Encarta dictionary mentions bow: it explains that bow(1) means bending; and bow(2) refers to the front of a ship. It adds a third word which does have a different pronunciation, bow(3) refers to a looped knot. No doubt there are not many homonyms but writers would be well served by an ETL that listed them and called attention to their differences. Since they would be highlighted on the screen, writers would notice them whereas when they rely only on printed dictionaries, they are very likely to be overlooked, especially because most of them are very ordinary words. An ETL for homonyms would perhaps look like a plaything for children - but it could really help all English learners. Moreover, having an ETL for children would tap into a large market.
Glossaries. Much more seriously, note that many published glossaries would be more useful if they were linked to an ETL. No doubt many technical terms that defined in glossaries do not generate ambiguity because they are either coinages or lexiconizations. Newly coined words will not be misunderstood by readers who either know what they mean or cannot understand them. Terms that are lexiconized can be found in dictionaries where the appropriate sense is defined. However, metaphors that involve the use of familiar words for unlexiconized concepts are troublesome and often generate ambiguity. Yet users of a published glossary are unlikely to be aware of them - they may scarcely recognize their existence. We may call them unlexiconized metaphors. An ETL could highlight them as one writes, calling attention to possible ambiguity if readers assume they mean what they ordinarily mean, yet the author had a different concept in mind. If a list of unlexiconized metaphors contained in a glossary could be offered to those who purchase the book, they would surely be grateful. However, my guess is that they will not remember to consult the list regularly. However, if these terms were to be highlighted on the screen whenever authors were using that special language, they could easily make certain that their intentions were well understood. One way to provide an ETL for users of a glossary would be to mount the information on a CD-ROM and include it as a bonus in the relevant glossary. Indeed, one often finds such an insert in books designed to help computer users. Why no add them to glossaries for a wide variety of subject fields? The publisher might raise the purchase price to cover the extra cost, but perhaps the increased sales such a supplement would generate could actually make it cost effective to include the CD-ROM as a no-cost benefit.
Textbooks and Handbooks. There are, I believe, various kinds of non-lexical publications that would benefit from the use of an ETL. For example, many handbooks and textbooks provide systematic information about a subject field and use printed indexes to help users find the page or section where a particular topic is discussed. However, to find the relevant passages, a user must consciously think about the index term and look for it in the index. An ETL could easily be designed that would simply list the index terms found in a book and highlight an author's text whenever one of these terms was used, including a hyperlink to identify the page number where the subject is discussed. Again, it would surely be cost effective for a publisher to include a CD-ROM with an ETL as part of the package delivered to a buyer of the relevant book. Since the cost of indexing these books has been covered in the production process, it would cost very little to post the index in an ETL offered to purchasers of the book as a free bonus - and the value of the book for both buyer and publisher would thereby be significantly enhanced.
With a bit more reflection, I believe various other kinds of products could be identified where the addition of an ETL would be a profitable supplement. For example, bi-lingual dictionaries help translators find equivalents for all the words they want to use. However, an ETL could highlight those words whose translation is likely to be especially problematical and little noticed. This is especially true of words that seem to be equivalents yet contain traps for the unwary. A bi-lingual dictionary accompanied by an ETL for these troublesome terms would surely be much appreciated and no doubt purchasers of bi-lingual dictionaries would be grateful and willing to pay more for the product. Perhaps enough has been said to support the design and marketing of an instrument capable of supporting the development of an ETL. I expect that the income generated by sales to commercial publishers would be enough to cover the cost of making the same instrument available free of charge to non-profit scholarly groups and communities.
#6. Management of Lists
The considerations mentioned above seem to justify the expectation that this idea should be patented and made available through the University of Hawaii Research Corporation. On the basis of preliminary conversations, including with Bloomsbury Publishers of the ENCARTA Dictionary, and the Oxford University Press, I believe the concept is viable and will generate income for the University and, perhaps also, for those who participate in developing the project.
There are also scholars at the University of Hawaiiin the departments of Information and Computer Sciences, Communications, Library and Information Studies, and Information Technology Management. They support a joint Ph.D. program in Communication and Information Science, directed by Professor Rebecca J. Knuth. Through this program it should be possible both to find qualified persons to develop the program for an ETL, to manage the product, and also promote scholarly research on the application and use of this instrument as a tool for improved communication and information management.
As the project evolves, we can expect non-profit associations and research committees, like those identified above in the International Sociological Association to become actively involved and perform much of the necessary developmental work on a voluntary basis. Moreover, it could well be feasible to engage scholars who work as consultants for published dictionaries to lend a hand. For example, the contributors to the Encarta and other dictionaries, and even more, those who help develop scholarly glossaries, are themselves scholars motivated as much by academic concerns as by financial rewards. In addition to helping develop conceptual and terminological data for ETL projects, they might have a special interest in providing links for publications, especially texts available on the Internet, that would illustrate the use and intellectual value of specific concepts.
It may be premature to do more than mention these possibilities here. The main purpose of this paper has been to explain the rationale and functional value of developing electronic term lists at various levels for diverse communities. As soon as concrete progress has been made in launching this project, we should be able to visualize and actualize the steps required to manage it efficiently and make it worth while for everyone concerned.
ANNEX: A LEXICAL THESAURUS
Proposal for a New Kind of Electronic Tool to Help Writers
This annex supplements the main argument concerning the design and functioning of term lists by providing further details about its companion instrument, a lexisaurus -- see Word List
A New Type of Thesaurus. The existing thesaurus design can be modified by adding a metacrawler that would search dictionaries for definitions of any lexiconized word written by an author. Just as the Microsoft Thesaurus permits writers to highlight any word as they write and click on the menu (or Alt F1 key) to find a list of synonyms for each of its senses, so in the proposed new tool, a kind of lexical thesaurus, writers will open a list of hyperlinks to the entries in different dictionaries where that word has been defined. The tool could also identify terms for the different senses of the word , as they are now presented in the Microsoft Thesaurus, and by clicking on any one of them, find its definitions in different dictionaries, of open its synonyms and find their definitions in different dictionaries also. For present purposes let me propose a coinage for this tool because, I believe, it is really a new concept.. I shall call it a lexisaurus, a blend formed by melding lexical and thesaurus.
The goal of the lexisaurus will be to enable authors to take advantage of all the available electronic dictionaries to find their definitions not only of the word that the author has written, but also definitions for its synonyms, including synonyms for all its senses. Of course, the process would have to be selective - the user could only click on one word at a time, but the tool would open lists of words, each of which will be a hyperlink to definitions in different dictionaries, plus synonyms and hyperlinks to relevant texts to illustrate usage.
The Established Thesaurus. A benchmark for evaluating the lexisaurus project can be established by looking at how an established thesaurus handle information about a word. I shall use an example, class, to see what happens when it is viewed in a thesaurus and what would happen if it could be examine through the eyes of a lexisaurus.
Let us first view the WORD thesaurus view of class. This instrument opens three windows. The first, called looked up, repeats the search word. The second window, called meanings, opens a lists of six terms representing different senses of class. They are:
A third window opens for synonyms of each of these terms. If one clicks on any of the synonyms, this word will shift to the looked up window, and the process is repeated, indefinitely. No definitions are provided but the system provides many words whose meanings overlap each other, and one can view the results simultaneously in three windows, a design that helps one visualize conceptual linkages. Essentially, the thesaurus focuses on words and provides some information about how they are defined and what other words are closely related to them.
The WordPerfect thesaurus is essentially similar in design but somewhat more complex. It has four windows permitting the simultaneous viewing of three hierarchic levels. Moreover, it usually offers phrases rather than single words to characterize each sense of the entry word. Again, clicking on each of the senses opens a list of synonyms and clicking on any one of them puts it into the first window where the process continues. The entry for class lists six senses of the word, 4 nouns and 2 verbs.
If we click sense #1, we will get the following synonyms"
If we click again, say on the first of these words, order, we will open 15 senses of this word with brief phrases to characterize each. The first of them is, a class defined by attributes possessed by all its members. Opening this word, we get 18 words, starting with cut, kind, type, nature...
Again, opening cut we will find quite a few senses, each briefly defined by a phrase. The process continues indefinitely and a host of linked concepts and terms can be identified by this tool. Comparing the two instruments, the WORD tool is simpler and the WORDPERFECT instrument provides more information.
Dictionaries vs. Thesauruses. Readers may well think that this is a wonderful resource and it could scarcely be improved upon. However, browsing almost any dictionary entry will reveal much more information about the meanings of words, and their relationships to linked words.
All electronic lexicons (general dictionaries) provide defining texts for the different senses of their entry words, and they often lists synonyms as well, with different degrees of linkage information. They may also give examples to illustrate usage, and provide encyclopedic information about the history and development of concepts.
The Encarta Dictionary To illustrate, let us first consider the text to be found in the Encarta
Electronic Dictionary - http://dictionary.msn.com - which is published by Microsoft, owner of
WORD. Consider first its definition of class: it offers 14 senses of the word - more than twice
what we found in the thesaurus - and each has a full definition. Consider, for example, #6 which
reads:
6. structure of society: the structure of divisions in a society determined by the social or
economic grouping of its members
Note that the boldface phrase resembles the sense-defining terms in the WordPerfect thesaurus, but the text that follows provides more complete defining characteristics. Some of the definitions are preceded by field names like Biology, Ethnology, and Mathematics to specify the domains within which these concepts of class are used. At the end of the entry is a hyperlink to a set of synonyms entered under type: The summary reads: type, kind, sort, category, class, species, genre: CORE MEANING: a group having a common quality or qualities.
Of course, this synonymy
pertains to only one of the senses of class, #11:
group of similar items:
a group of things with
at least one common characteristic.
The bold face synonyms are all hyperlinked to
the entries where these words are defined. The synonymy includes the following
text:
type: a
group of people or things with strongly marked and readily defined
similarities; kind a general word for a group of people
or things loosely defined according to their similarities;
sort a general word used in the same way as
kind; category a deliberately defined group, usually used
to help sort or classify a larger group; class used in
the same way as category; species used in the
classification of living things to describe a specific group of animals,
plants, insects, or other organisms; genre a formal word
for a particular type of painting, writing, dance, or other art
form.
Clearly, a lexisaurus that hyperlinked users to the Encarta Dictionary entry for class would provide much more information about the word's meanings and related concepts than could be found in either of the two thesaurus tools mentioned above.
Moreover, the Encarta Electronic Encyclopedia is linked to the dictionary and provides many further leads for anyone using class. The starting point for an entry in the EEE is a list of sites, including electronic texts and articles in the Encyclopedia in which the word class appears as part of the entry. The Encarta materials are all part of a gigantic MSN complex that includes NBC news, and many advertised products. One may well speculate that this elaborate apparatus might well be interested in supporting and adding a lexisaurus repertoire of tools.
Other Dictionaries. However, there are many other electronic dictionaries and glossaries that are not part of the MSN complex. They could well be searched by a metacrawler that would be able to find entries for any word, like class, in all (or at least most) of them. A comprehensive list of sites for dictionaries and glossaries, in alphabetical order, can be found at: http://www.opengroup.com/books/index/bkbra800.shtml.
A selective list of these
works in English can be found at: http://www.yourdictionary.com/languages/germanic.html#english.
This list is sponsored by yourDictionary.com. According
to their Web Site,
yourDictionary.com provides the most comprehensive and
authoritative portal for language and language-related products and
services on the web with more than 1800 dictionaries with more than 250
languages. More than 1,500,000 people a month visit YDC.The organization is guided by linguists. Its statement
of goals reads:
The intention of
yourDictionary.com is to become the most authoritative and
comprehensive Web portal specializing in information about language----any
language in the world. Come here to look up a general or specialized word
in English or foreign-language dictionaries. We will do our best to
provide a home, or link to all the creditable dictionaries, glossaries and
word lists available on the Web. But yourDictionary.com will do
more than that: it will provide ways of building vocabularies, studying
grammar, practicing spoken and written languages. It will provide
scientific information about language as well as various forms of language
play designed to build language skills. If you don't find the linguistic
information you want here, let us know, for ourDictionary.com is
also a language-interest community designed to find and share information
about speaking, reading, writing and comprehending your own language and
all the other languages in the world. This information is
quoted from: http://www.yourdictionary.com/about.html
Since the proposed lexisaurus will significantly increase the number of dictionary users, it will surely contribute to the primary goals of this organization. I believe that they will want to embrace this new instrument and support its use in conjunction with their very extensive list of electronic dictionaries in many different languages. To get a feel for how this will work, let us take a look at the definitions of class found in some of the English-language electronic lexicons listed by yourDictionary.com.
We may start with The Merriam-Webster Collegiate http://www.m-w.com/netdict.htm which identifies only five senses of class when the word is used as a noun, but within each of these categories several sub-classes are defined - this work therefore recognizes more concepts of class than does the thesaurus, and it also provides more complete defining properties for each.
The American Heritage Dictionary, 4th edition, 2000, at: http://www.xrefer.com/offers an entry for class that identifies 8 senses of the word, and lists 10 hits (out of 585) in which the word, class, appears in a definition. For each of these 10, there is a hyperlink to the text in the dictionary where the word appears.
The Oxford English Dictionary, with a site at: http://www.oed.com is only available by subscription, but it is apparent that subscribers would get a wealth of information about any word entered in this major dictionary.
It seems unnecessary to add further citations from different electronic dictionaries, but a good many can be found - they would not significantly change the impressions created above. To check on this impression, one may use the list of electronic dictionaries that are hyperlinked on Xrefer, the search engine for Oxford University Press dictionaries. It searches all of them for their entries on a selected word. To see how this works, go to: http://www.xrefer.com If one types class in the window, one will get 15 hits from a list of more than 200. Each entry contains a definition and a link to the fuller text in the dictionary from which it was taken. Since the Bloomsbury Thesaurus is included, one can also get very substantial sets of synonyms. Some of these definitions are encyclopedic in scope, and many of the sources are specialized glossaries.
Not surprisingly, the Oxford site is paralleled by The Cambridge Dictionaries Online at: http://dictionary.cambridge.org/cmd_search.asp Its entry for class gives readers hyperlinks to a half dozen entries for different senses of the word, plus a lengthy list of phrases in which the word appears, each supporting a hyperlink to the relevant entry - in many different dictionaries, again including specialized glossaries.
Innovative Lexicons. All these dictionaries generally follow well-established lexicographic traditions. However, some new types of dictionaries have been created in response to the opportunities offered by the Internet and they deserve our attention here. A good example is The WordNet English Dictionary and Thesaurus, published by Princeton Cognitive Science Laboratory. It can be found at: http://www.cogsci.princeton.edu/~wn. As they describe the project, WordNet®® is an online lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets. These relations include not only synonyms, but also hypernyms, hyponyms, holonyms, and coordinate terms, both in short and longer sets. Users are given choices to open the kinds of sets they want to view.
To illustrate how this system works, here is the set of hyponyms (more specific concepts) generated by WordNet for class in its third sense defined as social class, socio-economic class -- (people having the same social or economic status) In the following list, each hyponym follows an arrow, beginning with terms and followed, in parentheses, by examples.
Class (Sense 3) class, social class, socio-economic class -- (people having the same social or
economic status; "the working class"; "an emerging professional class")
=> world, domain -- (people in general; especially a distinctive group of people with some
shared interest; "the Western world")
=> age class -- (people in the same age range)
=> agriculture -- (the class of people engaged in growing food)
=> brotherhood, fraternity, sodality -- (people engaged in a particular occupation; "the medical
fraternity")
=> estate, estate of the realm -- (a major social class or order of persons regarded collectively
as part of the body politic of the country and formerly possessing distinct political rights)
=> labor, labour, working class, proletariat -- (a social class comprising those who do manual
labor or work for wages; "there is a shortage of skilled labor in this field")
=> lower class -- (the social class lowest in the social hierarchy)
=> middle class, bourgeoisie -- (the social class between the lower and upper classes)
=> commonalty, commonality, commons -- (class composed of persons lacking noble or knightly
or gentle rank)
=> peasantry -- (the class of peasants)
=> demimonde -- (a class of woman not considered respectable because of indiscreet or
promiscuous behavior)
=> underworld -- (the criminal class)
=> yeomanry -- (class of small freeholders who cultivated their own land)
=> caste -- (a social class separated from others by distinctions of hereditary rank or profession
or wealth)
=> caste -- ((Hindu) a hereditary social class stratified according to ritual purity)
=> upper class, upper crust -- (the class occupying the highest position in the social hierarchy)
=> firing line -- (the most advanced and responsible group in an activity; "the firing line is
where the action is")
=> immigrant class -- (recent immigrants who are lumped together as a class by their low
socioeconomic status in spite of different cultural backgrounds)
=> center -- (politically moderate persons; centrists)
=> old school -- (a class of people favoring traditional ideas)
=> market -- (the customers for a particular product or service; "before they publish any book
they try to determine the size of the market for it")
=> craft, trade -- (people who perform a particular kind of skilled work; "he represented the
craft of brewers"; "as they say in the trade")
=> womanhood, woman -- (women as a class; "it's an insult to American womanhood"; "woman
is the glory of creation")
Another
innovative Web-based product is the The Wordsmyth Educational
Dictionary-Thesaurus, at:
http://www.wordsmyth.net. Its entry for class defines 7
senses of the word, and then lists a set of synonyms, each posted as a
hyperlink to its own entry. In addition, one will find in this work a
list of about 15 phrases in which class appears - e.g., class action,
c. consciousness, c. struggle, first c., 4th c., working c.,
upper c., etc.
Each of these phrases is defined and also hyperlinked to an entry that
provides one or more sense definitions and further synonyms for each sense
of the phrase. The result is a wealth of data that does not duplicate the
material found in the works mentioned above..
Computational Lexicography. In addition to work based on established electronic dictionaries, the development of a lexisaurus will require cooperation with lexicographers and terminologists, especially those interested in the utilization of computing technology and the Internet to support their work. In this context, an important resource is provided by the DICT Development Group. Their Web site offers a search engine for some dictionaries. http://www.dict.org/bin/Dict
A search of this site for class opens definitions from seven sources. These including four lexicons and three computing glossaries. The lexicons include three citations to Webster and one to WordNet. As for the glossaries, the first is the FOLDOC Free Online Dictionary of Computing, http://foldoc.doc.ic.ac.uk/foldoc/index.html. It provides term entries for several specialized meanings of class as used in computing. Interestingly, the other two citations are from VERA, Virtual Entity of Relevant Acronyms, at http://userpage.fu-berlin.de/~oheiabbd/veramain-e.cgi Here, it seems, CLASS is an acronym standing for either Centralized Local Area Selective Signaling or Custom Local Area Signaling Service.
The proliferation of acronyms suggests a need to formalize the distinction between them and words. Perhaps capitalizat