Discovering Better Database Alternatives

Savvy searchers realize when a once appropriate database should be abandoned even if it is still touted as THE information source for a given discipline by the majority of its users. On the occasion of the proposal of discontinuing the PubSCIENCE database of the Office of Scientific and Technical Information (OSTI) of the Department of Energy (DOE), I publicly welcomed the news to the horror of many of my fellow information professionals.

I was not welcoming the potential loss of appropriations for a free information service about energy-related scholarly articles. I was supporting the proposal as a way to send a wake up call to information professionals that by clinging to the idea of using the PubSCIENCE database they are depriving themselves and their patrons of far better sources. I gave an account in the October, 2002  issue of Information Today why information professionals should shed no tears over the likely demise of that database, and should steer away from it even if it survives due to the intensive, but misguided lobbying to save it. Here I focus on the alternative comparable sources that run circles around what PubSCIENCE could offer.

Energy Citations Database (http://www.osti.gov/energycitations)

This database is also created from the Energy Science & Technology Database of OSTI, just like the bulk of PubSCIENCE, but it is a much larger subset than PubSCIENCE had. True, it doesn’t have the most touted (and for me the most disappointing) feature of PubSCIENCE, the link to publishers’ archive for the free abstracts, and for qualifying subscribers, for the free documents themselves, but it has twice as many unique records. Most of them have excellent metadata, which makes it much more rewarding when searching for relevant articles (and conference papers and technical reports). Most of the records in ECD has subject codes, descriptors, broader subject terms, abstracts, and a variety of other value added information, such as document type, language code, name or code of the sponsoring and the research organizations which are lump indexed in PubSCIENCE.

ECD has some duplicates but nothing comparable to the volume of duplicates and triplicates that was dumped into PubSCIENCE. This strategy made it look larger for sure, though not to the tune of 1.8 million records which appeared lately in the promotional materials of PubSCIENCE. As of September 1, 2002 it had exactly 1,327,447 records, as opposed to more than 2 million records in ECD.  ECD also has a far better interface, a more capable search engine, and a better output module  than PubSCIENCE, which did not allow  sorting of the result by date or title, but only by “relevance” of the items which did not make any sense to me. How does it qualify for relevance when exact duplicates and triplicates  often appear scattered in the result list  when with a good relevance ranking algorithm they should have queued up next to each other as passengers at a London bus stop.

Figure 1. Triplicates and duplicates in PubSCIENCE

Even better is a larger version of the Energy Citations Database (about 3.5 million records) at https://www.osti.gov/doeecd) which is meant for DOE staff and contractors, but is available for the public even without registering for a user-id and password. The software features and the superior record content are the same in the two versions, only the number of records is different. It is a pleasure that you don’t have to run the query twice as in PubSCIENCE to cover both the pre-1990 and the post-1989 subsets of the database.

There are two deficiencies in the software. One is  that you cannot unambiguously limit your search to a specific journal.  The other is that you have to return to the result list and click on the next record to display it instead of just clicking on a next button when a full record is displayed. Those who make the move from PubSCIENCE must bear in mind that spaces between words there mean a phrase but in ECD space means an AND relationship.

Figure 2. Better content and layout in DOE-ECD

ASCE database (http://www.pubs.asce.org/cedbsrch.html)

The American Society of Civil Engineers was my pick a few years ago when it made the unprecedented decision for a commercially successful database (CEDB) charging about an $80/hour fee and $1.50 per record, withdrew the database from the professional online services and offered the whole database for free to the public, not only to ASCE members.

It is a top notch abstracting/indexing database with nearly 100,000 records. Yes, it is small in light of all the other databases discussed here, but it can bring up journal articles that you would not find in PubSCIENCE, not even in the ECD databases, like the four records about geothermal energy in Hawaii which CEDB delivered. There are about 4,500 records about articles and conference papers related to some energy science and technology topic.

It is very useful that you can consult the ASCE thesaurus online to find the preferred terms, their narrower and broader terms when formulating a search. Beyond the inconvenience of flip-flopping between the result list and the individual records, it may also limit the precision of your search that there is no possibility for exact phrase searching, i.e. business information and information business would bring up the same records. There is no option to limit the search to a specific journal.     

Figure 3. A small database but valuable records

Scirus (http://www.scirus.com)

I panned this database at its debut for its baseless tag line (“For Scientific Information Only”) when it had tens of thousands of results pointing you to sites (mostly put up by students) with obscenities and vulgarities. You don’t need to be a church lady to be annoyed by such junk content which is still in Scirus but you can bypass it.

By the Summer of 2002, Elsevier added the full abstract and bibliographic citations of nearly 2 million articles published by Elsevier and Academic Press and Belstein from 1973. The content is very rich and scientific (if you limit your search to journal articles). The software is far the most powerful in this group, the only feature I missed was the alert about misspelled words or the automatic replacement of misspelled word(s) in the query that Google does so magnificently.

Of course, there are links to the source documents from this database, and if you have a qualifying subscription to the journal(s) you get access to the documents in HTML and PDF formats. This is a big deal as Elsevier dominates the list of the Energy and Fuel section of the Journal Citations Report of ISI with 20 journals of the 66 monitored by ISI just for this category. Then there are the interdisciplinary Elsevier journals, not to mention the economics journals to get information about economic issues that are hardly covered by the DOE databases. For example, PubSCIENCE has nine records with the term fuel taxes anywhere in the abstracting records within the section covering the period from 1990 onward. There is a triplicate and a duplicate so the unique number of records is 6, plus 8 records (7 of them unique) in the pre-1990 section of the database. Scirus brings up 214 records for the same query about journal articles from its stable of serials.

Figure 4. Sweeping list of results

There are many free resources to get not only abstracts but often the entire document free of charge. Most of these specialize on grey literature, like government reports. Here I only wanted to illustrate how should you not get stuck doing the same old-same old routine using an abstracting/indexing source that is mediocre in the company of other free alternatives which will spare you a lot of energy.

Jacsó, P. Should PubSCIENCE Go the Way of Caesar? Information Today 19(9) October, 2002, p. 32, 33.