The classic database richly enhanced by the inclusion of nearly 10 million cited references in 212,000+ records and of tens of thousands of links to publisher and journal sites could have been a contender for cheers had it not been  marred, maimed and mutilated by

a) sloppy and hasty implementation,

b) long delay in the development of appropriate  distribution format which vitiates some hundred thousands of cited references,

c) including an incredible volume of syntactically erroneous links which should have been recognized even by absolute beginners,

d) ever-changing record structures and field tagging which distressed many PsycINFO hosts, some of which in turn further debilitated the database by adding their own errors in the chaos.

In spite of  all these problems (most of which can be fixed by switching from MARC format to XML format, applying global changes to syntactically wrong URLs, testing by program  the hyperlinks created, consolidating the record structure),  PsycINFO has great potential, but as of the end of December, the above flaws turned cheers into jeers. 

THE CONTEXT

For fairness: there are many assets in PsycINFO, which I praised earlier.  Suffice it to say here that PsycINFO has about 2 million records (75% of which are journal articles), a very good mix of source documents, covers about 1,900 journals (including  all the most prestigious ones in psychology, psychiatry, mental health and -to a lesser extent- in behavioral sciences).

It also includes records for  over 75,000 books, about 133,000 book chapters, 232,000 dissertations, and 48,000 conference papers,  which are particularly important as these are not covered by the traditional citation databases of ISI. Neither are they covered by e-psyche (with the exception of dissertations) but e-psyche should be no one’s benchmark. Nearly 90% of the PsycINFO records are in English and 90% of  the records have abstracts.

On the negative side, 43,000 records have no descriptors (although only before the late 1970s), and the publisher fields show  massive inconsistencies in punctuations and abbreviations, as well as typos, in addition to a great number of patently incorrect links (to be discussed later). Theoretically, the database goes back to 1872., but don’t take this at face value.

For 1872 there are four records for Darwin’s classic book, three of them (created in 2002) with practically the same information which are much inferior to the one created in 1998 with decent abstract and table of contents for the reprint edition – go figure the logic of APA). The 1885 entries take you also to three almost identical records). These  triplicates and quadruplicates are created by APA, and are different from the duplicates generously but mistakenly contributed by Ovid as discussed at the end of the review. As for the following two decades there are only 55 records, so the time-span claim  is as cheap as most of the ones made by PsycINFO’s competitor: e-psyche. PsycINFO should not engage in such tactics.

Neither should CSA-IDS do so by claiming that its version of PsycINFO goes back to 1840. Technically and legally it may be true because indeed,  there is one record for a book published in 1840, and another one with an imprint of 1867 – then nothing in between 1840 and 1872. That’s not a good enough reason to label the database with a time span of 1840-current. 

THE ENHANCED CONTENT

The biggest improvement in PsycINFO is the inclusion of cited references. This very important enhancement has  two dimensions: a) for which sources  are cited references included (the width of enhancement), and  b) which of the cited references are included for the source documents (the depth of enhancement).

APA provides a credible, quarterly updated running tab on its site about the number of records and the number of cited references added to the database on  a year by year  basis. It clearly states that cited references have been added comprehensively since 2001, and selectively since 1988. Selectivity in PsycINFO means that only some records have been enhanced with citations, not that capriciously selected citations have been added for some records, some of the time (as is the case in e-psyche).

Width of enhancement

While most of the claims of PsycINFO can be taken at face value, the one about the  comprehensiveness from 2001 onward at the record level is not entirely accurate. Out of the 196,798 records with publication year 2001 or higher, 158,154 records (80%) have cited references.

For comparison, there are cited references  in about  85% of the nearly 900,000 records in the psychology, psychiatry and behavioral subset of the ISI Social Science database (Social SciSearch 1972:2003 on Dialog). The figure decreases to 80% for the past two decades because (laudably) ISI started to include professional journals where cited references are far fewer than in academic journals.   

In itself it is of no concern that only 80% of the records have cited references in the 2001-2003 subset of PsycINFO, because there are articles which indeed do not often cite references (such as editorials), and other items which never have cited references, and for which no records should have been created in PsycINFO, to start with, such as various announcements related to journals, manuscript submission guidelines, call for papers, call for nominations, news about upcoming conferences, and various events, and other ephemeral items. 

There are also regular articles which do not have cited references, although it is not typical in the refereed journals which make up more than 95% of the journal article sources. Actually, out of the 159,427 journal article records, 142,061 (89%) have cited references.

What raises the percentage of records with no cited references to 20% in PsycINFO for the period from 2001 onward, is the 13,733 dissertation records  for the above time period which never include cited references in PsycINFO (even though all of the dissertations cite publications profusely). This fact about dissertations is not made obvious to the users in the database help files. On the positive side, from 2001 onward, 80% of the conference proceedings at the analytical (paper) level are enhanced with cited references (which is not made obvious, either).

Conference papers published in journals are also covered well. These documents are poorly covered, if at all, in many other social science databases, and even in specialized databases, like Conference Papers Index, in which the records have very modest traditional content.

So where is the problem with comprehensiveness? Even a casual scanning of the list of items without cited references (excluding the ephemeral materials and dissertations) show journal articles which are very likely to cite references even if their PsycINFO record shows  none.

The omissions are obvious for the scholarly books, and  scholarly articles. All the articles published in 2003 which appear in the screenshot below include cited references, ranging from 23 to 59, but not in the PsycINFO records.  In light of the above considerations, records without cited references from 2001 onward is likely to occur in about 2% of the records. Priority should be given to enhancing these  records.

Depth of enhancement

The enhancement of records by cited references has another dimension, too. That is the depth of enhancement, i.e. to what extent references in a given source documents are added. PsycINFO is good at it (although with some glaring omissions) - but it does not come through for reasons to be discussed below in any of its online versions except in OCLC’s. You appreciate PsycINFO’s depth of enhancement when you compare it with e-psyche’s pathetic depth. Take as a very typical example, the record for the same article from e-psyche and PsycINFO. The former includes 10 cited references.

PsycINFO has all the 109 references  cited by the article. This jaw-dropping difference is not justified by the excuse of e-psyche that it omits references to all books, as well as all references  to any cited documents published before 1970.

You can see in the screenshot above that none of the last 10 citations belonged to the exclusion category of e-psyche, still, not a single cited reference of them is included in e-psyche. The same is true for the first 10 cited references in PsycINFO as shown below. I have no explanation other than irresponsibility and desperation  for e-psyche to exclude more than 92% of the 109 cited references, when 107 of them meet even e-psyche’s  questionable inclusion policy requirements.

It is also indicative of the depth of inclusion that in PsycINFO, the average number of cited references is about 44, in Social SciSearch 33, and in e-psyche 18, except for the first set of records added at it launch. One reason for PsycINFO high average is  that it covers scholarly books in which the average number of cited references is much higher than in journals. 

Of course, there  is a very wide range in the number of items cited. Of the citation enhanced PsycINFO records, about 10%  have between 1 and 9 citations, 14% between 10-19. On the other hand, there are  records which have several thousand citations. The record with the largest number of citations is the book about the Fundamentals of Abnormal Psychology (3rd edition) – with 4,751 cited references, and there are many with well over 3,000 cited references, such as the book on Abnormal Psychology shown here from the OCLC implementation. However, the hasty implementation by APA  prevents the display of hundreds of thousands of cited references (at least temporarily) in all bit OCLC's implementation, and that’s a big problem.

 

Restrictions for displayable citations

The number of records with more than 250 cited references  represents a relatively small portion of the database, but several hundred thousand  cited references become invisible for all users (except for OCLC users). Why? Because APA has been distributing the records in MARC Communications format (also known as ISO 2709 format) to online services, and released the XML format only after 18 months of the launch of enhanced PsycINFO.

Mind me, the MARC Communications format was a brilliant brainchild of Henriette Avram in the late 1950s. It was the key component for shared cataloging and  other library automation functions (like creating item files for circulation systems), in which the Library of Congress was about a good decade ahead of Western European National Libraries. The MARC Communications format  has been  also the vehicle for distributing abstracting/indexing databases to online information services.

It has been enhanced continuously, but there is one area where it cannot be enhanced, and APA should have realized this before it launched its enhancement project.  The MARC format limits the length of the record to 99,999 characters simply because the record length must be defined in a 5 digit segment of the fixed length MARC Leader field. Record may reach the record limit at the 230th citations or even earlier

 

In my tests PsycINFO records reached the record limit with 200 or 400 or 600 citations – depending (among others) on the types of cited references (multiple authorship with large number of authors), the type of documents (conference proceedings can have very long names, dates, locations), and even such bibliographic data elements as excessively long titles (common in scientific papers). Here a record conks out at the 228th citation exactly because of the many long citations.

On the other hand, all the 683 citations in a book could be accommodated within the MARC record size limit because the majority of its citations are  short legal references. I know that short legal anything is an oxymoron, but there are some exceptions.

Although APA warns on its Web-site  about the limitations very few, if any, end users of PsycINFO would know about it. Users  would be befuddled why dozens, hundreds, or thousands of citations are missing from the record of many articles and books which do refer to published materials in  Roman alphabets.

At least CSA-IDS clearly warns users how many cited references are present in the source documents and how many could be displayed (due to the record size limitations). These two values are in the MARC record as two subfields. Still, they are not used by all online services, or are used incorrectly or inconsistently.

In Dialog only the first subfield is displayed, leaving the users in the dark until they realize that the alphabetic citations stop a tad prematurely at Bremner.

Ovid displays both  values correctly in some records, but not in all of them. For example, for this article by Pinquart, the Ovid record claims that there are 355 citations present and 355 displayed, but as you scroll down in the record you realize that the list of cited references ends at No. 69.

Again, CSA-IDS tells it as it is, and OCLC has the right numbers, but it mixed up the labeling. In this case there are 69 cited references not because of the record limit, but because of citations excluded for no reason. This is one of those cases when PsycINFO  inexplicably omits a great number of cited references – as e-psyche routinely  does.

 

Depressing  Links

It is also the sign of  hasty implementation that PsycINFO did not heed the advice of the APA Publishing Manual about the importance of minding the syntax of the URLs,  let alone the one about checking their validity.

APA might as well have added, that the credibility of your database will suffer if it has tens of thousands of not merely wrong, but syntactically wrong URLs. It may not have been easy to type the forward double slashes in the URL as backward double slashes (my word processor automatically corrects them, for example), but PsycINFO managed to do it and did it relentlessly in 2002 and 2003. For its credit, not all the URLs were messed up all the time this way, just 11,852 in the publisher name field, counting only those cases where the URL has wrong syntax.

These links are cold, very cold in all implementations except in CSA-IDS (which corrected them). OCLC must have been so frustrated by the volume of incorrect URLs that it decided not to display the publisher field, although you may search and browse that field.

APA has not discriminated against certain publishers. You can find this URL problem in many records for articles published by a variety of publishers throughout 2002 and 2003 as shown in my collage.

In a truly democratic way, APA used the syntactically incorrect URL for its own Web address 746 times.

To break the monotony of the backslash-ed URLs, there are syntactically wrong URLs in great numbers with other flavors.  PsycINFO graced the Haworth Press with a secure and non-secure prefix, however, it does not act like belt and suspender. Then –as if to compensate for the excess- you will also find URLs without the protocol prefix and the slashes and colon (which is a better alternative as many browser versions  can handle such minimalist URLs) .

However, Ovid’s programmers in charge of PsycINFO may have looked only at such prefix-less records in studying the record structure  and decided to add the http:// prefix to many URLs, making the bad URL worse.

PsycINFO indexers sometimes may have had second thoughts about the publisher URLs, as they stopped mid-stream in typing the URL, leaving it as a puzzle.  Ovid dutifully added the http:// prefix but the URL  still takes you nowhere when it has nothing but http://www.

There are plenty of wrong URLs which may look good and blue to indicate actionable link, but you will get red in the face when you  click on them, receive the error message and then find out that academicpress is misspelled as acedemicpress in 356 records. In such simple publisher URLs it is easy to recognize and correct the errors on the fly, but it is frustrating, puts the burden on the users, and makes them wonder about  how many records have wrong URLs, as well as about the credibility of the database.

Publisher URLs were not the only data elements in PsycINFO which suffered from URL syntax disorder. Journal URLs also shared this pain, and quite often none of the URLs were correct in the original record. Some online services tried hard but could not help as is the case where Ovid added the correct http:// prefix but did not remove the wrong. As you will see below such policy pops up in the unrelated matter of handling correction records which are added but the erroneous ones are not replaced (removed), so it compulsive behavior as psychologists would say.  .

Quite tellingly, among the many implementations I looked at, only CSA corrected the wrong URLs systematically and comprehensively as illustrated by its implementation of the record shown before from Ovid  as well as in hundreds of my other test records. CSA-IDS  did and still does  a big favor to APA and to the users who can happily click on actionable URLs in CSA-IDS which are dead in many other implementations of PsycINFO.

Distressing partnership

It was a hasty decision to use MARC for distributing PsycINFO. Perhaps the enhancement of those records where the MARC limit was reached should have been postponed until the XML exchange format for PsycINFO was developed (which happened in July, 2003), and the online services had enough time to prepare for receiving and processing XML records (which has not happened as of the end of 2003 – with one exception).

Some online services apparently were more distressed than others by the hasty implementation of the enhancements by PsycINFO. Dialog may have been the most stressed out, as witnessed by the large number of wrong descriptors which were erroneously parsed and extracted from the often changing format and tagging of PsycINFO records. Look at some of those entries in the basic index  starting with double d.

All of them are descriptors, the sacred cows in every decent database.  Practically all of them are from records added after the new tagging and distribution mess of the enhanced records started. I speculate that the ^d subfield code was erroneously added to the descriptors themselves. It still is, I am afraid as apoarently no one complained.  

These descriptors  are relatively easy to spot anywhere in the index, although looking at them is not the best psychotherapeutic treatment for those who got psyched out when missing thousands of records while searching by the root word psychotherap  as a descriptor in the current 2002-2003 segment of the database. Now they will know that they have to enhance their strategy by using such statements in every descriptor search as: select psychotherap?/de or dpsychotherap? It’s another question that even minimal level quality control should have caught this problem at DIALOG.

Dealing with PsycINFO must have taken its toll also on Ovid which usually has smart solutions in processing databases. But  Ovid has one of the few implementations of PsycINFO which omitted the Digital Object Identifiers (DOIs) from the records, the key for licensing libraries to link simply to their full-text digital versions of journals from PsycINFO records. There is a whopping number of 208,359 records in PsycINFO on CSA-IDS which include DOIs, but I could not find any in Ovid, and there is no option for searching or browsing by DOIs in the otherwise rich assortment of  indexes in Ovid.

I also found many duplicate records in Ovid’s version of PsycINFO, and alerted the company showing a small gallery of these duplicates http://projects.ics.hawaii.edu/~jacso/extra/cj-03/dups/. They  are the original, erroneous records and their correction records, appearing as  item #1-#5, #2-#6, and #3-#4.

I was told that these are not true duplicates (which is true), but correction records (which is correct). But I was also told  that “we believe that it is in the user best interest to get the correct information even if it creates the impression of a duplicate record” – and this is not correct in my book, or in the book of any of the other online information services which replace the erroneous records. Of course, after reading the PR hypes about the  e-psyche database (shown below for appetizer for my other jeer)  I may have turned  just skeptical about sentences which  include too often the word “believe” and mean the opposite. Ovid can't really believe this. 

APA may not have coined yet a term for this syndrome ending in disorder (such as update apathy disorder), but  to me this syndrome of not replacing the erroneous records  is just rationalization. It is similar to the Stockholm syndrome, when hostages, even when it does not do any good to them, start to bond with their captors and sympathize with their cause and actions.

This policy will not do any good to Ovid (and to its customers).  Correction records in an update are meant to replace erroneous records, not just to add them – in order to make sure that the users get the correct record, and only the correct record. I believe (ahem) that replacement serves  best the interest of users. This is one of the many advantages of being digital.

When looking at the duplicates the users have no idea which one is correct and which one is not, and the records do not carry such distinctive labels or warnings.  If the erroneous one shows up first,  the user will not look at the correction record. Beyond the annoyance of handling these duplicates, the number of duplicates may also distort the result of bibliometric studies which use Ovid’s version of PsycINFO.  

As my test search at the end of 2003 showed, the policy has not changed, the duplicates are still there. I can only hope that when Ovid will reload the database from the XML tapes and the distress is over, it will revise its policy and reverse its preference, and will eliminate the erroneous  records of the duplicate pairs.

Conclusion

The best way to put it how much you get for your money is to compare the pay-as-you-go price for PsycINFO and Mental Health Abstracts (MHA) on Dialog. The latter (which used to be the only competitor until 1983 (when it was taken over by IFI/Plenum) has become one of the worst commercial databases (competing now for that title with e-psyche). It was last updated in 2000, and it has been deteriorating at a fast pace for the past 20 years. As you can see from the publication year index it was  drastically reducing the number of records, and what you cannot see, its editors systematically decimated its journal base, emaciating it  to a skeleton of the one before the takeover.

Still, MHA has an hourly rate of $70 and a per record rate of $1.50 for its sorry records, while PsycINFO  has an hourly rate of $30 (like the government databases), and a per record charge of $0.80 even though more than 210,000 records have been enhanced by the valuable cited references.  Dialog’s pricing  shows how untrue is the adage: you get what you pay for.

As for the bang for the buck offered by PsycINFO versus e-psyche, the latter claims that it charges about half or less of what APA charges (based on the number of users). This may be so, but the value you get from PsycINFO is far more than twice of what you get from e-psyche. PsycINFO is far from perfect, but its errors of commission and omission are due to sloppiness and harried implementation. Its claims are realistic (if not always accurate) and do not not seem to be knowingly misleading.

APA should focus on enhancing the current segment of the database (say from 1990 onward) with cited references, rather than keep adding skeletal records for articles published since Gutenberg invented the printing press. APA should also get rid of the Mental Health Abstracts database (which they acquired without releasing any PR statement - understandably), and should stop using it to add sorry records for articles published in the mid-1900s. It is a very poor database, and as the proverb has it  “he that lies down with dogs, shall get up with fleas”. APA can’t afford to let PsycINFO stink in the morning or thereafter. On the contrary,  it must stay squeaky clean, no matter how dirty tricks its competitors have been playing.

Back to Cheers & Jeers 2003