Relevance in the eye of the search software

The original article appears in Online Information Review 29(6):676-682.

It is enhanced here by additional color charts to facilitate the visualization of the results.

 

Most search engines promise to rank the results by the relevance of the documents to the query submitted. This is supposed to order the display of documents based on to what  extent they match topically the query. Large scale user studies have shown that most users look only at the first result screen, typically displaying the top  10 ranked results. This makes relevance ranking particularly important for bringing to the top the most relevant records. In case of identical editions of the same datafile the rank positions should be identical or very close.

They are not. Several tests of the identical (or almost identical) edition of the PsycINFO and Medline files in 8-10 implementations by various online services showed the lack of or only a minimal level of consensus among the implementations.  For example, the same 31 records were retrieved for the simple query of scientometrics OR scientometric from the Web of Knowledge, Scirus, OCLC, Ebsco and other implementations of the Medline database.

Ebsco retrieved 32 records, the extra one (oddly ranked as the most relevant) being an article which was picked up because of the word scientometric in the author affiliation field. Although it was excluded from the comparison chart, the rank order number of the other 31 items were not changed , thatís why the rank order numbers range from 2 to 32 in Ebsco.

Only in case of a few records (such as #2, #7, #9, #16 and #17) are the rank positions close enough in all the four implementations. OCLC is the most often the outlier, and Ebsco is the least often the outlier. WoK and Ebsco have most often a full or almost full consensus, as shown by Record #1, #4, #6,  #8,  #11, #12, #14, #15, #21, #22, #25, and #27. But this is not a consistent pattern, and for some records the rank order numbers of WoK and Ebsco are the furthest apart, as for Record #5, #23, #26, and #28. (For Record #30 the rank position of Ebsco is not marked as it is ranked #32 because of its rank range is from 2 to 32 as was explained earlier.)

Pair-wise comparison of rank orders showed similar pattern, i.e. strong consensus among some records, weak and very weak consensus among others. The records in the pair-wise comparisons were linked to the PubMed version of Medline to facilitate the lookup of the full records for exploring reasons for the strong and weak consensus,  but this did not provide further insights. The chart below shows the closest and the furthest distance between the ranking of the same records by WoK and Scirus. Green lines represent strong consensus, while yellow and red lines indicate weak and very weak consensus.

 

Analysis of rank orders for the same 31 records from other implementations of Medline, such as from Dialog (in the Target mode), HighWire Press, and KnowledgeFinder showed a very similar and inconsistent pattern of consensus.

As the record content is exactly the same in all the four implementations, there seemed to be neither rhyme nor reason for the lack or just minimal level of consensus among the various online information services. Users of the same file in different database implementations will see very different records when looking only at the top 10 or top 20 purportedly most relevant items for the same search.

This syndrome is not limited to Medline and PsycINFO which have the largest number of implementations, but apply to all the  databases that have 2-5 implementations by various online information services. This unpredictable pattern questions the adequacy of the various algorithms used for ranking documents or document surrogates in search results.

back to eXTRA