|
Relevance in the eye of the search software The original article appears in Online Information Review 29(6):676-682. It is enhanced here by additional color charts to facilitate the visualization of the results. |
|
Most
search engines promise to rank the results by the relevance of the
documents to the query submitted. This is supposed to order the display of
documents based on to what extent
they match topically the query. Large scale user studies have shown that
most users look only at the first result screen, typically displaying the
top 10 ranked results. This
makes relevance ranking particularly important for bringing to the top the
most relevant records. In case of identical editions of the same datafile
the rank positions should be identical or very close. They are not. Several tests of the identical (or almost identical) edition of the PsycINFO and Medline files in 8-10 implementations by various online services showed the lack of or only a minimal level of consensus among the implementations. For example, the same 31 records were retrieved for the simple query of scientometrics OR scientometric from the Web of Knowledge, Scirus, OCLC, Ebsco and other implementations of the Medline database. |
|
Ebsco retrieved 32 records, the extra one (oddly ranked as the most relevant) being an article which was picked up because of the word scientometric in the author affiliation field. Although it was excluded from the comparison chart, the rank order number of the other 31 items were not changed , that’s why the rank order numbers range from 2 to 32 in Ebsco. |
|
|
|
Only
in case of a few records (such as #2, #7, #9, #16 and #17) are the rank
positions close enough in all the four implementations. OCLC is the most
often the outlier, and Ebsco is the least often the outlier. |
|
|
|
Pair-wise comparison of rank orders showed similar pattern, i.e. strong consensus among some records, weak and very weak consensus among others. The records in the pair-wise comparisons were linked to the PubMed version of Medline to facilitate the lookup of the full records for exploring reasons for the strong and weak consensus, but this did not provide further insights. The chart below shows the closest and the furthest distance between the ranking of the same records by WoK and Scirus. Green lines represent strong consensus, while yellow and red lines indicate weak and very weak consensus.
|
|
|
|
Analysis
of rank orders for the same 31 records from other implementations of
Medline, such as from Dialog (in the Target mode), HighWire Press, and
KnowledgeFinder showed a very similar and inconsistent pattern of
consensus. As
the record content is exactly the same in all the four implementations,
there seemed to be neither rhyme nor reason for the lack or just minimal
level of consensus among the various online information services. Users of
the same file in different database implementations will see very
different records when looking only at the top 10 or top 20
purportedly most relevant items for the same search. This syndrome is not limited to Medline and PsycINFO which have the largest number of implementations, but apply to all the databases that have 2-5 implementations by various online information services. This unpredictable pattern questions the adequacy of the various algorithms used for ranking documents or document surrogates in search results. |