|
Internet Insights - Thoughts about Federated Searching Abridged, text-only version was published in Information Today, 21(9) October, 2004, p.17. |
![]() |
|
The words federal and federated do not always conjure up positive images. Still, federated is the most expressive adjective when it comes to the consolidated retrieval of results in response to a query sent to several databases hosted by different online information systems. Federated searching is more than multiple database searching, metasearching, polysearching, broadcast searching all of which put the emphasize on searching, but there are other steps in federated searching, and they make the process as difficult as herding cats. Federated searching consists of transforming a query and broadcasting it to a group of disparate databases with the appropriate syntax, merging the results collected from the databases, presenting them in a succinct and unified format with minimal duplication, and allowing the library patron to sort the merged result set by various criteria. |
|
Large
libraries spend more than one million dollars a year for the digital
archives of journal publishers, for abstracting/indexing and for full-text
aggregator databases. Still, few patrons discover the richness of these
scholarly and professional digital sources, and even fewer use them
happily and regularly because they’re not exposed well on many
libraries’ home pages. The different interfaces and search languages are
further deterrents, and often the database names don’t provide enough clues to choose them when looking for information about a topic.
|
Û ABI/INFORM Û PAIS Û INSPEC ÛISTA Û CINAHL |
|
Many
of the host names (Athena, Dynix, Sirsi, EOS) and database acronyms (ABI/INFORM,
PAIS, INSPEC, ISTA, CINAHL, Scirus, Scopus) are as Greek to library
patrons as the names of the food on the menu in a, well, Greek restaurant.
Do you know what’s common in all the databases whose acronyms I
just mentioned? They all contain relevant materials for library and
information science and technology. | |
|
Learning about the availability of these databases is one thing. Getting to them by clicking through the labyrinth on many library Web sites is another. Making patrons use them—while applying the strict semantic and syntax rules of Boolean and proximity operators to terms looked up from the thesauri—is yet another thing. No surprise that patrons are happy if they make it through one database with some catch on their hook. They don’t go to see if another database may have more and/or better results. Most give up, storm out of the library and throw at Google the query library anxiety information overload help which will find a few good-enough full-text reports, case studies, and articles among the first few hits of the more than 10,000 open access web pages which come up for the search above. They may never come back to the (digital) library again. And there goes your ROI. |
|
|
|
Greek
restaurateurs in Paris’s Quartier Latin display their foods on the
street to lure in the tourists. Food marts in Asia offer tasters and
display pictures of their culinary masterpieces to encourage wary American
and European visitors to try the exotic cuisine, in the hope that they
will order a course or two and return again. Similarly, digital libraries
have to offer samples of their varieties of intellectual foods. This would
encourage patrons to swiftly browse, pick, taste, and consume highly
nutritive information—and come back again. |
![]() |
|
This
summer and fall, I tested the three most popular federated search engines:
Ex Libris’ Metalib, MuseGlobal's MuseSearch (through Innovative
Interfaces’ MetaFind), and WebFeat’s Prism. Query submission and
broadcasting seem to be quite similar in all three software, but behind
the scenes there are different translation
procedures to accommodate the query syntax of the target systems. These
daunting tasks are invisible for the end user, and one can only guess the
differences from
comparing the results of the federated search engines versus the native
search engine - a topic for my upcoming in-depth article. I found one major befuddling problem in query translation and broadcasting in my tests. It is in WebFeat which is of great importance as WebFeat provides the cross-searching features incorporated in ISI Web of Knowledge (WoK) for many open access archives, and may give users a bad impression of the otherwise impressive high-end WoK service. |
![]() |
|
WebFeat can take the query presented to the native WoK software on the native query template |
|
|
|
after
the results are presented from WoK, by pressing the External Collections
Result button to run the query on the selected open access database(s). |
|
|
Great idea, but it does not work reliably for many of the very typical title-author query combinations in WebFeat. Time and again it returned no results when there were one or more records which matched the syntactically valid query in the targeted databases when I ran the query in their native software independently. |
![]() |
|
I tried many variants with and without truncation, with first name spelled out and with initial only, and there were no results from WebFeat. When I searched by title or name alone, the records matching the original title-author test queries did show up in WebFeat's result list …. |
|
|
…. such as this record as hit number #7 of the 8 records returned for the query journal impact factor in the title field |
|
|
There
were other idiosyncrasies in running WoK queries in the open access
databases by WebFeat, which does not seem to handle correctly the intra-cell Boolean
operators either. The other federated search engines did not have such a major problem which effects several widely popular and very large databases. |
|
MetaFind has also flexible grouping options, and a database may appear under several categories, such as EBSCO Academic Search Elite does on this library site. The user can specify how many hits should be returned from the databases (from 5 to 100), and how many should be displayed on a page |
|
|
|
WebFeat also allows a variety of query layouts, but it does not offer an option to control the hits per sources. It displays whatever is the default from each database, and this can yield widely differing ranges from 10 to 100. The uncomfortably long result lists may make the users feel like wading through a marsh, sinking deeper and deeper, especially if an erroneously selected database like Thomas returns 100 "hits" all of them false hits for the query (more about this later). |
|
|
Scoreboards |
|
All the three software provide a scoreboard of the results from the databases searched. MetaLib shows the progress of the search database by database, but when it starts displaying the actual hits, the scoreboard disappears. You may ask for its re-display, but it lays over the result list even though there is enough space to juxtapose it. It cannot be repositioned as a moveable pane could be. |
![]() |
|
MetaFind starts the display of the results with a good scoreboard, showing the progress, and displays the result beneath the scoreboard, echoing back the query. |
|
|
|
WebFeat's scoreboard appears almost immediately and is being updated as the search progresses across databases. The progress is usually lightning fast, but it has a price. WebFeat does not consolidate/federate the results returned, just presents them on an "as is" basis which sometimes yields unwieldy, ill-structured and disorganized result lists. |
|
|
|
|
|
It also offers the result list in a well-laid out brief format |
|
|
|
or full format - all the three from the files already fetched from the target databases. The full format links the users to the full-text source either directly or through Ex Libris' excellent SFX option. The SFX option is available in all the three formats, and is - perfect example of the synergy of federated searching and linking. |
|
|
MetaFind also has a formatted, brief-result list, |
|
|
|
but it is not as compact as MetaLib’s, and not as consistent across the different resources. It is, however, a nice touch that the user can control how many hits should be listed from the resources. The minimum is 5. I would prefer an even lower number as 2-3 hits per database give enough information for the first glance-over and still keep a relatively tight, easy to scan list when showing results from 10-15 databases. |
|
|
|
The more detailed format of MetaFind tries to ram too much information into the description field, including data elements which are not needed by the user, at this stage (such as the clue to check the catalog) or ever (such as the accession number within the database). |
|
|
|
Then again, it is better than what WebFeat offers. It has only a single format. For some of the databases the single format has good structure and content to give a clue to the user if it is worth clicking and be catapulted to the native system for the complete abstracting/indexing or the full record. More often, however, there are irrelevant data elements at this stage (such as my otherwise beloved DOI) as well as pieces of information which are never relevant for the end-users (such as intra-database accession numbers). To make the records even more protracted, almost every field starts in a new line which makes the result list look like a cake served in crumbles. |
|
|
In other records on the result list, the title of the article and/or the author name are often missing |
|
|
… and sometimes the title, the author, the journal names are all absent. Fetching the information about the reading level and the size of the articles in Kbytes does not compensate for the lack of title, author and journal name fields. |
![]() |
|
True, you find these when you are taken to the source to view details |
|
|
|
but WebFeat could have extracted the author and periodical title from the native system's result list and show it while you are still in the result list for further browsing. |
|
|
|
MetaFind can sort the result list by the same criteria plus by order of retrieval, and it shows the same problem in sorting by author as MetaLib, not distinguishing between last name first and first name first formats, so KF Kaltenborn comes ahead of Kenneth A Borokhovich. Sorting by other criteria seemed to be correct. |
|
|
|
|
|
MetaFind's deduping algorithm may not be as sophisticated, but it offers the users option as for their prefererred criteria, such as title, ISBN, ISSN and link URL. The records which have duplicates show the number of duplicates which are kept hidden, but can be invoked to check the supposed duplicates. The higher the duplicate numbers the more important the item may be - if they come from databases of different content providers rather than re-purposed records from various databases of the same family. |
|
|
|
MetaFind must be using rather liberal conditions to detect duplicates which is advantageous when one record misspells the author's name, as PsycINFO does |
|
|
|
Deduplication worked well in 90% of my tests, making only a few mistakes such as the one shown below which falsely identifies two records as duplicates just because part of their title information overlaps. This is where the liberal approach may backfire. |
![]() |
|
Of
course the license fees for the entry-level and high-end federated search
systems vary widely. But this fee must be put in perspective by comparing
it with the license fees that one pays for the digital resources that will
get discovered, be selected more often, and used more effectively by
patrons if a powerful federated search engine is in place. MetaLib and MetaFind are very powerful federated search engines providing comprehensive and high quality federated search services. There are other powerful alternatives, including in-house adaptation of a commercial metasearch software, such as MetaStar. Such in-house developments put the responsibility on the systems librarians' shoulder, but the library of North Carolina State University shows an impressive example how superbly it can be done. More about it later in the digital pre-print version of my Cheers and Jeers for 2004, coming well in December. |