roytennant.com :: Digital Libraries Columns

 

Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.

roytennant.com :: Digital Libraries Columns

Cross-Database Search: One-Stop Shopping


10/15/2001

   You know you want it. Or you know someone who does. One search box and
   a button to search a variety of sources, with results collated for easy
   review. Go ahead, give in--after all, isn't it true that only
   librarians like to search? Everyone else likes to find.

   Why should we make our users hunt down the best resource for a given
   information need and learn how to use its particular options for
   searching? Why not provide them with a simple way to get started? In
   the past, we might argue that such a wide-ranging search service was
   too difficult or impossible to build. It remains difficult, certainly,
   but such a service can no longer be called impossible, as these
   examples show.

   Cross-database search services
   Some early adopters of this type of technology use commercial
   applications, while others have built their systems from scratch.
   Unfortunately, because most search commercial databases, the curious
   are often locked out. However, in a message to the Web4Lib electronic
   discussion ("Cross-Database Search Tools Summary"), I listed some staff
   contacts for some of these services. Also, some are publicly
   searchable.

   Searchlight. The California Digital Library (CDL) has offered its
   Searchlight tool since January 2000. Based on the Database Advisor
   service of the University of California (UC)-San Diego, Searchlight
   offers one-stop searching of abstracting and indexing databases,
   library catalogs, and web sites, as well as other types of resources.
   After selecting which "flavor" of Searchlight they wish to search
   (either "Sciences and Engineering" or "Social Sciences and
   Humanities"), users type in the search. An intermediary screen
   describes what is happening and also counts down the minute that it
   will take by default before results are returned (this can be adjusted
   by the user).

   The user's search words are sent to a wide variety of databases (well
   over 100), with the results organized by resource type (books, journal
   indexes, electronic journals, e-texts and documents, reference
   resources, and web directories). The number of hits is noted beside
   each resource, and clicking on that number will automatically take
   users to the results in that particular database when possible.
   Alternatively, they can click on a link to go to the resource and
   search it directly. Anyone can try it out, but those not part of the UC
   community won't see results for licensed databases.

   The CDL is planning to gather user feedback this fall on how it works
   for its own needs and then consider where to take the service. Possible
   future directions may include subject-focused cross-database search
   tools (for example, one-stop searching of all the best resources in a
   particular discipline), or a tool optimized for the needs of
   undergraduates to find "a few good things" on a topic.

   NLM Gateway. This cross-database search operates in a variety of
   databases from the National Library of Medicine (NLM). A web page
   describing the service states, "One target audience for the Gateway is
   the Internet user who is new to NLM's online resources and does not
   know what information is available there or how best to search for it."

   Flashpoint. The Research Library of the Los Alamos National Laboratory
   wrote an in-house Perl program to search a set of databases
   simultaneously. In the article "Flashpoint @ LANL.gov: A Simple Smart
   Search Interface," the authors describe how their system underwent
   several design iterations in response to user feedback, testing, and
   analysis of failed searches. It presently searches nine bibliographic
   databases and one full-text database.

   King County Library Search. The King County Library System in
   Washington State uses the commercial product WebFeat to offer one-stop
   searching of its library catalog, web site, and ProQuest databases. The
   system was released in November 2000 for user testing, so not all the
   databases planned to be included are yet covered. Users are limited to
   library cardholders.

   Multi-SEARCH. The University of Arizona Library uses OCLC's SiteSearch
   software to search multiple databases. SiteSearch uses the Z39.50
   protocol to search databases that are compliant with that standard--in
   this case, three state catalogs and the OCLC FirstSearch databases.
   Software for cross-database searching

   Several sites use the WebFeat product to search multiple databases.
   Other products that offer similar capabilities include
   Fretwell-Downing's Zportal, MetaLib from Ex Libris, Copernic
   Aggregator, Endeavor ENCompass, and OCLC's SiteSearch. Several other
   site developers have written their own software but then must maintain
   it as resources (search targets) change.

   More libraries than those noted above are developing their own
   cross-database search services, including OhioLink and the National
   University of Mexico (UNAM). It's clear that there is a widely
   perceived need for one-stop searching of bibliographic databases,
   though it is also too early to have much data yet on what features are
   essential.

   One key challenge for software of this type is how to package up the
   search and process the results. Unless the database supports the Z39.50
   search protocol, it can be daunting to deal with the particular needs
   of a proprietary database. Even if sending the search is
   straightforward, the results may emerge via a somewhat primitive
   technique called "screen scraping." Screen scraping is the process of
   collecting needed information by clues such as the location of the
   information on the screen. The problem is that the slightest change in
   screen displays can break your process. Some applications are limited
   to Z39.50 databases, while others (such as Searchlight) encompass other
   databases as well. In general, the more databases a search service
   covers the more challenges it will face.

   Some early experience indicates that simply broadcasting the search and
   getting back results from separate databases is a start but not what
   most users really want or expect. Most users likely want such features
   as deduping (dropping duplicate records from different databases),
   merging and ranking (instead of keeping the results separated by the
   source), and methods for trimming down or sorting the results set.

   Unfortunately, most of these features are likely to be somewhat
   difficult to achieve and probably extremely difficult to achieve with
   much accuracy. But increasingly, librarians serving user groups from
   the general public to academic researchers are realizing that it is a
   goal well worth pursuing.
     __________________________________________________________________

LINK LIST

   Copernic Aggregator
   [124]www.copernic.com/products/aggregator

   Cross-Database Search Tools Summary
   [125]sunsite.berkeley.edu/Web4Lib/archive/0109/0021.html

   Endeavor ENCompass
   [126]www.endinfosys.com/encompass.htm

   Ex Libris's MetaLib
   [127]www.exlibris-usa.com/Metalib/overview.html

   Flashpoint
   [128]lib-www.lanl.gov/lww/flashpoint.htm

   "Flashpoint @ LANL.gov: A Simple Smart Search Interface"
   [129]www.library.ucsb.edu/istl/01-summer/article2.html

   Fretwell-Downing's Zportal
   [130]www.fdgroup.com/fdi/zportal/about.html

   King County Library Search
   [131]www.kcls.org/wf/webfeatfaq.html

   Multi-SEARCH
   [132]www.library.arizona.edu/indexes/links/multisearch.shtml

   NLM Gateway
   [133]gateway.nlm.nih.gov/gw/Cmd?Overview.x

   OCLC SiteSearch
   [134]www.oclc.org/oclc/menu/site.htm

   Searchlight
   [135]searchlight.cdlib.org/cgi-bin/searchlight

   WebFeat
   [136]www.webfeat.org