Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
Cross-Database Search: One-Stop Shopping
10/15/2001
You know you want it. Or you know someone who does. One search box and
a button to search a variety of sources, with results collated for easy
review. Go ahead, give in--after all, isn't it true that only
librarians like to search? Everyone else likes to find.
Why should we make our users hunt down the best resource for a given
information need and learn how to use its particular options for
searching? Why not provide them with a simple way to get started? In
the past, we might argue that such a wide-ranging search service was
too difficult or impossible to build. It remains difficult, certainly,
but such a service can no longer be called impossible, as these
examples show.
Cross-database search services
Some early adopters of this type of technology use commercial
applications, while others have built their systems from scratch.
Unfortunately, because most search commercial databases, the curious
are often locked out. However, in a message to the Web4Lib electronic
discussion ("Cross-Database Search Tools Summary"), I listed some staff
contacts for some of these services. Also, some are publicly
searchable.
Searchlight. The California Digital Library (CDL) has offered its
Searchlight tool since January 2000. Based on the Database Advisor
service of the University of California (UC)-San Diego, Searchlight
offers one-stop searching of abstracting and indexing databases,
library catalogs, and web sites, as well as other types of resources.
After selecting which "flavor" of Searchlight they wish to search
(either "Sciences and Engineering" or "Social Sciences and
Humanities"), users type in the search. An intermediary screen
describes what is happening and also counts down the minute that it
will take by default before results are returned (this can be adjusted
by the user).
The user's search words are sent to a wide variety of databases (well
over 100), with the results organized by resource type (books, journal
indexes, electronic journals, e-texts and documents, reference
resources, and web directories). The number of hits is noted beside
each resource, and clicking on that number will automatically take
users to the results in that particular database when possible.
Alternatively, they can click on a link to go to the resource and
search it directly. Anyone can try it out, but those not part of the UC
community won't see results for licensed databases.
The CDL is planning to gather user feedback this fall on how it works
for its own needs and then consider where to take the service. Possible
future directions may include subject-focused cross-database search
tools (for example, one-stop searching of all the best resources in a
particular discipline), or a tool optimized for the needs of
undergraduates to find "a few good things" on a topic.
NLM Gateway. This cross-database search operates in a variety of
databases from the National Library of Medicine (NLM). A web page
describing the service states, "One target audience for the Gateway is
the Internet user who is new to NLM's online resources and does not
know what information is available there or how best to search for it."
Flashpoint. The Research Library of the Los Alamos National Laboratory
wrote an in-house Perl program to search a set of databases
simultaneously. In the article "Flashpoint @ LANL.gov: A Simple Smart
Search Interface," the authors describe how their system underwent
several design iterations in response to user feedback, testing, and
analysis of failed searches. It presently searches nine bibliographic
databases and one full-text database.
King County Library Search. The King County Library System in
Washington State uses the commercial product WebFeat to offer one-stop
searching of its library catalog, web site, and ProQuest databases. The
system was released in November 2000 for user testing, so not all the
databases planned to be included are yet covered. Users are limited to
library cardholders.
Multi-SEARCH. The University of Arizona Library uses OCLC's SiteSearch
software to search multiple databases. SiteSearch uses the Z39.50
protocol to search databases that are compliant with that standard--in
this case, three state catalogs and the OCLC FirstSearch databases.
Software for cross-database searching
Several sites use the WebFeat product to search multiple databases.
Other products that offer similar capabilities include
Fretwell-Downing's Zportal, MetaLib from Ex Libris, Copernic
Aggregator, Endeavor ENCompass, and OCLC's SiteSearch. Several other
site developers have written their own software but then must maintain
it as resources (search targets) change.
More libraries than those noted above are developing their own
cross-database search services, including OhioLink and the National
University of Mexico (UNAM). It's clear that there is a widely
perceived need for one-stop searching of bibliographic databases,
though it is also too early to have much data yet on what features are
essential.
One key challenge for software of this type is how to package up the
search and process the results. Unless the database supports the Z39.50
search protocol, it can be daunting to deal with the particular needs
of a proprietary database. Even if sending the search is
straightforward, the results may emerge via a somewhat primitive
technique called "screen scraping." Screen scraping is the process of
collecting needed information by clues such as the location of the
information on the screen. The problem is that the slightest change in
screen displays can break your process. Some applications are limited
to Z39.50 databases, while others (such as Searchlight) encompass other
databases as well. In general, the more databases a search service
covers the more challenges it will face.
Some early experience indicates that simply broadcasting the search and
getting back results from separate databases is a start but not what
most users really want or expect. Most users likely want such features
as deduping (dropping duplicate records from different databases),
merging and ranking (instead of keeping the results separated by the
source), and methods for trimming down or sorting the results set.
Unfortunately, most of these features are likely to be somewhat
difficult to achieve and probably extremely difficult to achieve with
much accuracy. But increasingly, librarians serving user groups from
the general public to academic researchers are realizing that it is a
goal well worth pursuing.
__________________________________________________________________
LINK LIST
Copernic Aggregator
[124]www.copernic.com/products/aggregator
Cross-Database Search Tools Summary
[125]sunsite.berkeley.edu/Web4Lib/archive/0109/0021.html
Endeavor ENCompass
[126]www.endinfosys.com/encompass.htm
Ex Libris's MetaLib
[127]www.exlibris-usa.com/Metalib/overview.html
Flashpoint
[128]lib-www.lanl.gov/lww/flashpoint.htm
"Flashpoint @ LANL.gov: A Simple Smart Search Interface"
[129]www.library.ucsb.edu/istl/01-summer/article2.html
Fretwell-Downing's Zportal
[130]www.fdgroup.com/fdi/zportal/about.html
King County Library Search
[131]www.kcls.org/wf/webfeatfaq.html
Multi-SEARCH
[132]www.library.arizona.edu/indexes/links/multisearch.shtml
NLM Gateway
[133]gateway.nlm.nih.gov/gw/Cmd?Overview.x
OCLC SiteSearch
[134]www.oclc.org/oclc/menu/site.htm
Searchlight
[135]searchlight.cdlib.org/cgi-bin/searchlight
WebFeat
[136]www.webfeat.org