Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
roytennant.com :: Digital Libraries Columns
Getting to "the Right Stuff"

09/15/2005
   In a world awash with information, finding what you really want can be
   difficult. Any database or web index can deliver a set of results. But
   it's particularly difficult to highlight the most relevant "stuff." Web
   search engines such as Google and Yahoo try their best to recommend
   some items over others, and now libraries are trying to do this for
   their holdings.

   The classic Google-style ranking algorithm takes into account not only
   the popularity of a particular web page (by how many sites link to it)
   but also the popularity of the web sites that point to it. A really
   popular "pointer" page produces higher-ranked results.

   It's harder to apply the same techniques to searching for books or
   music, since other strategies for highlighting popular items are
   required. For example, both Amazon and iTunes offer some version of
   "those who bought this also bought these"-type of recommendations.
   Using circulation data

   Libraries can do something similar if they make use of circulation
   data, and OCLC is experimenting with just that. "We would like to match
   circulation data to WorldCat titles to see if it is possible to group
   or cluster circulations by titles and if these titles are related by
   subject, author, publisher, etc.," says Lynn Connaway of the OCLC
   Office of Research. "If we can develop a useful system for clustering
   circulation data, we may be able to develop a system similar to
   Amazon's."

   Connaway quickly adds that OCLC does not wish to have circulation data
   linked to an individual's identity, only that one borrower checked out
   a group of titles. This data could then be used to create clusters for
   recommending books deemed interesting or related. OCLC's efforts are
   still at an early stage and may never show up in a production service
   if the research does not demonstrate a useful effect.

   There are still other ways to recommend books, and another OCLC
   Research project provides an example. Debuting this month, anyone
   viewing an Open WorldCat record can write a review, just like on
   Amazon. OCLC Research has developed a Wiki-based application called
   "WikiD" ("Wiki for data") that gathers the reviews in a system entirely
   separate from, but linked to, WorldCat.
   RedLightGreen

   The Research Libraries Group (RLG) has for over a year offered
   RedLightGreen, an interface to the RLIN catalog designed for
   undergraduates. It uses commercial software from Recommind and a finely
   tuned algorithm to rank search results.

   "The Recommind relevance ranking can be tuned to give more or less
   weight to different elements of the document in computing the score,"
   says Joe Zeeman of RLG. "We spent considerable time adjusting the
   relative weights of names and titles versus subjects and notes to get
   relevance scores we were happy with. The relevance score, however, is
   only a piece of the information we use to organize the result display
   in RedLightGreen."

   "We combine the numeric relevance scores...in a search result by the
   Recommind search engine with holdings information for all
   manifestations of the work taken from the RLIN Union Catalog to arrive
   at an aggregate score that combines relevance to the query with
   scholarly importance as evidenced by the purchasing decisions of our
   reporting libraries."
   Ranking techniques

   Meanwhile, we at the California Digital Library (CDL) are experimenting
   with different techniques for ranking and recommending within library
   catalogs, thanks to a Mellon Foundation grant. The project has just
   gotten underway, but we, too, will be examining circulation and
   holdings information.

   "People choose to use systems that are fun, easy to use, responsive,
   and allow individual quirks and preferences," states Peter Brantley,
   CDL director of technologies. "We are exploring personalization
   strategies because we want to build systems that people will use, and
   use happily, to find the resources they need."

   All of these projects aim to achieve for libraries what Google and
   others have done for web indexes. Typing in a few search words and
   retrieving thousands of undifferentiated hits doesn't do anyone any
   good. The better we can highlight the right hits for a particular need,
   the more satisfied our clients will be. It isn't enough to get users to
   "stuff"; we need to get them to "the right stuff."
     __________________________________________________________________

   Link List
   [123]CDL Melvyl Recommender Project
   [124]Recommind                     [125]RedLightGreen
   [126]Wiki WorldCat Pilot           [127]WikiD        [128]WikiD PowerPoint