Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
Getting to "the Right Stuff"
In a world awash with information, finding what you really want can be difficult. Any database or web index can deliver a set of results. But it's particularly difficult to highlight the most relevant "stuff." Web search engines such as Google and Yahoo try their best to recommend some items over others, and now libraries are trying to do this for their holdings. The classic Google-style ranking algorithm takes into account not only the popularity of a particular web page (by how many sites link to it) but also the popularity of the web sites that point to it. A really popular "pointer" page produces higher-ranked results. It's harder to apply the same techniques to searching for books or music, since other strategies for highlighting popular items are required. For example, both Amazon and iTunes offer some version of "those who bought this also bought these"-type of recommendations. Using circulation data Libraries can do something similar if they make use of circulation data, and OCLC is experimenting with just that. "We would like to match circulation data to WorldCat titles to see if it is possible to group or cluster circulations by titles and if these titles are related by subject, author, publisher, etc.," says Lynn Connaway of the OCLC Office of Research. "If we can develop a useful system for clustering circulation data, we may be able to develop a system similar to Amazon's." Connaway quickly adds that OCLC does not wish to have circulation data linked to an individual's identity, only that one borrower checked out a group of titles. This data could then be used to create clusters for recommending books deemed interesting or related. OCLC's efforts are still at an early stage and may never show up in a production service if the research does not demonstrate a useful effect. There are still other ways to recommend books, and another OCLC Research project provides an example. Debuting this month, anyone viewing an Open WorldCat record can write a review, just like on Amazon. OCLC Research has developed a Wiki-based application called "WikiD" ("Wiki for data") that gathers the reviews in a system entirely separate from, but linked to, WorldCat. RedLightGreen The Research Libraries Group (RLG) has for over a year offered RedLightGreen, an interface to the RLIN catalog designed for undergraduates. It uses commercial software from Recommind and a finely tuned algorithm to rank search results. "The Recommind relevance ranking can be tuned to give more or less weight to different elements of the document in computing the score," says Joe Zeeman of RLG. "We spent considerable time adjusting the relative weights of names and titles versus subjects and notes to get relevance scores we were happy with. The relevance score, however, is only a piece of the information we use to organize the result display in RedLightGreen." "We combine the numeric relevance scores...in a search result by the Recommind search engine with holdings information for all manifestations of the work taken from the RLIN Union Catalog to arrive at an aggregate score that combines relevance to the query with scholarly importance as evidenced by the purchasing decisions of our reporting libraries." Ranking techniques Meanwhile, we at the California Digital Library (CDL) are experimenting with different techniques for ranking and recommending within library catalogs, thanks to a Mellon Foundation grant. The project has just gotten underway, but we, too, will be examining circulation and holdings information. "People choose to use systems that are fun, easy to use, responsive, and allow individual quirks and preferences," states Peter Brantley, CDL director of technologies. "We are exploring personalization strategies because we want to build systems that people will use, and use happily, to find the resources they need." All of these projects aim to achieve for libraries what Google and others have done for web indexes. Typing in a few search words and retrieving thousands of undifferentiated hits doesn't do anyone any good. The better we can highlight the right hits for a particular need, the more satisfied our clients will be. It isn't enough to get users to "stuff"; we need to get them to "the right stuff." __________________________________________________________________ Link List CDL Melvyl Recommender Project Recommind RedLightGreen Wiki WorldCat Pilot WikiD WikiD PowerPoint