Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
The Benefits of Grid Networks--Digital Libraries
Some recent events have made me think about grid networks. By grid networks I mean both networks of computers and networks of humans connected together in a grid topology. One event was a posting by Lorcan Dempsey, OCLC director of research, to his blog, "WorldCat in Your Pocket." (See the Link List) He describes a computer cluster (or "grid") that OCLC recently acquired to speed up processing of a test version of WorldCat, the 56 million-record bibliographic database that is OCLC's playground. Computer grids A computer grid comprises a set of interconnected "nodes" consisting of one or more CPUs (the brains of the computer) and very fast but volatile memory (RAM). Software then parcels out a computing task to this grid. Amazing heights of processing power can be reached by having all the nodes work on a problem simultaneously. A Beowulf cluster, which is the type of grid system OCLC has, uses off-the-shelf PCs with dual processors and expanded RAM. Since commodity PCs are relatively inexpensive, a cluster of this type is often much less costly than a mainframe computer while delivering similar or even faster computing power. Computer grids are not just for research scientists. Google relies on this kind of technology to deliver search results from billions of web pages in seconds. It uses grids with enough RAM to prevent reading data from a disk, a notoriously slow (relatively speaking) operation. On his blog, Dempsey wrote that using the Beowulf cluster for processing "means that what might have taken a minute now takes two seconds, what might have taken an hour takes two minutes, what might have taken a month takes a day. For jobs that will fit entirely in memory (e.g., a 'grep' of WorldCat), avoiding disk input/output gives another factor of about 20, reducing one-hour jobs down to six seconds." Grep is a UNIX string search that finds specific text anywhere in a record. Therefore, with inexpensive, off-the-shelf hardware components, libraries can do what was once difficult, overly expensive, or impossible. At the time that OCLC purchased its Beowulf cluster, it cost around $100,000-$120,000. Now the same thing would cost less. Storage is already ubiquitous and massive (see "Bigger, Cheaper, Everywhere," LJ 10/15/04, p. 26). Through grid networking technologies, processing power is becoming that way as well. Social grids Meanwhile, I've been working with new forms of professional communication and discussing their impacts with colleagues. I call this phenomenon "social grid networking." By distributing a problem among a group of people, you're likely to get it solved faster and likely better. Just like computer grids. The channels of communication are many and varied. There are many blogs by librarians, and as with any publication, you can quickly discover if they are useful to you. Chat can be either one on one or group. Lately I've begun hanging out on the code4lib chat room, and it's remarkable how much I learn while also fostering stronger connections with colleagues. Link sharing is another form of communication. Seeing what others of similar interest bookmark can be a useful form of current awareness. The unalog link sharing community keeps me current with information and technologies useful to digital library developers. Grid benefits Podcasting is a new communication method, although it is a broadcasting technology of one to many rather than many to many. Podcasting is a recorded message in MP3 format, suitable for downloading to an iPod (thus the term) or other MP3 players. Podcasters typically record a broadcast on a regular basis, similar to a radio broadcast or newspaper column, and users can then download the MP3 file to their player and listen whenever it is convenient. What both computer and social grid networks offer librarians are faster, more effective methods either to solve problems or exploit our opportunities better. It means that our users are increasingly able to take advantage of whatever methods of communication they wish. It also means that libraries are being challenged to deliver information in whatever form(s) our users choose--whether new book lists via RSS or a podcast on how to research a topic. As the premier information profession, we should at least be familiar with all the various methods in which information can be communicated, if not be fully equipped to use whichever form is best for a given purpose or audience. Link List Beowulf Clusters www.beowulf.org unalog unalog.com WorldCat in Your Pocket orweblog.oclc.org/archives/000544.html