Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
The Open Content Alliance
About a year ago, Google announced a project to digitize large numbers of books from five research libraries. Dubbed "the Google Five," the University of Michigan, Harvard, Stanford, Oxford, and the New York Public Library signed an agreement with Google to provide portions (or, in the case of Michigan, all) of their collections to Google to be digitized. A year later we still don't know much more about their procedures, but now Google is being sued for digitizing material under copyright while out-of-copyright books are beginning to appear on the Google Print web site. By contrast, a similar initiative was recently announced about which we already know much more. Maybe that's why it's called the Open Content Alliance (OCA), put forward by the Internet Archive, Yahoo!, and a number of large libraries, including my employer, the California Digital Library. Microsoft shortly thereafter announced support as well, and additional libraries likely will join. Yahoo!, Microsoft, and the libraries themselves are paying the Internet Archive to digitize materials at 10 cents a page--an excellent price for nondestructive scanning. The resulting files will be made available at the Internet Archive web site and likely at other locations. Open and accessible Since the OCA is focusing on out-of-copyright material, it is dodging the legal fight that Google is taking head-on. This means that all OCA content will be viewable in its entirety online. But the project goes further. The digitized files and their associated metadata will be available for complete downloading, thereby allowing anyone to create singular presentations of this material. Some books are already available for downloading and printing. The importance of this becomes clearer by visiting the Open Library site, where the Internet Archive has mounted a few dozen of the books already digitized. The method closely resembles paging through a physical book. Although this presentation may seem compelling, some potential drawbacks soon become apparent. It's difficult to jump to a particular chapter, for example, and other features such as searching and the all-important ability to magnify the page don't work yet. Still, if you do not like this orientation, you can create your own. Clicking on "Details" while viewing an Open Library book pulls up a small window giving some core metadata about the title and a link to the Internet Archive site that allows anyone to download a PDF or DjVu format of the book, or even the entire package of digital files from which these presentations were created. These books, in other words, are as open and accessible as possible. Beyond the books themselves, the process itself is open. Only days after the initiative was announced, the University of California partnership agreement with the Internet Archive was made available to the library press. By contrast, months after the Google Print initiative was made known, the University of Michigan, after some pressure, released its agreement with Google. No other library of the Google Five has so far released its agreement. Principles and collaboration The OCA effort, unlike that of Google, is based on respect for collections and the principles behind mass digitization of library materials. Research libraries, writes Dan Greenstein of the California Digital Library in a draft principles document, must "clearly and unambiguously begin articulating what public goods are served by massive digitization of their holdings," plus "articulate and agree to adhere to a set of principles" to ensure that the resulting products "support and promote these public goods." It's unclear whether the OCA project will rival the Google Library project in size. Since it is easier for organizations to participate, the OCA will easily have more participants, but the Google project may lead in the number of digitized volumes if it fulfills its promise. Only time will tell. In any case, more digitized content is likely a better thing overall. The agreement between the University of California and the Internet Archive emphasizes that the initiative is collaborative, as both parties must agree to a protocol that will set up procedures for, among other things, moving the books to and from the Internet Archive digitization shop, identifying and attaching appropriate metadata to the scanned files, and assessing the scanned files against appropriate standards. Collaborations among participating libraries are also likely, if for no other reason than to minimize duplication. There are other opportunities for collaboration and not just among OCA libraries but with the "Google Five" and many other institutions involved with digitizing content. Open digitized content, after all, is a growing boon to all of our libraries and the users we serve. For more on the wired library, see the netConnect supplement mailed with the January, April 15, July, and October 15 issues of LJ. __________________________________________________________________ Link List California Digital Library www.cdlib.org Google Print print.google.com Internet Archive www.archive.org Open Content Alliance www.opencontentalliance.org Open Library www.openlibrary.org The Open Library Vision www.openlibrary.org/details/openlibrary