Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant
Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
Faculty and researchers at universities worldwide gather and interpret data, advocate new ideas, and extend human knowledge. This work is sometimes shared with other scholars and researchers as working papers, technical reports, and other forms of prepublication work. Although this scholarship may eventually show up in a peer-reviewed journal or book, some may not. This preprint culture is strongest in the scientific and technical disciplines, but social scientists share similar works. This 'grey literature' is often difficult to find and even more difficult for librarians to collect systematically, manage, and preserve (see 'What Is Grey Literature?' in the link list). But the web and other digital technologies are changing all that. A variety of web-based systems are becoming available for accepting deposits of papers. These systems make the research output of institutions easier to discover as well as manage and preserve. They also make it possible to share information globally through compliance with a standard metadata harvesting protocol. For an institution wishing to implement a repository, there are now implementation models to consider and software decisions to make. Although you will need to know more to set up a repository, here is a beginning road map. If you wish further information, your next move should be to read the recently released 'The Case for Institutional Repositories: A SPARC Position Paper.' Software Some systems are open source, while others are commercial. Foremost among the free variety of software is ePrints, an open-source project from the UK's University of Southampton. The ePrints solution is squarely focused on the faculty working paper (also called preprint or e-print). The ePrints model assumes that faculty will directly upload their own prepublication scholarship for open access via an institutional or subject-based repository. A number of institutions are now using this software, including CalTech and the Digital Library of the Commons at Indiana University. Another package slated to become open source is DSpace, developed through a partnership between the MIT Libraries and Hewlett-Packard. DSpace is designed to be a more flexible solution than ePrints. It makes fewer assumptions regarding what type of object is being uploaded. Since the programmer who developed ePrints is now a key developer with the DSpace project, DSpace has roots in ePrints but has no doubt surpassed it. MIT is the only user, but once the software is released in open source, other institutions may choose to implement it. The Berkeley Electronic Press (bepress) offers a commercial solution. Bepress provided a sophisticated solution for peer-reviewed journals when the University of California entered into a codevelopment agreement with the press to add key features for institutional repository support. Now the bepress software is both compliant with key standards and simpler to use for those who do not need the peer review capabilities. Implementation models The software platform is but one essential step to creating an institutional repository. Perhaps more important is identifying an appropriate implementation model. There are nearly as many models as there are institutional repositories, but focusing on a few examples may highlight some important differences. MIT uses a distributed model, championed by Southampton's Stevan Harnad and others as 'self-archiving,' whereby individual faculty upload and manage their own scholarly output. DSpace has the widest focus of any repository described here; it explicitly welcomes any scholarly object. 'Educational material in digital formats (e.g., online lecture notes, visualizations, simulations, original graphics) are some of the most valuable assets produced by colleges and universities today and are extremely important to the faculty that create them,' says Mackenzie Smith, DSpace's project director. 'Much of this material is really like a new kind of publication and clearly needs to be captured, managed, and often preserved... what better place to take responsibility for this than the library?' The University of California's eScholarship uses a semidistributed model that assigns management responsibility to organizational units (research units, departments) that then assist faculty with uploading their papers. CalTech uses a semicentralized model, wherein repository sites can be set up for any university unit, but the library uploads the papers on the faculty's behalf. Its digital collections range from computer science technical reports to theses and dissertations. It is too early to tell what benefits will accrue to each model, but it is highly unlikely that any single model will work for all institutions. Each institution should consider alternative models in light of its particular circumstances. Federation for free Any institution implementing a repository using one of the software solutions described above will automatically expose its metadata to harvesting through the Open Archives Initiative Harvesting Protocol. This protocol establishes a standard way for metadata about digital objects to be crawled (retrieved by software) from any repository that complies with the protocol. This harvested metadata can then be indexed along with other harvested metadata to provide one-stop searching for papers on a particular topic. For example, just days after the eScholarship Repository opened in April 2002, records for papers in that repository were showing up in locations such as the EconPapers site. Meanwhile, a project of the University of Michigan called OAIster (say 'oyster') has harvested over half a million records for digital resources using the Open Archives protocol. A significant number of these records come from institutional repositories. Economic models All of the repositories highlighted here began with support from their libraries. How each of these institutional repositories will be sustained over time may vary as much as the implementation models, but in all cases the long-term economic model is unclear. Will each academic institution decide to fund the repository as part of the basic infrastructure? Or will it require the library to charge participating departments for their use of the infrastructure? Although many active in the field expect institutions to fund these services as part of the underlying support for the academic enterprise, it is not yet clear that university administrations will agree. Subject terminologies One of the thorniest issues is the lack of a single controlled vocabulary for fields of scholarly pursuit. For example, 'medicine' may be a perfectly legitimate subject heading for one university, while it would be ridiculously broad for a medical school. When searching or browsing a specific repository, this may not be much of a problem. But as access to institutional repositories becomes federated in central portals, it becomes more problematic. How can a user profitably browse papers from a variety of repositories that use very different subject terminologies? Publication and removal Since at least some of what is being deposited in institutional repositories is 'prepublication,' at least a few will be published in a journal. In some cases, faculty may request that their papers be removed from the institutional repository. eScholarship allows removal of papers, although a citation must always remain. CalTech is more conservative in that it disallows removal. If the journal publisher does not require the removal of the prepublication version, it may still be useful for a reader to discover that a preprint was subsequently published by a journal. Putting such information into the record in the institutional repository is typically the responsibility of whoever deposited the paper originally. From grey to black and white Although the software and implementation model that an institution chooses to employ is still anyone's guess, the likelihood that universities and research institutions will implement something is increasing. Institutional repositories fill an important void and are likely to remain a part of our information landscape. They provide much better access to a literature than has ever previously been possible and should be a no-brainer for most academic institutions. __________________________________________________________________ Link List bepress bepress.com Caltech Digital Collections library.caltech.edu/digital 'The Case for Institutional Repositories' www.arl.org/sparc/IR/ir.html Digital Library of the Commons dlc.dlib.indiana.edu DSpace web.mit.edu/dspace/live EconPapers econpapers.hhs.se ePrints www.eprints.org eScholarship escholarship.cdlib.org eScholarship Repository repositories.cdlib.org /escholarship OAIster oaister.umdl.umich.edu Open Archives Initiative www.openarchives.org What Is Grey Literature? www.nyam.org/ library/greylit/whatis.shtml