roytennant.com :: Digital Libraries Columns

 

Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.

roytennant.com :: Digital Libraries Columns

The New Cataloger


04/15/2006

   I've often said librarians should like any metadata they see. This is
   because we are entering an age where MARC no longer rules, since the
   21st-century library will be handling increasing amounts of
   born-digital material. Even now, librarians are using formats such as
   Dublin Core (DC), Metadata Object Description Schema (MODS), and
   Metadata Encoding and Transmission Standard (METS), among others, to
   capture and manipulate important data about various information
   resources. One metadata standard is way too inadequate for the job.

   Our job requires much more than facility with new formats. We will need
   new kinds of tools that are only now beginning to be imagined and
   created for a growing amount of born-digital material as well as books.
   Publishers are increasingly supplying machine-readable metadata about
   the publications they put out--largely to enable their books to be sold
   by Amazon and other online booksellers. These records could provide
   much enriching information to our existing MARC data if the
   infrastructure were in place to normalize the records. Publishers often
   provide cover art, pull quotes from reviews, descriptive text, author
   biographies, and other useful material that MARC records typically
   lack, which vendors like Syndetic Solutions supply to libraries for
   on-the-fly display.

   The inside scoop

   How do I know this? I walk around with over 10,000 ONIX metadata
   records on my laptop that I downloaded from willing publishers. If we
   had a service to collect these records from publishers and make them
   available to catalogers, we could have access to many valuable facts
   about library materials. The real news is what completely original
   kinds of tasks catalogers will be expected to perform.

   In an online world, where there are many amazing free resources,
   librarians must get better at selecting and providing access to the
   right slice of this material. Part of this will entail harvesting
   (automated gathering) of metadata that describes freely available
   resources. OAIster.org, the mega-harvester site at the University of
   Michigan, has gathered records for over seven million freely available
   resources.

   Gathering is just a start

   As work at the California Digital Library, Cornell University,
   University of Illinois at Urbana-Champaign, and other places
   demonstrates, the 21st-century librarian must be good at normalizing
   and enriching selected piles of metadata. Metadata created for one
   purpose or system may not be optimized for another purpose or system.
   Also, when you aggregate a wide variety of metadata, you find a
   surprising number of variances in encoding practices as well as simple
   errors (see "[145]Bitter Harvest").

   In response, we are investigating ways to normalize and enrich metadata
   for greater versatility. Our first success is a utility for normalizing
   and enriching dates. For example, when given a date as "1880s," the
   function will create four new date fields, from a normalized
   "1880-1889" to a set of date tokens for enabling searching (e.g., 1880,
   1881).

   This type of operation can be executed as a record is captured and
   placed into a database, but other types of metadata transformation
   cannot be performed simply by software, e.g., assigning subjects.
   Experiments with topical clustering software have been encouraging but
   not flawless. The optimum solution may be to enable a cataloger to view
   automatic subjects made by the software and remove or add topic
   assignments.

   A new toolbox

   We also see a need for tools that enable a group of records to be
   selected based on virtually any criteria and then transformed in a
   particular way (e.g., change all occurrences of X to Y). As such, the
   modern cataloger will one day be a software-enabled specialist who can
   gather, subset, normalize, and enrich piles of records for a specific
   audience or purpose.

   The real challenge is the retooling and reeducation of those already in
   the field. A number of LIS programs have adjusted their curricula. A
   good place to start is Karen Coyle's "Metadata: Data with a Purpose."
   The need for catalogers will not go away soon, but what they will be
   asked to do will be very, very different.

   For more on the wired library, see the [146]netConnect supplement
   mailed with this issue and with the January, July, and October 15
   issues of LJ
     __________________________________________________________________

                                             Link List
   Bitter Harvest
   [147]www.cdlib.org/inside/
   projects/harvesting/bitter_harvest.html Date Normalization Utility
   [148]www.cdlib.org/inside/diglib/datenorm Metadata: Data with a Purpose
   [149]www.kcoyle.net/meta_purpose.html
   METS & MODS
   [150]www.loc.gov/standards OAIster
   [151]oaister.org ONIX for Libraries
   roytennant.com/proto/onix