Library Journal "Digital Libraries" Columns 1997-2007, Roy Tennant

Please note: these columns are an archive of my Library Journal column from 1997-2007. They have not been altered in content, so please keep in mind some of this content will be out of date.
roytennant.com :: Digital Libraries Columns
Coping with Disasters

11/15/2001
   Now that we've all witnessed a disaster that beggars the imagination,
   preventing disaster seems not only an appropriate topic but an
   imperative one. As Mike Handy, acting director of information
   technology services for the Library of Congress, said after September
   11, "Until recently, our planning efforts have assumed the most
   significant threats to be from accidental disruptions such as natural
   calamity, fire, power failure, etc. Obviously, now our assumptions
   include previously unimaginable possibilities."

   Although clearly the most unthinkable disaster would involve the loss
   of life or injury to library staff, that type of disaster planning is
   outside the scope of this column. Rather, I will consider what can be
   done to protect your digital library services and collections from the
   many disasters--whether they be outrageous or minor--that may befall
   those who do not prepare. Through planning and preparation, we can help
   prevent disasters from happening or minimize the damage once they do.
   Prevention

   "An ounce of prevention is worth a pound of cure" remains a valuable
   aphorism for disaster prevention. Everything that you can reasonably do
   to avoid or lessen the impact of disasters by planning ahead of time
   will be well worth your time, effort, and resources.

   For digital systems, the classic prevention technique is an effective
   protection system. Effective computer protection systems are
   constructed in layers. The first layer is the disk itself--or, more
   accurately, the way in which data are stored on the disk.

   The most secure way to store data on hard disks is by using RAID
   technology. RAID, an acronym for Redundant Array of Inexpensive (or
   Independent) Disks, specifies various methods of storing data that are
   optimized for different requirements. For example, if you want to
   provide a reasonable level of performance while achieving a moderate
   level of protection, you may choose a less-protective level of RAID
   (for example, RAID level 1, which is simply mirroring, or creating
   another complete copy of the data). If, on the other hand, protection
   is more important than response time, then a more protective level of
   RAID may be selected. For example, level 5 distributes both the data
   and information required to recover it across several physical disks,
   which can protect you from the failure of multiple drives.
   Layers of protection

   The second layer of protection includes such strategies as
   uninterruptable power supplies (which can prevent disk drive damage in
   power failures), fire extinguisher systems, alarm systems, and other
   methods for securing the computer disks or the room where they are
   kept.

   The third layer entails making copies of the data--backing it up. The
   typical computer backup system copies the data that you wish to retain
   to another disk, tape, or other digital medium. This backup can be
   incremental (only changed files are backed up) or complete. For better
   protection, the second copy should be stored at a location distant from
   the first (and I mean really distant--the farther the better,
   considering the impact of disasters such as earthquakes and
   hurricanes).

   A common technique for locating data close to where it is needed can
   also serve as a default backup system. Called "mirroring," this
   technique was developed primarily in response to slow or costly
   Internet connections. For example, those in Australia must pay a
   per-byte charge for overseas Internet traffic. Therefore, it's helpful
   for them to copy, or mirror, popular sites locally. Not only can this
   serve as a default backup, but it can also be essential in emergencies
   where users can be shunted from the main site to the mirror location.

   What might be considered the logical endpoint of this technique is
   represented by a preservation scheme advanced by Stanford University.
   Called Lots of Copies Keep Stuff Safe (LOCKSS), the strategy employs a
   large pool of interconnected and physically distant computers that
   constantly share copies of each computer's data. If any single computer
   crashes, the data it contained could be recovered from other computers
   still online.

   The system is designed to use standard-issue PCs, even those that would
   be too underpowered to run standard office applications (a typical
   LOCKSS installation would only require something like a PC with a
   100Mhz Pentium chip with 32MB of RAM and one or two large disks).

   Whether LOCKSS is used or not, since the price of hard disk storage is
   so cheap (you can find disk drives for as low as $3/GB now, and prices
   continue to drop), there is no logical reason for not creating multiple
   redundant copies of critical data. This can (and should) be as simple
   as setting up a script to copy all of your data to additional hard
   drives each night. If those drives are physically distant--which the
   Internet enables easily--it is even better.

   Emergency response & recovery

   Good preparation includes knowing what you will do in the middle of an
   emergency. One of the quickest and easiest ways to solve an emergency
   situation is to route users to a mirror (see above). If, for example,
   www.xxx.org goes down, that domain name can quickly be assigned to the
   host computer that has a mirror. Once this change propagates to the
   Internet routing system (which can take from a few hours to a few
   days), users will be none the wiser that they are going to a different
   physical location, since the domain name remains unchanged.

   If you're not lucky enough to have a mirror, you will need to do
   something else. What you do will depend on how essential your operation
   is to those who matter. If your data are important but not essential,
   then hang tight until the emergency passes and you can move on to the
   "recovery" stage. If your systems must be constantly responsive, then,
   one hopes, you will have determined ahead of time how you will cope.
   Again, planning is everything.

   Once the emergency has passed, you should know what steps must be taken
   to get everything back up and functioning. Specifically, you should
   know in advance how to install new hardware and software, retrieve data
   from a backup system, and get everything back online. Here you will
   discover just how well (or poorly) you have prepared. Those who have
   planned well will find this process to be quick and smooth, while those
   who haven't will find it time-consuming and difficult.
   Run with the big boys

   If you have data you can't afford to lose, you can't afford to be
   without a disaster plan. The plan should include aspects dealing with
   prevention, emergencies, and recovery. Luckily, there is little to
   prevent small libraries from having a disaster plan similar to that of
   the Library of Congress, which uses many of the techniques outlined
   here.

   If you don't know where to begin, start with the Federal Emergency
   Management Administration's Emergency Management Guide for Business 
&
   Industry, which will guide you through the process of making a plan
   that will get you through just about anything--except perhaps the
   unimaginable.
     __________________________________________________________________

LINK LIST

   Disaster Recovery Journal's Glossary
   [124]www.drj.com/glossary/glossary.htm

   Emergency Management Guide for Business & Industry
   [125]www.fema.gov/library/bizindex.htm

   Keeping Memory Alive: Practices for Preserving Content at the National
   Digital Library Program of the Library of Congress
   [126]www.rlg.org/preserv/diginews/diginews4-3.html

   LOCKSS
   [127]lockss.stanford.edu

   Public Library Association Tech Note: Digital Disaster Planning
   [128]www.pla.org/technotes/disaster.html