Why Engage in Preservation of Data for the Long Term?
Lisa R. Johnston
- Digital materials degrade gradually; software and hardware become obsolete
- Multiple independent copies of data stored on different devices
- Digital repositories can
- Store data in formats that will remain accessible in the future
- Recreate data in persistent format with metadata describing the original object
- Create virtual computing environment simulating original system or software
- The Digital Preservation Handbook provides a practical guide, including getting started
The data files that you create today will not stay intact forever. They will erode at the bit level, and over time their degree of usability will decrease and disappear at the rate at which available operating systems, software programs, and the necessary hardware can interpret them. In fact, all digital files, not just research data, are at risk of becoming obsolete. Luckily, there are several strategies that can help mitigate known risks and protect your digital files for the years to come.
Storing the data in a safe place is the first defense against deterioration. Hardware devices (e.g., magnetic tape, disk, etc.) have optimal environmental conditions and a life expectancy that will help determine when a media refresh is needed. Keep several identical (copied at the bit level) but separate copies of the files stored on geographically separated and technologically diverse devices. Note that replication is not the same as backup. In a typical storage backup system, a copy of the data is replicated at multiple locations. When a change is made to one copy, that change gets mirrored to the other locations according to the backup schedule (e.g., nightly). Therefore, if one copy is compromised, perhaps through bitrot deterioration, then all copies will be replaced with this corrupt version. Additionally, the preservation of digital content demands disaster planning and risk mitigation. Determine the risks if some or all of the backups fail. Could the data be recovered (e.g., digital forensic approaches applied) and at what costs?
Another long-term preservation solution is to deposit your data to an established and trusted digital repository. Disiplinary data repositories, such as GenBank or Dryad, or institutional archives, such as the Data Repository for the University of Minnesota (DRUM), will ensure that your data are protected, monitored, and safely sheparded into the future. The techniques we use include:
Format migration: Move the contents of a data file from one format to another so that the information remains intact and accessible using whatever software is current.
Format standardization (or normalization): Convert the data files into a consistent (often nonproprietary and prevalent) format that will continue to be read by software for a long time (e.g., ASCII-based text files) or by a wider audience.
Encapsulation: Recreate data in a persistent format, such as XML, with ample metadata (using standards such as PREMIS (Preservation Metadata Implementation Strategies) to describe the original object.
Emulation: Create a virtual computing environment that simulates the original system or software required to run an obsolete data format (e.g., digital art objects).
Preserve the old technology: Maintain the obsolete technology (software and hardware) in order to access the data in their native environment. (Note that Harvey (p.132) warns this approach is “…’ultimately a dead end’ because obsolete technology cannot be maintained in a functional condition indefinitely.”)
Finally there are a number of programs emerging to help preserve digital objects across multiple institutions. Notable organizations include the Lots of Copies Keep Stuff Safe (LOCKSS) initiative and the Digital Preservation Network (DPN). These “dark” archives provide the necessary protection of digital content in the event of administrative or physical failures. In the case of the DPN, users access the network through nodes (like the DuraCloud Vault, service from Duraspace) for safely depositing their digital content.
Local Services for Data Preservation
Data Repository for the University of Minnesota (DRUM) — an open access (public) repository for data that also provided digital preservation of the files for at least 10 years after deposit.
University Archives Electronic Records Unit — Archives and preserves university content that are born-digital and or digital (selection criteria applies).
Digital Preservation Management
(Resource, MIT, Dr. Nancy McGovern, Digital Preservation Workshop)
The Digital Preservation Handbook
(Resource, Digital Preservation Coalition)
Recommended Formats Statement
(Resource, Library of Congress)
Preserving Digital Objects with Restricted Resources
(Resource, Northern Illinois University)
Recommended Tools for Data Preservation
Learn more about the following tools than can facilitate data preservation.
The University of Minnesota Libraries keeps a list of tools that are used by the University of Minnesota Libraries to preserve our wealth of digital materials.
US Library of Congress Digital Preservation — offers a directory of tools and publishes the latest research in file format standards and approaches for preservation.
Used for file format identification, JHOVE identifies, validates (based on the binary header information), and gathers representation metadata for the file such as file pathname or URI, last modification date, byte size, format, format version, MIME type, format profiles, and more.
Software tool that transfers files from one location to another using MD5 checksums that work like fingerprints to verify that the migration was perfect.
Used for generating checksums (MD5) and for returning file format identification reports, DROID links to the PRONOM registry to validate technical information.
Harvey, Ross. “Environmental Storage Conditions for Magnetic Tape, CD-ROM, and DVD.” In Preserving Digital Materials, 125-6. Berlin: Walter de Gruyter, 2005.