The terms "backup" and "archiving" are often used interchangeably, as they both relate to saving a specific version of a file, but they are actually very different processes. The term “backup” is used specifically when making copies of various files with the knowledge that the files may change. Backups are kept for a certain amount of time, but can be discarded after a specified time has passed. Archiving is used when a file is to be preserved as-is, often at the end of a project and acts as a static (and usually final) record. [source - DataONE education module]
In addition to planning for local archive storage options (local server, network or SBU’s digital repository), we recommend that you investigate public data repositories within your subject area or discipline. A searchable list of repositories can be found here, and a list of repositories by discipline is here. See Data Repositories for more information on that option.
In many cases, SBU’s Open Access repository Academic Commons Data can be a suitable archive and sharing mechanism for your data. All items deposited into Academic Commons receive a persistent identifier (DOI or ARK), are freely available to anyone, and are full-text searchable, making them discoverable through Google, Google Scholar and other large search engines. If you are interested in depositing data into Academic Commons, or have further questions, please contact me.
Digital data are fragile, regardless of which storage medium you choose (DVD, hard disk, tapes, etc.). Digital data are susceptible to bit rot, and are likely to degrade or decay over time. The recommended methods for combating bit rot are refreshment and replication.
Refreshment: Periodically copy your data onto a new drive or disk (every 2-5 years).
Replication: Maintain your original copy, an external copy, and an external remote copy. Use at least two forms of storage in two different locations.
For long-term archiving of finalized data, personal computers and external storage devices are NOT recommended.
Does anyone remember Quattro Pro or Lotus 1-2-3? Exactly. When you archive the final version of your dataset(s), consider using an open, non-proprietary format to ensure that you will be able to fully access it/them in the future. Common file formats for text-based data are plain text (ASCII), HDF and NetCDF. Multimedia formats include JPEG 2000, MNG and PNG. For a list of many other open formats, see here.
If you prefer to keep your data in a proprietary format, there are a couple of ways to ensure continued access to older datasets. When new software versions are released and become established, migrate your older datasets to the newer version or package. In the case of software that becomes obsolete, you may be able to emulate the older software using a virtual machine. The recommended best practice however, is to convert your data to an open format, which facilitates both preservation and sharing.