Your data should be securely stored and backed up regularly.
When it comes to storage and backup, there are lots of options. Below are some words to be familiar with, and things to consider when planning for storage and backup.
- Storage Recommended Practices
- Backup Recommended Practices
- Security and Sensitive Data
- Archiving and Preservation
- Related Information
Data Storage: (noun) The use of recording media to retain digital information. This is typically done in an easily accessible location, secondary to the location of collection (though not exclusively so). Examples include local or external hard drives and portable media, networked shared drives, cloud storage and more.
Cloud Storage: (noun) Data storage using large computer networks that connect communications, data, applications and computing to devices such as laptops, desktops, phones and tablets. Cornell has enterprise level contracts with many cloud storage providers, including Box, Google Drive, Amazon AWS, Microsoft Azure and others.
Backup: (noun) a copy of all or a portion of files on a system, in a separate location from the original data, to be used for short-term recovery in the case of corruption or loss; (verb) the act of creating a backup (when used as a verb, typically written as two words, "back up"). A backup is a snapshot in time of your files; how long a backup is kept and how many versions of the backups exist will vary by tool and service.
Archive: (verb) The transfer of material to a facility that appraises, preserves, and provides access to that material on a long-term or permanent basis; (noun) an organization that intends to preserve information for access and use by a specific community, or a site where machine-readable materials are stored, preserved, and possibly redistributed to individuals interested in using the materials. Visit our sharing and archiving data page for more information.
A recommended practice is to keep at least three copies of your data: 1) "here" - a local copy on your lap- or desktop, where the files were created or collected, 2) "near" - an external copy on a different media type than the original and 3) "far" - an external copy in a geographically different location, such as a cloud storage service.
This is also called the Rule of Three: THREE copies, on at least TWO different media types, with ONE copy in an entirely different location.
(I.e., not in the same building, or, depending on your situation and needs, the same part of the country. This third copy would be invaluable in case of environmental risks, such as damage due to fire or water.)
Remember that not all media is appropriate for long-term storage. Mechanical hard disk drives (HDD) have an average life-span of just 4-6 years. Memory sticks are convenient, but are easily lost or stolen. Solid-state drives (SSD) lose charge if left for long periods of time without power.
Read and understand your cloud storage terms of service. In what situations can they close your account? For how long can you restore deleted content? How many versions of your data can you restore? Is it based on number of versions, or how long since you last accessed them?
When selecting your storage tool(s), consider things like how much data you have (size and number of files), who you need to share with, your budget, how long you will need that type of storage, if you have special replication, performance or security needs, or if you need to hold data that requires restricted access or is under HIPAA or export control regulations.
Learn more about storage options supported by Cornell University at the Data Storage Finder tool.
We recommend using an automated service to create regular backups. Cornell offers CrashPlan and EZBackup as options for departments and staff. Personal computers often come equipped with backup software, such as Backup for Windows and Time Machine for Macs. Don't create your backup right on your local hard drive (or a partition on that drive)!
Make sure you know how to recover data from your backups before you need to do it in an emergency. Regularly check that your backup system is functioning properly.
Synchronization with a cloud storage service is not the same as creating a backup. For example, if your computer is stolen and hacked, your cloud files may just as vulnerable as those stored on the "local" space. If you have a corrupted file on your local machine, that corrupted file "syncs" to your cloud space.
Some cloud storage services, like AWS, Box and others do provide both storage and backup services. Learn more about your options at the Data Storage Finder tool, or by talking to a RDMSG consultant.
Consider creating backups of just a portion, instead of all of your data. For example, back up just the most valuable, important or vulnerable subset of your data, or just files that have changed since the last "snapshot" was taken. This can keep backup storage costs minimized.
Security and Sensitive Data
If you work with restricted, HIPAA, export controlled or other sensitive data types, be sure to address those specific needs in you storage methods. Refer to Cornell's Information Technology Security Office's information on data types, the Regulated Data Chart, and the Data Storage Finder tool for help, or consult with the RDMSG for guidance.
Consider encryption, especially if traveling out of the country with a computer or physical drives that hold data.
Be thoughtful about how you share with collaborators. for example, if your data contains sensitive information, restrict access to specific individuals. Sharing to "Anyone with the link" exposes the data to search engines and indexers.
Archiving and Preservation
When you are ready to create an archived version of your data for long term preservation or for sharing with others, consult with an RDMSG expert to find the solution that best matches your needs.
Digital Curation Glossary (Digital Curation Center)
Glossary of Social Science Terms (Inter-university Consortium for Political and Social Research (ICPSR))
- Research Data Retention Policy (Cornell University)
- Ten Simple Rules For Digital Data Storage (Hart et al, 2016)
Page last updated Dec. 2022.