“Working” file formats (i.e, those used when collecting and working with project data) may not be ideal for re-use or long-term preservation.
In the absence of specific directives from funders or repositories, we offer the following general guidelines for selecting file formats for preservation and reuse. eCommons@Cornell, a repository service based at Cornell University Library, provides more detailed information in their support document, Recommended File Formats for eCommons.
Principles for selecting file formats
Select open, non-proprietary formats
Open, non-proprietary formats are better for re-use and long-term preservation, as they are independent of costly software for use and may be generated and opened in an open or free software. As a general rule, plain text formats, such as comma- or tab- delimited files, are open formats and are typically better for re-use and long-term preservation.
- Example of a proprietary format: Photoshop .psd file
- Example of an open format: .tiff image file
Select “lossless” formats
Formats that compress the information in a file are smaller, but the compression may permanently remove data from the file. These formats are “lossy,” while formats that do not result in the loss of information when uncompressed are “lossless.”
- Example of lossy formats: .mp3 audio file, .jpeg image file
- Example of lossless formats: .wav audio file, .tiff image file
Select unencrypted and uncompiled formats
If the encryption key, passphrase, or password to a file is lost, there may be no way to retrieve the data from the file later, rendering it unusable to others. Uncompiled source code is more readily re-usable by others because recompiling is possible with different architectures and platforms.