Writing READMEs for Research Data

This guide provides a structure for creating a readme file for research data. A data readme file is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data. Standards-based metadata is generally preferable, but where no appropriate standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.

Download a template and adapt* it for your own data!

New!* A guide to Writing READMEs for Research Code & Software also available!

Best practices

Create readme files for logical “clusters” of related files / data. In many cases it will be appropriate to create one document for a dataset that has multiple, related, similarly formatted files, or files that are logically grouped together for use (e.g. a collection of Matlab scripts). Sometimes it may make sense to create a readme for a single data file.

Name the readme so that it is easily associated with the data file(s) it describes.

Write your readme document as a plain text file, avoiding proprietary formats such as MS Word whenever possible. Format the readme document so it is easy to understand (e.g. separate important pieces of information with blank lines, rather than having all the information in one long paragraph).

Format multiple readme files identically. Present the information in the same order, using the same terminology.

Use standardized date formats. Suggested format: W3C/ISO 8601 date standard, which specifies the international standard notation of YYYY-MM-DD or YYYY-MM-DDThh:mm:ss.

Follow the scientific conventions for your discipline for taxonomic, geospatial and geologic names and keywords. Whenever possible, use terms from standardized taxonomies and vocabularies, a few of which are listed below.

Source	Content	URL
Getty Research Institute Vocabularies	geographic names, art & architecture, cultural objects, artist names	http://www.getty.edu/research/tools/vocabularies/
Integrated Taxonomic Information System	taxonomic information on plants, animals, fungi, microbes	http://www.itis.gov/
NASA Thesauri	engineering, physics, astronomy, astrophysics, planetary science, Earth sciences, biological sciences	https://www.sti.nasa.gov/nasa-thesaurus/
GCMD Keywords	Earth & climate sciences, instruments, sensors, services, data centers, etc.	https://earthdata.nasa.gov/earth-observation-data/find-data/gcmd/gcmd-keywords
The Gene Ontology Vocabulary	gene product characteristics, gene product annotation	http://amigo.geneontology.org/amigo/dd_browse
USGS Thesauri	agriculture, forest, fisheries, Earth sciences, life sciences, engineering, planetary sciences, social sciences etc.	https://www1.usgs.gov/csas/biocomplexity_thesaurus/index.html
IUPAC Gold Book	compendium of chemical terminology from the International Union of Pure and Applied Chemistry (IUPAC)	https://goldbook.iupac.org

References

The preceding guidelines have been adapted from several sources, including:

Best practices for creating reusable data publications (2019). Dryad.

Introduction to Ecological Metadata Language (EML) (2012). The Knowledge Network for Biocomplexity.

Document and Store Data Using Stable File Formats (DataONE): Provides useful information about file formats.
File formats (CDS)
File management (CDS)
Introduction to Intellectual Property Rights in Data Management (CDS)
Metadata and Describing Data (CDS)

* Our Readme metadata template is shared under a Creative Commons 1.0 Universal Public Domain Dedication (CC0 1.0). Please adapt, use and share as you see fit; attribution is appreciated when re-sharing, but not required!

Best practices

Recommended content

General information

Data and file overview

Methodological information

Data-specific information

References

Best practices

Recommended content

General information

Data and file overview

Sharing and access information

Methodological information

Data-specific information

References

Related information