This guide provides a structure for a readme file that can accompany research code or software. It provides headings similar to with those used on platforms where code is actively maintained and includes additional prompts for contextual information that are useful for documenting research code and supporting data validation and reuse.
Download a template and adapt* it for your needs!
Best practices
Platforms that support active development maintenance of code (e.g. GitHub) typically have features that support better capture of some of the metadata elements described below. Similarly, data repositories may have dedicated fields for some of the metadata elements described below. Whenever possible, it is preferable to use these; however, an archived copy of the project in a preservation repository may benefit from a more self-contained readme.
Recommended content
Recommended minimum content for supporting re-use is in bold below.
Project title
- Provide a title for the project
- Note: This is different from any related publications or data.
- Provide a version
- Consider using semantic versioning, so that it is clear how earlier versions relate to the most current version of the code
- Consider maintaining a changelog of important changes for each version of the project
- Note: In platforms where code is actively maintained, it is preferable to take advantage of tagged releases and release notes to document changes between versions
- Provide a short (one-sentence) description (or abstract) of the project
Description
- Provide a detailed description of project, including notable features and its purpose
- Provide the date of creation of project or latest version release (can be a single date, or a range)
- Provide the format(s) of the file or files that comprise the project
- Include the programming language and version when appropriate
- Include format(s) of ancillary files when appropriate
- Provide a list of relevant files (or folders)
- Include a short description of the content of each folder when appropriate
- Include the relationship between files, if important
- Provide the size of the file or set of files that make up the project
Installation
- Provide step-by-step installation steps to install on a user’s machine
- Include system requirements*
- Include necessary libraries or packages for the code to run.
- Note: There are language-specific tools for dependency management (e.g., “requirements.txt” for Python, renv for R) and code authors should take advantage of automated tooling for documenting and managing dependencies wherever possible.
- Include any caveats or potential issues in getting the code installed
Running the software or application
Consider how others will run your software or application when writing out system requirements:
- Is this an application that others can use with their own inputs? In that case, you may only need to identify which operating systems the application will run on.
- Are you sharing code or software that replicates research results? In that case, you may want to provide more detail on the hardware, operating system, and system libraries so that others can match your system as closely as possible or test whether the results hold in different environments.
System requirements
System requirements refer broadly to the configuration a system must have for a software application to run. This might be specified in multiple ways (minimum, recommended, required). Some elements you may want to include are:
- Operating system: Software that supports a computer’s basic functions, such as scheduling tasks, executing applications, and controlling peripherals. (Examples: macOS 13.5, Ubuntu 24LTS, Windows 11.)
- Runtime environment: The environment in which a program or application is executed. (Examples: Java virtual machine, Node.js, Microsoft C++ runtime library)
- System libraries: Programming code, and/or packages that have a well-defined interface from which behaviors can be invoked. (Examples: OpenGL, ActiveX)
- Other dependencies: Anything else that doesn’t fit into the above. (Examples: Python packages your code is dependent on.)
Usage
- Provide instructions on how to use the code
- Include screenshots of functionality
- Include code examples, for projects that are command line utilities or are intended to be used in conjunction with other code
- Include instructions on how to run tests (if applicable)
- Include any caveats to running the code
License
- Provide the licenses and/or explain any restrictions on use
- Note: This should also be in the source code as well
- Provide a recommended citation for others to use
Contact information
- Provide name/ORCID/institution/email information for the following roles, as appropriate
- Principal investigator
- Maintainer (the individual or entity responsible for the regular maintenance of the software)
- Programmer/Developer (the individual or entity responsible for writing the original code
- Copyright Owner (the individual or entity who owns select intellectual rights, if distinct from the above roles)
Acknowledgements
- Provide information about funding sources that supported the creation of the project; include funder name and grant number(s).
- List any publications that cite or use the project
- List any other publicly accessible locations of the project as well as the URL to where most up to date version can be found
- List relationships to ancillary scripts, applications, or data sets
- Be sure to list all contributors.
Download a template and adapt* it for your needs!
References
This guidance has been adapted from the “Guide to writing ‘readme’ style metadata” from the Cornell Data Services and the Software Metadata Recommended Format Guide (SMRF) published by the Metadata Working Group of the Software Preservation Network. Thank you to Seth Erickson, Mikala Narlock, Wanda Marsolek of the Data Curation Network for reviewing an initial draft of this guide.
Related Information
- Data and Code Licensing Primer (DCN): An overview of the licenses that are commonly applied to datasets and code.
- Reusable Code (Turing Way): Explains different levels of reproducibility and offers recommendations.
- Code Documentation (Turing Way): Provides guidance to projects with different types of code.
- Reproducible Environments (Turing Way): Describes a variety of methods for capturing computational environments.
- How to Document Your Research Software (CodeRefinery): Discusses different solutions for implementing and deploying code documentation