This guide outlines a recommended structure for a README file to accompany research code or software. It includes headings found on platforms where code is actively maintained along with bulleted prompts to help contextualize information, with the goal of making it easier to document research code to support data validation and reuse.
Download a template and adapt* it for your needs!
Need a general readme for data? See our Guide for Writing a Readme for Data.
Best practices
Platforms that support active code development and maintenance (e.g., GitHub) often include built-in tools for capturing key metadata (e.g., releases, versioning, licensing, etc.). Data repositories (e.g., Zenodo) may also offer dedicated fields for describing code. Whenever possible, it is best to use these platform features to reduce redundancy and improve discoverability. However, when archiving code in long-term preservation-focused repositories, a more self-contained README can help ensure essential metadata and information is captured and remains connected to the project.
Note: This README is meant to focus on what the software does, how it can be reused, and how to understand and cite it. It is not intended to serve as a guide for how to structure a software project or enforce software development best practices.
Recommended Content
Recommended minimum content for supporting re-use is in bold.
General Information
- Provide a project name
- The title should be distinct from publications or datasets
- Whenever possible, use the same name as the software program or package to maintain consistency
- Provide a version number to clearly differentiate between various versions of the project
- Consider using semantic versioning, so it is clear how earlier versions relate to the current one (e.g., v1.2.0)
- Consider maintaining a change log of important changes for each version of the project or release notes
- Consider using tagged releases if the platform where the code is actively maintained offers them
- Provide a short description
- This is a one sentence summary of the software project’s purpose or functionality. This should serve as a quick overview, distinct from the full project description below (e.g., Python tool for visualizing climate model outputs)
- Note it might be helpful here to identify if the software project includes source code
Project Overview
- Provide a full description of the software project
- This is a more detailed explanation of the purpose of the software project, including explanation of functionality and use cases
- Provide the date of creation of the project or the latest version release (provide single date, range, or approximate date; suggested format YYYY-MM-DD)
- Describe the organization of the software project so that users know what to expect to find. Document the location and briefly describe the contents of the following components, as applicable for your project:
- Source code
- Pre-compiled binaries
- Tests
- Configuration files
- Build scripts
- Dependencies
- Documentation
- Static resources
Note when describing formats, it can be helpful to include associated file extensions for clarity. For example, “Contents include Python files (.py) and image files (.jpg, .png)”
4. Provide the total size of the project or its components (uncompressed)
Installation
Include the following information so others can install and run your code successfully
- Provide step-by-step instructions to install and set up your software project on a user’s system
- Provide system requirements, such as the operating system, system-level dependencies, or relevant hardware dependencies
- Depending on the project’s purpose, the level of detail should reflect how others are expected to run your software or code
- Provide a list of any additional required dependencies (e.g., libraries, packages, modules) your code depends on
- If any of your dependencies require a specific version, make sure those are specified
- Tip: There are language-specific tools for managing dependencies for your project. (e.g., the Python module venv and the R package renv can be used to create “virtual environments”, which can make your project more portable and reusable by others)
- Tip: A package management tool can generate a list of dependencies for a project (e.g., Python’s pip freeze will output a list of installed packages in a format that can be used to create a “requirements.txt” file)
- Provide a description of any setup requirements (e.g., environment variables, configuration files) that users will need to configure manually
- Provide any known issues or caveats during installation (e.g., compatibility issues or known bugs)
Usage
Include the following information so others can use your code after it’s been installed
- Provide instructions on how run the software or execute the code and include a brief description of what the expected output or behavior should be
- Provide usage examples (e.g., if the project requires use of multiple scripts, describe the order they should be run; describe any expected input and output files)
- Include screenshots where appropriate to describe functionality
- Document how to run any tests
- Note any known caveats or limitations
License
- Provide a license and LICENSE file and/or explain any restrictions on use
- Note: This should also be in the source code as well
- Visit https://choosealicense.com for useful and short summaries on the licenses
- Provide a citation that users can reference in publications
Contact information
- Include names and contact details (email, ORCID, institution for the following roles, as relevant:
- Principal investigator
- Maintainer (the individual or entity responsible for the regular maintenance of the software)
- Programmer/Developer (the individual or entity responsible for writing the original code
- Copyright Owner (the individual or entity who owns select intellectual rights, if distinct from the above roles)
Acknowledgements
- Provide a list of funding sources that supported the creation of the software project; include funder name and grant number(s)
- Cite any publications using this software project
- Link to other locations where the software project is available (e.g., Zenodo, GitHub, institutional repository)
- List relationships to ancillary scripts, applications, or data sets
- List all contributors and their roles
Download a template and adapt* it for your needs!
References
This guidance has been adapted from the “Guide to writing ‘readme’ style metadata” from the Cornell Data Services and the Software Metadata Recommended Format Guide (SMRF) published by the Metadata Working Group of the Software Preservation Network. Thank you to Peter Cerda, Talya Cooper, Seth Erickson, Wanda Marsolek, Mikala Narlock of the Data Curation Network for providing feedback on versions of this guide.
Related Information
- Data and Code Licensing Primer (DCN): An overview of the licenses that are commonly applied to datasets and code.
- Reusable Code (Turing Way): Explains different levels of reproducibility and offers recommendations.
- Code Documentation (Turing Way): Provides guidance to projects with different types of code.
- Reproducible Environments (Turing Way): Describes a variety of methods for capturing computational environments.
- How to Document Your Research Software (CodeRefinery): Discusses different solutions for implementing and deploying code documentation