Checklist for Reusable Code – Cornell Data Services

Reusable code is well-documented, human-readable, portable, organized, and version-controlled. Creating reusable code helps you, your collaborators, and the broader open science community by making your work easier to understand, adapt, reproduce, and cite. If you would like support, connect with a member of Cornell Data Services at data-help@cornell.edu for a consultation!

Use the quick checklist below for a fast self-check when sharing code for your project. If you want more guidance and recommended practices for each item, please see the expanded checklist section that follows.

Download this checklist as a PDF!

Section 1: Quick Checklist

Well-documented

Include a clear README describing the author contact, project purpose, structure, dependencies, and environment

Provide installation and usage instructions, including a list of required packages and libraries, with version numbers

Add a software license and recommended citation

Document functions, workflows, and any other useful details using comments in your code

Document code changes clearly in changelogs or commit messages

Human Readable

Use whitespace for readability. Consult a programming language style guide for more guidance

Choose descriptive variable, object, and/or function names

Organize code into logical sections with comments describing each part

Portable

Use relative paths instead of absolute paths

Ensure your code can run independently without relying on objects and functions created elsewhere

Capture dependencies in environment files (e.g., requirements.txt)

Organized

Use a consistent directory structure: top-level directories with project title, unique identifier, and date

Give files logical and descriptive names, and avoid special characters and spaces

Keep code, data, results, and documentation in clearly labeled folders

Keep a safe copy of raw data stored separately; work on a separate copy

Version Controlled

Use version control software (e.g., Git) to track changes, collaborate, and recover earlier versions if needed

Host code on a collaborative platform (e.g., GitHub)

Archive a stable version of your code and/or software in a long-term public repository (e.g., eCommons, Zenodo) to ensure reuse and citability

Section 2: Expanded Checklist

Well-documented

Clear documentation in your data and code helps others – and your future self – understand how to use the code, why certain decisions were made, and where you or a collaborator last left off. Ask yourself: will a collaborator, a fellow researcher, or even you, one year from now, be able to quickly understand what’s going on?

Include a clear README

A README is a great place to describe your project, its collaborators, the project structure and workflows, and how the files relate to each other. Add a descriptive README (you can use our software README template ) that includes author contact information, environment info, and file contents.

Provide installation and usage instructions

List required packages and libraries, and where/how to install them. Include versions and/or environment files (e.g., requirements.txt) in your README and include them at the top of your script so users can quickly see what’s needed.

Add a software license and recommended citation

Add a software license and suggested citation to clearly state how others can use your code and how to give you credit. Refer to the Open Source Code website for software license options. For ethical and safe sharing, if your code/software project has associated data, ensure no sensitive information is stored in a public repository and include any relevant ethical/data restrictions.

Document functions, workflows, and any other useful details using comments in your code

Add comments to your scripts to provide context, briefly explain steps, and document your workflow.

Document code changes clearly

Use changelogs or commit messages in Git to clearly describe what changed in the code and why the change was made. This makes it easier to maintain the project and understand the revisions made over time.

Human Readable

Readable code makes it so much easier for others to understand your workflow and pick it up later. As Vince Buffalo says, “write code for humans, write data for computers.” Try to keep things concise, well-formatted, and easy to navigate so users can jump right in.

Use whitespace to enhance readability

Add spacing between major code sections and maintain proper indentation within loops, conditionals, and functions.

Use clear and descriptive variable or object names

Choose names that explain their purpose without requiring additional interpretation. Avoid cryptic abbreviations. When needed, clarify the variable(s) meaning in the documentation.

Organize code into logical sections

Group related tasks together and use section headers or comments to clearly indicate the purpose of each part of your script.

Use comments and documentation to provide context

Add comments when they help clarify intent, explain assumptions, or document non-obvious decisions.

For R users: Clean and reshape your data into tidy format to support consistent, reproducible analysis. See the Tidyverse style guide for more detail.

For Python users: Follow PEP8, Python’s style guide for indentation naming conventions, and line length.

Portable

Portable code runs reliably across different machines and environments without requiring hidden objects, or system-specific settings. Portability helps collaborators use your code seamlessly and reduces errors.

Use relative paths instead of absolute paths

Relative paths reference files based on your project’s structure, while absolute paths point to locations on your specific computer. Absolute paths often break when shared, so use relative paths (e.g., ../code/scripts/analysis.ipynb) to ensure your project works everywhere.

Ensure all steps needed to run the code are included

Scripts should not depend on objects created in previous sessions or elsewhere. Initialize all necessary objects and include all processing steps so the script can run independently from start to finish. If necessary, bundle dependencies with your project.

Test your code in a clean environment

Occasionally run your script in a clean session to check that it still works on its own. This is a simple way to catch portability issues.

For R users: Restart R frequently and disable automatic workspace restoration so you always begin with a clean environment. This helps confirm that your script runs independently of prior sessions.

For Python users: Use virtual environments like conda, venv, etc. to isolate dependencies.

Organized

A well-organized project structure makes your work easier to navigate and share – it saves time, reduces errors, and supports long-term reuse. Follow our guidance for file and data organization.

Use a consistent file and folder structure

Include clear top-level directories (e.g., code/, data/, results/, docs/) and name your project folder with the project title, a unique identifier, and the year.

Use descriptive file names

Give individual files descriptive names that are meaningful even outside the folder structure. Avoid special characters and spaces.

Keep a safe copy of your raw data

Always keep an untouched copy of your original data. Do all your cleaning and processing on a separate version. Your code and notes will show exactly what you did, making it easy for someone else to follow your steps if they ever need to.

For R users: Use RStudio Projects to manage work for each analysis. Projects simplify path management, help keep files organized, and allow you to track the code that generates outputs like plots and reports. This approach is especially helpful when juggling multiple projects

Version Controlled

Version control helps track changes to your code, collaborate effectively, and revert to earlier versions when needed—without losing progress or overwriting work.

Use version control software such as Git

Commit changes regularly using version control software (e.g., Git) to create a traceable history of your project. This creates a transparent record of your project’s evolution, makes it easier to recover mistakes when needed, and supports collaborative development.

Store your code on a collaborative platform

Platforms like GitHub enable sharing, collaboration, issue tracking, and documentation. They also help you manage code across multiple devices or team members.

Archive a stable version of your code for long-term sharing and reuse

GitHub is great while you’re actively working on a project, but for long-term storage, it’s better to use repositories¹ like eCommons or Zenodo. They’ll save your code, data, and metadata and give your project a persistent identifier so others can cite and reuse it.

For Python users: Add a .gitignore file so temporary files like pycache or .ipynb_checkpoints don’t clutter your GitHub repository. Consider using Git tags (e.g., v1.0.0) for clear version tracking.

Download this checklist as a PDF!

References & Acknowledgements

References

Kellam, Lynda; Koziar, Katherine; Pejša, Stanislav. 2019. R Data Curation Primer. Data Curation Network GitHub Repository.

What They Forgot to Teach You About R by Jennifer Bryan, Jim Hester, Shannon Pileggi, E. David Aja.

Collaborative and Reproducible Data Science in R, Associate Professor Nina Overgaard Therkildsen, Cornell University.

Write Accessible Code, Princeton University

“Writing READMEs for Research Code & Software” (CDS)

Open Source Initiative Approved Licenses (OSI)

Data Citation (CDS)

Portable paths for sharing projects (cran)

File management (CDS)

Sharing and archiving data (CDS)

Commit messages (Git)

Tidyverse style guide (R)

Style guide for Python code (PEP)

Workflow: projects (R for Data Science)

Installing Anaconda (Anaconda)

Venv creation of virtual environments (Python)

CCSS Replication Service (Cornell Center for Social Sciences)

Acknowledgements:

A special thank you to Sandi Caldrone, Molly Hirst, Megan O’Donnell, and Vicky Rampin of the Data Curation Network as well as Dianne Dietrich, Wendy Kozlowski, and Sarah Wright of Cornell Data Services for providing feedback on versions of this guide.

Please cite as: Evergreen, G., & McKee, L. (2026). Checklist for Reusable Code. Cornell University Library eCommons Repository. https://doi.org/10.7298/C4FB-3V18