Sharing data makes it possible for researchers to validate research results, to reuse data, and can increase the impact of that research (Piwowar 2007). Sharing is also required by an increasing number of funders and publishers who seek to maximize the impact of research, ensure results are reproducible, and that sufficient information is included for the scholarly record.
Archiving is one way to share data that specifically focuses on preservation. A sharing platform may be called an archive, a repository, a database, a data center, or another name.
Strategies for archiving and sharing
Data repositories provide varying levels of access and support (see below for more information on choosing a repository). Some have data experts who can provide curation services and long-term management of your data will allow for the data to be preserved into the future. A data repository is a place to archive and share datasets. Some provide curation services; others might provide different levels of access or time frames of preservation. Consider depositing to a repository that provides curation services:
- a discipline-specific data center or repository such as:
- CCSS Data Archive (primarily social sciences)
- CUGIR (Cornell University Geospatial Information Repository)
- NCBI databases (National Center for Biotechnology Information)
- a discipline agnostic repository e.g. Dryad
- Cornell’s digital repository (eCommons)
Other options for sharing that may be preferred or required by a publisher may not be curated or may not guarantee long-term preservation, e.g.:
- submission to a journal publisher in conjunction with a related publication
- publication in a data journal
- submission to non-curated repositories such as Cornell’s Open Science Framework instance, figshare, and Zenodo
While personal or lab websites, Electronic Lab Notebooks (ELNs), wikis, and similar tools may be sufficient for short-term sharing, they are usually not great choices for the long term. CDS can help researchers select an appropriate repository, data journal, or other strategy for sharing data that will ensure the data is discoverable, accessible, and preserved as part of the scholarly record.
Choosing a repository
Repository policies vary; confer with potential repositories or publishers to determine:
- what data they accept, e.g. limits on file and submission sizes, format requirements
- requirements for submission
- long-term preservation policy
- whether there are any fees associated with deposit or curation services
- whether they satisfy the “Desirable Characteristics of Data Repositories for Federally Funded Research” (report issued by the National Science and Technology Council (NSTC) in 2022)
To identify potential places to publish or share data, researchers may consult:
- the list of data publication services at Cornell (Services: Data sharing)
- a Cornell Data Services consultant for help finding and evaluating appropriate curation services, data centers, and repositories
- re3data registry of research data repositories
- CoreTrustSeal Certified repositories
- Generalist repository comparison chart
Issues and exceptions
There are some complex issues associated with making data broadly accessible that researchers need to be aware of.
Intellectual property
Intellectual property issues related to research data are complex. Ownership of data may rest with the researcher, the institution, or the funder, depending on the nature of the researcher’s appointment, grant contract conditions, and whether there are patent implications. Consult the Intellectual Property section of the Data Management Planning guide, under “Section 5. Policies for public access, data sharing, and reuse” for more help explaining circumstances that prevent data sharing in a data management plan. Consult our list of Cornell services related to intellectual property and copyright.
Conditions for reuse
When sharing data, it is important to document conditions for reuse. Documentation should include a description of standard licenses applied to the data, and any additional terms of use. We recommend the use of CC0, which is intended to reduce legal and technical impediments to the reuse of data.
Why CC0? Attribution can become increasingly complex as multiple datasets are combined and reused because derivative work must be licensed under the most restrictive license of all the contributing data sets. This can lead to a difficult-to-navigate situation called “license stacking” or “attribution stacking,” where each reuse of a dataset leads to more restrictive conditions. To prevent this situation, we encourage you to consider CC0, CC-BY, or similar. The use of CC0 does not prevent anyone from following community norms; data citation is always recommended. For a deeper investigation of issues associated with managing intellectual property rights in data projects, see the Introduction to Intellectual Property Rights in Data Management and Cornell University Library’s Copyright Information Center.
Private and confidential data, or data with commercial implications
Researchers may have ethical or legal obligations to maintain confidentiality and to protect the privacy of research subjects, or may have other circumstances requiring secure data storage or restricted access to data, such as licensing restrictions that prohibit data sharing. Data may also be part of a research project with commercialization potential. Funders and publishers recognize that there are legitimate circumstances under which an investigator cannot share their data, and a data management plan should explain those circumstances.
References
Sharing detailed research data is associated with increased citation rate. Heather A. Piwowar, Roger S. Day, Douglas D. Fridsma. PLoS ONE 2(3): e308. 2007. https://dx.doi.org/doi:10.1371/journal.pone.0000308.
Related information
- Data citation (Cornell Data Services)
- Data curation services (Cornell Data Services)
- Data sharing FAQs (Cornell Data Services)
- Desirable Characteristics of Data Repositories for Federally Funded Research (The National Science and Technology Council)
- eCommons Meets Federal Data Sharing Requirements (Cornell Data Services)
- License stacking (Mozilla Science Lab)
- Metadata and describing data (Cornell Data Services)
- Preparing FAIR data for reuse and reproducibility (Cornell Data Services)
- Preparing tabular data for description and archiving (Cornell Data Services)
- Research Data Retention Policy (Cornell University Policy Office)