Sharing data makes it possible for researchers to validate research results, to reuse data for teaching and further research, and can increase the impact of that research (Piwowar 2007). Sharing is also required by an increasing number of funders and publishers. Funders seek to maximize the impact of the research they fund by encouraging or requiring data sharing. Publishers seek to ensure the research they publish is reproducible, and that sufficient information is included for the scholarly record. While sharing data may pose challenges of "ethical, cultural, legal, financial, or technical nature," it can also pave the way for "more open, ethical, and sustainable science" (Figueiredo 2017).
- Strategies for archiving and sharing
- Choosing a repository
- Issues and exceptions
- Related information
Data sharing encompasses all strategies by which an investigator might make their data available to a broader audience, but not all sharing strategies allow for long-term preservation. Archives and data repositories have data experts who can provide curation services and long-term management of your data. Archiving your data in a trusted respository will allow for the data to be preserved into the future. We encourage researchers to first contact a trusted repository, including the following options:
- deposit to a discipline-specific data center or repository like CISER Data Archive (primarily social sciences), CUGIR (Cornell University Geospatial Information Repository), or the NCBI databases (National Center for Biotechnology Information)
- deposit to a curated discipline agnostic repository like Dryad
- deposit to Cornell's digital repository (eCommons)
Other options for sharing may be preferred or required by a publisher, although they are not curated and do not guarantee long-term preservation. These include:
- submission to a journal publisher in conjunction with a related publication
- publication in a data journal
- submission to non-curated repositories such as Cornell's Open Science Framework instance, figshare and Harvard Dataverse
While personal or lab websites, Electronic Lab Notebooks (ELNs), wikis, and similar tools may be sufficient for short-term sharing, they are usually not great choices for the long term. The best solution will ensure that data is discoverable, accessible, and preserved over the long term. The RDMSG can help researchers select an appropriate repository, data journal, or other strategy for sharing data.
Repository policies will vary; confer with potential repositories or publishers to determine:
- that they will accept the data
- requirements for submission
- long-term preservation policy
- whether there are any fees associated with deposit
In order to identify potential places to publish or share data, or for curation assistance preparing data for deposit into repositories, researchers may:
- consult the list of data publication services at Cornell
- contact an RDMSG consultant for help finding and evaluating appropriate curation services, data centers, and repositories
- locate an external service by searching a catalog of data repositories
Intellectual property issues related to research data are complex. Ownership of data may rest with the researcher, the institution, or the funder, depending on the nature of the researcher's appointment, grant contract conditions, and whether there are patent implications. Consult the Intellectual Property section of the Data Management Planning guide, under “Section 5. Policies for public access, data sharing, and reuse" for more help explaining circumstances that prevent data sharing in a data management plan. You can also consult Cornell services related to intellectual property and copyright for a list of services related to copyright, technology transfer, university policies and more.
Conditions for reuse
Why CC0? Attribution can become increasingly complex as multiple datasets are combined and reused because derivative work must be licensed under the most restrictive license of all the contributing data sets. This can lead to a difficult-to-navigate situation called “license stacking” or “attribution stacking,” where each reuse of a dataset leads to more restrictive conditions. To prevent this situation, we encourage you to consider CC0, CC-BY, or similar. The use of CC0 does not prevent anyone from following community norms; data citation is always recommended. For a deeper investigation of issues associated with managing intellectual property rights in data projects, see the Introduction to Intellectual Property Rights in Data Management and Cornell University Library's Copyright Information Center.
Private and confidential data, or data with commercial implications
Researchers may have ethical or legal obligations to maintain confidentiality and to protect the privacy of research subjects, or may have other circumstances requiring secure data storage or restricted access to data, such as licensing restrictions that prohibit data sharing. Data may also be part of a research project with commercialization potential. Funders and publishers recognize that there are legitimate circumstances under which an investigator cannot share their data, and a data management plan should explain those circumstances.
Sharing detailed research data is associated with increased citation rate. Heather A. Piwowar, Roger S. Day, Douglas D. Fridsma. PLoS ONE 2(3): e308. 2007. https://dx.doi.org/doi:10.1371/journal.pone.0000308.
Data Sharing: Convert Challenges into Opportunities. Ana Sofia Figueiredo. Frontiers in Public Health 5(327). 2017. https://doi.org/10.3389/fpubh.2017.00327
Data citation. Cornell Research Data Management Service Group. http://data.research.library.cornell.edu/content/data-citation. Information and guidance about citing data sets.
Frequently asked questions. Cornell Research Data Management Service Group. http://data.research.library.cornell.edu/content/frequently-asked-questions#question1 Addresses more questions about data sharing.
License stacking. Mozilla Science Lab. https://mozillascience.github.io/open-data-primers/5.3-license-stacking.html More information about attribution stacking.
Metadata and describing data. Cornell Research Data Management Service Group. http://data.research.library.cornell.edu/content/writing-metadata. Information about documenting data for sharing.
Preparing tabular data for description and archiving. Cornell Research Data Management Service Group. http://data.research.library.cornell.edu/content/tabular-data. An outline of best practices for preparing spreadsheets and other tabular data for publication.