CURATED Steps from the Data Curation Network
Data curation ensures that datasets are complete, well-described, and in a format and structure that best facilitates long-term access, discovery, and reuse.
Funders are increasingly emphasizing the importance of curation and quality assurance when choosing a repository, and included "expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata" in the 2020 Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research.
Data curators collaborate with researchers to make data more Findable, Accessible, Interoperable and Reusable by aligning with the FAIR Principles.
The curation process involves a review of a researcher’s data and documentation to ensure the data are as complete, understandable, and accessible as possible. Cornell University Library is also a member institution of the Data Curation Network (DCN), which expands our capacity to curate data from a large number of disciplines and data types by accessing a network of curators across multiple institutions.
Our curation workflow follows the DCN’s established CURATED Steps:
Check files/code and read documentation (risk mitigation, file inventory, appraisal/selection)
Request missing information or changes (tracking provenance of any changes and why)
Augment metadata for findability (DOIs, metadata standards, discoverability)
Transform file formats for reuse (recommend file formats for longer term reuse and preservation)
Evaluate for FAIRness (usage licenses, links to related research, accessibility)
Document all curation activities throughout the process
Preparing your data
Consult our submission checklist for datasets in eCommons, which includes guidance that is applicable to readying your dataset for submission to any repository.
eCommons: Cornell's Digital Repository
Cornell University Library's eCommons provides long-term access to a broad range of Cornell-related digital content of enduring value. It is a free, fully open access general repository, making scholarly research results as widely available as possible. We offer data-level curation services to help researchers publish their data and meet FAIR Guiding Principles (Findable, Accessible, Interoperable and Reusable). We can help you:
- Share your data with the world with a digital object identifier (DOI)
- Increase your reach and impact through data citation
- Protect your work's future through long-term archiving and preservation
- Prepare and curate your data (see Data curation services below)
- Develop appropriate standardized metadata
- Make your data FAIR
Need help with repository information for a data management plan? We can help review a plan and provide you with boilerplate language for eCommons.
Other data repositories
Some researchers may prefer to deposit their data in a disciplinary data archive or other external data repository. In order to identify potential repositories, researchers can search for a suitable repository using the Registry of Research Repositories. We can help researchers choose a data repository and ensure that the data conforms to the repository’s requirements. Curation services are available to all Cornell researchers regardless of the repository chosen.
Cornell also has several other domain-based repositories:
- CCSS Data Archive (primarily social sciences)
- CUGIR (Cornell University Geospatial Information Repository)
We are happy to answer any questions related to data curation. Contact us and a curator will respond within two business days. The curator will work with the researcher(s) to make the data as complete, understandable, and accessible as possible. The process is highly dependent on the complexity of the data set, the extent of curation needs, and the researcher's timeline. You can generally expect that:
- We can generate a draft DOI within 2 business days.
- We will create and share a Box folder to allow us to review your data and documentation before submission.
- Data can generally be published within 1-2 weeks.
Computational Reproducibility: A Practical Framework for Data Curators (and creators of code!)
United States: Request for Public Comment on Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research, 01/17/2020. Science and Technology Policy Office.