RESEARCH DATA MANAGEMENT SERVICE GROUP
Comprehensive Data Management Planning & Services

Data Curation Services

Data curation ensures that datasets are complete, well-described, and in a format and structure that best facilitates long-term access, discovery, and reuse.

Funders are increasingly emphasizing the importance of curation and quality assurance when choosing a repository, and included "expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata" in the 2020 Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research

Data curators collaborate with researchers to make data more Findable, Accessible, Interoperable and Reusable by aligning with the FAIR Principles.

CURATE(D) services

The curation process involves a review of a researcher’s data and documentation to ensure the data are as complete, understandable, and accessible as possible. Cornell University Library is also a member institution of the Data Curation Network (DCN), which expands our capacity to curate data from a large number of disciplines and data types by accessing a network of curators across multiple institutions. 

Our curation workflow follows the DCN’s established CURATED Steps:

Check files/code and read documentation (risk mitigation, file inventory, appraisal/selection)

Understand the data (or try to), if not… (run files/code, QA/QC issues, review readme or other metadata)

Request missing information or changes (tracking provenance of any changes and why)

Augment metadata for findability (DOIs, metadata standards, discoverability) ​

Transform file formats for reuse (recommend file formats for longer term reuse and preservation)

Evaluate for FAIRness (usage licenses, links to related research, accessibility)

Document all curation activities throughout the process

Preparing your data 

Consult our submission checklist for datasets in eCommons, which includes guidance that is applicable to readying your dataset for submission to any repository.

eCommons: Cornell's Digital Repository

Cornell University Library's eCommons provides long-term access to a broad range of Cornell-related digital content of enduring value. It is a free, fully open access general repository, making scholarly research results as widely available as possible. We offer data-level curation services to help researchers publish their data and meet FAIR Guiding Principles (Findable, Accessible, Interoperable and Reusable). We can help you:

Visit Cornell's eCommons to learn more, and consult the data deposit policy to decide whether eCommons will work for your data.

Need help with repository information for a data management plan? We can help review a plan and provide you with boilerplate language for eCommons.

Other data repositories

Some researchers may prefer to deposit their data in a disciplinary data archive or other external data repository. In order to identify potential repositories, researchers can search for a suitable repository using the Registry of Research Repositories. We can help researchers choose a data repository and ensure that the data conforms to the repository’s requirements. Curation services are available to all Cornell researchers regardless of the repository chosen.

Cornell also has several other domain-based repositories:

  • CCSS Data Archive (primarily social sciences)
  • CUGIR (Cornell University Geospatial Information Repository)

Get help

We are happy to answer any questions related to data curation. Contact us and a curator will respond within two business days. The curator will work with the researcher(s) to make the data as complete, understandable, and accessible as possible. The process is highly dependent on the complexity of the data set, the extent of curation needs, and the researcher's timeline. You can generally expect that:

  • We can generate a draft DOI within 2 business days.
  • We will create and share a Box folder to allow us to review your data and documentation before submission.
  • Data can generally be published within 1-2 weeks.

Related resources

Data citation best practice

DCN Curation Workflow

eCommons repository

File formats best practice

Guide to writing "readme" style metadata

Introduction to intellectual property rights in data management 

Metadata and describing data 

Preparing FAIR data for reuse and reproducibility 

Preparing tabular data for description and archiving 

Sharing and archiving data

Computational Reproducibility: A Practical Framework for Data Curators (and creators of code!)

United States: Request for Public Comment on Draft Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research, 01/17/2020. Science and Technology Policy Office.