Data Curation Services

Data curation ensures that datasets are complete, well-described, and in a format and structure that best facilitates long-term access, discovery, and reuse.

Funders are increasingly emphasizing the importance of curation and quality assurance when choosing a repository, and included “expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata” in the 2022 Desirable Characteristics of Repositories for Managing and Sharing Data Resulting From Federally Funded Research report.

Data curators collaborate with researchers to make data more Findable, Accessible, Interoperable and Reusable by aligning with the FAIR Principles.

CURATE(D) services

The curation process involves a review of a researcher’s data and documentation to ensure the data are as complete, understandable, and accessible as possible.  The extent of the dataset review will depend on the size of the dataset deposited, how well documented it is, and general staff availability. These reviews are not peer review and do not judge the core scientific analysis, methodologies, or conclusions behind the data. Instead, the purpose of review is to ensure metadata completeness, and data usability and discoverability. 

Cornell University Library is also a member institution of the Data Curation Network (DCN), which expands our capacity to curate data from a large number of disciplines and data types by accessing a network of curators across multiple institutions. 

Our curation workflow follows the DCN’s established CURATED Steps:

Check files/code and read documentation (risk mitigation, file inventory, appraisal/selection)

Understand the data (or try to), if not… (run files/code, QA/QC issues, review readme or other metadata)

Request missing information or changes (tracking provenance of any changes and why)

Augment metadata for findability (DOIs, metadata standards, discoverability) ​

Transform file formats for reuse (recommend file formats for longer term reuse and preservation)

Evaluate for FAIRness (usage licenses, links to related research, accessibility)

Document all curation activities throughout the process

Preparing your data 

Consult our submission checklist for datasets in eCommons, which includes guidance that is applicable to readying your dataset for submission to any repository.

eCommons: Cornell’s Digital Repository

Cornell University Library’s eCommons provides long-term access to a broad range of Cornell-related digital content of enduring value. It is a free, fully open access general repository, making scholarly research results as widely available as possible. We offer data-level curation services to help researchers publish their data and meet FAIR Guiding Principles (Findable, Accessible, Interoperable and Reusable) and federal data sharing requirements. We can help you:

Visit Cornell’s eCommons to learn more, and consult the data deposit policy to decide whether eCommons will work for your data.

Need help with repository information for a data management plan? We can help review a plan and provide you with boilerplate language for eCommons.

Other data repositories

Some researchers may prefer to deposit their data in a disciplinary data archive or other external data repository. In order to identify potential repositories, researchers can search for a suitable repository using the Registry of Research Repositories. We can help researchers choose a data repository and ensure that the data conforms to the repository’s requirements. Curation services are available to all Cornell researchers regardless of the repository chosen.

Cornell also has several other domain-based repositories:

  • CCSS Data Archive (primarily social sciences)
  • CUGIR (Cornell University Geospatial Information Repository)

Get help

We are happy to answer any questions related to data curation. Contact us and a curator will respond within 1-5 business days. The curator will work with the researcher(s) to make the data as complete, well described, and accessible as possible. The process is highly dependent on the complexity of the data set, the extent of curation needs, and the researcher’s timeline. You can generally expect that:

  • We can generate a DOI within 1-5 business days.
  • The full process from curatorial review through data publication in eCommons generally takes 1-2 weeks.

Options for review include:

  • Review and then publish (recommended): Curators will review your submission before it is published. A draft DOI can be requested before data is reviewed. The dataset is not published until curation is complete and the submitter is ready for the files to be public.   
  • Publish then review: Data/code is self-submitted to the repository and is immediately public. A working DOI can be requested after you complete your submission. Note, because curatorial review occurs after publication, any changes or updates may result in a versioned DOI.  

Testimonials

“I have used the service several times and always recommend it to my department. The curation has been excellent and really improves the data flow for my work.” – eCommons depositor 

“I greatly appreciate the support that you’ve provided for data curation/sharing. I think it’s very important, but I might not have done it if it weren’t for your support.” – eCommons depositor 

Browse datasets in eCommons 

Related resources

Data citation best practice (Cornell Data Services)

DCN Curation Workflow (Data Curation Network)

eCommons Meets Federal Data Sharing Requirements (Cornell Research Data Management Service Group)

eCommons repository (Cornell University Library)

File formats best practice (Cornell Data Services)

Guide to writing “readme” style metadata (Cornell Data Services)

Introduction to intellectual property rights in data management (Cornell Data Services)

Metadata and describing data (Cornell Data Services)

Preparing FAIR data for reuse and reproducibility (Cornell Data Services)

Preparing tabular data for description and archiving (Cornell Data Services)

Sharing and archiving data (Cornell Data Services)

Computational Reproducibility: A Practical Framework for Data Curators (and creators of code!) (Sandra Sawchuk and Shahira Khair, 2021)