eCommons Meets Federal Data Sharing Requirements

In 2022, the National Science and Technology Council (NSTC) released the report “Desirable Characteristics of Data Repositories for Federally Funded Research”, (White House Office of Science and Technology Policy (OSTP), 2022) providing guidance on selecting appropriate data repositories. The characteristics, covering a range of features organized into the themes of organizational infrastructure, digital object management, and technology, are prioritized to help ensure that data resulting from federally funded research is broadly accessible, robustly curated, and preserved over the long term.

Cornell’s institutional repository, eCommons, is a service of Cornell University Library that archives Cornell-related digital content of enduring value. eCommons is an appropriate and well-established institutional repository for researchers to permanently store the datasets, code, and other outputs from federally funded research. eCommons provides free and persistent public access to scholarly research results, and can be documented as the storage, publishing, and preservation location in data management and sharing plans accompanying grant proposals.

We are pleased to share the ways in which eCommons’ infrastructure and services meet the NSTC’s recommendations and satisfy funder public access policies1, publisher requirements2, and federal guidance3. We have not addressed NSTC’s additional considerations for repositories storing human data because eCommons does not accept confidential or restricted data.

Organizational Infrastructure

Free and easy access

The repository provides broad, equitable, and maximally open access to datasets and their metadata free of charge in a timely manner after submission, consistent with legal and policy requirements related to maintaining privacy and confidentiality, Tribal and national data sovereignty, and protection of sensitive data.*

By default, material deposited in eCommons is openly accessible and can be downloaded by anyone without logging in. We offer data curation services – ensuring that data are appropriate for sharing openly, sensitive information has been removed, files are accessible and understandable for other users, and descriptive metadata are provided to facilitate downstream discovery and reuse.    

Clear use guidance

The repository ensures datasets are accompanied by documentation describing terms of dataset access and use (e.g., reuse licenses and need for approval by a data use committee).

We highly recommend the use of our readme metadata template to describe terms of use and facilitate the assignment of Creative Commons (CC) licenses in the submission process.  We suggest using a CC0 waiver to encourage data reuse and expect downstream users of data to follow academic norms of proper attribution and data citation. eCommons automatically offers CC licenses but can accommodate others as appropriate. Learn more about IP and data.

 Risk management

The repository has documented capabilities for ensuring that administrative, technical, and physical safeguards are employed to comply with applicable confidentiality, risk management, and continuous monitoring requirements for sensitive data.

Researchers are directed not to share confidential or sensitive data on eCommons per the deposit policy and license.

Retention policy

The repository provides documentation on policies for data retention.

Our policies to permanently preserve and archive deposited data are described in the Preservation Support Policy. eCommons is committed to preserving the binary form of the digital object. 

Long-term organizational sustainability

The repository has a plan for long-term management of data, including maintaining integrity, authenticity, and availability of datasets; has contingency plans to ensure data are available and maintained during and after unforeseen events. 

As a core service of a well-established institution, eCommons benefits from secure permanent funding, providing reasonable expectation of its long-term sustainability.

Digital Object Management

Unique persistent identifiers

The repository assigns a dataset a citable, unique persistent identifier (PID or DPI), such as a digital object identifier (DOI), to support data discovery, reporting (e.g., of research progress), and research assessment (e.g., identifying the outputs of Federally funded research). The unique PID points to a persistent location that remains accessible even if the dataset is de-accessioned or no longer available.

When your item becomes a part of the eCommons repository it is assigned a persistent URL. eCommons is committed to maintaining the integrity of this identifier to substantiate citations referenced in publication or other communications. Our persistent URLs are registered with the Handle System, a comprehensive system for assigning, managing, and resolving persistent identifiers, known as “handles,” for digital objects and other resources on the Internet. Upon request, eCommons will also assign a Digital Object Identifier (DOI) to a submission. The DOI will appear with the handle as a “Permanent Link” for the item. eCommons DOIs are registered through DataCite, which provides an additional layer of discovery through their metadata registry and search tool. After publication, datasets can be versioned. All versions of a dataset will be accessible, but the dataset DOI will always resolve to the newest version.

Metadata

The repository ensures datasets are accompanied by metadata to enable discovery, reuse, and citation of datasets, using schema that are appropriate to, and ideally widely used across, the communities that the repository serves.

eCommons is an open institutional repository that invites submission of any Cornell research output. As such, our metadata schema and curation process are designed to be broad and inclusive. All discovery metadata, indexed by Google and other search engines, uses the qualified Dublin Core metadata schema and others. For datasets receiving a Datacite DOI, the Datacite metadata schema is used to ensure that broadly applicable infrastructure PIDs such as ORCID, FundRef, and ROR are tied to every publication. All submitters to eCommons are encouraged to provide accessory documentation such as readme files to facilitate data reuse. 

Curation and quality assurance

The repository provides or facilitates expert curation and quality assurance to improve the accuracy and integrity of datasets and metadata.

Curation services are available to all Cornell researchers. The curation process involves a review of a researcher’s data and documentation to ensure the data are as complete, understandable, and accessible as possible. Cornell University Library is also a member institution of the Data Curation Network (DCN), which expands our capacity to curate data from a large number of disciplines and data types by accessing a network of curators across multiple institutions. Our curation workflow follows the DCN’s established CURATED Steps

Broad and measured reuse

The repository ensures datasets are accompanied by metadata that describe terms of reuse and provides the ability to measure attribution, citation, and reuse of data (e.g., through assignment of adequate and openly accessible metadata and unique PIDs).

In addition to offering DOIs for datasets and code submissions, eCommons publishes usage statistics on item landing pages.

Common format

The repository allows datasets and metadata to be accessed, downloaded, or exported from the repository in widely used, preferably non-proprietary, formats consistent with standards used in the disciplines the repository serves.

ECommons strongly suggests the use of open, common file formats to increase likelihood of long-term preservation. Our curators check that files can be opened with widely available software. eCommons uses DublinCore metadata schema and content can be accessed via OAI PMH, the eCommons API, RSS feed, or direct link. 

Provenance

The repository has mechanisms in place to record the origin, chain of custody, version control, and any other modifications to submitted datasets and metadata.

Curated datasets in eCommons have a provenance record including the origin, chain of custody, and any modifications made to the submitted dataset and associated metadata.  Substantial edits made to a dataset after publication will create a new version of your submission and may result in a versioned DOI. Prior versions, organized by date of publication, also remain accessible and downloadable. 

Technology

Authentication

The repository supports authentication of data submitters. The repository has technical capabilities that facilitate associating submitter PIDs with those assigned to their deposited digital objects, such as datasets.

Depositing authors are required to authenticate via Shibboleth

Long-term technical sustainability

The repository has a plan for long-term management of data, building on a stable technical infrastructure and funding plans.

eCommons is committed to preserving the binary form of the digital object. Further practical measures to preserve as much functionality (“look and feel”) of the original content as possible will be taken as resources permit. 

Digital preservation is an evolving field. Current long-term preservation strategies and technologies employed by eCommons are shaped by the Open Archival Information System (OAIS) reference model (ISO 14721:2012) and informed by relevant international standards and emerging best practices. eCommons preservation activities and policies will be reviewed regularly to ensure that they remain current as technology and institutional practices evolve. eCommons employs the use of the DSpace 7 native data integrity tool for checksums. 

Security and integrity

The repository has documented measures in place to meet well established cybersecurity criteria for preventing unauthorized access to, modification of, or release of data, with levels of security that are appropriate to the sensitivity of data (e.g., the NIST Cybersecurity Framework).

eCommons is “GDPR compliant and follows best practices for privacy and security. Users who submit content to eCommons are not asked to provide any personally identifying information. eCommons implements and follows commercially reasonable electronic security measures to secure the systems through which information is collected or stored. Security protections, and all other elements of this policy extend to data copies and backups implemented for business continuity (Lippincott, 2022).” eCommons resides within a container on cloud-based servers hosted by Amazon AWS. Access to the containers and the eCommons database is restricted to Cornell University Library IT (CUL-IT) staff with AWS production permissions.  

We do not currently have software tracking usage or access. We are operating under the agreement that CUL-IT staff will not modify or release data in an unauthorized manner. 

* Text in italics at the beginning of each section are the requirements from the Desirable Characteristics of Data Repositories for Federally Funded Research.

References

  1. NSF Public Access PlanNIH Data Sharing Policy and Implementation Guidance
  2. Scientific Data Data Repository Guidance
  3. Desirable Characteristics of Data Repositories for Federally Funded Research, (White House Office of Science and Technology Policy, 2022)

Related information

Page last updated March 2023.