These are the questions asked most frequently at the RDMSG's information sessions on the NSF's data management plan requirement. Our answers are based on the collective research, computational, and information management experience of the RDMSG consultants, as well as the stated requirements of research funders. Researchers may also wish to consult with the cognizant program officer. If you have a question that's not addressed below, contact email@example.com for more information.
- What data do I have to share?
- How soon do I have to share my data?
- For how long do I have to provide access to my data?
- Where can I put my data? Does the NSF have a data repository?
- I already publish my data in journals. Isn’t that good enough?
- How do I budget for storing or providing access to data beyond the end of the grant?
- I don’t think I can share my data because of issues related to privacy and confidentiality (or license agreements, commercial applications, etc.)
- May I send my data management plan to the RDMSG for review?
- Do you have some sample data management plans I can look at?
Some directorates offer more specific guidelines. Engineering (ENG), for example, includes “analyzed” data in that directorate’s policy, meaning those data that are published in articles, dissertations, or supplementary materials. Note that figures within a publication aren’t sufficient – tables of the numbers used to create figures should be made available. In the guidance we’ve seen so far, sharing raw data is not typically required. Where no specific guidance is available, we recommend researchers keep in mind two things when deciding which data to share:
- What data are necessary to reproduce or validate your results? Note that this may include code.
- What data have the potential for reuse by others?
Some directorates offer more specific guidelines. Those we’ve reviewed typically require researchers to share data within 2-3 years of collection, the end of the award, or publication, so those are some benchmarks to consider if your directorate or the solicitation does not offer more specific guidelines. [back to top]
Some directorates offer more specific guidelines (ENG specifies data should be kept at least for three years). If you are depositing your data in a data center or archive with a long-term commitment to providing access to the data, then you should simply state this in your plan. If you plan to host the data yourself or pay a service provider to host it for you, then you should specify a time period that is reasonable and that your budget can sustain, and explain that in your data management plan. [back to top]
The NSF does not maintain a general purpose data repository, although some directorates and programs recommend certain repositories (see for example the data policy of the Division of Ocean Sciences). If no recommendations are provided, some of your options include disciplinary repositories, Cornell’s institutional repository (eCommons, for smaller data sets), the CCSS data archive, publishers (for data related to publications), or a custom solution for your project. Please contact firstname.lastname@example.org for specific recommendations. [back to top]
That depends on what you mean. Data in papers in tables and figures is not really considered to be an adequate means of sharing. If, on the other hand, you already publish in journals which require data sets related to papers to be deposited in a data center and manuscripts to include citations or accession numbers for those data sets, that would be a reasonable data sharing plan for those data sets. More and more journals are beginning to accept data sets as supplementary materials and/or to require that authors make their data available. [back to top]
NSF does allow for costs associated with data management (typically line G2, with an explanation in the budget justification). If you are depositing your data in a data center or archive, then your data will probably be available for the long term. Most data centers or repositories either accept data free of charge (if it is within their collection scope) or charge a one-time fee at the time of deposit, making budgeting fairly straightforward. Currently, Cornell doesn’t offer any services which allow up-front payment for longer term storage, although the RDMSG is aware of the need for such a service and is considering different options. [back to top]
7. I don’t think I can share my data because of issues related to privacy and confidentiality (or license agreements, commercial applications, etc.).
NSF recognizes that there are legitimate circumstances under which an investigator simply cannot share their data. Your data management plan should explain those circumstances. [back to top]
Yes. RDMSG provides support for developing data management plans, including suggestions for services and data centers within and beyond Cornell. Contact email@example.com for more information. [back to top]
Some other agencies and institutions do provide some (see links below), and some data repositories or centers provide language for data management plans specifying the use of that repository for data access and preservation. We’re less inclined to provide sample plans until we see what kinds of plans are reviewed favorably. If your data management plan is receives favorable reviews and you are willing to share it with colleagues, please let us know. Sample data management plans:
- Sample NSF data management plans from DataONE (scroll to bottom of page)
- Sample data management plan from ICPSR
- Sample data management plans from NEH (follow link to Data Management Plans in "Grant Programs, Initiatives, and other Related Information" section toward bottom of page)
- Sample data management plans from University of California San Diego
- Public data management plans (good and bad!) from the DMPTool