Primer on Data Management from DataONE
Data Management Guidelines from the California Digital Library
Online tool for creating data management plans, with templates for many funding agencies.
Many research funders require a data management plan with a grant proposal. The guide below describes the major areas that researchers should consider in preparing a data management plan.
This guide is not specific to any particular funder, discipline, or type of data, and prospective PIs should always review the specific proposal request documents and requirements of the funder.
What types of data, samples, physical collections, code, software, curriculum materials and other materials will be produced in the course of the project?
- including a brief description of each type of data to be generated (e.g. experimental, qualitative, raw, processed).
- how much data you anticipate will be generated over the course of the project.
- which data you will share and at what stage (raw, processed, reduced, or analyzed).
- why the data you will share will be of interest to a broader community and how your plan will maximize potential for reuse.
- if you are using data from other sources. If so, provide a brief description, including content, source, and any conditions required for obtaining and using that data. If you will combine existing data with your own, describe the relationship between the data sets.
- formats of data files created over the course of the project, and approximate volume of data.
- Select non-proprietary file formats for sharing and archiving to maximize the potential for reuse and longevity, and describe the plans for conversion to those formats, if necessary.
- the metadata that will be created or captured, when it will be created, and who will create it.
- Identify community metadata standards used. Indicate if no applicable standards exist and describe what additional documentation you will provide to make the data understandable and usable by others (ex: Readme file).
- data organization, such as how data will be distributed among files, file naming conventions, directory organization, and version management.
- who will have primary responsibility for implementing the data management plan?
- If multiple institutions are involved, funding agencies typically task the lead PI with executing the DMP.
- plans for transfer of responsibility if key personnel depart from the project.
- how the data sets will be stored (if secure storage and/or restricted access are required) and backed up during the course of the project.
- Describe hardware, storage environment, and local or external services to be used.
- Include the costs for these services in proposal budget, if applicable.
- who will have access to working data and how will access be managed before and after the grant period.
- how the data will be transferred and shared between collaborators.
How will you meet funder requirements to provide public access to your data while protecting privacy, confidentiality, security and intellectual property rights?
- conditions for reuse of the data by others including any licenses that will be applied.
- whether data acquired from another source will be shared, and under what conditions.
- how the data will be managed to protect privacy (e.g. measures taken to anonymize data, disposition of data including personally identifiable information).
- legal and ethical requirements that may preclude sharing of any of your data. If so, explain the circumstances that prevent you from sharing data.
- if your research is subject to oversight by the Institutional Review Board. Refer to applicable requirements and describe how your data management practices will ensure compliance.
- copyright protection and whether it extends to your data. Some standard licensing options (Creative Commons, Open Data Commons) exist. Many metadata standards accommodate rights or usage statements where conditions for reuse may be expressed.
- that funding agencies (including the NSF) often recognize that commercialization potential may delay or preclude data sharing, and exempt trade secrets and commercial information from the data sharing requirement.
Note, some of these issues may already by addressed in the section on public access, sharing and re-use.
- any departmental, institutional, or programmatic policies on data retention, how they influence your plan, and how you will adhere to the policies.
- how long data will be retained or preserved and why.
- Some data may only be retained for the lifetime of the project, some may be retained for the project plus a specified number of years, and some may be worth the effort of long-term preservation (several years to decades).
- Consider what data are needed to validate the research, what data directly support publications based on the research, and what data have the greatest potential for reuse.
- hardware or campus or commercial services to be used to assure data preservation
- costs for any of these activities or services. These may be included in your proposal budget.
- The Cornell University Cyberinfrastructure Plan, which describes key information technology elements and deliverables supporting research at Cornell.
These may include:
- Monitoring and reporting
- Specific assurance of having resources to carry out the plans
- Data processing workflow (e.g. how you plan to get data from point of collection to point of access)
- Data quality assurance or quality control measures
- For sensitive data, the security measures and any formal standards that will be used (e.g. biological agent permitting)