Metadata and describing data

Metadata: the who, what, when, where, why, how of your research

Metadata and describing data

Properly describing and documenting data allows users (yourself included) to understand and track important details of the work. Having metadata about the data also facilitates search and retrieval of the data when deposited in a data repository. The information needed to discover, use, and understand data is referred to as metadata. Metadata describes the who, what, when, where, why, and how of your data in the context of your research and should provide enough information so that users know what can and cannot be done with your data. Metadata can also facilitate search and retrieval of the data when deposited in a data repository, and is an important step in creating FAIR data.

Describing your data

It is important to describe your data with sufficient detail so that users can:

  • reconstruct the context
  • evaluate whether they are fit for purpose
  • further analyze and reuse appropriately

You may need to describe several facets of your data, including:

  • overall bibliographic information about the dataset (e.g. title, author, related publications)
  • types of files used (e.g., csv, txt, png, etc.)
  • key descriptive information about the experiment, (e.g. sampling or measurement methods, software used for analysis, any processing or transformations performed)

Commonly used data formats may be available in your field that help capture and structure relevant metadata. When possible, structure your metadata using an appropriate, agreed-upon metadata standard format (see below for examples and guidelines). When no appropriate metadata standard exists, you may consider composing a “readme” style metadata document.

Metadata formats and standards

Specific disciplines, repositories or data centers may guide or even dictate the content and format of metadata, possibly using a formal standard. Some standards describe general information such as bibliographic metadata, others describe specific data types or are designed for specific disciplines. Some examples of metadata standards are listed below.

To find an appropriate metadata standard for your discipline, try one of these guides:

Tools can also be very helpful for generating structured metadata and are available for a variety of standards and use cases (e.g. Morpho allows for easy creation of EML).

Examples of different metadata standards:

  • Dublin Core – domain agnostic, basic and widely used metadata standard
  • DDI (Data Documentation Initiative) – common standard for social, behavioral and economic sciences, including survey data
  • EML (Ecological Metadata Language) – specific for ecology disciplines
  • ISO 19115 and FGDC-CSDGM (Federal Geographic Data Committee’s Content Standard for Digital Geospatial Metadata) – for describing geospatial information
  • MINSEQE (MINimal information about high throughput SEQeuencing Experiments) – Genomics standard
  • FITS (Flexible Image Transport System) – Astronomy digital file standard that includes structured, embedded metadata

Related information