Contents
What is metadata?
Metadata is a type of data that describes and provides information about other data. Essentially, metadata is data about data.
TYPES OF METADATA
There are many types of metadata and many ways it can be organized. To help with metadata management, numerous international standards have been created to help structure and guide the collection of metadata. We've reviewed metadata standards and used our experience with working with data to create an organized list of common metadata types. This list is meant to be a practical starting point for structuring and collecting metadata -- we encourage you to look further into metadata standards and think critically about what would be most useful for your specific project.
IDENTIFICATION INFORMATION
Title
Description
Source
Links
Industry description/NAICS code(s)
Date coverage
Update frequency/latest update date
Date collected
Date modified
Access/usage/restriction notes
DATA QUALITY INFORMATION
Accuracy information/description (caveats, confidence intervals, gaps)
Lineage (raw administrative data or estimates/calculations; if calculated, include formula, algorithm, process, etc.)
SPATIAL REFERENCE INFORMATION
Geographic coverage
Indirect spatial reference (municipalities, census subdivision, fish plant locations, etc.)
Geographic format (points, polygons, lines)
VARIABLE INFORMATION
Label (i.e. column and/or row headings)
Description
Data type (text, numeric, date, currency, etc.)
Date coverage
Accuracy information
This is not an exhaustive list of metadata types and categories, however, this list provides a good starting place when beginning to document metadata and demonstrates the type of information that is useful to capture. Similarly, not all metadata will be available for every dataset or variable and that’s okay. Documenting the gaps in your metadata can be just as useful and informative as documenting the available metadata.
Example dataset with metadata
The benefits of metadata
Metadata helps establish trust and transparency in data and analyses. When all information about a dataset is available, community members are empowered to discuss and debate the data and analysis.
Documenting the data discovery and collection process through a standardized metadata process will help both current and future users reference, find, and collect the same data.
Metadata plays an important role in data validation, study replication, and quality control.
Documenting data attributes as metadata creates a simplified, comprehensive reference guide for your data.
A metadata file is a great place to document any changes that you have made to the dataset after it was downloaded from the source. Metadata can also include notes on any errors found in the data.
Getting metadata
When collecting publicly available data, it is important that you also collect and save the corresponding metadata. For example, when downloading a dataset from Statistics Canada, the download file often includes both the dataset file and the metadata file.
When requesting custom data you should also request the related metadata.
If a metadata file is unavailable for a dataset you're working with, you can create your own metadata file by documenting identification, data quality, variable, and geographical information. Documenting metadata doesn't have to be an onerous process, carefully gather the metadata you can find (it may be helpful to refer to the list above) and enter it into a spreadsheet. Once you have a template created you can reuse it and adjust it for future datasets.
En français
This article is sponsored in part by the Future Skills Centre.