Chapter 4 FAIR principles
There is a need to extract the maximum benefit from our research investments/outputs and a good data management is essential to achieve this need (e.g., Roche et al., 2015). However, what constitutes ‘good data management’ has been largely undefined and it was generally left as a decision of the data owner. Therefore, bringing some clarity around the goals of good data management and defining simple guidelines to inform those who publish and/or preserve scientific data, is of great utility. For this purpose, Wilkinson et al (2016) described four foundational principles: Findability, Accessibility, Interoperability, and Reusability (also known as FAIR principles); that serve to guide data producers and publishers, helping to maximize the added value gained by contemporary digital publishing and, also, adhere to the expectations and requirements of the funding agencies. The FAIR principles apply not only to ‘data’ but also to the algorithms, processing tools, and workflows that led to those data. All digital research objects benefit from the application of these principles, since all components of the research process should be clearly available in the datasets’ metadata to ensure transparency, reproducibility, and reusability (Wilkinson et al, 2016).
Findable: Each dataset should be identified by a unique persistent identifier and described by rich, standardized metadata that clearly include the persistent identifier. The metadata record should be indexed in a catalogue and carried with the data.
Accessible: The dataset and its metadata record should be retrievable by using the persistent identifier and a standardized communications protocol. In turn, that protocol should allow for authentication and authorization, where necessary. All metadata records should remain accessible even when the datasets they describe are not easily accessible. It should not be confused with Open Data, since FAIR’s Accesible principle grants the data owner the degree to which data is available, or advertised (metadata) (Mons et al., 2017).
Interoperable: Both metadata and datasets use formal, accessible, shared, and broadly applicable vocabularies and/or ontologies to describe themselves. They should also use vocabularies that follow FAIR principles and provide qualified references to other relevant metadata and data. Importantly, the data and metadata should be machine accessible and analyzable.
Reusable: To meet this principle, data must already be findable, accessible, and interoperable. Additionally, the data and metadata should be sufficiently richly described that it can be readily integrated with other data sources. Published data objects should contain enough information on their provenance to enable them to be properly cited and should meet domain-relevant community standards.