FAIR Guidance – EJP RD – European Joint Programme on Rare Diseases

FAIR stands for Findability, Accessibility, Interoperability and Reusability as firstly defined in 2016 by Wilkinson et al in the: “The FAIR Guiding Principles for scientific data management and stewardship”. To facilitate the implementation of the FAIR principles, the GO FAIR Initiative was created, acting as global reference hub for FAIR expertise and developments.

FAIRified data can have advantages such as simplifying sharing and reuse, makes it easier for others, including computers to understand it due to standardization and ontologies, and improves privacy by design encouraging users to set the rules for data reuse from the start. However, FAIR data concept still carries misunderstandings. The most common is that FAIR equals open, but in fact FAIR data should be as open as possible and as closed as necessary.

FAIR guidance for project consortia

The EJP RD recommends involving FAIR expertise as early as possible in consortia that aim to generate or collect data, or that require resources to be Findable, Accessible, Interoperable, Reusable for humans and machines (FAIR), for instance to enable efficient analysis over distributed resources, AI and machine learning.

We recommend that consortia that require FAIR data management minimally include the following roles.

Essential roles for interdisciplinary ‘three-party collaboration’:

FAIRification steward (embedded in a group that has experience in FAIRifying resources and ontological data modelling)
Local steward (expert of local data management)
Software engineer responsible for the software that manages the data (we recommend avoiding vendor lock-in).

Recommended additional roles:

Project manager (highly recommended organiser role)
Part-time advisors for expertise on (i) the domain (e.g., a medical doctor), (ii) international FAIR standards (e.g. a senior FAIR expert), (iii) implementation of FAIR software.
Data scientist dedicated to exploiting the added value of FAIR, machine readable data.

We recommend sending requests for FAIR expertise through the EJP RD helpdesk (helpdesk@ejprarediseases.org). The EJP RD coordination team offers to facilitate involvement of EJP RD-associated experts in consortia that wish to build on and thereby contribute to the EJP RD Virtual Platform.

We estimate that making a moderately complex resource FAIR and EJPRD Virtual Platform compliant lies in the range of 20,000 – 80,000 Euro for each of the three essential roles (some roles may be fulfilled ‘in-kind’).

FAIRification Process & Implementation - Reference publications

General Process

This Paper provides interpretations and implementation considerations (choices and challenges) for each FAIR principle:

FAIR Principles: Interpretations and Implementation Considerations

This paper describes a generic step-by-step FAIRification workflow to be performed in a multidisciplinary team guided by FAIR data stewards:

A Generic Workflow for the Data FAIRification Process

The papers above are from a special issue dedicated to experiences in implementing FAIR principles:

Special Issue: Emerging FAIR Practices.

Rare Disease registries

VASCA FAIRification workflow

This paper describes the setting up of the VASCA FAIR registry (VASCA, part of VASCERN ERN), includes a more general description of the de novo FAIRification process which makes all data FAIR automatically and in real-time upon collection and explains how all these steps contribute to FAIR, also including the lessons learned:
- The de novo FAIRification process of a registry for vascular anomalies
  “Existing methods to make clinical trial and registry data (more) machine-readable and FAIR are usually carried out after a project is conducted and data are collected (post hoc). This is typically more expensive than building in FAIRification into the data collection process.
  In practice, this means that all the hands-on work for the FAIRification is conducted before data collection. Subsequently, data is made FAIR through entering them into the Electronic Data Capture (EDC) system. This has the advantage that clinical data is made FAIR without any intervention from data management and data entry personnel. Due to the generic approach and developed tooling, the VASCA working group believes that the method can be used in other registries and clinical trials as well.”

This paper focuses more on the technical implementation of the de novo FAIRification in an electronic data capture system:
- De-novo FAIRification via an Electronic Data Capture system by automated transformation of filled electronic Case Report Forms into machine-readable data

FAIRification experience – the (on-going) example of ERKReg, a centralised registry

ERKReg FAIRification steps
- The European Rare Kidney Disease registry (ERKReg) was established in 2018 for all patients with a rare kidney disease. It collects disease and treatment related information from all the patients with a rare kidney disease that are followed in an expert centre. ERKReg is a centralised registry, meaning that all the data entered in the registry is transferred and stored on one secured server. The registry was implemented before the beginning of the EJP RD project and did not consider the FAIR principles from start. By design though, the registry recorded the rare disease diagnosis as an Orphacode and the genetic diagnosis was encoded using HGVS. The use of such ontologies allows for better findability, interoperability and reusability and is a simple first step to take towards making resources more FAIR. While most of the Common Data Elements (CDE) defined by the JRC were already collected by ERKReg, work was conducted to integrate the missing data elements and fully align. In order to make the registry findable and the registry metadata searchable, ERKReg was then registered on ERDRI.dor and ERDRI.mdr. The training material developed by EJP RD and help from the EJP RD experts was then needed to deploy the CDE in a box and the metadata semantic model. The training material was particularly useful to understand what needed to be done, why it was needed to be done and how. At this date (11/2021), a local deployment of the CDE in a box has been achieved, and it is hoped to achieve soon a global deployment. It is planned to FAIRify all or part of the Domain Common Data Elements (DCDEs) collected in ERKReg at a later stage (a decision is still pending on the extent of the data ERKReg wishes to share). It is envisioned to make automatic regular updates of the database to our triple store via CDE in a box.

Take home message & advice from the ERKReg Data Steward
- ERKReg is in the process of making its data and metadata FAIR retrospectively. While it is possible to achieve FAIR after a registry was built, we would strongly advise to start building any new registry with the FAIR principles in mind. It includes, but is not limited to, use the training resources developed by EJP RD to gain knowledge on the FAIRification process, get help from your FAIRification steward whenever you feel stuck or if anything is unclear (this is completely normal, FAIR is NOT obvious), use ontologies in your registries wherever possible, deploy the CDE in a box and the EJP RD metadata model. One very important aspect that it often overlooked (and that we indeed overlooked), is the identification of the FAIR objectives. Without clear objectives in mind, and without real knowledge of which data the registry owner would like to share in the future and therefore make FAIR, time and resources are lost. Finally, bear in mind that FAIRification is a flexible process, you always stay in control of which data you share, in which format, with which stakeholder. It “just” makes the data easy to find, access, interoperate with other resources and reuse.

Patient organizations

Patient organizations are key drivers of the FAIRification process in practice and dialogue with stakeholders is critical to success:
- How Patient Organizations Can Drive FAIR Data Efforts to Facilitate Research and Health Care: A Report of the Virtual Second International Meeting on Duchenne Data Sharing, March 3, 2021
- Patient organisations are creating their own network within the ‘Rare Disease Global Open FAIR implementation network’: https://www.go-fair.org/implementation-networks/overview/rare-diseases/

List of Tools, Standards & Resources

For technical implementation, the EJP RD recommends implementation of FAIR principles and compliance with the emerging Virtual Platform. FAIR means ‘Findable, Accessible, Interoperable, Reusable for humans and machines’ (FAIR). By ‘FAIR at source’, the EJP RD aims to create a platform that automatically adapts its functionality to its contributing resources and that is ready for discovery (resource and record level) and analysis over distributed resources (statistics, data integration, AI, Machine Learning). Resources (e.g. patient registries) that build on and contribute to the EJP RD virtual platform have the following:

for non-technical humans: a good description of the resource on a public user interface, usually a web page
for technical humans (engineers and programmers): an ‘Application Programming Interface’ and associated JSON schema.

The EJP RD develops a common query API that builds on standards developed by the Global Alliance for Genomics and Health.

for machines: ontological models that describe the data container (metadata, including access conditions) and the data elements. These descriptions allow data and metadata to be represented in terms of the ‘Resource Description Framework’, a world wide web consortium recommendation for global interoperability across all domains of discourse.

The EJP RD virtual platform uses the EJP RD Metadata model as the ontological model to describe the metadata of rare disease resources, our EJP RD metadata is created based on the Data Catalogue Vocabulary (DCAT). The virtual platform uses an ontological process-measurement pattern derived from the Semantic Science Integrated Ontology (SIO) as the core model for all data elements. To implement the EJP RD metadata we use the FAIR Data Point software. Besides providing descriptions of resources according to the EJP RD metadata model, the FAIR Data Point also provides access to the descriptions that conform to the FAIR data point specifications.

Registry FAIRification

The European Joint Research Center recommends that rare disease registries collect a standard set of common data elements (CDEs). In addition, the EJP RD endorses the use of ontologies that follow a core data model, adding a FAIR layer to the registries. However, there are several tools and standards to be considered while adding the FAIR component to registries.

Using different data collection software when they do not apply FAIR standards can complicate combining the forms and data across studies or institutes. A straightforward FAIR implementation step is: to require EDC software providers (e.g. SMEs, RedCap, Castor EDC, Molgenis) in a consortium to provide a FAIR and VP compliant tool. However, FAIRification does not depend on the chosen software, but the right software should facilitate the process.

As an entry point into interoperability, the rare disease registry codebook enables data exchange between institutions that use different electronic data capture (EDC) software. The reusable codebook contains the mappings of the CDEs to ontologies, compiling the definition (for humans) and the identifiers/code used in the EJP RD CDEs Semantic Model to annotate each data element (for computers). The codebook can be used via the iCRF Generator Tool, which takes the content of the codebook, and creates interoperable eCRFs (iCRFs) that can be imported by several major EDC software providers (e.g. Castor and RedCap).

To implement the EJP RD CDEs Semantic Model integrally, EJP RD created an all-in-one tool, the CDE in a box. This collection of software applications enables creating, storing and publishing the “Common Data Elements” according to the CDE semantic model (based on the CDEs). In addition, the CDE in a box implements a Triple store and the FAIR Data Point and requires only CSV as input from the user. The step-by-step user guide and templates for the CSV are available here.

Data Management Plans

Data management plans are used to describe the researchers’ plans towards good data and metadata curation practices during and after the project. Different institutions have specific DMPs templates and tools available to help researchers completing and maintaining their DMPs. Some of the most used tools are:

Starter Kit for research data management (RDM)
Data Stewardship Wizard

Both can be used to create, plan, collaborate, and bring your data management plans to life. You can browse in the tools for ready to Use DMP Templates.

MindMap of Resources for RD Research

The Resources for RD research is a search tool for current available resources related to Rare Diseases.  It displays a simple interface to easily discover IRDiRC recognized and EJP RD funded resources including FAIR standards, guidelines and FAIR enabling resources (through, for example, data deposition and/or analysis). These resources are groups around the following nodes:

Patient Registries & Cohorts	Standards
Biobanks	IRDiRC Recognized Guidelines
Animal Models & Cell Lines	Tools for Research
Archive/Share Experimental & Phenotypic human Data	Analyse Experimental & Phenotypic Human Data
Knowledge Bases	RD Clinical Experts
Tools & Services for Translational and Clinical Research & Development

FAIR Implementation Profile

The FAIR Implementation Profile (FIP) is a collection of FAIR implementation choices made by a community of practice for each of the FAIR Principles. Community specific FAIR Implementation Profiles are themselves captured as FAIR datasets and are made openly available to other communities for reuse.

Build the FAIR Implementation Profile
Extra material on FIP
What is your FIP? Check out the FIP mini questionnaire which will lead you through the creation of your own FAIR Implementation Profile: build your FIP  or download the questionnaire in PDF.

Report on core set of unified FAIR data standards

Standards contribute fundamentally towards making Rare Disease resources increasingly Findable, Accessible, Interoperable, or Reusable (FAIR). EJP RD has concentrated on standards challenges relevant to its first goals. To this end, the Report on core set of unified FAIR data standards (EJP RD deliverable) identified:

existing standards that can be used/piloted directly;

standards where there is a need to aggregate and/or map between and/or extend existing standards before EJP RD can begin using them.

In this document EJP RD did not identify any situations where completely new standards will have to be developed from scratch.

Current version of the list of tools and Standards information

EJP_RD_Tools_and_Standards_v0.1.0

Educational Videos, Training and External resources

Why Does FAIR matter (2’22)

Ontologies in a FAIR registry: solving patients’ data linkage or clinicians (2’45)

Setting up a FAIR registry: the difference between data sharing and data visiting (12’18)

Hackathons on Implementation of CDE Semantic Model in EDC systems

Health RI FAIR Initiatives overview

Health RI Workshops on delivering FAIR metadata for COVID-19 data portal

ZonMW: guidance on FAIR data and FAIR data management for research projects

FAIR Assessment Tools

There is growing interest in the degree to which digital resources adhere to the goals of FAIR – that is, to be Findable, Accessible, Interoperable, and Retrievable by both humans and, more importantly, by machines acting on behalf of their human operator. Unfortunately, the path to FAIRness was left undefined by the original FAIR Principles paper, which chose to remain agnostic about which technologies or approaches were appropriate. As such, until recently, it has been impossible to make objectively valid statements about the degree to which a data object exhibits “FAIRness”.

With the encouragement of journal editors and other stakeholders who have a need to evaluate author/researcher claims regarding the FAIRness of their outputs, a group consisting of FAIR experts, journal editors, data repository hosts, internet researchers, and software developers assembled to jointly define a set of formal metrics that could be applied to test the FAIRness of a resource. The first edition of these metrics was aimed at self-assessment, in the form of a questionnaire; however, upon review of the validity of several completed self-assessments by data owners, we determined that the questions were often answered inconsistently, or incorrectly (knowingly or unknowingly), and often the data provider did not know enough about the data publishing environment to answer the questions at all. As such, a smaller group of FAIR experts created a second generation of FAIR Metrics that aimed to be fully automatable. The result was a set of 22 Metrics spanning most FAIR principles and sub-principles, which explicitly describe what is being tested, which FAIR Principle it applies to, why it is important to test this (meta)data feature, exactly how the test will be conducted, and what will be considered a successful result.

As a task under the objectives of the EJP RD, we created a set of software packages – The FAIR Evaluator – that coded each Metric into an automatable software-based test, and created an engine that could automatically apply these tests to any dataset, generating an objective, quantitative score for the ‘FAIRness’ of that dataset, together with advice on what caused any failures (https://www.nature.com/articles/s41597-019-0184-5). With this information, a data owner would be able to create a strategy to improve their FAIRness by focusing on “priority failures”. The public version of The FAIR Evaluator (https://w3id.org/AmIFAIR) has been used to assess >5500 datasets. Within the domain of rare disease registries, a recent publication about the VASCA registry shows how the Evaluator was used to track their progress towards fairness (https://www.medrxiv.org/content/10.1101/2021.03.04.21250752v1.full.pdf). To date, no resource – public or private – has ever passed all 22 tests, showing that FAIR assessment is able to provide guidance to even highly-FAIR resources.

The “R” Principles for FAIRness frequently refer to community and domain standards. To date, very few domain-specific FAIRness tests have been created, despite this being a critical “last-step” towards full data reusability. The EJP RD is able to provide advice, guidance and code assistance for communities who wish to begin designing and deploying domain-specific FAIRness tests, to increase their compliance with the “R” Principles of FAIR.

FAIR Evaluation Services

Helpdesk

EJP RD helpdesk (helpdesk@ejprarediseases.org)

- Bio Semantics (Mark Wilkinson, Marco Roos, Rajaram Kaliyaperumal, Peter-Bram ‘t Hoen)

- FAIRification Stewards (Bruna dos Santos Vieira, Clemence Le Cornec, Shuxin Zang, Alberto Camara, Cesar Bernabe, Joeri van der Velde)

Patient Organizations – for joining the network:

- Nawel van Lin (nawel.vanlin@worldduchenne.org)

- Michela Onali (michela.onali@gmail.com)

Health RI Service desk – for tools and services, and how Health RI can support research projects see https://www.health-ri.nl/health-ri-service-desk
ZonMW – Dutch and International collaboration on improving quality and innovation in health research https://www.zonmw.nl/en/contact/
GO FAIR – news, events, materials & training, and the latest FAIR developments https://www.go-fair.org/go-fair-initiative/

Web page Contributors: Bruna Dos Santos Vieira, Clémence Le Cornec, Marco Roos, Mark Wilkson, Nirupama Benis, Rajaram Kaliyaperumal, Peter-Bram ‘t Hoen, Ronald Cornet, Tanguy Onakoy, Yanis Mimouni
Version 1.0.0