Towards an Interoperability Landscape for a National Research Data Infrastructure for Personal Health Data

Towards an Interoperability Landscape for a National Research Data Infrastructure for Personal Health Data

Standardization efforts in health research

The independent International Organization for Standardization (ISO) is a non-governmental organization focusing on the development and publication of international standards. To date, 171 national standards bodies are members, facilitating the exchange of expert knowledge to tackle global challenges and foster innovation by developing relevant consensus-based, voluntary standards6. The Research Data Alliance (RDA) collects, develops and refines several standards and information to enable interoperability between research data repositories7. One example is the RDA COVID-19 Recommendations and Guidelines on Data Sharing8 that also can be seen as model for data sharing guidelines for other research studies in the health sector. In the US and Canada, the Accredited Standards Committee (ASC) is the prevailing SDO. At the European level three SDOs are responsible for defining and developing voluntary standards: the Comité Européen de Normalisation (short: CEN; for various kinds of services, processes, products and materials), Comité Européen de Normalisation Electrotechnique (short: CENELEC; for electrotechnical standardization)9 and European Telecommunications Standards Institute (short: ETSI; for information and communication technologies)10.

In the domain of healthcare, nine global initiatives work together since 2007 within the Joint Initiative Council (JIC) on solving real-world problems: Clinical Data Interchange Standards Consortium (CDISC), Digital Imaging and Communications in Medicine (DICOM), CEN/TC 251, GS1 Healthcare, Health Level 7 (HL7) International, Integrating the Healthcare Enterprise (IHE) International, ISO/Technical Committee 215, Logical Observation Identifiers Names and Codes (LOINC) and Systematized Nomenclature of Medicine (SNOMED) International. They enable real-time information exchange in healthcare by using standards based on full interoperability of information and processes11. The Global Alliance for Genomics & Health (GA4GH)12 reunites a growing number of public and private institutions from healthcare delivery and (health) research, companies, societies, funders, agencies and NGOs with the overarching goal of allowing responsible sharing of genomic data while respecting human rights. GA4GH frames policies and develops and/or refines technical standards13. Global Digital Health Partnership (GDHP), an international collaboration on digital health, was established in 2018 by several governments, government agencies, territories, multinational organizations and the World Health Organization (WHO). The alliance comprises currently 36 members and intercedes for the best use of digital technologies backed by evidence to improve well-being and health14. GDHP publishes regularly white papers about interoperability, clinical and consumer engagement, cybersecurity, policy environments and evidence and evaluation topics15,16. Further collaboration entail the Personal Connected Health Alliance (PCHA)17, or the collaboration between the American Office of the National Coordinator for Health Information Technology (ONC)18 and the European Union19 or the United Kingdom20. ONC serves also as the lead US representative to the GDHP21.

The ISO committee for standards in biotechnology (ISO/TC 276)22 and its working group ISO/TC 276/WG 5“Data Processing and Integration” are working on standards for data in life sciences that can and should be considered for health data (Table 3). Initial releases include guideline standards for data publication (ISO/TR 3985)23 and requirements for data formatting and description in life sciences (ISO 20691)24. Additionally, a series of standards for provenance information models for biological material and data (ISO 23494) is currently under development in ISO/TC 276/WG 5 and will be published progressively in the coming years. Moreover, in ISO/TC 215, as well as in ISO/TC 276/WG 5 several standard drafts are currently being developed for data and metadata in personalized medicine.

Identified standards

We identified 7 syntactic, 32 semantic and 9 combined syntactic and semantic standards that are potentially relevant to NFDI4Health (Fig. 1). In addition, we identified further 101 ISO Standards (Table 3) from ISO/TC 215 Health Informatics and ISO/TC 276 Biotechnology, which are presented in additional file 2. Features of syntactic and semantic standards are represented in Table 1 and Table 2, respectively.

Fig. 1
figure 1

Identified syntactic and semantic standards in health research. We categorized health research and interoperability standards into three types: semantic, syntactic, or both. Semantic standards focus on the meaning and interpretation of data, including terminologies, vocabularies, and ontologies (e.g., SNOMED CT, LOINC, ICD). Syntactic standards focus on the structure and format of data exchange, defining how data is formatted and transmitted (e.g., HL7 CDA). Combined standards include elements of both, defining data structure and format while also ensuring consistent meaning with value sets or terminologies (e.g., HL7 FHIR).

Table 1 Identified syntactic standards.
Table 2 Identified semantic standards.

Current standards in NFDI4Health

Within NFDI4Health, a tailored metadata schema (MDS) was created to collect information from German clinical, epidemiological and public health studies collecting information on studies and their comprised study resources (e.g., study documents, instruments, data collections, etc.)25,26. To ensure the syntactic and semantic interoperability of the register based on the MDS, a mapping of the MDS elements to FHIR was performed and the feasibility was analyzed27. In addition, metadata included in the re3data28 schema and clinicaltrials.gov were compared to the NFDI4Health MDS. The metadata from ECRIN29 and DDI30 were also compared to the MDS. SNOMED CT, HL7 Terminology, NCIt, MeSH, ISO and ICD were used for Value Sets in the NFDI4Health MDS. The suitability of SNOMED CT for the data annotation of variables from questionnaires originating from clinical but also epidemiological and Public Health studies was evaluated by performing mappings to SNOMED CT. The results of the annotation were implemented on a test basis in OPAL/MICA31,32. OPAL/MICA are open software solutions built for managing and harmonizing epidemiological data33. With our mapping activities we evaluated suitability of different standards for our NFDI4Health use-cases.

link