The ASCAPE analytical models for QoL of breast and prostate cancer patients are built on the data provided by four different the clinical sites. Heterogeneous data formats from the four pilots of the project are harmonized according the common reference data model based on HL7 FHIR format. Therefore, the data that feed the AI engines are standardized in a unique and widely accepted data format. In addition, the unique data entry format allows the easy integration of new datasets from different sources without the necessity of changing the AI engine code. Therefore, any additional cohort provides data based on FHIR can be easily included in the federated learning approach. The whole methodology will also facilitate the adaptation of ASCAPE to other use cases studying other type of pathologies and the interoperability among different clinical sites.

ASCAPE needs a robust and flexible methodology to harmonize and transform the collected health data into HL7 FHIR format, finding common links or similarities between different health data entities and HL7 FHIR resources. HL7 FHIR standard allows multiple resources[1] with many attributes to design the model. The process to follow is: (1) understand the clinical concept and meaning of a variable; (2) identify the resource and attributes of the resource that best represent the variable concept; and (3) complete those attributes with the most accurate information possible. A data model with complete and appropriate relationships between resources is more helpful and profitable for knowledge extraction and information comparison from different data bases. Figure 1 below shows an example of the potential relationships between different resources.


Therefore, the first step in the harmonization process was the identification of the variables of interest for the ASCAPE objectives. Once, the initial ASCAPE data model reference was consolidated, the next step was to map the retrospective datasets to identify which of the collected variables were included from the different pilot sites. Once all the datasets were mapped according the data model, the next step was to provide a common structure using FHL7 FHIR and SNOMED-CT (Systematized Nomenclature of Medicine – Clinical Terms)[1] vocabulary. The last step is to develop and to implement the code for reading the local datasets in the Hospital Information Sites (HIS) and transforming and storing them into the HAPI FHIR servers. In the work done so far, a collection of nearly 120 definitions of data variables have been identified and set to be potentially interchanged following the  health data model defined for ASCAPE. The model is subject to some potential changes as the project evolves


The objective of using a clinical terminology is to represent the medical concepts in a common widely accepted vocabulary. The project decided to use SNOMED-CT as the main clinical terminology for ASCAPE.  SNOMED CT is a clinical terminology distributed by the International Health Terminology Standards Development Organization (IHTSDO). There are several medical terms dictionaries for providing a common nomenclature, such as ICD-9, ICD-10, LOINC, SNOMED CT and others. 


HL7 FHIR and SNOMED-CT are perfectly complemented for the purposes of ASCAPE, and in particular their combination facilitates: 1) the integration with web standards (XML, JSON, HTTP, OAuth, etc.), 2) the exchange of information using messages or documents, and 3) the implementation in service-based architecture. The selected standards comply with the priorities proposed by the Communication on enabling the digital transformation of health and care in the Digital Single Market[2].


The clinical information of ASCAPE is one of its main assets, and thus its quality and harmonization must be ensured. The datasets quality depends upon the whole data curation of the data providers  and the harmonization process by using HL7 FHIR resources and SNOMED vocabulary.



Blog post produced by Atos.


[1] SNOMED International homepage: