In ClimAIr, data protection is not something added at the end, it is the starting point. The entire system is designed to enable advanced AI training while ensuring that sensitive medical information remains under the full control of healthcare providers.
Today, data privacy is more important than ever. Personal information moves quickly, and AI systems often know a lot about us. This makes protecting sensitive data a major challenge. In ClimAIr, where climate data, air quality, and medical records come together, keeping patient information safe is essential. At the same time, to understand how the environment affects health, large and varied datasets must be combined. Climate data, pollution levels, and hospital records all need to be analyzed together to find patterns and predict risks. The challenge is to do this while ensuring sensitive information remains protected at every step.
In ClimAIr, the protection of sensitive information is not addressed at the end of the process. It is a structural requirement that shapes the entire system architecture from the beginning. The technical design must allow advanced AI training while ensuring that medical data remains fully under the control of healthcare providers. As Gökay GÖK, Deputy General Manager of KEYDATA, explains, “the foundation of this project is very clear: medical data must never leave the healthcare providers.” This principle defines where data lives, how models are trained, and how collaboration is made possible.
Building such a system requires more than advanced algorithms. It requires designing trust directly into the technical structure.
A Secure Infrastructure for Collaboration
Within the project, KEYDATA is responsible for building the secure AI infrastructure that enables collaboration across partners. “In ClimAIr, our main role is to build the secure AI infrastructure that makes collaboration possible,” says Gökay.
The project brings together nine medical partners, which in practical terms means nine different legacy systems, nine data models, and nine ways of working. According to Gökay, “the biggest technical challenge is bringing all of this into one unified data model without forcing partners to change their internal systems.” This requirement is critical: hospitals cannot be asked to redesign their operational systems to participate in a research project.
At the same time, medical data is extremely sensitive. Accessing or transferring it typically requires explicit patient consent and ethics committee approval. For this reason, ClimAIr is built around a strict principle: medical data stays inside the hospital, inside the medical center. “Our mission is to create a common, interoperable layer on top of these systems while keeping all sensitive data exactly where it belongs,” Gökay explains.
To implement this, KEYDATA designed a privacy-preserving architecture deployed directly at each medical partner. Dedicated NVIDIA Jetson Orin–based mini AI computers were installed on site. These devices act as secure local AI nodes and form the backbone of the project’s distributed infrastructure. “These devices are not just computer units,” Gökay notes. “They are the core of our privacy-preserving architecture.”
Each Jetson node performs two essential functions. First, it manages unified data collection and harmonization within the institution, mapping local legacy clinical systems into the common data model. Second, it runs local AI training processes within a federated learning setup. All computations are performed locally under the medical partner's control.
Federated Learning Without Moving Data
Federated learning is central to the ClimAIr architecture. It allows AI models to be trained collaboratively without moving data to a central location. “The main idea behind federated learning is simple,” says Gökay. “You train the model locally, keep all sensitive and demographic data inside the hospital, and only share model parameters.”
In practice, each hospital trains its own local AI model using its own data. Once training is completed, only the model weights are transmitted. These updates are encrypted and aggregated centrally to produce a stronger shared model. “You can think of it as nine local models trained in nine hospitals, combined into a better model without sharing a single patient record,” Gökay explains.
To coordinate this process, the project uses Flower, an open-source federated learning framework. Flower manages training rounds, orchestrates communication between partners, and supports secure aggregation. Crucially, it ensures that raw clinical data never leaves the local environment.
This approach is particularly well-suited to healthcare projects. “Health data is extremely sensitive,” Gökay emphasizes. “Federated learning solves many of the security, consent, and regulatory challenges that come with sharing medical data.”
Harmonizing Clinical, Environmental, and Climate Data
Privacy alone is not enough. For AI models to be meaningful, data must also be interoperable. In ClimAIr, data harmonization means enabling datasets with very different structures to speak the same language.
“Data harmonization is about aligning coding systems, standardizing variables, resolving structural differences, handling missing data and making sure time and location data are comparable,” Gökay explains. But the challenge goes further than harmonizing nine medical systems.
Clinical data must also be aligned with environmental and climate datasets, which differ significantly in structure, scale, frequency, and spatial resolution. “Clinical data is patient-based, while environmental data is regional and time-based,” Gökay notes. “Bringing all of this into a unified data model that supports AI training is probably one of the most demanding parts of the project.”
To manage this complexity, the team defined a common data model and built specific transformation pipelines for each partner. Medical experts, environmental specialists, and technical teams work closely together to ensure alignment at both the technical and semantic levels. “We are not changing the partners’ systems,” Gökay stresses. “We are building a layer that allows them to work together.”
The local Jetson nodes are also designed to ingest and process environmental and regional climate data relevant to each site’s geography. For this integration, KEYDATA worked closely with the partners responsible for environmental datasets. As a result, local AI models are trained not only on health data but also on climate and environmental indicators from the same region.
Technical and Ethical Challenges
From a technical perspective, one of the main challenges is training reliable AI models across heterogeneous data types within a federated setup. Differences in data distribution between partners and regional variability increase the risk of bias and inconsistent performance. “Making sure models perform consistently across regions is not simple,” Gökay admits.
From an ethical perspective, privacy remains the most critical concern. “Our entire architecture is built on the principle that raw medical data never leaves the healthcare provider,” Gökay reiterates. At the same time, AI models must remain transparent, fair, and trustworthy, especially if their outputs inform public health decisions.
From Infrastructure to Impact
KEYDATA’s contribution to ClimAIr reflects its experience in system architecture, interoperability, and AI engineering. “Climate and health is not just a scientific problem,” says Gökay. “It’s a data and systems problem.” In projects that combine clinical, environmental, and climate data, integration capability becomes critical.
The project's outcome will include a system with a dashboard and user interface designed for researchers, clinicians, and policymakers. The insights generated can support climate-health strategies, help healthcare providers anticipate climate-related risks, and assist environmental agencies in understanding the health impact of environmental change.
While the current focus is on allergic rhinitis, the architectural approach developed in ClimAIr is adaptable to other diseases and even other sectors. “This model proves that you can keep sensitive data local and still build powerful, collaborative AI systems,” Gökay concludes.
In ClimAIr, trust is not added at the end of the process; it is part of the infrastructure from the start.