Unlocking the value of distributed health data for machine learning

A federated architecture enables a decentralized approach that provides a more secure approach to supporting analytics and medical research.

April 14th 23rd6 minutes read

A decentralized approach can help unlock the potential of AI in healthcare.

The digitization of medical data and the application of machine learning and analytics are giving researchers, clinicians and administrators new tools to improve patient outcomes, reduce healthcare delivery costs and accelerate the drug development pipeline. Developed and implemented.

However, challenges in accessing healthcare data limit the ability of healthcare professionals to exploit AI opportunities in healthcare.

Unrealistic efforts in centralizing data

Healthcare data is generated across borders (and even within facilities) in thousands of facilities and clinics, and by a wide variety of devices, staff, and departments. This creates problems when trying to apply machine learning.

Outside of healthcare, the main approach to applying machine learning and analytics to distributed data is to first centralize the data into a data lake or data warehouse. However, due to his three inherent characteristics of medical data (sensitivity, volume and interoperability), centralization is often impractical or impossible.

sensitivity. Most countries have regulations restricting the use of personal data, and many complement their regulations with further guidance on protecting personal health information. GDPR in the EU and he HIPAA in the US severely restrict the sharing of medical data between institutions or across borders without explicit consent. Medical data custodians not only have their own privacy and security protocols, but they are also concerned about sharing intellectual property that gives them a competitive advantage.

Healthcare organizations and product developers have traditionally used a combination of technical anonymization and legal instruments to manage these trust barriers, each with significant limitations. Due to the complexity and cost of sharing medical data, many potentially high-value initiatives are slow or impossible to get off the ground, at great cost to researchers and patients.

Volume. The explosion of health data has created new opportunities for researchers to improve existing models with new capabilities and build new predictive models for diagnostics, precision medicine, and real-world evidence. However, the promise of endless medical innovation driven by vast amounts of digital medical data must be undercut by the practical implications of moving and storing copies of these massive datasets.

The computational time and cost required to centralize data for machine learning and analytics severely limit medical AI innovation.

Interoperability. The historical lack of data standards in healthcare also poses challenges for data aggregation across sites. Hospital electronic health record (EHR) systems are designed to optimize hospital operations and comply with local rules and regulations, not to facilitate data sharing. Transforming existing data into a standard format for aggregation across systems is time-consuming and costly.

Efforts such as the Fast Healthcare Interoperability Resources (FHIR) open source framework are underway to establish and enforce better health data standards in the United States. Deployment challenges still exist outside the United States, and interoperability does not solve the challenges of sharing sensitive data at scale.

Impact of decentralized health data

History has shown that barriers to sharing health data for machine learning and analytics hinder overall progress in AI in healthcare.

The cost and complexity of data integration make centralization impractical. Distributed systems, on the other hand, severely limit the ability to extract insights from remotely stored data. Simply put, existing approaches using machine learning and analytics to health data no longer work and it is time for new approaches.

With the explosion of digital health data, medical AI requires data scientists to explore new approaches beyond centralizing data. In the future of federation, medical data will not be moved and teams will be able to derive insights from medical data around the world while protecting patient privacy.

Unlike traditional machine learning, federated learning and analytics allows data scientists and researchers to train models and perform analytics without assembling the data.

Usage

A central federated learning server hosted by a trusted party sends training instructions to each hospital’s data server, where local models are trained. Local model parameters are sent back to the federated learning server, where they are aggregated into a single global model. of The nature of federated learning makes it an ideal solution for health AI as data never moves and privacy is preserved.

Recalling the three unique characteristics of healthcare data that make data centralization impractical, federated learning can help solve these challenges by:

sensitivity. Healthcare providers will continue to require patient consent or a legal basis to share data for certain purposes. However, data doesn’t move, and federated learning preserves privacy, making data sharing much easier and enabling compliance with regulations like GDPR and HIPAA.

Volume. The cost and compute time involved in moving and storing large amounts of data is irrelevant in a federated architecture because there is no data movement. Aggregated data such as model parameters still travel between servers, but this amount is negligible compared to the raw data set.

Interoperability. Federated learning does not solve the challenges of data standardization and interoperability, but a federated architecture requires all data to be standardized for model training to perform well. Existing efforts to drive the adoption of FHIR standards will continue to benefit everyone as teams move to a federated architecture.

While the medical AI ecosystem is still in the early stages of experimenting with federated learning and analytics, there is growing interest in the opportunities it could unlock. Already, this technology has been applied to many use cases such as predictive diagnostics, precision medicine and drug discovery.

In this future of improved access, researchers and data scientists will be able to leverage data from connected medical devices without moving the data to centralized servers. Medical application developers will be able to recognize new revenue opportunities by enabling machine learning and analytics across their data networks, while giving their partners complete control over their data. And all stakeholders across the healthcare ecosystem will benefit from new and better insights from AI to efficiently deliver better patient outcomes.

While there are many barriers to adopting privacy-preserving tools to increase data access, it undermines critical elements of healthcare delivery, including diagnostic accuracy, patient outcomes, pipeline development speed and drug approval time. and all at the expense of patients and undue burden. An understaffed healthcare ecosystem.

Dr. Bryce Pickard is Partnerships Director at integrate.ai.

See also Feed more data to support AI processes with federated learning