Unlocking the value of distributed health data for machine learning

Using a federated architecture enables a decentralized approach that provides a more secure approach to supporting analytics and medical research.

April 14th 23rd6 minutes read

A decentralized approach can help unlock the potential of AI in healthcare.

Digitizing health data and applying machine learning and analytics will give researchers, clinicians, and administrators new tools to improve patient outcomes, reduce healthcare delivery costs, and accelerate the drug development pipeline. developed and adopted.

However, the challenge of accessing medical data limits the ability of practitioners to unlock opportunities for AI in healthcare.

Unrealistic efforts in data centralization

Health data is generated by thousands of facilities and clinics across borders (and even within facilities) and by a wide variety of devices, staff and departments. This creates problems when trying to apply machine learning to it.

Outside of healthcare, the primary approach to applying machine learning and analytics to distributed data is to first centralize the data into a data lake or data warehouse. However, her three inherent characteristics of health data, centralization, are often impractical or impossible. sensitivity, volume and interoperability.

sensitivity. Most countries have regulations restricting the use of personal data, and many countries supplement their regulations with further guidance on protecting personal health information. GDPR in the EU and HIPAA in the US severely restrict the sharing of health data between institutions or across borders without explicit consent. Health data custodians not only have their own privacy and security protocols, but are also concerned about sharing intellectual property that gives them a competitive advantage.

Healthcare organizations and product developers have traditionally used a combination of technical anonymization and legal instruments to manage these barriers of trust, but each has significant limitations. Due to the complexity and cost of sharing health data, many potentially valuable initiatives are slow or impossible to start, a huge loss for researchers and patients.

Volume. The explosion of health data is opening up new opportunities for researchers to use new capabilities to improve existing models and build new predictive models for diagnostics, precision medicine and real-world evidence. . But the promise of boundless health innovation driven by vast amounts of digital health data must be tempered by the practical implications of moving and storing copies of these massive datasets.

The computational time and cost required to centralize data for machine learning and analytics severely limit innovation in health AI.

Interoperability. The historical lack of data standards in healthcare also creates challenges with data aggregation across sites. Hospital electronic health record (EHR) systems are designed to optimize hospital operations and comply with local rules and regulations, not to facilitate data sharing. Transforming existing data into a standard format so that it can be aggregated across systems is time-consuming and costly.

Efforts such as the Fast Healthcare Interoperability Resources (FHIR) open source framework in the United States are underway to establish and enforce better health data standards. Adoption challenges still exist outside the United States, and interoperability does not solve the challenges of sharing sensitive data at scale.

Effects of decentralized health data

History has proven that barriers to sharing health data for machine learning and analytics have hampered the overall progress of AI in healthcare.

The cost and complexity of integrating data make centralization impractical, while distributed systems severely limit the ability to extract insights from remotely stored data. Simply put, existing approaches using machine learning and analytics of health data are no longer working and it is time for new approaches.

As digital health data continues to explode, health AI requires data scientists to explore new approaches beyond centralizing data. In a federated future, medical data won’t move and teams will be able to derive insights from medical data around the world while protecting patient privacy.

Unlike traditional machine learning, federated learning and analytics allows data scientists and researchers to train models and perform analytics without batching data.

Usage

A central federated learning server hosted by a trusted party sends training instructions to each hospital’s data server where local models are trained. Local model parameters are sent back to the federated learning server, where he is aggregated into one global model. of The nature of federated learning makes it an ideal solution for health AI as data never moves and privacy is preserved.

Looking back at the three unique characteristics of health data that make data centralization impractical, federated learning can help solve these challenges by impacting through:

sensitivity. Healthcare providers continue to require patient consent or a legal basis to share data for certain purposes. However, data doesn’t move, and federated learning protects privacy, making data sharing much easier and enabling compliance with regulations like GDPR and HIPAA.

Volume. Because no data is moved in a federated architecture, the cost and compute time of moving and storing large amounts of data is not an issue. Aggregated data such as model parameters travels between servers, but this amount is negligible compared to the raw data set.

Interoperability. Federated learning does not solve the challenges of data standardization and interoperability, but in a federated architecture all data must be standardized in order for model training to perform well. Existing efforts to accelerate the adoption of FHIR standards will continue to benefit everyone as teams move to a federated architecture.

The health AI ecosystem is in its early stages of experimentation with federated learning and analytics, but there is growing interest in the opportunities it could unlock. The technology is already being applied to many use cases such as predictive diagnostics, precision medicine and drug discovery.

In this future of improved access, researchers and data scientists will be able to leverage data from connected medical devices without moving the data to a central server. Health application developers can recognize new revenue opportunities by enabling machine learning and analytics across data networks, while partners gain complete control over their data. And all stakeholders across the healthcare ecosystem will benefit from new and better insights from AI to efficiently deliver better patient outcomes.

While there are many barriers to the adoption of privacy-preserving tools to increase data access, critical factors in healthcare delivery, such as diagnostic accuracy, patient outcomes, pipeline development speed, and drug approval times, are all critical to patient suffers from the sacrifices and overburdens and burdens of An understaffed healthcare ecosystem.

Dr. Bryce Pickard is Partnerships Director at integrate.ai.

See also Federated learning feeds more data to support AI processes