Accelerating AI innovation in healthcare: Real-world clinical research applications on the Mayo Clinic platform

Applications of AI


Platform architecture overview

MCP is a secure, cloud-based data science environment designed to accelerate research and innovation through access to large-scale, de-identified, standardized clinical data and integrated analytical tools. The platform architecture is built to ensure scalability, privacy, and accessibility for researchers in a variety of fields.

Extensive de-identified and standardized data resources: MCP employs an innovative de-identified and standardized process applied to data from more than 15.1 million patients. To protect patient privacy, the platform uses a multi-layered anonymization strategy that combines rule-based heuristics and deep learning models to identify and replace personally identifiable information.30. These measures ensure full compliance with HIPAA and agency governance policies. Additionally, the platform provides extensive data standardization, including mapping EHR data to standard medical terminology and common data models. This rich multimodal dataset enables a wide range of research applications, including training AI models, generating real-world evidence, and discovering clinical insights.

Integrated Research Tools: MCP offers a comprehensive suite of research tools that streamline your entire data-to-discovery workflow. These tools enable secure data access, exploration, and analysis within an integrated platform. Designed for scalability and ease of use, the MCP tools ecosystem supports both technical and non-technical users and facilitates efficient and reproducible collaboration across a variety of data types while maintaining rigorous privacy, governance, and compliance standards.

Dedicated data science environment: Researchers access MCP through a secure, cloud-hosted data science environment tailored for their use. This environment integrates MCP research tools and provides preconfigured support for open source analysis frameworks such as Python, R, and TensorFlow. It provides controlled, compliant access to anonymized data and high-performance computing resources, enabling seamless model training and evaluation within a managed, privacy-preserved infrastructure.

This architecture establishes MCP as a scalable, privacy-preserving, AI-enabled research environment that enables researchers to generate actionable insights from anonymized, real-world data while maintaining the highest standards of security and compliance.

Real-world observation data in MCP

MCP provides access to a wide range of high-quality clinical data, including standardized structured data (diagnoses, test results, medications, etc.) and unstructured data (clinical records, images, etc.). This anonymized data records patient progress over time and across different demographics. MCP’s dataset currently includes more than 15.1 million patient records, 12 billion radiology images, 3.2 billion test results, and 1.65 billion clinical notes, all accessible through a secure data science environment. In addition to the Mayo-specific standardized EHR, MCP also provides EHR data in OMOP CDM format. This enhances interoperability and allows users to leverage analytical pipelines and tools developed within the OHDSI ecosystem.

MCP tools used in this study

MCP partners with nference, Inc.31. Make available different tools for different needs. Because this study only used structured EHR data within MCP, we utilized the following tools:

Cohort Visualizer facilitates the rapid creation, characterization, and comparison of patient cohorts for hypothesis testing and analysis using EHR data. It supports both structured and unstructured data and provides code-free analysis and intuitive visualization tools. Users can load or create new cohorts and analyze them using the Cohort Builder graphically or tabularly. User-friendly navigation allows users, regardless of technical expertise, to explore vast clinical datasets using standard clinical codes and keywords, helping accelerate clinical research and address unmet needs in translational medicine. Additionally, SQL code is provided to facilitate data retrieval from EHR databases for more detailed downstream analysis. Figure 2A shows the MCP Cohort Builder user interface. Users can define and filter patient cohorts using structured and unstructured EHR data. Figure 2B shows a cohort comparison interface that allows users to visualize and compare characteristics of cohorts through a graphical overview.

Figure 2: MCP tool interface.
Figure 2

a, B Cohort Visualizer interface showing patient cohort creation a Compare view with B. C Schema Visualizer interface. Explores the data schema and shows relationships between tables. D, E MCP workspace interface showing the coding environment (De.g. JupyterLab and RStudio) and integrated computational tools for data analysis and AI model development. E.

Schema Visualizer provides an interactive interface for exploring data dictionaries and schemas in MCP. Detailed information about the tables, columns, and their relationships is provided, along with example query code for downstream data collection (Figure 2C). It also features advanced search tools that allow users to efficiently locate specific tables, columns, or values ​​within a data schema.

MCP workspaces provide a comprehensive environment for accessing data and computing resources to support advanced analytics and data science workflows. This platform provides scalable computational resources for a variety of research needs. For individual researchers, the maximum configuration available includes 208 CPU cores, 1872 GB of RAM, and eight NVIDIA H100 80 GB GPUs, ensuring capacity for complex and data-intensive machine learning workflows. We also offer the latest open-source tools, packages, and libraries for cloud-based computing, with integrated support for JupyterLab, VSCode, and RStudio to meet your diverse coding needs. This all-in-one platform streamlines data collection, processing, and analysis. In addition, the workspace includes high-performance computing capabilities for resource-intensive tasks such as data mining, machine learning, and deep learning. It also provides code-level guidance for a variety of applications such as data extraction, large-scale language model (LLM) execution, and medical image processing. Additionally, users can leverage Git within their workspaces to efficiently manage and collaborate on repositories in GitHub. Figure 2D,E shows the MCP workspace interface page.

Research projects carried out at MCP

We designed four different projects to comprehensively showcase the capabilities of MCP across different clinical research scenarios. Figure 3 illustrates the objectives of these projects in their respective clinical research contexts. A detailed description of each project is provided below.

Figure 3
Figure 3

Objectives of the four clinical research projects

Project 1. Randomized controlled trials (RCTs) to stimulate drug efficacy in heart failure (HF) patients using real-world observational clinical data. This project leverages the rich retrospective data available in MCP to stimulate the conditions for traditional randomized controlled trials (RCTs). Doing so allows for high-quality research that avoids the usual costs and ethical concerns associated with traditional RCTs. More specifically, we developed a methodology to stimulate RCTs to evaluate drug efficacy in heart failure patients using real-world observational data. Key objectives include identifying suitable RCT candidates for stimulation and leveraging EHR data to replicate efficacy trials of heart failure drugs, enabling robust comparative efficacy studies in the absence of traditional RCTs. Additionally, this project will explore the use of Cohort Visualizer, a code-free analysis tool designed for researchers without a data science background to facilitate accessible and efficient cohort analysis.

Project 2. Effect of antihypertensive medications (AHMs) on the risk of Alzheimer’s disease and related dementias (ADRD) in hypertensive patients with mild cognitive impairment (MCI). This study aims to validate the results of previous studies32 Our results suggest that AHM use may be associated with a reduced risk of ADRD in hypertensive patients with MCI. Our main objective is to utilize real-world observational data to perform survival analyzes to assess the relationship between AHM use and ADRD progression. Additionally, this study investigates potential drug-drug interactions between AHMs, statins, and metformin within the targeted patient cohort, providing further insight into pharmacological influences on dementia risk. This project serves as a simulation of traditional clinical research, using statistical analysis to evaluate real-world evidence.

Project 3. Building a prediction model for progression from mild cognitive impairment (MCI) to Alzheimer’s disease (AD) using EHR data and deep learning techniques. This project focuses on training and validating deep learning models.33 Predicting progression from MCI, which is considered a precursor to dementia34using longitudinal EHR data and sending it to AD. Specifically, we employ a Bidirection Gated Recurrent Units (BiGRU) deep learning model to predict MCI progression at various time intervals up to 5 years after diagnosis. Furthermore, this study aims to verify the generalizability of the model across diverse datasets and healthcare systems, ensuring its applicability in real-world clinical settings.

Project 4. Development of a deep learning model to predict major adverse cardiovascular events (MACE) after liver transplantation (LT). This project focuses on leveraging longitudinal EHR data to develop an advanced deep learning model on MCP to predict MACE after LT and compare its performance to previously developed models based on medical claims data.35. This model assists clinicians in risk stratification by identifying high-risk candidates and provides management strategies to improve transplant outcomes. Additionally, the model highlights important predictive capabilities, allowing physicians to take targeted preventive measures to reduce the likelihood of adverse cardiovascular events. This study demonstrates the ability of MCP to facilitate the development of deep learning models for clinical research.

Data collection and analysis approach

MCP tools have played a key role in facilitating these projects by providing an integrated platform for cohort development, data extraction, and analysis. Specifically, Project 1 leveraged the Cohort Visualizer to identify RCT candidates. All projects then leveraged Jupyter Notebooks to run SparkSQL API queries to extract EHR data from the MCP database. Finally, data analysis, including statistical evaluation and deep learning modeling, was performed within the workspace using R or Python.

Platform accessibility and reusability

MCP is a subscription-based, cloud-hosted research environment that is accessible to external users after registration and approval. Researchers, healthcare institutions, and industry partners can register to access MCP’s de-identified datasets and integration tools by completing the required onboarding process. Once registered, users will have access to the same standardized data, analytical tools, and secure computing environment described in this whitepaper. The platform supports both open source and proprietary components. Users can leverage open source tools (Python, R, TensorFlow, PyTorch, etc.) within the MCP workspace, ensuring flexibility and reproducibility. This hybrid model fosters collaboration, scalability, and replicable research while maintaining robust privacy and security protections.



Source link