Hershey, Pennsylvania — One of the challenges researchers face is accessing and analyzing big data, modern artificial intelligence (AI), and machine learning (ML) methodologies to generate the critical data needed for large-scale research. To answer clinical research questions. Until recently, large amounts of data from biomedical and health research, such as electronic health records (EHRs), were out of reach because there was no infrastructure that allowed researchers to interact with the data in a secure and seamless manner.
Vasant Honavar, co-leader of the Informatics Core at the Penn State Clinical and Translational Science Institute (CTSI), and his team have launched a digital collaboration for Precision Health Research (DCPHR). DCPHR combines the efforts of CTSI’s Informatics Core and Center for Artificial Intelligence Foundations in Scientific Applications. The Digital Collaborative, along with the Institute for Computational Data Sciences, the Institute of Social Sciences, and the Health Spokesperson of the National Science Foundation’s Northeast Big Data Hub, provide access to these large data sets through several discovery tools. , provides researchers with the information they need. Artificial intelligence and machine learning stacks to properly use these datasets.
A primary goal of DCPHR was to make the Electronic Health Record (EHR) of the Pennsylvania Department of Health available for data-intensive research. Data had to be standardized to conform to a common data model and accommodate his multicenter EHR-based study. Additionally, we needed to implement a basic infrastructure for AI/ML-enabled research. This ensures that researchers’ access and use of data is policy-compliant, repeatable, scalable and shareable.
Co-Leader of the Penn State CTSI Informatics Core, Wenke Hwang focuses on the development and curation of the Penn State Health EHR, which adheres to the data standards of PCORnet’s Common Data Model. Since 2015, as part of a team funded by PCORI, I have worked extensively with research investigators and technical teams at the Pennsylvania State of Information Technology to create a clinical research data repository called Health for Health Data. The data uses pseudo-identifiers in a HIPPA-compliant manner to split all patient-level and visit-level data into multiple tables. “Health to Health” data meets national data standards and is harmonized with data from over 70 of her PCORnet clinical sites. This data repository is updated regularly (currently bi-weekly), checked quarterly to ensure data quality, and used as the data infrastructure to support multiple successful multi-site grant applications. increase.
Additionally, Hwang is committed to getting data into the hands of researchers in a timely and user-friendly manner. He expanded the research repository to include several important domains not routinely covered by common data models, such as pediatric data elements, mother-child pairs, and neighborhood traits. He has worked with researchers to use the repository for machine learning and predictive modeling for clinical and translational research. He describes the process of linking his repository of Path to Health data with national mortality records, future clinical data used for clinical decision support, and non-AI/ML clinical research protocols such as medical chart his reviews and patient recruitment. Developed.
Enable access to large datasets
DCPHR currently offers access to data from two complementary sources. First, anonymization of Pennsylvania State Health EHR data standardized using the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) developed by the NIH-funded Observational Health Data Sciences Initiative (OHDSI) consortium It’s a transformation. The OHDSI consortium has collected his 3,266 collaborators from approximately 75 countries to OMOP-based his EHR data, which collectively contains 928 million unique patient records (representing about 12% of the world’s population). Link to repository. Institutions that are members of the OHDSI Consortium can propose multi-site studies using a defined research protocol and invite collaborators across the OHDSI Consortium to participate. Identical analyzes are performed on the data, and the results of the analyzes are pooled to answer the research question the study is aimed at.
A second way to access EHR data is through TriNetX. With TriNetX, Penn State University researchers can access Pennsylvania Health EHR data, anonymized data from the TriNetX Research Network (from over 71 medical institutions), and anonymized claims data from the Diamond Network (92 organization), and anonymized data. We request data from the COVID-19 Research Network (from 78 additional organizations). Researchers can define study cohorts of interest by querying her TriNetX network based on medication, diagnosis, demographics, laboratory results, genomics, mortality, oncology, procedures, and more.
According to Avnish Katoch, Research Informatics Project Manager at Pennsylvania State University CTSI. “We needed a way to leverage the full expertise of Pennsylvania State University to answer clinical research questions.”
“DCPHR aims to significantly lower the barriers to collaboration between clinical scientists and translational scientists at Penn State Medical College and data scientists and AI/ML professionals at University Park and other campuses. , Data-Intensive AI/ML-Powered Biomedicine and Honavar, Dorothy Fore Hack, J. Lloyd Huck Chair, Biomedical Data Science and Artificial Intelligence, Director, Pennsylvania State Center for Artificial Intelligence Fundamentals and Scientific Applications said:
CTSI Informatics Core Supports Data-Intensive Health Research Powered by AI
The CTSI Informatics Core empowers researchers in several ways. The researcher will not only have access to her OMOP’s Penn State University instance for pilot studies and access to the TriNetX system, but will also receive:
- Assist with study design and feasibility analysis.
- Assist with cohort definition and data extraction.
- Data preparation support
- Support for analysis of large datasets (characterization, prediction, effect estimation)
- Support for model interpretation, investigation, deployment, and inference.
- AI/ML support for research proposal writing
The CTSI Informatics Core partners with researchers to assist in all aspects of AI/ML-based analysis of EHR, billing, or other large clinical data sets.
“If you are a clinical researcher with a well-formed clinical question that you would like to answer using one of the datasets mentioned, we [the computational consulting team] We are happy to cooperate. We have the high-performance computing infrastructure and the necessary software stacks in place. You can take data, ingest it, clean it, and run it in an AI/ML analytics pipeline,” says his team of ICDS scientists and engineers serving the CTSI Informatics Core researching his innovations. One of his R&D engineers, Justin Petucci, said:
Petucci has worked closely with Honavar, Katoch, and clinical researchers on several projects including:
- A multisite study of health disparities between different races using EHR data from 8 million US patients.
- Predict mortality from cancer and noncancer causes using a cohort of over 1.4 million cancer patients from the US National Cancer Database.
- Using a large cohort of patients from the TriNetX Research Network, predict the 30-day clinical outcome of COVID-19 patients with and without peripheral arterial disease (PAD).and
- Improving the accuracy of heart disease risk prediction using EHR data.
Advances in AI, along with the increased availability of large data sets, offer unprecedented opportunities to revolutionize biomedical and health research. “The potential and potential of AI to improve individual and population health outcomes, inform health policy, and reduce health disparities will require cutting-edge data and computational infrastructure, advanced AI /Needs interdisciplinary collaboration between ML expertise and tools, AI./ML experts and biomedical, clinical and health researchers and, ultimately, a new generation of AI/ML-savvy We train clinicians and clinical researchers,” he added.
looking ahead
The CTSI Informatics Core at Penn State University is excited to move DCPHR out of pilot mode into production to support Penn State researchers interested in:
- Data-intensive biomedical, clinical, and translational research powered by AI-ML.
- Integrate other data (e.g., sociodemographic data, environmental data, and ultimately genomic data) with EHR and claims data.
- Foster an interdisciplinary biomedical AI community at Penn State University through workshops and idea labs (e.g., CTSI, the Pennsylvania State Artificial Intelligence Foundation and Center for Applied Sciences, the Department of Clinical Informatics and the AI Initiative at the School of Medicine, in collaboration with ICDS do).
Jennifer Kraschnewski, director of Penn State CTSI, said: “Their significant work has leveraged our university-wide expertise to bring the power of AI and ML to our EHR opportunities, and take clinical and translational science into the future. .”
Additional Information
For more information and to submit a request for research services through the Pennsylvania State University CTSI, please visit our website.
Researchers working with large datasets need artificial intelligence and machine learning. However, figuring out how to best access and interface with these huge databases can be difficult. Many research groups at Penn State work through the CTSI Informatics Core, leveraging data science methodologies to advance their research. To learn more about how the CTSI Informatics Core works, watch the replay of Harnessing the Power of EHR Data and IA to Advance Biomedical Research. It includes how and why the current research group has applied artificial intelligence to their research, and gives examples. How a compute consulting team can support data science projects.
