Fibro uses Israeli electronic health records to predict machine learning risk scores for advanced liver fibrosis in the general population

Electronic data

The work is based on EHR data from Clarit, Israel's largest healthcare provider.¹⁸. The collected data includes routine laboratory results, diagnostics (ICD-9 codes and coding), and demographic information from January 1, 2004 to December 31, 2020. The anonymized medical records consist of a complete clinical registry of members, including lab test results and diagnosis recorded as international classifications of diseases, as well as 9th edition (ICD-9) codes¹⁹. The electronic research protocol was approved by the Clarit-Helsinki Commission 0195-17-COM2. This study is based on retrospective data. Therefore, it was exempt from the patient's written informed consent requirement.

Retrospective research design

EHR data was temporarily split using the methodology of rolling origin updates¹⁸which defined three consecutive, non-overlapping follow-up periods: 2005–2010, 2010–2015, and 2015–2020. Each period began with index date (T0) and ranged over 5 years from index date (Figure 1). As members can be recorded over multiple periods, we define observations as member pairs and index dates. For each period, cohorts were constructed, data was entered, and follow-up results were followed (Table S7).

Eligibility was determined based on index dates (Table S4) including only members aged 40-75 years on the index date. Specifically, effective hemoglobin (HB), platelets (PLT), and white blood cell count (WBC) were required within one year of T0. These three lab tests are defined as minimal criteria and indicate that individuals received full blood counts. We excluded people with previously known liver cirrhosis or one of the predefined exclusion diagnoses (Table S8). A population-based sample was obtained as all eligible individuals in the Clarit membership database was included.

For each period, the input data were defined as gender and age on index date, as well as the latest lab test results such as HB, PLT, WBC, aspartate aminotransferase (AST), alanine transaminase (ALT), albumin, bilirubin, prothrombin time international normalization ratio (PT-INR), vitamin B12, glucose, hemoglobin A1C, Che1C, Che1C, Che1C, Che1C, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, Hemoglobin, and more. Cholesterol, LDL cholesterol, triglycerides, and total protein. Only the most recent lab test results within the predefined thresholds (Table S9) taken one year before the index date were considered. Thresholds are excluded, excluded unrealistic values (e.g., harmful albumin levels), while maintaining extreme values that may indicate liver malfunction. No subgroup-specific data quality assessments were performed. However, all pre-processing procedures and thresholds were applied uniformly across all individuals, regardless of age, gender, or other demographic characteristics. The known limitation of EHR-based research is the incompleteness of potential data, but the Clarit Healthcare System integrates both care delivery and insurance and is the primary dispensing of drugs under national subsidies. Additionally, healthcare enrollment is stable throughout life, with rare transitions between providers. These factors contribute to a relatively high level of data integrity in the population (Figure 2).

In this study, sample size was determined by summing the number of individuals who met inclusion and exclusion criteria at the date of each index. It is important to note that in this context, some individuals may contribute to sample size over multiple index days. As a result, the final sample size based on these eligibility criteria consisted of 2,255,580 observations. No formal power calculations were performed. However, the large sample size for observations above 2 million provides robust statistical power for model training and validation.

To avoid duplication between follow-up periods, train data may cause leakage of information that contains information that is not available during verification. Individual follow-ups ended with observed outcome events of the cirrhosis diagnosis (Table S10) or with appropriate censorship events. A liver cirrhosis diagnosis does not occur here, but it cannot be observed anymore. Right, censorship events were exclusion diagnosis (Table S8), death, dismissal from Clarit, or the end of a five-year follow-up period.

Although there is no expectation of proper censorship applied to exclusion diagnosis, it should be noted that the goal of this work is to help identify individuals who are not currently diagnosed and at risk of liver cirrhosis rather than a follow-up program. Therefore, in our study, patients who are constantly monitored in clinics such as hepatitis C carriers (Table S8) are censored from the target population at the time of diagnosis.

Treatment data were not used as model inputs or were not explicitly tracked during model development or evaluation. Those with previous diagnosis of advanced liver disease or conditions that may require treatment (chronic liver disease and cirrhosis) are excluded using ICD-9-based exclusion criteria (Tables S4 and S5), thereby accounting indirectly for treatment exposure.

Model training and evaluation

We constructed a machine-learning model to predict the hazard for liver cirrhosis diagnostic based on the following features: HB, PLT, WBC, AST, ALT, Albumin, Bilirubin, PT-INR, Vitamin B12, Glucose, Hemoglobin A1c, Cholesterol, HDL cholesterol, LDL cholesterol, triglycerides, total protein, days from latest blood test result (to index date), age and gender. All laboratory values were included as continuous measures with biologically incredible values excluded based on defined thresholds (Table S9). Age (year) was treated as continuous variables and gender as binary indicators trained gradient regression models for survival analysis gradient gradients during two early periods. We evaluated that forecast in the most recent period (using xgboost)^{twenty one} and Objective=”Survival: Cox, “atimator=100, and Base score= 1). Survival analysis models allow prediction of the time of initial diagnosis rather than whether or not a diagnosis occurs within the follow-up period, making them more appropriate than binary classifications to prioritize individuals in the general population. This model was trained using time-fixed covariates derived from baseline (index date) measures, but did not formally evaluate proportional hazards. However, using gradient boosted COX models allows for flexible modeling of nonlinear effects without assuming strict proportional hazards. Survival models allow for the inclusion of information from individuals who have not completed the follow-up period without known diagnostic results. Furthermore, unlike linear regression models, gradient boost decision trees with sparse-aware division algorithms²⁰ Maintain missing data without substitution, allowing for explanation of higher-order nonlinear interactions between variables. Therefore, no assignment was applied and the missing parts were handled internally by the model. Furthermore, the use of tree-based models did not apply any transformation or rescaling to the data. The forecast results were assessed using an annual C/D ROC, and the results were assessed using AUCS to assess annual performance. All measurements were compared between training and validation periods to analyze performance and alternative FIB-4 scoring of invisible data.

We did not examine or adjust for heterogeneity in model performance or parameter estimates across clusters such as geographical areas and healthcare sites. All training data were obtained from a single national healthcare provider (Clarit) using uniform coding and data standards. Therefore, no clustering structure was imposed during model development, and the models were trained and validated as a single pool cohort. The outcome of interest (liver cirrhosis diagnosis) is relatively rare in the data set, but no class imbalance correction methods (e.g., remeasurement or resampling) were applied. Using the COX survival model, the algorithm could directly incorporate time to event and censor information, reducing the need for binary classification thresholds or label balance. Note that pre-diagnosis death of cirrhosis was treated as a right-censored event rather than as a competing risk, consistent with the assumptions of standard COX models. Therefore, the risk of competing was not explicitly modeled.

Following retrospective assessment, new models with the same parameters were trained for all three EHR periods to predict future cohorts. This was based on the assumption that a validation set consisting of data over the past five years could contain underlying information that would affect the performance of the model (e.g., Covid-19 or regulatory changes).

Prospective Clinical Cohort

An external prospective validation was performed in clinical cohorts in the AFULA district (Figure 3). The latest diagnostics, lab test results and demographic data were received on December 8th, 2021, and consisted of 90,136 individuals who passed the comprehensive exclusion criteria. Unlike the retrospective cohort, the most recent lab test results were considered in future cohorts regardless of their date. Predictive risk was calculated for all individuals using an EHR-based model, and FIB-4 scores were obtained where possible if AST and ALT levels were available. The highest risk individuals were selected using a 3:1 ratio (FIBRO forecast: FIB-4) for invitation without overlap between groups. This ensured that each individual's origin was attributable to a single risk score. Risk scores calculated during clinical trials were hidden, ensuring double-blind studies. All individuals were contacted and invited to liverology consultations and non-invasive fibrosis testing, and cases of morbidity were recorded. The recruitment for the study began on April 19, 2022 and concluded on August 24, 2022. Participants who arrived at the clinic took the TE exam, measured their height and weight, and completed an audit questionnaire. Liver stiffness was measured with KPA, steatosis grade was measured via CAP score (results in Table S6), and progressive liver fibrosis was diagnosed with KPA >12. All authors had access to the study data and reviewed and approved the final manuscript.