The powerful new AI predicts how more than 1,000 diseases will unfold beyond human life and open the door for precise prevention, policy planning and bias-aware health care innovation.

Research: Learn the natural history of human diseases in generative trances. Image credit: song_about_summer/shutterstock
Recent research Published in the journal Natureresearchers have developed a machine learning model that predicts the progression of 1,256 different ICD-10 level 3 diseases based on patients, using large-scale health data. Past medical history.
This model demonstrated predictive accuracy comparable to existing tools for analyzing individual diseases. It showed the potential to simulate future health trajectories over up to 20 years, providing insights into personalized health risks and comorbidities.
The need for complex disease models
Human disease progression includes periods of health, acute disease, and chronic disease, often manifesting as clusters of comorbidities affected by genetics, lifestyle, and socioeconomic factors.
Understanding these patterns is important to provide personalized healthcare, provide lifestyle guidance and implement effective early screening programs. However, traditional algorithms are primarily designed for single diseases and cannot capture the complexity of over 1,000 perceived health conditions.
This limitation will be particularly important in the context of aging populations where the burden of diseases such as cancer, diabetes, cardiovascular disease, and dementia is expected to rise significantly over the next decades. Therefore, accurate modeling of disease trajectories is essential for both healthcare planning and economic policy.
Artificial intelligence, especially large-scale language models (LLMS), provides promising solutions. These models are excellent at learning dependencies across sequences of data, just as they predict disease based on previous health events.
Inspired by this analogy, researchers developed a trans-based model to predict specific conditions, facilitating early results. However, despite these advances, truly comprehensive and generated models that can simulate multidisorder full vision over time have not been systematically evaluated.
Development of large-scale data models
Researchers have created a transformer-based model, Delphi-2M, to predict lifelong disease trajectories. Unlike the language model that processes words, Delphi-2M used diagnostic codes from the 10th revision of the International Classification of Diseases (ICD-10), death, gender, BMIand lifestyle factors such as smoking and alcohol use.
To address the gap in the medical record, the team inserted an artificial “no event” token. It included gender and lifestyle tokens, including disease codes, lifestyle levels, gender, events and padding tokens (around 1,270 total).
The training was based on a large health record from the UK Biobank, with 402,799 participants for training, 100,639 for verification and 471,057 for longitudinal trials. To test generalizability, the model was also validated on data from 193 million individuals in Denmark.
Several modifications have adjusted the base model to fit health data. Replace the position encoding with continuous age encoding, modify the output head to add an output head to predict the next event, and the attention mask to prevent tokens from affecting each other at the same time.
Delphi-2M can estimate the risk of more than 1,000 diseases, predict the timing of diagnosis, and simulate a complete health trajectory. Performance was optimized through hyperparameter tuning to generate a 2.2m parameter model combining predictive accuracy and generation capacity, providing a new approach to modeling multiple and long-term health progression.

aSchematic diagram of ICD-10 diagnosis, lifestyle and health trajectory based on healthy padding tokens. Each is recorded at a different age. btraining, validation, and test data derived from the UK Biobank (left) and the Danish Disease Registry (right). cDelphi model architecture. The red elements show changes compared to the underlying GPT-2 model. 'n×' indicates that the transformer block is applied n times in succession. dincludes example model input (prompt) and output (sample) (age: token) pairs. escales Delphi's law and shows the optimal validation loss as a function of model parameters for different training data sizes. fablation results were measured by differences in cross-entropy compared to age- and sex-based baselines (y-axis) for different ages (x-axis). gpredicted time event accuracy. The observed (y-axis) and expected (x-axis) times for the event are displayed for the next token prediction (gray dots). The blue line shows the average across consecutive bins on the X-axis.
Evaluating model performance
Delphi-2M performance was assessed using health data from 63,622 participants in the UK Biobank to age 60. The model generated simulated health trajectories and compared them with tangible results.
Prediction of disease rates at age 70 and 75 years is closely matched to observed patterns and confirms their ability to capture trends in population-level incidence. From the average, prediction accuracy has decreased over a longer period of view. auc Of approximately 0.76 to approximately 0.70 over ten years, Delphi-2M outperformed the model based solely on age and gender.
This model effectively distinguishes risks between subgroups defined by lifestyle or previous illnesses, supporting the value of personalized risk profiling.
Importantly, Delphi-2M can also generate synthetic health trajectories that reflect actual disease patterns without replicating individual records. The model trained with this synthetic data alone retained much of the original performance and only showed a 3-point degradation of AUC. This highlights potential applications of research that provide privacy.
To interpret predictions, researchers looked at the embedded space and revealed disease clusters that they matched ICD-10 We presented chapters and how certain diagnoses formed results, including the strong effects of pancreatic cancer on mortality.
External verification of Danish data confirmed generalization on average auc There is a slight drop in performance, but at about 0.67. Finally, the study acknowledged its limitations, including bias in the UK biobank recruitment process and patterns of missing data.
Conclusion
This study introduced Delphi-2M, a GPT-based model that can predict and simulate multiple disease progression over time. Delphi-2M showed strong accuracy in predicting health risks in conditions greater than 1,000 compared to single dissier or biomarker-based models.
However, there was a lower performance than single markers regarding diabetes risk HBA1C When tested with Danish data, there is only a slight drop in performance, but the approach.
The ability to sample future trajectories of synthesis allows for the creation of datasets that provide long-term disease burden estimation and privacy. This model was achieved by highlighting patterns of comorbidities and temporal effects of disease, including the risk of persistent death from cancer. auc Approximately 0.97 to predict death.
However, some restrictions were granted. The forecast reflects biases in UK biobank data, including healthy volunteer effects, recruitment bias, and missing patterns. Differences were also seen in the ancestors and socioeconomic groups as a whole. Importantly, this model captures statistical associations, but limits direct clinical use rather than causal associations.
Overall, Delphi-2M demonstrates the promise of a trans-based model for personalized risk prediction, healthcare planning, and biomedical research. Future improvements can integrate multimodal data, support clinical decision-making, and support policy development in aging populations.
Journal Reference:
- Learn the natural history of human diseases in generative trances. Shmatko, A., Jung, A., Gaurav, K., Brunak, S., Mortensen, L. H., Birney, E., Fitzgerald, T., Gerstung, M. Nature (2025). doi:10.1038/s41586-025-09529-3, https://www.nature.com/articles/S41586-025-09529-3
