Machine learning predicts lifespan and suggests underlying causes of death in aging C. elegans

Machine Learning


Life-extending treatments differentially affect pathologies of aging

For ML analysis of the relationship between age-related pathology and lifespan, we first compiled a large dataset, spanning a wide range of interventions. This included both previously published data and data newly generated for this study. For a listing of all data and sources, see Supplementary Data 1, and Supplementary Table 2. The data comprised pathology severity measurements and lifespan estimates, across 47 different genetic or environmental conditions. Also included was data from several species in the Caenorhabditis and Pristionchus genera; among these, hermaphrodites show higher levels of senescent pathology and are shorter lived than females of gonochoristic sibling species7.

For data gathering, Nomarski microscopy was used to track the development of naturally occurring senescent pathologies. These can be readily assayed in vivo thanks to the optically transparent C. elegans cuticle. This approach builds upon earlier work9,10,12,15,16,17,18,19,20 characterizing in particular, five prominent age-related pathologies that occur in aging wild-type C. elegans. These are deterioration of the pharynx, atrophy of the intestine, accumulation of yolk-rich pseudocoelomic lipoprotein pools (PLPs), atrophy and fragmentation of the distal gonad, and development of teratoma-like uterine tumors (Fig. 1b, c, Supplementary Fig. 1a, b, Supplementary Table 1)7,12,15,20,21.

Treatments for which new data was gathered included the following (Fig. 1d). Culture on axenic medium plates, a putative dietary restriction treatment (Supplementary Fig. 2); diverse manipulations of insulin/IGF-1 signaling (IIS), affecting the daf-2 insulin/IGF-1 receptor and the daf-16 FOXO transcription factor (Supplementary Figs. 3–5), and manipulations of hsf-1 signalling (Supplementary Fig. 4f–j), protein synthesis, mitochondrial function (Supplementary Fig. 6) and the mTOR pathway (Supplementary Fig. 7). Pathology measurements were performed by sampling 10 animals from the population at multiple ages. Also included was a dataset from a cohort of wild-type animals, where the individuals were tracked throughout life, and pathology and lifespan were measured. This allowed a test of ML models in the absence of possible bias introduced by population sampling.

As expected, interventions varied in the severity of their effects on age-related pathology. Notably, not only did the degree of suppression differ between pathologies, but the relative degree of suppression also varied between treatments (Fig. 1d, Supplementary Figs. 3–7). Greater impacts were most often seen on the intestine and the pharynx, with weaker effects on uterine tumors. The presence of such variation provides a good basis for the application of ML to investigate how age-related pathologies affect lifespan in C. elegans. For further discussion of observed effects on pathology, see Supplemental Discussion.

Differential correlations between pathologies and metrics of declining health

Senescence involves diverse changes that are pathological insofar as they disrupt biological function and, in some cases, contribute to late-life mortality. The age-related changes in anatomy documented here are clearly degenerative (organ atrophy, hypertrophy, and structural deterioration), which strongly suggests that they are pathological in nature15,22. Other possible criteria for identifying them as pathological are that they contribute to age-related decline in health and to late-life mortality. We first probed the former possibility.

As C. elegans hermaphrodites grow older, they exhibit various behavioral impairments affecting, among other things, defecation, locomotion, and pharyngeal pumping. Age changes in these three health-related parameters were measured, and the patterns of decline seen were comparable to those previously described23,24,25. Fig. 2a–c displays comparisons of age changes in three pairs of health metrics and pathologies in wild-type animals under standard culture conditions, while Fig. 2d shows a correlation matrix between health and pathology metrics.

Fig. 2: Variation in age-related pathology with health metrics.
Fig. 2: Variation in age-related pathology with health metrics.

Health metric declines in early-mid adulthood. Selected comparisons of fits (extra sum-of-squares F test). a Close correspondence of decline in defecation with intestinal atrophy, and (b) of decline in movement (body bends) with uterine tumor development. c Loss of pharyngeal pumping occurs later than the main period of pharyngeal pathology development. ac 2 trials (n = 10/trial), error bars show S.E.M. d Correlation matrix of the different pathologies measured and decline in health metric (analysis of raw data). Pearson method. e Broad correlation between pathology and lifespan. Blue symbols, population data; red triangles, individual data. Pathology progression through time is converted into a gradient and then transformed into Z-scores (which describe a value’s relationship to the mean of a group of values) for standardization. This allows comparison between different pathologies by normalizing levels of a given pathology to the average level of that pathology in the group (represented by a Z-score of 0, with a Z-score of +2 representing the maximum pathology severity in the group, and −2 the healthiest animals). Intestine scores were normalized to the respective mean intestine score on day 1 prior to being converted to a gradient, such that the normalized intestinal pathology scores represent the change in percentage of intestinal volume. This takes into account differences in terms of the ratio of intestinal width to whole body width between different treatments and species. The average of all 5 pathology Z-scores is shown (x-axis) plotted against the real observed lifespan (y-axis).

Consider first the decline in defecation rate. This proved to be correlated most strongly with intestinal atrophy, as well as with pharyngeal deterioration and gonad atrophy (Fig. 2a, d). Here a plausible possibility is that intestinal atrophy impairs intestinal function, including defecation. Decline in locomotion proved to be most strongly correlated with uterine tumor development (Fig. 2b, d). Tumors frequently attain a great size, such that they fill much of the body cavity in the mid-body region; one possibility is that this impedes body bends, either by increasing mid-body rigidity or due to the tumor pressing against body wall muscles. These two examples are at least consistent with the view that intestinal atrophy and uterine tumor formation are pathological processes.

By contrast, pharyngeal pumping declined some time later than the appearance of pharyngeal pathology (Fig. 2c). This could imply that age changes in neuronal control of pumping rather than organ degeneration underlie the age-related decline in feeding rate. The co-occurrence of the mid-life anatomical changes and decline in movement and defecation provides further support for the designation of such changes as pathological.

Negative correlation between age-related pathology development rate and lifespan

Next, we investigated the relationship between age-related pathology and lifespan. To begin with, we conducted an overall analysis of the correlation between lifespan and the overall average severity of all five pathologies using our datasets (Supplementary Table 2). As a metric of pathology severity, we measured the rate of pathology progression. To obtain the rate of pathology progression, all raw data were first transformed into Z-scores (which describe a value’s relationship to the mean of a group of values). This was done to allow pathologies to be compared to one another; standard scoring systems that are used in the literature (and thus employed in this work) vary from one another, and so without this correction are not directly comparable. It should be stressed that such comparisons are only an approximation. Given the complexity of biological systems, it is currently not feasible to determine the full systemic effects of a specific pathology at a given level.

Following Z-score transformation, as one might expect, a negative correlation between mean pathology development rate and lifespan was observed, and this was consistent across all treatments and observed both in C. elegans and in the other nematode species used in this work (R2 = 0.5; Fig. 2e). Z-score transformation is appropriate in this case, given the synchronous nature of pathology development in C. elegans (Fig. 1c). This is consistent with the possibility that age-related pathology contributes to late-life mortality. Given that this is only an approximation, and considering the complexity of biological interactions alongside the challenge of determining whether all pathologies contribute equally to survival, or alternatively whether a subset is the predominant determinant of lifespan, machine learning (ML) was applied to explore this further.

Use of machine learning to identify potential life-limiting pathologies

In line with our predictions, results in this study underscore how different treatments that extend lifespan differentially suppress age-related pathologies (Fig. 1d, Supplementary Fig. 3-7). This raises the possibility that it is a combination of pathologies, in an independent or dependent manner, that affects lifespan. The latter would mean that the effect of one pathology on lifespan may be influenced by one or more other pathologies; such interactions could, in principle, follow any number of relationships, ranging from linear to non-linear.

To investigate the relationship between pathology severity and lifespan, an ML-based approach was employed. To obtain a dataset to interrogate that was of a sufficiently large size for ML analysis, we combined data generated in this study with various data from previous studies, from our lab and others, as described above (Supplementary Table 2). The dataset included a total of 434 observations across 23 conditions and 8 different species, up to day 11 of adulthood, by which age pathologies have usually reached maximum severity15.

First, we randomly split the data into two groups: (i) one group of 80% to build the model; and (ii) the remaining non-overlapping 20% of data to be used to validate the model. We trained different standard models (linear regression, ElasticNet regression, Support Vector Machine [SVM], random forest [RF], and multilayer perceptron [MLP]) using the 80% training set (raw pathology scores and corresponding lifespan data). We then evaluated the models using the unseen 20% testing set to identify which best predicts lifespan based on pathology at all ages assayed. The random forest (RF) model outperformed the others in predicting lifespan (R2 = 0.57; mean average error [mae] of predicted life: 4.0 days) (Supplementary Fig. 8a).

Next, we needed to account for the fact that the pathologies progress with time and that pathology values at different time points are not independent of one another. In other words, we needed a means to account for the incremental progression of pathology from one time point to the next. To this end, differences in scores through time were fed into the different models (i.e., pathology scores at each time point had the previous time point scores subtracted), with training and evaluation of the models as described above. Again, the RF model outperformed the others, yielding an R2 of 0.79 and mae of 2.77 days for scores measured on day 11 (Fig. 3a). In other words, 79% of the variation in nematode lifespan between different contexts can be predicted using the five age-related pathologies measured here. This was despite the high diversity in lifespan of the populations studied, which ranged from 7.5 to 40 days (mean: 18.3 days, lowest quartile: 13 days; Fig. 3b).

Fig. 3: Machine learning can predict lifespan from age-related pathology.
Fig. 3: Machine learning can predict lifespan from age-related pathology.

a Selection of a random forest (RF) machine learning (ML) algorithm as the best predictor of lifespan from pathology, and use of the RF model to predict lifespan with R² = 0.79. Left: The RF model was identified as the best predictor of lifespan. R² values and mean absolute error (in days) are shown for different ML models, including linear regression (Linreg), Ridge regression (Ridge), ElasticNet regression, Support Vector Machine (SVM), RF, and multilayer perceptron (MLP). All pathology measurements conducted up to day 14 of C. elegans lifespan were used. Day 14 was chosen because it is the time point at which maximum pathology severity was observed across all animals (see Fig. 1c). Right: Lifespan prediction using the RF model achieved R² = 0.79. This analysis was repeated using the RF model, but instead of using all data up to day 14, data from up to various earlier time points (i.e., up to day 1, 4, 7, 9, 11, and 14 of adulthood) were used separately. The R² for each is shown in the inset. Day 11 outperformed all other time points, which was predicted, as it represents the time point where pathology severity is generally highest before plateauing at its maximum around day 14 (see Fig. 1c). The scatter dot plot shows data only up to day 11 of C. elegans lifespan. Each dot represents a held-out test sample and shows observed lifespan versus lifespan predicted by the RF model using pathology data. Eighty percent of the animal samples were used to train the model, and the remaining 20% were used to evaluate it. Each sample corresponds either to a single animal (n = 55 wild-type individuals) or to the mean value of a population subjected to a specific treatment or genotype (n = 45 population means) observed at different intervals (see Methods). For comparison of different models using only data up to day 11 of adulthood, see Supplementary Fig. 8b. For coefficients of the linear regression model, see Supplementary Fig. 8b. p-values compare the RF model to other models. b All wild-type single animal lifespans, and average population lifespans of various treatments and species used to build the model plotted as a box plot. The average lifespan of all single wild-type worms, treatments, and species is 18.3 days. Note that some treatments extend lifespan while others shorten lifespan. Day 11 (shown in red) is the time point up to which pathology progression was scored and used to predict lifespan in (a). c Correlation matrix of the different pathologies measured (raw data) and the real observed lifespan. Pearson method. d Feature importance: RF model Mean Decrease in Impurity; long-lived refers to worms having survived for more than 18 days. t-test: long-lived vs short-lived animals. Note that the Pearson correlations shown in (c) quantify only linear pairwise associations, whereas the Random-Forest feature-importance scores in (d) (mean decrease in impurity) reflect non-linear effects and higher-order interactions; consequently, a variable with a low linear correlation can still rank as highly important in the multivariate RF model. ***p < 0.0001, ****p < 0.00001. For raw pathology and lifespan data used see Supplementary Datasets 1 and 2.

Next, we asked which pathologies contribute the most to lifespan, according to our model. In terms of model feature importance, the pharynx and intestinal pathology scores correlate the most strongly with observed lifespan (Fig. 3c). Furthermore, the greatest determinant of model prediction (based on Gini index or Mean Decrease in Impurity [MDI]) is pharyngeal pathology, by a wide margin (Fig. 3d, Supplementary Fig. 8c). This is consistent with previous findings pointing to links between lifespan and both pharyngeal and intestinal status7,15,26,27.

We also looked at differences between longer- and shorter-lived single wild-type worms, as well as longer- and shorter-lived treatment populations and species used in the model. Here, longer lived was defined as >18 days and shorter lived <18 days. This value was selected as it is the approximate mean C. elegans lifespan at 20 °C as well as the mean of all single wild-type worms, treatments (culture under standard vs non-standard conditions, e.g. axenic medium) and species used in this study (Fig. 3b). This analysis revealed that pharyngeal pathology is more predictive of lifespan in shorter-lived animals than in longer-lived ones (Fig. 3d). One possibility is that this reflects increased early deaths linked to pharyngeal infection26. By contrast, intestinal atrophy and PLPs are similarly predictive of life in shorter- and longer-lived animals. Notably, uterine tumors were more predictive of lifespan with treatments that extend life. This suggests that in animals subjected to life-extending treatments, the contribution of uterine tumors to late-life mortality increases.

Finally, to assess the predictive potential of this model for future datasets, we collected lifespan and pathology data from two new conditions not previously used in the model (creation or validation): wild-type animals cultivated with Stenotrophomonas MYb57 and Achromobacter MYb9, two bacterial isolates identified in wild C. elegans and shown to colonize the C. elegans gut28,29. Notably, predicted vs actual lifespan showed no significant difference across treatments, and predicted lifespans were within 1 day of empirically observed lifespans (Supplementary Fig. 9), supporting the predictive power of the ML model.

Sexual dimorphism in development of age-related pathologies

While conducting the above analysis, other observations relating to senescent pathology were made. In C. elegans, most aging research is directed at hermaphrodites, while relatively little is known about aging in males. Some sex differences have been previously noted, as follows. Males cultivated in isolation live longer than hermaphrodites30. Aging males do not accumulate yolk pools in the body cavity15, consistent with the absence of vitellogenin (yolk protein) synthesis in males31. Moreover, males do not exhibit distal gonad atrophy32 or marked intestinal atrophy15.

To give a fuller picture of sex differences in age-related pathology, we directly compared all five pathologies in the two sexes (unmated) (Fig. 4a–e). This confirmed the absence of yolk pools, intestinal atrophy, and gonad atrophy in males. Moreover, we detected no form of germline tumor equivalent to the teratoma-like uterine tumors seen in aged hermaphrodites20,33. By contrast, similar levels of pharyngeal pathology were seen in the two sexes (Fig. 4a–e). Thus, major pathologies develop rapidly in organs linked to reproduction (i.e., the gonad and intestine) in hermaphrodites but not males. This could imply that trade-offs linked to reproduction lead to pathology in hermaphrodites but not males. These results demonstrate marked sexual dimorphism in age-related pathophysiology in C. elegans and, therefore, potentially, in the causes of late-life mortality, and hint at a greater role of pharyngeal pathology in late-life mortality in males than in hermaphrodites.

Fig. 4: Sex difference in age-related pathology.
Fig. 4: Sex difference in age-related pathology.

Most of the age-related pathologies seen in hermaphrodites are largely absent from males (two-way ANOVA, Bonferroni correction); stars show statistically significant differences in pathology progression (ANCOVA, Tukey correction). a Pharyngeal deterioration is similar in both sexes. b Gonad atrophy is absent from males. c Germline tumors (teratoma-like uterine tumors in hermaphrodites) and (d) PLPs (yolk pools) are absent from males. e Gut atrophy is largely absent from males. 2 trials (n = 10/trial). ***p < 0.0001; ****p < 0.00001.



Source link