A multi-source data-driven framework for probabilistic flood risk assessment using cascade machine learning models: case study in the Sichuan Basin

Model validation

Figure 2 present the Validation of simulated monthly meteorological variables (average temperature and precipitation) from MPI-ESM1.2 through the Transformer-SVM with Bayesian optimization and extreme data generator. Figure 2a–c illustrates the observed and simulated monthly average temperature at three stations during the period 1990–2014. The simulation results during the validation period (2007–2014) demonstrate that the model effectively captures temperature variations. The model not only reproduces the temperature trends accurately but also encompasses the observed values within a narrow uncertainty range. Regarding extreme values, the maximum temperatures at the three stations during the validation period occurred in July 2013, reaching 31.3 °C, 29.5 °C, and 28.7 °C respectively. The minimum temperatures at these three stations were recorded in January 2011, at 4.3 °C, 4.1 °C, and 4.1 °C respectively. The simulated dataset effectively encompasses these observed extreme values. When comparing the average values of the simulated ensembles, the RMSE values for the three stations are 1.53, 1.46, and 1.65 respectively, while the correlation coefficients stand at 0.98, 0.98, and 0.97 respectively. This indicates that the model is capable of accurately reproducing the temperature metrics.

Figure 2d–f shows the comparison of observed and simulated monthly precipitation at 3 stations during the period of year 1990 to 2014. The simulation results for the three stations demonstrate that the model effectively captures the temporal sequence characteristics of monthly rainfall, especially during the latter half of the validation period. By comparing the observed values with the simulated means, the RMSE values for the three stations are76.5, 59.6, and 64.9, respectively. The PBIAS metrics at all three monitoring sites (Site A: − 10.65, Site B: − 10.25, Site C: − 11.5), consistently demonstrated negative values. This pattern indicates a relatively low cumulative bias in the simulations, yet reveals a systematic underestimation trend across the study area, aligning with Type II error characteristics defined in hydrological model validation frameworks⁵⁹. The relatively larger RMSE deviations can be mainly attributed to two factors: (1) simulation biases resulting from the inherently strong randomness of rainfall,(2) the model’s difficulty in reproducing several extreme rainfall events during the validation period, such as the month of July 2007 when the monthly rainfall at all three stations close to or exceeded 500 mm. Due to the incorporation of the extreme data random regenerator and the presence of a small amount of data nearing or exceeding 400 mm in the training datasets from Chongqing and Gaoping observatories, a small portion of simulated data also exceeded 400 mm during the process of extreme value reproduction. However, there is a certain lag in the timing of their recurrence. Increasing the number of random ensembles could address such issues to some extent, but it would also result in a wider uncertainty range in the simulated dataset. The simulation results demonstrated that although the systematic underestimation bias in monthly precipitation modeling requires cautious interpretation regarding its hydrological impact on runoff/flood forecasting, the quantitative evaluation based on the PBIAS (absolute value < 25) indicates model performance satisfies the “Satisfactory” classification criteria established by Moriasi et al.⁵⁹ for watershed-scale hydrological modeling.

As the precipitation is a crucial factor in generating runoff, which subsequently leads to flood disasters, and in order to provide a more precise description of model accuracy on a temporal scale, monthly rainfall data is decomposed into a daily timescale using the KNN method. Figure 3 illustrates four statistical attributes of daily precipitation at three stations, namely, the probability of wet days, the average daily precipitation amount, the standard deviation of daily precipitation, and the maximum daily precipitation value, respectively. By comparing the average values of various indicators (derived from 20 ensembles of monthly data), the simulated data generally reproduce the observed values quite well. For the indicator of probability of wet days (Fig. a1, b1 and c1), the RMSE values of three stations are 0.12, 0.06 and 0.14, respectively. But the simulation shows poor fit for the month of June and July, with similar issues observed at all three stations. The main reason for this issue is that the monthly rainfall data for June and July are relatively high, but the number of wet days is lower compared to adjacent months, exhibiting a certain degree of uniqueness. Additionally, the training dataset used for the KNN model did not distinguish between months in order to increase the training volume. Consequently, this led to biases in the indicators for June and July. This can be addressed in the future by training a separate model specifically for the unique months. The model-simulated statistical metrics of daily precipitation (mean and standard deviation) exhibited high consistency with observational datasets across the three monitoring stations, as illustrated in figures a2–c2 (mean values) and a3–c3 (standard deviations). Quantitative evaluation revealed strong agreement in both central tendency and variability: Correlation coefficients (\(R\)) for mean daily precipitation reached 0.96, 0.97, and 0.97 at stations CQ, GP, and LZ respectively (\(RMSE\) values are 1.64, 1.04 and 0.65), while corresponding \(R\) values for standard deviation demonstrated comparable performance (0.95, 0.97, and 0.97). Figures a4, b4, and c4 present the simulation results of the monthly maximum values. Compared to the observed values, the simulated values exhibit a slight underestimation, especially for the month of July. This situation is inherently related to the results of reproducing the monthly extreme values. Therefore, in this study, based on the simulated daily precipitation, the Gumbel distribution was used to calculate extreme rainfall values for different return periods. Table 1 presents the reproducibility results for three stations. The results indicate that the simulated values underestimate the observed data. This systematic deviation aligns with the demonstrated underestimation in monthly precipitation simulations (mean values) relative to observational datasets, as quantitatively validated through persistent negative PBIAS metrics (Fig. 4). Compared to the maximum values in the simulated ensembles, the underestimation rates for the 100-year flood event at the three stations are 5.7%, 2.6%, and 12.1%, respectively.

Table 1 Extreme precipitation quantile estimates corresponding to multi-return periods (T = 2–100 years) derived from stationary extreme value analysis at three stations.

To verify the simulation capability of the PCA-SVM coupled model for surface runoff, this section initially utilizes observed meteorological data, including precipitation and temperature, as input variables. The results, presented in Supporting Information Fig. S1 (c), demonstrate that the model effectively captures both the localized variations and the overall trend in surface runoff. However, it exhibits some underestimation of certain peak values. However, incorporating the downscaled temperature and precipitation data (20 ensemble members with extreme-value adjustment) yields modest improvement in simulating runoff extremes. Therefore, the PCA-SVM coupled model can effectively simulate surface runoff. Next, the study will use the simulated monthly meteorological data from the validation phase (2007–2014) of model framework as input conditions to verify whether the entire model framework can reproduce the surface runoff data.

Figure 4 shows the comparison of observed and simulated monthly surface runoff during the Validation period based on simulated meteorological data at three stations. The results indicate that the simulated values effectively capture the inter-annual variability of surface runoff data. However, due to the underestimation of a few extreme values in the simulated rainfall data, the runoff simulation data also exhibit a certain degree of underestimation bias in corresponding extreme cases. For instance, at the Chongqing station (Fig. 4a), two extreme events in July 2007 and August 2009 exceeded 220 mm, yet the maximum simulated values were only 120 mm and 144 mm, respectively. Although the simulated values suggest the possibility of extreme runoff during these months, the specific data are significantly underestimated. Similarly, at the Gaoping station (Fig. 4b), an extreme monthly runoff event in July 2007 exceeded 350 mm, but the maximum simulated value was only 153 mm. At the Luzhou station shown in Fig. 4c, while the average runoff data is lower than that of the other two stations mentioned, the extreme value of 218 mm in September 2012 also shows some underestimation in the simulation. Nevertheless, when comparing the monthly average runoff values, the three stations recorded 64 mm, 52 mm, and 53 mm, respectively, with corresponding simulated values of 65 mm, 45 mm, and 50 mm. The deviation rates are 1.6%, 13.4%, and 5.7%, respectively. Except for the slightly larger deviation in the simulated data at the Gaoping station, the other two stations accurately reflect the monthly surface runoff data and their variations.

In summary of the Validation phases, the overall GCM-Meteorological data-surface runoff model framework demonstrates, through its simulation results during the Validation period, that the data can effectively simulate the monthly averages of surface runoff. However, a few extreme values are difficult to fully reproduce, and the model can only indicate the presence of extreme conditions. Specifically, the simulated data show relatively high values, but there is still a gap compared to the observed maximum values. Since this data will be used for qualitative assessments of whether significant flood disasters have occurred, it can be considered as referential.

Model prediction

Through model Validation, the entire research framework will proceed to simulate and predict future meteorological variables and surface runoff. The training data for this part of the model covers the period from 1990 to 2014, with predictions extending from 2015 to 2100. For the future economic scenarios, four different Shared Socioeconomic Pathways (SSPs) are employed: SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5. Figure 5 presents the annual rainfall trends at three stations under various scenario projections for the future period (the model simulates monthly data, which has been aggregated into annual data for better illustration of inter-annual trends). The reference benchmark values for the three stations, based on the average from 1990 to 2014, are 1192 mm, 1076 mm, and 1141 mm, respectively. Under the SSP1-2.6 scenario, the annual precipitation at these three stations is projected to remain relatively stable in the future, as illustrated in Fig. 5a1, b1, and c1. The annual average precipitation at these stations is expected to peak by the mid-century, with the average annual precipitation for the period 2041–2070 reaching 1232 mm, 1171 mm, and 1231 mm, respectively. This represents an increase of 3.4%, 8.8%, and 7.9% compared to the reference benchmark values. Under the SSP2-4.5 scenario (Fig. 5a2, b2, c2), the annual precipitation changes exhibit some similarities with those under the SSP1-0.26 scenario, both being relatively gradual. However, the peak values occur at the end of this century (2071–2100), with the average values of annual precipitation at the three stations reaching 1319 mm, 1203 mm, and 1250 mm, respectively. Under both the SSP3-7.0 and SSP5-8.5 scenarios, annual precipitation exhibits a marked upward trend, with Gaoping showing a particularly pronounced increase. In the extreme SSP5-8.5 scenario, the average growth rates for the three stations by the end of this century reach 16.9%, 31.9%, and 17.8% respectively. Within this scenario, the increases in Chongqing and Luzhou are relatively more gradual, whereas Gaoping experiences more drastic changes. Especially during the mid-century period from 2055 to 2065, its simulated average precipitation annual data ranges from a minimum of 1117 mm to a maximum of 1440 mm, indicating a relatively unstable rainfall trend.

Figure 6 presents the predicted annual average temperature at three locations during the future period under SSP5-8.5 scenario. From the images, it can be observed that under the extreme shared socioeconomic pathway (SSP5-8.5), the annual mean temperatures in Chongqing and Gaoping exhibit a notably pronounced and consistent upward trend. In contrast, the projected data for the Luzhou station shows a more moderate trend compared to the other two sites. By the mid-century (2041–2070), the average temperatures at these three stations are expected to reach 19.3 °C, 19.1 °C, and 18.6 °C, respectively. Towards the end of the century (2071–2100), the average temperatures are projected to continue rising to 20.0 °C, 20.2 °C, and 19.2 °C, respectively. In terms of growth rate, Gaoping location is expected to experience the most significant increase of 14.2% compared to the baseline temperature by the end of the century, a pattern similar to that observed in annual precipitation. Taking into account the annual temperature deviation of 0.7 °C within the region during the baseline period (1990–2014), the final model predicts a maximum temperature difference of approximately 1.2 °C among the three locations by the end of this century, indicating an increasing trend in temperature disparities between stations. Additionally, the model highlights an issue: due to the considerable distance between the three locations and the model’s framework not accounting for spatial correlation, there may be some spatial bias in the predictions for the three stations. Further refinements can be made to the model in the future to address this.

This study also analyzed the statistical characteristics of daily rainfall, focusing particularly on changes in monthly rainfall amount and wet-day probability under climate change. The results are presented in Fig. S3 (Supplementary Materials).

Based on predicted meteorological variables (temperature and precipitation), this study forecast the surface runoff data under future climate change scenarios. Figure 7 illustrates the annual surface runoff variations under two shared socioeconomic pathways (SSP1-2.6 and SSP5-8.5). Under SSP126, the surface runoff at the three stations exhibits relatively stable changes, with peaks generally occurring in the mid-twenty-first century (2041–2070), reaching 796 mm, 591 mm, and 627 mm respectively. The corresponding growth rates compared to the baseline values are 2.5%, 8.6%, and 4.1%, respectively. In the SSP5-8.5 scenario, the three stations show a continuous increase, with annual runoff reaching 879 mm, 765 mm, and 692 mm by the end of the century (2071–2100), representing growth rates of 13.2%, 40.5%, and 15.0%, respectively. This pattern aligns closely with changes in precipitation, suggesting that Gaoping may face significant flood risk in the future.

Flood risk identification

Table 2 compares the simulation performance of PCA-RF models trained with and without SMOTE preprocessing. The model was calibrated using data from 1990 to 2006 and validated against the 2007–2014 dataset. Observational records indicate that severe flood events occurred in only 5% of the total study months, revealing significant class imbalance in the original dataset. To address this imbalance, SMOTE was applied to the training dataset to equalize the representation of flood and non-flood months. Results demonstrate that the SMOTE-enhanced PCA-RF model outperformed its non-SMOTE counterpart in both accuracy and reliability. Specifically:

Predictive accuracy: The SMOTE-processed model exhibited narrower interquartile ranges (IQRs) across 20 simulation trials (90.63%–93.75%) compared to the non-SMOTE model, indicating enhanced robustness.
Event probability estimation: The SMOTE-integrated model achieved a mean probability density estimate of 4.56% for severe flood occurrence, closely aligning with the observed frequency of 4.2%. Its IQR (4.12%–4.89%) also showed reduced variability relative to the non-SMOTE approach.

Table 2 Comparative analysis of monthly probabilities of severe flood events performance between PCA-RF models with and without SMOTE using hydrological variables.

These findings confirm that the SMOTE-optimized PCA-RF framework effectively mitigates class imbalance-induced biases while improving simulation stability. Consequently, this integrated methodology has been selected for future flood projections under different SSPs during the future period, and presented in Fig. 8.

(a)

Near-term projections (Fig. 8a, 2015–2040)

Baseline simulations indicate monthly flood probabilities ranging from 9.12% (SSP3-7.0 scenario) to 12.31% (SSP5-8.5 scenario). The SSP3-7.0 ensemble demonstrates the lowest risk (mean = 9.12%, IQR: 7.78–9.29%), while SSP5-8.5 yields the highest probability (mean = 12.31%, peak = 14.34%), exceeding current observational baselines by > 200%.

(b)

Mid-century projections (Fig. 8b, 2041–2070)

SSP1-2.6 and SSP2-4.5 scenarios show comparable mean probabilities (10.15% vs. 10.22%), though SSP2-4.5 exhibits narrower IQRs (9.87–10.61%) suggesting higher confidence in elevated risks. SSP3-7.0 maintains the lowest probability (9.89%, + 8.4% vs. 2015–2040), while SSP5-8.5 shows intensified risks (mean = 12.75%, IQR: 11.39–13.75%) with left-skewed distribution compared to earlier projections.

SSP1-2.6 and SSP2-4.5 scenarios exhibit marginal probability increases (less than 1.2%). SSP3-7.0 displays expanded uncertainty (IQR width + 37%) despite a slight mean decrease to 9.68% (– 2.11% vs. mid-century). SSP5-8.5 projections stabilize at 12.71–12.83%, maintaining peak risk levels.

Collectively, these findings demonstrate that all SSPs demonstrate ascending flood probabilities over time, with SSP5-8.5 (incorporating extreme weather intensification and modified hydrological regimes) showing 2–3 times higher probabilities than historical baselines. SSP3-7.0 exhibits paradoxically moderated growth (+ 8.4% mid-century vs. + 15.2% in SSP2-4.5), potentially reflecting its socioeconomic assumptions about land-use regulation. This pathway divergence stems from the structural paradigm of SSP frameworks: SSP5-8.5 embodies a fossil-intensive development trajectory generating strong radiative forcing, whereas SSP3-7.0 reflects constrained climate adaptation allocations (less than 0.5% GDP) under regionally fragmented governance regimes. Their distinct land–atmosphere coupling processes drive fundamentally divergent hydrological responses.

Source link