Skillful seasonal prediction of Afro-Asian summer monsoon precipitation with a merged machine learning and large ensemble approach

Machine Learning


Structure of forecast model

The structure of machine learning model for seasonal climate prediction (Y-model) is shown in Fig. 1. Firstly, the monsoon related big climate data is constructed for searching the potential predictors; Secondly, the predictors are cleaned by physical mechanism test and best predictors set, and further processed into large ensemble; Thirdly, independent hindcast is operated to find the effective predictors. The details about this model are shown in the following.

Fig. 1: The structure of machine learning model for seasonal climate prediction (Y-model).
figure 1

The monsoon related big climate data is constructed for searching the potential predictors. The predictors that passed the physical mechanism test are processed into 27 members of large ensemble according to three criterions derived from tranditional statistical forecast experience. Each member undergoes a cleaning process to retain the best-predictors-set. The extreme real-time predictors are then used to expand the large ensemble to a total of 351 members. Independent hindcast is operated relying on Facebook Prophet. The real-time prediction can be derived from the ensemble mean of skillful members during independent hindcast period.

I) The monsoon related big climate data is constructed (Supplementary Table 1). Many previous researches have documented that both the atmospheric circulations and external forcings significantly influence the AfroASMP. The anomalies in sea-land pressure difference (sea level pressure, SLP) and horizonal winds at 850 hPa (UV850) directly determine the intensity of AfroASMP. The subtropical high21,22,23 (geopotential height at 500 hPa, H500) and South Asian high24,25,26,27 (geopotential height at 200 hPa, H200) have been identified as the important components in the East Asian and South Asian summer monsoon. Air Temperature at 50 hPa (T50) and geopotential height at 50 hPa (H50) are used to represent the possible influence from the stratosphere, although the associated physical mechanism on seasonal scale requires further exploration. Key external forcings with the strong seasonal persistence characteristics physically contribute to the anomalous AfroASMP, through exciting the anomalous in-situ circulation or remote atmospheric teleconnection wave patterns. For instance, the important roles of El Niño28,29,30, sea surface temperature anomalies (SST) over Indian31,32 and Atlantic33,34, snow depth35,36 (SNOWD), sea ice cover over Arctic37,38 (SIC) and soil moisture/temperature (SOILM/SOILT) over Eurasia39,40 have been widely discussed. Considering our insufficient understanding of the underlying physical mechanisms, above variables located Northern Hemisphere or global region are included into big climate data. In order to find the previous and simultaneous predictors, leading 0–5 months datasets from the fifth major global reanalysis produced by ECMWF (the European Centre for Medium-Range Weather Forecasts) (ERA5)41 and future 1–5 months predictions of the Climate Forecast System, version 2 (CFS v2)42 are used.

II) The predictors are cleaned and processed into large ensemble. The potential predictor is derived from the regionally averaged value over significant region on the correlation coefficient (CC) map between predictand and each variable in big climate data. The image processing strategy provided by mathematical morphology43,44 is employed to automatically identify these significant regions. Only the predictors associated with significant anomalies in summer horizonal winds at 850 hPa over the monsoon domain are retained. Since one of the most fundamental characteristics of monsoon is the seasonal reverse of prevailing winds, the compellent extent of physical mechanism associated with the predictor is quantized by the accumulated numbers of grid cells covered by the significant regions on the correlation map between this predictor and summer horizonal winds at 850 hPa over the monsoon domain region of 0°–30°N and 0°–360°E. If the compellent extent is less than 2000 grids (18.5% of total grids), the predictor with unsignificant physical mechanism for the predictand is disregarded.

The remaining predictors with compellent physical mechanisms are recombined into a large ensemble. According to traditional statistical forecast experience, the predictive skill is dominated by three principles, which are the focused historical sample size (here, the last 15/20/30 years), the independence among predictors (CCs below 0.05/0.15/0.2) and the relationships between predictors and the predictand (CCs above 0.5/0.4/0.3). One threshold for each principle is used to create new member. After applying three thresholds of above three criterions, we ultimately have 27 large-ensemble members (cleaning and processing step in Data and methods section). Our forecasting model relies on Facebook Prophet45, which cannot effectively handle the seasonal prediction of AfroASMP without the inclusion of predictor (Supplementary Figure 2) and is sensitive to the input predictor (Supplementary Figure 3 and 4). Therefore, each member in large ensemble is further cleaned to remain only the best predictors set, which has the lowest the root-mean-square error (RMSE) skill during test period (cleaning and processing step in Data and methods section). Then, the large ensemble is expanded to 351 members after considering extreme real-time predictors, which are important for the magnitude of the predictand (cleaning and processing step in Data and methods section).

III) Independent hindcast during 2011–2022 is performed using 351 members of large ensemble. The training period starts from 1982. For each independent hindcast, the whole training period ends at the year before hindcast year, for example, the hindcast in 2020 has the entire training period of 1982–2019. The predictive skill of each member during hindcast period is measured by CC and RMSE (details in Data and methods section). The ensemble mean of the skillful members during hindcast period can be utilized for real-time prediction. The member is deemed skillful if its CC skill is higher than 95% confidence level and its RMSE skill is lower than 5% threshold of all members’ RMSEs in decreasing order. In addition, the ensemble mean is calculated as the average between two members with the highest CC skill and two members with the lowest RMSE skill, if no member passes above skillful test.

Predictive skills of AfroASMP at lead 4–12 months

The leading empirical orthogonal function (EOF1) modes of summer (May–September) terrestrial precipitation over three monsoon sub-regions covering Africa, South and East Asia are displayed in Fig. 2. There are generally uniform spatial patterns over Africa and South Asia, and dipole pattern over East Asia. Therefore, four monsoon precipitation indices of AfroASMP are defined as regionally averaged summer terrestrial precipitation over the African (AF, 3°–16°N, 17°W–40°E), South Asian (SA, 10°–35°N, 70°–85°E), South China Sea (SCS, 10°–25°N, 93°–122°E), and East Asian (EA, 25°–35°N, 93°–140°E), respectively. The large interannual variabilities of four monsoon indices of AfroASMP have been widely discussed2,21,26,28,30. During the recent decade, there has been an increase in both AF and SF, along with record-breaking anomalous EA events, such as the super Meiyu in 202046,47 and the extreme drought in 202248,49,50.

Fig. 2: The climatological summer precipitation during 1982–2022 over the AfroASM monsoon region (unit: mm).
figure 2

Monsoon region is defined as the area in which the local summer-minus-winter (May–September minus November-March) precipitation exceeds 300 mm and the local summer precipitation exceeds 55% of the annual total. The three pairs of sub-panels show the spatial pattern and corresponding principal component (PC1) of the first EOF mode (EOF1) for summer precipitation over three monsoon sub-regions (African, AF; East Asian and South China Sea, EA + SCS; and South Asian, SA). The percentage variance for EOF1 is given in the upper-right corners of the spatial pattern panels. The correlation coefficient (CC) between PC1 and the monsoon index is given in the PC panels.

The predictions and predictive skills for four monsoon indices of AfroASMP during 2011–2022 are shown in Fig. 3. Specifically, Y-model predictions are derived from the ensemble mean based upon skillful members during 2011–2022. Lead times of 12, 10, 8, 6 and 4 months indicate the initialization months in May, July, September, November of the year preceding the forecast target year, and January of the forecast target year, respectively. For comparation, the skills of four state-of-the-art coupled dynamic models at lead 1 month are also shown, including ECMWF51; DWD52; Meteo_France53; JMA54,55 and CFSv2 models (details in Data and methods section). CFS v2 shows significant skill for the AF with CC skill of 0.64 and RMSE of 7.22. ECMWF displays skill for the interannual variability of SA with CC skill of 0.64, but not for its magnitude evidenced by the RMSE of 21.97. The remaining three models don’t perform skillfully in predicting AF and SA. Unfortunately, seasonal predictions for EA and SCS pose challenges in current dynamic models, as all model CC skills fall below 0.3 with the majority even below 0.1, and all RMSEs exceed 14.6.

Fig. 3: Time series of the anomalous AfroASMP indices during 2011–2022.
figure 3

Black line is observation, colored lines are the predictions from the ensemble mean of skillful members in Y-model at lead 4–12 months. (a) EA index; (b), SCS index; (c), AF index; (d), SA index. e and f are the corresponding CC skills and RMSE skills of Y-model and five state-of-art models. The numbers of skillful members used for ensemble mean at each lead month are shown at the bottom of the panel for each monsoon index. For the predicted SCS at lead 10 months, the ensemble mean is calculated using two members with highest CC skill and two members with lowest RMSE skill since no members passed the skillful test. The boldface numbers are significant at the 95% confidence level using Student’s t-test.

The well predictions of Y-model can be persistently observed at lead times up to 12 months, not only for SA and AF, but also for EA and SCS. The interannual variabilities and magnitudes of predicted AfroASMP are generally consistent with the observation. The increased trends of AF and SA indices during the recent decade have been accurately predicted, including the floods of AF and SA in 2022. The predicted EA and SCS indices exhibit large variabilities in recent years, consistent with observations. Extreme events are skillfully predicted, such as the EA flood in 2021 and drought in 2022, as well as the SCS flood in 2011and drought in 2019. All CC skills of Y-model at lead 4–12 months are significant (the threshold CC is 0.58 at 95% confidence level) and all RMSE skills of Y-model are lower than that of CFSv2. Most of CC skills are higher than 0.6 or even 0.7 and most of reduced RMSE are more than 20% or even 30%. Y-model exhibits its lowest skill in predicting SA at lead 4 months, with CC of 0.58 and reduced RMSE of 11%. Surprisingly, its highest skill is observed in predicting AF at lead 12 months, with CC of 0.90 and reduced RMSE of 53%. There are also some bars, especially in magnitude, such as the overestimation of SCS drought in 2020 at lead 12 months and underestimation of AF in 2016 and 2017 at lead 4 months. It is worth to notice that the skills of Y-model don’t change as the lead months. Since the different physical mechanism associated with four monsoon indices, the highest skill of Y-model for each monsoon index are shown at different lead months.

The numbers of skillful members for each monsoon index vary, since the skill spread of 351 members differs based upon their distinct features. We take the skill spread at lead 12 months for instance (Supplementary Figs. 3, 4). The skillful members are 10 for EA, 18 for SCS, 18 for AF and 15 for SA. Compared to the no-skill members, the extreme real-time predictors are necessary for EA and SA predictions, since none of members without extreme real-time predictors have CC and RMSE skills for EA and RMSE skill for SA. In the case of SCS and AF predictions, the members without extreme real-time predictors show advantages in RSEM skills, but the extreme real-time predictors can produce better CC skills. Extreme values of 0.8 and 1.0 seem can better predict the interannual variabilities of EA and SA, but they are 1.5 and 2.0 for SCS and AF predictions, implying that multi-factors are responsible for EA and SA variations and high extreme factors contribute to AF and SCS variations. It should be noticed that the AfroASMP can’t be skillfully predicted if the potential predictors are input into Prophet all at once, since most of members have no skill (Supplementary Figs. 3, 4). Large ensemble is the cruel step in Y-model, which makes it possible to find effective predictors and operate skillful prediction.

Predictor information indicated by Y-model

Y-model uncovers some important predictors for AfroASMP from big climate data. For instance, the important predictors for the interannual variabilities of AfroASMP may be reflected by the first seven predictors selected by the highest-CC-skill-member during independent hindcast period (Fig. 4). Some variables involved the source of predictability for AfroASMP have been noticed by previous researches, such as the tropospheric biennial oscillation feature (UV850), and key external forcings (SST, Model-SST, SOILT, SIC). The signals of tropospheric circulations are very carefully used in traditional seasonal prediction, since its weakness in storing the anomalous signals based upon its rapidly balance feature. Y-model indicates some key tropospheric circulations (SLP, H500, H200, Model-T2m, Model-H500) for the AfroASMP, their signals may store into the external forcings through the in-situ anomalies of temperature and precipitation. SNOWD and SOILM with known physical mechanisms are hardly used by Y-model, their influences may be reflected by other predictors. It is worth to notice that the important role of stratosphere (T50/H50) for the seasonal prediction of AfroASMP is specially indicated by Y-model. T50/H50 is the important predictors for each monsoon index at all lead months, it is especially ture for EA at lead 12–8 months and SCS at lead 8–6 months, although the associated physical mechanism needs further investigation.

Fig. 4: First seven predictors and numbers of extreme real-time predictors used by the highest-CC-skill-member in Y-model.
figure 4

Upper bars represent the accumulated numbers of first seven variables during independent hindcast period and lines below depict the totoal numbers of extreme real-time predictors at each hindcast year. (a) EA index; (b), SCS index; (c), AF index; (d) SA index.

The extreme predictors associated with extreme AfroASMP are of considerable value, since extreme AfroASM is the crucial objective in seasonal prediction. Y-model provides some useful insights into the associated extreme predictors. The number of extreme real-time predictors generally increases with the augmented magnitude of AfroASMP, although there are some inconsistent changes (SCS in 2017 at lead 4 months and AF in 2017 at lead 4 months) (Fig. 4). For extreme EA, AF and SA in 2022, there are highest numbers of extreme real-time predictors at all lead months. This result implies that the extreme AfroASMP may be attributed to the leading extreme factors, besides of extreme simultaneous factors48,49,50. It is interesting that one of the important factors for the extreme AfroASMP indices in 2022 suggested by Y-model is the extreme anomalies of March-April (MA) stratosphere in 2021 (Supplementary Fig. 5). There are significant relationships between MA stratosphere over middle latitude and AfroASMP indices during historical period. The extreme negative anomalous T50 in March, 2021 (Supplementary Fig. 5b) may contribute to the extreme EA drought in 2022. There is extreme positive anomalous H50 in April, 2021 (Supplementary Fig. 5d), which may contribute to the extreme AF and SA flood in 2022. Stratospheric final warming (SFW) occurs during the period of spring, coinciding with the annual breakdown of the stratospheric polar vortex. Spring SFW events may influence the tropospheric circulation in following summer through the anomalous Northern Annular Mode and Arctic Oscillation56,57. Nevertheless, a more in-depth investigation is required to understand how the anomalous stratosphere in middle latitudes during previous spring continues to influence AfroASMP.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *