Madden Julian Vibration Prediction Skill Calculation Inaccurate

The Madden-Julian Oscillation (MJO) is a major tropical weather system and one of the largest sources of intraseasonal to interseasonal weather predictability. The successful prediction of the MJO has been a very active research area due to its large socio-economic impacts. Silini et al. (S21 in this paper) developed a machine learning model to forecast the MJO and claimed to have an MJO forecast accuracy of 26-27 days in all seasons and 45 days in December-February (DJF) winter. If this were true, the accuracy of their model would be competitive with that of state-of-the-art dynamical MJO forecast systems, which are 20-35 days. However, here we show that the MJO forecast was miscalculated in S21, unfairly enhancing the model's performance. The correctly calculated accuracy of their model was significantly lower than that reported in S21. The forecast accuracy for all seasons drops to 11-12 days, and the forecast accuracy initialized during DJF drops to 15 days. Our findings reveal that the S21 machine learning model cannot compete with state-of-the-art numerical weather prediction models in predicting the MJO.

Prediction of the Madden-Julian Oscillation (MJO) has been an active area of research since the 1990s.^1,2 Due to the recognized importance of the MJO as a source of interseasonal to intraseasonal weather predictability, efforts have been made to improve forecasts of the MJO and standardize its evaluation.³Including participants of the WGNE MJO Task Force^4,5These coordinating bodies and studies have developed frameworks and metrics to fairly compare model performance for simulating the MJO across different models. Recent studies using these comparison frameworks have demonstrated impressive improvements in MJO forecasts.^1,3This is mainly due to improvements in dynamical models optimized for predicting the MJO.^FiveThe best dynamical models are good at predicting the MJO beyond 30 days¹Most show a forecasting ability of 20-25 days.¹.

Silini et al. (2021, hereafter S21)⁶ We introduced a machine learning model to forecast the real-time multivariate MJO index (RMM).⁷is a commonly used index to identify the MJO. RMM represents the MJO state with RMM1 and RMM2, which correspond to the leading pair of empirical orthogonal functions of outgoing longwave radiation and 850 hPa and 200 hPa zonal winds. S21 trained different kinds of machine learning models to predict the RMM indices using historical values as input and output. Details are described in S21.

An important index for evaluating the predictive ability of MJO forecasts, which are widely used in the meteorological and climate fields, is the bivariate correlation coefficient (COR).^1,4COR is calculated using formula (1) (same as formula (1) in S21).

$${\rm{COR}}\left(\tau \right)=\frac{\mathop{\sum}\nolimits_{t=1}^{t=N}[{a}_{1}\left(t\right){b}_{1}\left(t,\tau \right)+{a}_{2}\left(t\right){b}_{2}\left(t,\tau \right)]}{\sqrt{\mathop{\sum}\nolimits_{t=1}^{t=N}\left[{a}_{1}^{2}\left(t\right)+{a}_{2}^{2}\left(t\right)\right]}\sqrt{\mathop{\sum}\nolimits_{t=1}^{t=N}\left[{b}_{1}^{2}\left(t,\tau \right)+{b}_{2}^{2}\left(t,\tau \right)\right]}}$$

(1)

here no is the total number of predicted cases, τ The forecast lead time. One_1,2 The observed values of RMM1 and RMM2 are tEach. b_1,2 are the corresponding model predictions for RMM1 and RMM2, respectively, and the lead times are τ.

S21 states that equation (1) is used to calculate the predictive ability of the model, but when we tried to reproduce their results we found that this was not the case. Instead, we found that S21 had mistakenly used the following incorrect equation to calculate predictive ability:

$$\begin{array}{l}{\rm{incorrect}}\; {\rm{COR}}\left(\tau \right)\\=\frac{{\sum }_{i=0}^{i=\tau }{\sum }_{{t}_{0}=0}^{{t}_{0}=N}\left[{a}_{1}\left({t}_{0}+i\right){b}_{1}\left({t}_{0},i\right)+{a}_{2}\left({t}_{0}+i\right){b}_{2}\left({t}_{0},i\right)\right]}{\sqrt{{\sum }_{i=0}^{i=\tau }{\sum }_{{t}_{0}=0}^{{t}_{0}=N}\left[{a}_{1}^{2}\left({t}_{0}+i\right)+{a}_{2}^{2}\left({t}_{0}+i\right)\right]}+\sqrt{{\sum }_{i=0}^{i=\tau }{\sum }_{{t}_{0}=0}^{{t}_{0}=N}\left[{b}_{1}^{2}\left({t}_{0},i\right)+{b}_{2}^{2}\left({t}_{0},i\right)\right]}}\end{array}$$

(2)

It should also be noted here that t₀ represents the start date of the prediction. In equation (2), τ Compared with equation (1), S21 miscalculated the model’s MJO forecast skill by using the covariances for all lead times from zero. τ Instead of calculating the model's skill by lead time, τ For example, what S21 claims to be model skill at a lead time of 7 days incorporates information on forecast skill at all lead times from 0 to 7 days. In the correct formulation given in equation (1), ${b}_{\mathrm{1,2}}(t,\,\tau )$ Represent only Model values of RMM1 and RMM2 in lead time τ.

Personal communication with the authors of S21 confirmed that this error occurred unintentionally during the calculation of COR. We emphasize that equation (2) is not a proxy metric used in prior studies on MJO forecasting. Equation (2) is the equation actually used in S21 to evaluate MJO forecast skill, but S21 erroneously states that equation (1) is used. Therefore, the statement about forecast skill in S21 is incorrect and cannot be compared to previous MJO forecast skill evaluation studies. Because COR tends to decrease over time, the additional sum of the period from zero to 100 is τ In equation (2), the incorrect COR value tends to be larger than the correct COR value.

To quantify the impact of this error on the S21 results, we compare the predictive skill of the S21 model evaluated with the correct COR via equation (1) to the predictive skill of the S21 model evaluated with the incorrect COR via equation (2) (Figure 1). Here, we only show results for the feedforward neural network model of S21 (not the autoregressive recurrent neural network results); however, given the nature of the error, we are confident that our findings here apply to both types of models demonstrated for S21.

**Figure 1: Comparison of correctly and incorrectly calculated COR.**

In Figure 1, the erroneous skill calculated by equation (2) closely matches the skill published in S21 (see Figures 1 and 3 in Silini et al.).⁶). However, the corrected skill calculated by equation (1) shows that the actual forecast skill of the S21 model is much lower than reported. The full season skill drops from the reported 30 days to 11–12 days, and the forecast skill of the simulation started in December–February (DJF) drops from the reported 45 days to 15 days. Similar drops in forecast skill are observed for simulations started in March–May (MAM), June–July (JJA), and September–November (SON). Reproductions in other figures (e.g., S21 Fig. 5) show similarly large drops in model forecast skill (not shown). Considering that the MJO forecast skill of the best-performing dynamical models is currently 4–5 weeks, the drop in forecast skill by 1–2 weeks is substantial.

The debate about the relative merits of the models in S21 is invalid because the skills were miscalculated. Their machine learning framework for MJO forecasting turns out to be unable to compete with state-of-the-art dynamical models. Their models show skill comparable to classical linear models of the MJO.^1,8,We forecast the RMM using a simple linear regression technique, for approximately 1–2 weeks,,which means that the current state-of-the-art dynamical model remains the,best method for MJO forecasting.

We also note that there were other machine learning models that outperformed S21.^9,10,11This suggests that the S21 machine learning framework, which uses only RMM as machine learning input, may not have an advantage over existing machine learning approaches that use additional input variables. A more detailed comparison would be required to fully evaluate this claim. Evaluation of the S21 model using the correct equations highlights that the model does not outperform other existing machine learning approaches in predicting the MJO.

We do not wish to discourage or discourage continued research into MJO prediction using machine learning. However, careful adherence to best practices established by the MJO community and proper use of standardized metrics to evaluate model performance are necessary to ensure fair comparisons across different models. This is especially true given that MJO prediction has important implications for interseasonal to seasonal forecasts in both scientific and applied domains.

Previous studies suggest that the MJO has a maximum predictability of six to seven weeks.^12,13S21's original claim of MJO predictability exceeding 60 days for a given phase or season appeared to be particularly difficult to reconcile with this predictability estimate. Modifications to the equations used to evaluate the models reduced model performance below the reported predictability estimates. However, both dynamical and machine learning approaches to MJO forecasting should continue to strive to meet these predictability limits. Future research in this area will continue to reveal whether modeling frameworks can achieve this goal.

Source link