Short-term forecasts of streamflow in the UK based on a novel hybrid artificial intelligence algorithm

Streamflow rate predictions on reference rivers

This section focuses primarily on flow forecasting in three reference rivers, chosen to evaluate the performance of different forecasting models in areas of the UK characterized by different rainfall regimes. The evaluation metrics for the training and testing stages, calculated for all rivers, forecasting models and temporal horizon, are shown in Tables 3, 4 and 5. In addition, Figures from 6 to 10 show the comparison between measured and predicted flow rate during the testing stage, for the different prediction models and forecast horizons.

Table 3 Evaluation metrics for NARX modeling.

Table 4 Evaluation metrics for MLP-RF modeling.

Table 5 Evaluation metrics for NARX-MLP-RF modeling.

The first river considered was Tay at Ballathie, Scotland, with the second highest average annual precipitation over the catchment area and the highest average annual flow rate among the 18 rivers analyzed (see Section “Case studies and dataset”). The NARX-MLP-RF hybrid model outperformed both NARX and MLP-RF models. The best performance was observed for the shortest forecast horizon t = 1 day, with the NARX model outperforming MLP-RF model for both training and testing stages. As can be seen in Fig. 6, NARX led to a more accurate prediction of the peak flow rates. However, compared to MLP-RF, NARX showed a tendency to overestimate the flow rates more frequently than MLP-RF. Therefore, the NARX-MLP-RF hybrid model, combined the advantages of both models, leading to more robust predictions compared with the two individual NARX and MLP-RF models. As the forecast horizon increases, a decrease in accuracy was observed for all models. Specifically, for t = 3 days (Fig. 7), the difference in prediction accuracy between the NARX and MLP-RF models is more marked, with the latter still showing a good ability to predict flow rate trends but with a more accentuated underestimation of the peaks, compared to t = 1 day. However, again the NARX-MLP-RF hybrid model resulted in the best forecasts, although metrics were only slightly better than the individual NARX model. The worst predictions were observed for t = 7 days (Fig. 8), with NARX showing a significant over- and underestimation of flow rates compared to shorter forecast horizons. Also, MLP-RF shows a decrease in performance with, however, a lower dispersion compared to NARX, particularly for the medium–low values of flow rate (Figur 8b and d). Consequently, the best prediction was obtained with the NARX-MLP-RF hybrid model, which showed a limited accuracy reduction from a 3-day to 7-day ahead forecast horizon.

The second river analyzed in detail is the Ribble in Samlesbury, England. It showed, during the spring, a marked decreasing trend in both precipitation over the catchment area and streamflow. Figure 9 shows the comparison between measured and predicted flow rate, for forecast horizons of 1 day and 7 days, and for the NARX-MLP-RF hybrid model. Furthermore, the results for the individual models are shown in Tables 3 and 4. As for the testing stage, the best predictions were obtained for a forecast horizon of 1 day with the NARX-MLP-RF hybrid model, with R² = 0.91. The NARX model (R² = 0.90) resulted in slightly worse prediction than the hybrid model, while still providing more accurate forecasts than the MLP-RF model (R² = 0.85). Again, as the forecast horizon increases, a reduction of the prediction accuracy was observed for the three different models. However, for t = 7 days, MLP-RF (R² = 0.81) outperformed NARX (R² = 0.77), which, however, still led to higher MDA values, indicating a better ability to follow the flow rate trend (MLP-RF–MDA = 62.84%, NARX–MDA = 74.53%), whereas the NARX-MLP-RF hybrid model combined the strengths of the individual models leading to better predictions (R² = 0.81 and MDA = 76.31%).

The third reference river was the Thames at Kingston, in the south of England, which has the largest catchment area among the 18 rivers. This case study shows overall very accurate predictions for the three different forecast models and horizons. For t = 1 day and for the testing stage, R² values of up to 0.98 were calculated for MLP-RF and up to 0.99 for both NARX and the NARX-MLP-RF hybrid. The predictions became less accurate as the forecast horizon increased while maintaining higher accuracy under the same conditions, compared to the two previously investigated cases, with R² values up to 0.95 for MLP-RF and 0.98 for both NARX and NARX-MLP-RF, for t = 3 days. A marked decrease was observed only for t = 7 days for MLP-RF with R² = 0.88. Both NARX and NARX-MLP-RF showed an R² equal to 0.98, with a limited reduction in the other metrics (Fig. 10).

Overall, the high performance of the forecast models for the Thames at Kingston can be justified by particularly gradual variations in the flow rates, which facilitate the predictions of peaks along the time series, linked to the large catchment area and lower average rainfall compared to the rest of England, and with a homogeneous distribution throughout the year. These factors make the hybridization of NARX and MLP-RF less relevant in terms of forecast improvement. Conversely, forecast models for rivers with smaller catchments and higher but less homogeneous rainfall throughout the year, as in the case of Ribble at Samlesbury, benefited more from hybridization, with better forecasts and a lower reduction in performance as the forecast horizon increases.

One aspect investigated with special emphasis is the highest flow rates, which can represent critical scenarios as they can lead to flooding. From this point of view, relative errors were calculated with reference to the first decile of flow rates for the three different models and for different forecast horizons. The relative errors were calculated as the difference between the predicted and measured values, divided by the measured values. Histograms with the frequency of the relative errors for the three reference rivers are shown in Figs. 11, 12 and 13, respectively. For the Tay River at Ballathie (Fig. 11) and t = 1 day, the relative errors were in the range −0.5 ÷ 0.4, with an almost symmetrical distribution for all three models. In particular, the NARX-MLP-RF ensemble model showed the highest frequency of low relative errors, equal to 24% and 29% for relative errors between −0.1 and 0 and between 0 and 0.1, respectively. MLP-RF, on the other hand, showed a lower frequency of relative errors between −0.1 and 0 and between 0 and 0.1, amounting to 19% and 23%, respectively. The NARX model showed a similar frequency distribution to the NARX-MLP-RF ensemble model with, however, slightly lower frequencies for lower relative errors. As the forecast horizon increases, the accuracy of the three models is reduced. Thus, a decrease in frequency was observed for the lower relative errors, with a subsequent increase in frequency for the higher relative errors. For t = 7 days, the NARX-MLP-RF ensemble showed the highest frequency for the relative errors between −0.1 and 0, i.e., 25%, maintaining a rather symmetrical distribution. In contrast, the NARX model showed a less symmetrical distribution with a frequency of around 20%, for relative errors between −0.3 and −0.2. Frequencies in the order of 20% were also observed for the MLP-RF model, both for relative errors between −0.3 and −0.2 (as for NARX) and between −0.2 and −0.1. This result showed a tendency for the NARX and MLP-RF models to underestimate peak flow rates.

For the Ribble at Samlesbury (Fig. 12) and t = 1 day, the relative errors were in the range -0.6–0.6. The NARX-MLP-RF ensemble showed the highest frequency of low relative errors of 21% for both relative errors between −0.1 and 0 and between 0 and 0.1, showing an almost symmetrical distribution. In contrast, MLP-RF showed a lower frequency of relative errors between −0.1 and 0 and between 0 and 0.1. The latter also showed a peak frequency of 17% for relative errors between −0.2 and −0.1, showing a more skewed distribution than the NARX-MLP-RF ensemble model. The NARX model showed lower frequencies, compared to NARX-MLP-RF, for the relative errors between −0.1 and 0 and between 0 and 0.1, amounting to 20% and 16% respectively. As the prediction horizon increased, an increase in the variance of the relative error distributions was observed, with a reduction in the frequencies corresponding to the lowest relative errors. In particular, the NARX model also showed relative errors in the range between −0.9 and −0.8, but with a very low frequency of 2%. All three models showed a higher frequency of negative relative errors, indicating that underestimates of extreme flows exceed overestimates in terms of frequency. However, the NARX-MLP-RF ensemble still showed a peak frequency of 18% for both the low relative errors between −0.1 and 0 and between 0 and 0.1.

A lower variance in relative errors was observed for the Thames first-decile flow forecasts in Kingston (Fig. 13), compared to the other two reference rivers. Specifically, for t = 1 day, the NARX-MLP-RF ensemble model showed frequencies of 57% and 35% for the lowest relative error between -0.1 and 0 and between 0 and 0.1, respectively. Furthermore, the relative errors were generally within a narrow range, between −0.2 and 0.2. MLP-RF showed a slightly worse situation, with a higher frequency of negative relative errors of 8% and 4%, between −0.2 and −0.1 and between −0.3 and −0.2, respectively. As the forecast horizon increased, the NARX-MLP-RF model still showed an almost symmetric distribution, while both NARX and MLP-RF showed an increase in the frequency of negative relative errors, resulting in a more asymmetric distribution that confirms a greater underestimation of peak flows than the NARX-MLP-RF ensemble model.

Overall, the outcomes observed for streamflow rate prediction preformed on whole time series were in agreement with what observed for the high flows. Actually, while for rivers like the Ribble, with smaller catchments and higher but less homogeneous rainfall throughout the year, relative error ranges were quite wide, for rivers with large catchments and more homogeneous rainfall like the Thames the relative error ranges were narrower, indicating a greater accuracy in the prediction of high flows. However, the hybrid NARX-MLP-RF model proves to be the best, with the NARX and MLP-RF models leading to more asymmetrical distributions even over larger basins.

Streamflow rate predictions for the whole of UK

This section discusses the streamflow forecasts performed with the hybrid NARX-MLP-RF model, with reference to the testing stage, for all investigated rivers. Figure 14 provides a map with the different evaluation metrics, for R²–MAPE and RMSE–MDA couples, as the forecast horizon increases. Metrics are also shown in Table 5.

The R² coefficient showed values ranging from 0.77 to over 0.99 for the 1-day forecasts. R² decreased as the forecast horizon increased, in some cases dropping to values in the order of 0.7 for the 7-day forecast. However, there is a marked territorial difference. For rivers in the south of the UK, an R² of over 0.8 was obtained, with peaks as high as 0.95, even for 7-day forecast, while for rivers in Scotland, particularly those in the north-east, lower values of 0.77 and 0.7 were obtained for the 1-day and 7-days ahead predictions, respectively. The MAPE shows a trend in agreement with the R² values, with values between 1 and 26%, and increasing with the forecast horizon.

The RMSE values were consistent with the R² maps, with lower values for the rivers of England and Wales, ranging from about 4 m³/s to 18 m³/s, and higher values for Scotland. The increase in RMSE as the forecast horizon increased was most pronounced for the northern UK, with RMSE up to about 40 m³/s for 7-days ahead predictions. However, many rivers of England and Wales were characterized by RMSE values between 4 m³/s and 18 m³/s even for 7-days ahead predictions. In addition, MDA values between 64 and 88% were calculated, showing a good ability of the forecasting model to follow the right direction along the streamflow time series. A slight reduction was observed as the forecast horizon increases, with, however values between 64 and 70% observed only for rivers in central and north-east Scotland, where the lowest R² values were also obtained.

Overall, the hybrid NARX-MLP-RF model resulted in good predictions for all rivers and forecast horizon. However, the performance of the forecast model is highest for rivers with large basins and a homogeneous distribution of rainfall throughout the year, as observed for several English rivers, while it is lowest for rivers with smaller basins, characterized by less homogeneous rainfall, where peak prediction is more challenging due to the sudden variation in stream flow.

In order to provide an overview of how model performance changes with the forecast horizon, the percentage increase in MAPE, from a 1-day to a 7-day forecast horizon was analysed and reported in Fig. 15.

In particular, the ensemble NARX-MLP-RF model showed the lower MAPE variations for most stations, followed by the NARX model. Both showed MAPE variations of less than 10%. In contrast, MLP-RF showed more marked MAPE variations, with a maximum value of 56% for Tamar at Gunnislake. However, for some stations, MLP-RF also showed MAPE variation of less than 10%. For example, for Test at Broadlands, the MAPE variation was 4.57%. However, for the same station, NARX and NARX-MLP-RF showed lower MAPE variations of 3.60% and 2.90% respectively. It was noted that there is an appreciable correlation between the increase in MAPE just considered and the CV of the flow time series (Fig. 15). The correlation is high for the NARX model (r = 0.82) and rather high for the NARX-MLP-RF ensemble model (r = 0.72), whereas it is significantly lower for the MLP-RF hybrid model (r = 0.58).

This result demonstrates that while the decrease in accuracy of the forecast models, as the forecast horizon increases, is proportional to the variability of the streamflow during the time series, and this decrease is much less pronounced in the NARX model than in the hybrid RF-MLP one. However, this aspect needs further investigation and specific studies.

Source link