FuXi Weather operates in a cycling analysis and forecasting mode, utilizing the full range of available satellite data. Because DA is inherently an ill-posed problem38,48 requiring background forecasts to improve analysis accuracy, we developed a variant of FuXi-DA without these forecasts to evaluate their contribution to the DA process. This variant, which relies exclusively on observations, represents a direct-from-observation prediction approach. Performance was assessed by comparing the accuracy of analysis fields and forecasts globally and in specific regions such as central Africa and northern South America, using ERA5 as the reference. The performance of FuXi Weather was compared with that of ECMWF HRES, which was evaluated using the time series of its 0-h lead time analysis, HRES-fc0 (see Section “Evaluation method”). This comparison inherently favors HRES at early lead times, since by definition it starts with a low root mean square error (RMSE) and a high anomaly correlation coefficient (ACC). Consistent with the common practices in the NWP community, FuXi Weather was also evaluated against its analyses. Statistical significance testing was conducted following the methodology outlined by Geer49. Single observation tests validated DA responses against theoretical expectations while data denial experiments (see Supplementary Information Section 5) evaluated the impact of excluding certain observations.
Global analysis fields
This subsection evaluates the performance of FuXi Weather analyses and 42-h FuXi forecasts (initialized with ERA5), against ERA5 as the reference. Figure 1 presents the globally-averaged and latitude-weighted RMSE for two FuXi Weather configurations: one incorporating background forecasts and one without. Performance varied markedly across different variables and pressure levels. The RMSE of analysis fields relative to forecasts is higher at 850 hPa than 300 and 500 hPa, likely owing to the lower information content from satellite observations at lower altitudes.

The time series shows the globally-averaged and latitude-weighted root mean square error (RMSE) relative to ERA5 for: the analysis fields of FuXi Weather with (solid red lines) and without (solid black lines) background (bg) forecasts, along with 42-h FuXi forecasts initialized using ERA5 (dashed blue lines). The comparison includes five variables: relative humidity (R), temperature (T), geopotential (Z), u component of wind (U), and v component of wind (V), at three pressure levels (300, 500, and 850 hPa). The five rows and three columns correspond to five variables and three pressure levels, respectively. To improve clarity, the original data are shown with reduced opacity, while solid lines represent smoothed values using a 12-point moving average. Both FuXi Weather analyses (black and red) and 42-h FuXi forecasts (blue) are evaluated against ERA5.
For relative humidity (R), the analyses of FuXi Weather outperform forecasts at 300 and 500 hPa, but have slightly higher RMSE values at 850 hPa. For temperature (T), geopotential (Z), and wind components (U and V), the RMSE values are comparable to those of forecasts at higher altitudes but were consistently higher at 850 hPa. Although satellite data primarily capture temperature and moisture information, their assimilation also improves wind fields through the dynamic relationship between wind, temperature, and moisture. Wind can be inferred from temperature gradients (geostrophic balance) and the movement of atmospheric constituents, such as humidity, known as the “generalized tracer effect”23.
Incorporating background forecasts yields statistically significant improvements in the accuracy of FuXi Weather analysis fields, as demonstrated by systematically lower RMSE values. This highlights the crucial role of background forecasts in DA, which is ill-posed without prior information (as detailed in Supplementary Information Section 9). Both configurations of FuXi Weather show similar trends over time, but the analyses without background forecasts exhibit more pronounced error peaks, especially when some satellite data were missing (see Supplementary Figs. 1 and 2), underscoring the stabilizing effect of background forecasts.
The shaded area in Fig. 1 represents variations across initialization times; this is more pronounced in forecasts. Forecasts initialized at 00/12 UTC consistently outperform those at 06/18 UTC, likely because the 12-h observation windows of ERA5 (09-21 UTC and 21-09 UTC)43 provide 9 h of look-ahead time for 00/12 UTC but only 3 h for 06/18 UTC13. In contrast, the analysis fields of FuXi Weather demonstrate more consistent accuracy across initialization times, likely due to its fixed 8-h assimilation window, and its use of cycled background fields initialized from previous analyses. Additional evaluations, including the analysis activity and mean bias error (MBE), are provided in Supplementary Information Section 6.
Global weather forecasts
The primary criterion for evaluating an end-to-end weather forecasting system is its ability to provide reliable and accurate forecasts in a cycling analysis and forecasting mode. This subsection evaluates the performance of 6-h cycle forecasts generated by FuXi Weather, initialized using two types of FuXi-DA analysis fields: one incorporating background forecasts and one without. The forecasts are compared with those from ECMWF HRES.
Figure 2 shows the globally-averaged and latitude-weighted RMSE as a function of forecast lead times over 10 days. FuXi Weather forecasts are initialized using FuXi-DA analysis fields either with (red solid and green dashed lines) or without (black lines) background forecasts. Forecasts depicted by red and black lines are evaluated against ERA5, while the green dashed lines represent forecasts assessed against the FuXi-DA analyses. Statistically significant improvements in FuXi Weather forecasts (red lines) over ECMWF HRES are indicated by red dots, based on t-test at the 95% confidence level. When validated against ERA5, FuXi Weather forecasts initialized with background-inclusive analyses (red lines) consistently demonstrate lower RMSE values than those without, aligning with results in Fig. 1. Regardless of the evaluation reference (ERA5 or FuXi-DA analyses), the performance gap between forecasts (red and green dashed lines) diminishes over lead time and becomes negligible by day 10.

The figure presents the globally-averaged and latitude-weighted root mean square error (RMSE) for forecasts generated by the FuXi model and ECMWF HRES (blue) in 10-day forecasts. FuXi forecasts are initialized using analysis fields produced by FuXi-DA with (red solid and green dashed lines) and without (black) background forecasts. The evaluation includes 5 variables: relative humidity (R), temperature (T), geopotential (Z), u component of wind (U), and v component of wind (V), at three pressure levels (300, 500, and 850 hPa). The five rows and three columns correspond to five variables and three pressure levels, respectively. FuXi forecasts (red and black lines) are verified against ERA5, and also against FuXi-DA analyses (green dashed lines). When FuXi (green dashed lines) and ECMWF HRES (blue) forecasts are evaluated against their respective initialization time series, they inherently exhibit lower RMSE at early lead times. Red dots indicate time steps where FuXi Weather significantly outperforms ECMWF HRES, based on the t-test at the 95% confidence level. The performance change on day 4 arises from the model transition from FuXi-Short to FuXi-Medium.
When evaluated against their respective analyses, both FuXi Weather and ECMWF HRES show small initial errors. Against ERA5, FuXi Weather initially shows higher RMSE values than ECMWF HRES, but outperforms ECMWF HRES after a lead time of 2–8 days, depending on the variable and pressure level. For R, FuXi Weather outperforms ECMWF HRES at lead times of 2.00, 3.25, and 2.25 days for 300, 500, and 850 hPa, respectively. For T, Z, U, and V, the critical lead times are later owing to the lower accuracy of their corresponding analysis fields. For Z, these times are 8.00, 7.75, and 7.50 days at 300, 500, and 850 hPa, respectively. The performance discontinuity on day 4 reflects the transition between FuXi-Short and FuXi-Medium forecast components.
Figure 3 shows similar trends for the globally-averaged and latitude-weighted ACC. FuXi Weather forecasts initialized without background forecasts perform worse, as expected. However, FuXi Weather forecasts initialized with analyses incorporating background forecasts, though initially less accurate than ECMWF HRES, improve over time and eventually achieve higher ACC values across all examined variables. Using an ACC threshold of 0.6 to define a skillful forecast, Fig. 4 compares skillful lead times. FuXi Weather extends skillful lead times for 7 out of 15 variables, matching ECMWF HRES for 6 others. For example, for Z500, FuXi Weather extends the skillful lead time from the ECMWF HRES value of 9.25 days to 9.50 days for forecasts initialized with background forecasts (forecasts initialized without background forecasts show a skillful lead time of only 8.25 days). Additional forecast comparisons, including spatial RMSE distributions, are provided in the Supplementary Information Section 7.

The figure presents the globally-averaged and latitude-weighted anomaly correlation coefficient (ACC) for forecasts generated by the FuXi model and ECMWF HRES in 10-day forecasts. FuXi forecasts are initialized using analysis fields produced by FuXi-DA with (red solid and green dashed lines) and without (black) background (bg) forecasts. The analysis includes five variables: relative humidity (RH), temperature (T), geopotential (Z), u component of wind (U), and v component of wind (V), at three pressure levels (300, 500, and 850 hPa). FuXi forecasts (red and black lines) are verified against ERA5, and also against FuXi-DA analyses. The five rows and three columns correspond to five variables and three pressure levels, respectively. When FuXi (green dashed lines) and ECMWF HRES (blue) forecasts are evaluated against their respective initialization time series, they inherently exhibit higher ACC in early lead times. Red dots indicate time steps where FuXi Weather significantly outperforms ECMWF HRES, based on the t-test at the 95% confidence level. The performance change on day 4 arises from the model transition from FuXi-Short to FuXi-Medium.

Skillful forecast lead times of ECMWF HRES and FuXi Weather for five variables: relative humidity (R), temperature (T), geopotential (Z), u component of wind (U), and v component of wind (V), at three pressure levels (300, 500, and 850 hPa), using all testing data over a 1-year testing period, spanning July 03, 2023–June 30, 2024. The five rows and three columns correspond to five variables and three pressure levels, respectively.
Forecast performance in central Africa
Operational evaluations of NWP systems routinely assess both global and regional performance metrics16, covering geographical areas such as Europe, North America, East Asia, and Australia. However, forecast accuracy tends to be lower in low-income countries, largely due to limited investment in weather observation infrastructure. This issue is especially concerning for many low-income countries, where agriculture is a major economic sector that relies heavily on accurate weather forecasts. Climate change further exacerbates weather-related risks, disproportionately affecting vulnerable populations with low adaptive capacities in these countries. Therefore, improving forecast accuracy in underserved regions, especially Africa, is crucial for enhancing climate resilience50,51.
This subsection compares the performance of FuXi Weather and ECMWF HRES in underserved regions, with a particular focus on central Africa. Similar to Fig. 2, FuXi Weather forecasts are evaluated against both ERA5 (red lines) and its analyses (green dashed lines). Figure 5 illustrates that, when verified against their respective analyses, FuXi Weather (green dashed lines) consistently outperforms ECMWF HRES (blue lines) in forecasting the 850 hPa u wind component (U850), 2-meter temperature (T2M), and mean sea level pressure (MSLP), throughout the 10-day forecast period. When evaluated against ERA5, FuXi Weather (red lines) has nontrivial initial error, but the magnitude and growth of this error are sufficiently modest that ECMWF HRES—even compared to its own analyses, so with inherently zero initial error—exhibits larger error after two days. In particular, FuXi Weather (red lines) achieves lower RMSE and higher ACC, with ACC values for T2M consistently exceeding 0.6 across the 10-day forecasts, indicating meaningful predictive skill. In contrast, ECMWF HRES maintains skillful T2M forecasts for approximately two days.

Central Africa is defined as the region spanning 15° E to 35° E in longitude and 10° N to 10° S in latitude. Rows 1 and 2 show the root mean square error (RMSE), and anomaly correlation coefficient (ACC) for forecasts generated by FuXi Weather (red solid and green dashed lines) and ECMWF HRES (blue). FuXi Weather is initialized using analysis fields produced by FuXi-DA incorporating background forecasts. This figure includes three variables: 850 hPa u wind component (U850), 2-meter temperature (T2M), and mean sea level pressure (MSLP). FuXi forecasts (red) are verified against ERA5, and also against FuXi-DA analyses (green dashed lines). When FuXi (green dased lines) and ECMWF HRES (blue) forecasts are evaluated against their respective initialization time series, they inherently exhibit lower RMSE and higher ACC in early lead times. Red dots indicate time steps where FuXi Weather significantly outperforms ECMWF HRES, as paired difference passed the 95%-confidence-level the t-test of significance.
Forecast errors are further decomposed into systematic and random components by calculating the MBE and the standard deviation (std) of errors (STDERROR). Supplementary Fig. 22 reveals that FuXi Weather (red lines) exhibits both lower MBE and smaller STDERROR across all five evaluated variables: U850, 850 hPa temperature (T850), T2M, MSLP, and total precipitation (TP). These results suggest that FuXi Weather more effectively reduces both systematic bias and random errors compared to ECMWF HRES, contributing to its overall superior forecast performance. Improvements relative to HRES in TP forecasts are of note due to precipitation’s socioeconomic importance in central Africa, although with the caveat that HRES performance is relatively poor for TP in this region. Forecast behavior is further characterized using forecast activity40, defined as the std of forecast anomalies relative to climatological means and normalized by ECMWF HRES forecast activity. As shown in Supplementary Fig. 22, FuXi Weather normalized forecast activity values indeed drop below 1, suggesting smoother predictions relative to ECMWF HRES. This reduction in forecast activity may partially account for FuXi Weather’s improved performance. However, FuXi Weather’s superior forecast skill (red lines) over ECMWF HRES becomes evident as early as day 1, prior to any considerable reduction in forecast activity. The forecast activity of FuXi Weather decreases gradually until around day 2 and then stabilizes, indicating that FuXi Weather’s enhanced accuracy arises earlier than the substantial reduction in forecast activity and cannot be fully attributed to it.
Notably, FuXi Weather achieves superior forecasts for surface variables without assimilating surface-based observations, pointing to its strength in utilizing satellite data in regions with limited in-situ observational infrastructure. Further analysis (see Supplementary Information Section 7) reveals that FuXi Weather also outperformed ECMWF HRES in other data-sparse regions, such as tropical oceans and South America, although it is less competitive in areas with dense surface observations. In central Africa, where observational networks are sparse, the efficient use of satellite data by FuXi Weather closes the performance gap with ECMWF HRES, resulting in superior forecasts.
Supplementary Fig. 23 illustrates two 10-day forecast time series for two randomly selected initialization times, while Supplementary Fig. 24 presents forecasts at a fixed 3-day lead time. Both figures confirm that FuXi Weather more closely aligns with its benchmark than ECMWF HRES, reinforcing the results in Fig. 5. Additionally, Supplementary Fig. 25 shows FuXi Weather’s superior performance, particularly for T2M, MSLP, and TP over northern South America, where observational coverage is also sparse relative to Europe or North America. However, the reduction in forecast activity may partially contribute to these improvements. A detailed discussion on the trade-offs between forecast accuracy and activity is provided in Supplementary Information Section 12. While incorporating generative models or differentiable solvers for atmospheric dynamics could potentially enhance forecast activity without compromising accuracy18,52,53, an in-depth investigation of these approaches is beyond the scope of this study.
Due to substantial biases in TP data from ERA554, the Integrated Multi-satellite Retrievals for the Global Precipitation Measurement (GPM) (IMERG)55,56 is used to evaluate TP forecasts over central Africa and northern South America, respectively. As shown in Supplementary Fig. 26, FuXi Weather achieves lower RMSE than ECMWF HRES, relative to IMERG. However, both FuXi Weather and ECMWF HRES exhibit undesirably low ACC and substantial MBE when evaluated against IMERG. In FuXi Weather, this deficiency is likely inherited from its training with ERA5, underscoring the potential advantages of training with more accurate observational datasets, such as IMERG, to further improve FuXi Weather’s precipitation forecasts.
Overall, these preliminary results suggest that FuXi Weather can produce forecasts of comparable or potentially improved accuracy relative to traditional NWP systems, despite relying on substantially fewer observations. The superior performance of FuXi Weather relative to ECMWF HRES may be attributed to two primary factors: (1) enhanced ability to mitigate both systematic biases and random errors, and (2) reduced forecast activity. While further advancements, such as improving forecast activity, are necessary, FuXi Weather represents a promising and cost-effective alternative for regions with limited observational infrastructure. Future work will include further validation against independent observational datasets to better evaluate its performance advantages.
Physical consistency of analysis changes
FuXi Weather, as a data-driven machine learning system, does not inherently encode prior physical knowledge of atmospheric processes. This subsection examines the impact of assimilating a single observation on background fields and assesses whether the resulting changes align with theoretical expectations.
Two FuXi-DA runs were conducted: the first using a 6-h forecast with original observations, and the second with a perturbation introduced to raw satellite data from individual channels at a specific observation location. The differences between these two runs reflected the changes in analysis fields caused by the perturbation (details in Supplementary Information Section 4.1). The first run, initialized at 06 UTC on July 24, 2023, assimilated all available data to generate the analysis. In the second run, a +5 K perturbation was introduced into the NOAA-20 ATMS raw observation at 19.9° N, 125.5° E (marked as a purple dot in Supplementary Fig. 9), near Typhoon Doksuri over the ocean. The impact of this perturbation was evaluated by comparing outputs from both runs. The satellite observations were independently perturbed for each channel.
Figure 6 shows the horizontal and vertical distributions of changes in the analysis fields resulting from three separate perturbations, each applied to a different humidity channel. The spatial patterns of these changes in analysis fields aligned with the radiative transfer theory: an increase in brightness temperature corresponds to a decrease in humidity, resulting in less radiation absorption57. The vertical distribution showed progressive increases in the peak heights of the Jacobian functions for channels 18, 19, and 20, matched by corresponding increases in the peak heights of the humidity increments. This pattern suggests that the DA system effectively captures the varying detection altitudes of these channels. Additionally, flow-dependent characteristics were observed in the humidity field. The perturbation introduced at 05 UTC, 1 h before the analysis, generated changes in analysis fields mainly localized near the perturbation location, with a moderate eastward extension along the prevailing flow, consistent with downwind propagation. Supplementary Fig. 10 illustrates the changes in wind vector analysis fields, overlaid with relative humidity analysis fields. The perturbation results in increased northerly flow near the perturbed location. This change enhances the advection of drier air, characterized by lower relative humidity, into a more humid region. Consequently, the perturbation leads to a localized reduction in relative humidity, consistent with the results shown in Fig. 6.

The perturbation, located over the ocean near Typhoon Doksuri at 19.9° N, and 125.5° E (red dot), is introduced at 05 UTC, 1 h before the analysis time. The two rows show, in the left panel, the horizontal spatial distribution of the analysis changes for channels 18–20 at 600, 500, and 400 hPa, with wind fields overlaid, as well as the corresponding vertical distribution along the same west-east cross-section. The dashed lines on the second row indicate the pressure levels for the horizontal spatial distribution. The right panel shows the Jacobian functions for three humidity channels derived from ATMS aboard NOAA-20. The atmospheric profile is based on the US Standard Atmosphere, and radiative transfer calculations are performed using RTTOV version 13.2. In the wind vector plots, a long barb represents 4 m/s, a short barb 2 m/s, and a pennant indicates 20 m/s.
In summary, FuXi Weather effectively captures the horizontal and vertical dependencies of analysis changes on satellite observations without explicitly incorporating prior knowledge. Data denial experiments (Supplementary Information Section 5) further confirm FuXi Weather’s physical consistency with satellite observations, while additional tests demonstrate the robustness of its performance.
