Forecasting onshore wind generation in european bidding zones using deep learning

Two data sources are used to develop our models, the first is onshore wind generation data per BZ, retrieved from the ENTSO-E Transparency Platform API. The second is weather data, used as input to the model, sourced from Open-Meteo. The data covers four years, from 2020-01-01 to 2024-12-31, at an hourly resolution. BZs have a maximum of 43,844 observations, with the minimum in this dataset being in NL at 42,004, when combined the dataset contains 36 BZs and 1.577 million observations. We then clean the data by removing NaNs, dropping 6131 values, and periods of 6 hours or more in which generation was registered as zero or static were removed, dropping 2238 values. We also remove extreme low generation outliers where wind speed is recorded as above the mean BZ wind speed and BZ WEP output is less than 1% of installed capacity. This removed 1,805 observations. In total, less than 1% of the data was removed and we have a final dataset of 1.567 million observations used for training, validation, and testing.

As noted in previous research^13,14, changes in installed capacity over time can cause incorrect forecasts. We account for these changes using a ‘load factor’, normalising MW generation by installed capacity. This maps generation from BZs to the same scale; BZ WEP varies due to installed capacity differences, ranging from 200MW in LU to 30,000MW in ES. This also serves to de-trend generation data. Periods in which normalised generation was greater than 1 were interpolated to the mean of the day, treating 3,708 values.

Finally, the set of BZs is heterogeneous in geographic size and windfarm layout so features that help control for these differences were created. First, the details of the neural network architecture used for the 36-BZ RNN and individual RNNs are provided. The model is provided as open-source but we hope that other researchers will build on the methodology developed here.

Model Architecture

As our data is time-series data, RNN architectures as in^4,26 are used here. When making predictions the model outputs a full day of 24 timesteps in one sequence. We use the same architecture in our individual BZ RNNs in Experiment 2. Full model details are provided in Table 1. The 36-BZ model is significantly larger than the individual BZ models because more training data allows for a larger network to be trained without overfitting. The model is also released open source and can be accessed at our GitHub repository²⁷.

Table 1 Comparison of individual BZ RNNs and 36-BZ RNN architectures

ENTSO-E Bidding Zone Wind Generation Data

ENTSO-E’s Transparency Platform has been referred to as ’possibly the most ambitious platform for power system data globally’²⁸. The EU requires that market actors, such as generators, retailers and traders publish data on the platform to reduce insider trading and increase transparency. It contains generation data for all fuel types across ENTSO-E members, from Hydro to Nuclear, although this study uses only onshore WEP data. ENTSO-E has 36 member countries and 40 TSOs which are further divided into 49 BZs. Although it is one BZ, Germany has 4 ’control zones’ each managed by separate TSOs; Amprion, TenneT, TransnetBW and 50Hertz. As the largest onshore wind producer in Europe, Germany was divided into its control zones with these being treated as separate BZs in our dataset, similar to other multi-BZ countries.

BZ level hourly MW scale wind generation data was first retrieved from ENTSO-E’s open transparency platform’s API. BZs with greater than 5% missing values were excluded. The capacity data for remaining BZs was then retrieved from ENTSO-E. BZs with less than 150MWs of capacity or too few windfarms were also removed, these BZs exhibited highly dissimilar production patterns and greater instances of data errors compared to higher capacity BZs. The following BZs were removed CH, NO4, NO5, LV, SI, IS, UA, SK, RS, BK, AL, MK, IT SUD, IT NORD, and IT CNOR. A complete list of the BZs, their codes, full names and capacities can be seen in Fig. 5.

Official capacity data were not available for all BZs from ENTSO-E; Sweden and Italy both needed their BZ’s capacity estimated. Capacity for BZs with missing data was estimated using capacity-utilisation statistics from BZs with known values. Specifically, the mean and 95th-percentile utilisation levels derived from BZs with reported capacity were applied to estimate the missing capacities. The mean and 95th percentile of utilisation of the remaining BZs was then used to estimate Swedish and Italian BZ capacity data.

Previous research notes that changes in installed capacity over time need to be accounted for by processing generation data as a load factor^13,14. Figure 5 shows changes in installed capacity for each BZ over the 4 year period, growing by 28.08% from a total of 157,391MW to 201,598MW. All BZs, except for DK2 and IT SD, experience positive or flat capacity growth. The largest growth occurs in FI, where capacity increases by 226%, while 4 BZs, HU, CZ, BG, and DE BW experience no or negligible changes. The difference in capacity in 2024 between the smallest, LU, and the largest BZ, ES, is approximately 150-fold, showing large variations in system sizes across BZs. Official capacity data for BZs marked [ES] was not available from ENTSO-E and needed to be estimated. We did this by finding the load factor at the 95th quantile of production for the 28 BZs where capacity data was available in each year as a fraction of their installed capacity, then estimated the capacity of 8 BZs where capacity data was not provided with these values.

To ensure consistency in scale between BZs and de-trend generation data, MW generation data was normalised by installed capacity. To limit abrupt changes in normalised generation stemming from our normalisation process, within and between years, the changes in capacity were interpolated through each year. Year to year changes are distributed equally across the first of each month if yearly capacity growth is less than 20%, and twice monthly if greater than 20%. We then adjust capacity values in each year to stabilise the normalised generation distributions between the training, validation, and testing datasets. This process is shown for FI in Fig. 6, which saw the largest capacity growth.

**Fig. 6: Load Factor Normalisation Process.**

Weather Data and Weather Data Processing – Open Meteo

ENTSO-E WEP data characterises a BZ’s total generation from all windfarms. Some BZs cover substantial geographic areas, several comprise entire countries, i.e. ES, GR, FR, and based on queried infrastructure data have up to 1429 or as few as 18 windfarms. To accurately represent weather conditions within each BZ, without introducing noise as seen in NWP approaches^4,14,21, windfarm locations across Europe were identified. Finding windfarm coordinates at this scale required first extracting locations of windfarms from OpenStreetMap²⁹ by querying turbine positions then clustering turbines into farms, defined as groups of at least 3 turbines within a 2km linked distance between them. Once farm coordinates have been identified, the Open-Meteo API was called using the cluster centroid coordinates of each windfarm for the relevant weather features from Open-Meteo. Open-Meteo datasets provide weather data grids down to a 9km spatial and hourly temporal resolution, based on interpolated weather station and satellite data.

The weather variables collected are summarised in Table 2. In addition, the variable wind power density (WPD) was created, which characterises the retrievable power from atmospheric conditions and is derived from wind speed v, surface pressure p, and temperature T (in K), together with the specific gas constant for dry air R_air¹⁴. The WPD formulation and the expression for air density are given by equation (1) and (2) respectively:

$${\rm{WPD}}=\frac{1}{2}\,\rho \,{v}^{3}$$

(1)

$$\rho =\frac{p}{{R}_{{\rm{air}}}\,T}$$

(2)

Table 2 Included Forecasting Features

As each BZ contains a different number of windfarms, our weather data query returns varying numbers of time series by BZ. To train an RNN on 36 BZs combined data requires consistent feature representations between BZs, so the weather data of all windfarms was aggregated by taking the mean, median, and standard deviation for each hour for each variable over a BZ’s farms. This creates a uniform set of features that are informationally consistent across BZs rather than having an inconsistent number of features depending on BZ system topology and capacity, resulting in 20 weather features for each BZ at each hour summarised in Table 2. Additionally, the largest and second largest farms by turbine count in each BZ were identified and had their wind speed and wind gust speeds at these farms included. 1,805 instances were dropped when load-factor normalised generation was less than 0.05 if mean BZ wind speeds were equal to or greater than 30km/h.

Figure 7 shows the mean, minimum, and maximum wind speed for each BZ’s mean hourly wind speed time series. The highest mean wind speeds are located in BZs situated close to the coast, showing they benefit from coastal wind patterns. These are DK1, DK2, NL, and IE. A similar pattern is seen for the maximum wind speed, whereas minimum wind speeds are seen in larger BZs such as FR and ES.

**Fig. 7: BZ Wind Speed, Mean, Minimum, and Maximum Map.**

Historical weather conditions are used as a proxy for forecasts in this study. Historical values are shifted to simulate forecasts. Modern weather forecasts are highly accurate, typically within 3m/s for wind speeds and 1^∘C on temperatures over a 24-hour horizon depending on terrain^30,31. These features provide similarly informative features for the prediction horizon. Historical weather conditions alone are insufficient to forecast WEP for the day-ahead, given insufficiently strong autocorrelation⁷, and would not allow for representative testing of day-ahead forecasting for our DNNs.

System Descriptor Features and Total Features Summary

Geographic and system variations affect regional WEP and forecasting accuracy³², motivating the inclusion of engineered spatial and system features to contextualise weather conditions within and between BZs. These features include min-max normalised capacity over time and the average nearest neighbour distance, i.e. the mean distance (in km) between farms in a BZ. As shown in Fig. 8, turbines can be heavily concentrated in a few farms as capacity decreases, leading to mischaracterisation when spatially aggregated features fail to capture conditions at high-output locations. To address this, wind and gust speeds at the two largest farms are included as additional input features, capturing deviations from the regional average during periods when dominant farms diverge from prevailing conditions. Finally, the turbine Gini was computed, measuring how evenly distributed turbines are across farms as a gauge of how capacity is spread across a BZ’s windfarms.

Table 2 shows the full set of 31 features employed for all RNNs, the ARIMAX model, and Random Forest model used in this study, with 20 weather variables and 11 system related features. Combined, these features give the RNNs a comprehensive overview of weather conditions, and assist the model in handling the highly heterogeneous aspects of BZ WEP forecasting.

Source link

binance konto commented on AI And The Channel: It’s Go Time: Thanks for sharing. I read many of your blog posts
小艾彩票平台 commented on Create the content you envision: Hello, for all time i used to check blog posts her
天天官网 commented on 10 AI Applications to Streamline Business and Customer Experiences: After looking into a few of the blog posts on your
免费Binance账户 commented on Foreshadowing Biden’s AI Executive Order? — AI: The Washington Report | Mintz: Can you be more specific about the content of your
注册免费账户 commented on Book Review: “How AI Work: From Sorcery to Science” by Ronald T. Kneusel: I don't think the title of your article matches th

Forecasting onshore wind generation in european bidding zones using deep learning

Model Architecture

ENTSO-E Bidding Zone Wind Generation Data

Weather Data and Weather Data Processing – Open Meteo

System Descriptor Features and Total Features Summary

RECENT POSTS

Advanced AI and ML courses to learn machine learning, deep learning in 2026

Ethical use of artificial intelligence in Bangladeshi news media

Lip Sync AI launches free AI video creation platform for creators

Model Architecture

ENTSO-E Bidding Zone Wind Generation Data

Weather Data and Weather Data Processing – Open Meteo

System Descriptor Features and Total Features Summary

Related Posts