Global 30-m annual median vegetation height maps (2000–2022) based on ICESat-2 data and Machine Learning

ICESat-2 data preparation

The ICESat-2 satellite Lidar, in a near polar orbit since September 2018, aims to measure the height of forests and other ecosystems worldwide among other aims. Onboard, the Advanced Topographic Laser Altimeter System (ATLAS) splits a green laser pulse (532 nm) into six beams, divided into three pairs with an energy ratio between strong and weak lasers of approximately 4:1¹⁴. The small footprint size (11 m) and close along-track sampling distance (0.7 m) provided by the photon-counting sensor provide multiple discrete measurements within each 20 m segment that could not be attained through unmixing of larger footprint Lidar waveforms¹⁶. The near-polar orbit also covers the full expanse of temperate and polar climatic zones, which are often dominated by short vegetation.

The ICESat-2 ATL08 product produces summary statistics for 100 m segments, limited statistics for 20 m segments, and heights of individual photons together with classification into noise, terrain, canopy and top of canopy (Fig. 1). We downloaded the version 6 of the product for the entire world²³, and based on reported normalized height, we calculated additional metrics for each 20 m segment, including mean height, median height, 95th percentile height, the total number and fraction of photons reflected from vegetation. Information regarding the total number of photons per 20 m segment and the total number of signal photons were also included. Each segment was further defined by: time of ICESat-2 data acquisition, date of acquisition, and whether the selected segment was scanned using the strong or weak beam of the laser. Time of day was divided into night/day depending on the solar altitude angle. Lastly, all 20 m segments were converted to Parquet format resulting in 24.8 billion individual data points (Fig. 2) which are publicly accessible at https://zenodo.org/records/15198654.

Field calibration

To test the application of ICESat-2 satellite Lidar to short vegetation ecosystems we designed an initial field trial to compare ICESat-2 height metrics to field measurements within cultivated grassland (e.g. pasture) systems. The study was designed to include a variety of measured vegetation heights within areas consistently defined as cultivated grassland. Field measurements were conducted within the Rio Vermelho watershed in Goiás state in Brazil in October of 2023. Forty (40) sample points were selected from the 2947 ATL08 100m segments collected within the year prior to field work spanning September 2022 through September 2023 (Fig. 2).

A secondary goal was to assess filters including season, time of day, and laser strength. The selection of sample points was based on equal divisions by season of ICESat-2 overpass (wet versus dry season) and vegetation structure. The laser strength and time of day were also noted for each location. For each location sampled, pasture and shrub heights were measured and tree heights were estimated along four 90 m north-south parallel transects with 20 m separation. Field measurements were aligned with ATL08 20 m segments to identify which field measurements were closest to the ICESat-2 field of view. Field vegetation measurements greater than 10 cm in height within each segment were included to calculate the mean, median, 95th percentile and maximum height of vegetation. All metrics were compared using the RMSE and the concordance correlation coefficient²⁴. This sample showed the lowest RMSE and highest CCC for median height (3.3 m and 0.19) compared to other metrics tested. The field data are publicly available with additional details on sample design and alignment with ICESat-2 segments in a Zenodo repository²⁵.

ICESat-2 filtering and sampling design

After our data preparation, about 24.8 billion 20 m ICESat-2 segments collected between January 2019 and October 2022 were available. To improve the overall quality of our training process, these segments were filtered according to all following criteria, established according to our field calibration and existing literature:

Total number of photons between 3 and 60,
Number of signal photons less than or equal to 35²⁶,
Number of vegetation photons greater than or equal to 3, and
photons acquired only at night time and by the strong beam.

Aiming to derive a comprehensive set of training data in both space and time, we randomly selected five (5) 20-segments per month and per 1 × 1 degree tile. This procedure was implemented worldwide and repeated ten (10) times, producing ten independent sample sets with a total of 32 million training points (Fig. 2)²⁷.

Further filtering was conducted using the GLAD global land cover²⁸ product. This product was overlaid with ICESat-2 segments and height quantiles of ICEsat-2 segments were calculated for each land cover class. Segments with height outside the 1st – 99th percentile height were removed, a total of 655,336 samples. Segments identified within “Terra Firme, True Desert”, approximately 13% of samples, were assigned a median height of 0.001 m, the minimum height detectable in byte format.

Landsat cloud-free aggregates

The primary Earth Observation data input for our median vegetation height model was global-scale historical Landsat cloud-free aggregates at 30 m spatial resolution²⁹. Spanning from 1997 to 2022, this dataset further processes the GLAD Landsat Analysis Ready Data (ARD) version 2³⁰ for (i) removing cloud and cloud shadow pixels, (ii) aggregating all 16-day images into bi-monthly temporal composites, and (iii) filling the missing values/data gaps using a time-series reconstruction approach (i.e. Seasonally Weighted Average). Landsat cloud-free aggregates provide complete, consistent and cross-calibrated images from Landsat 5 through Landsat 9 per year for all Landsat spectral bands (i.e. blue, green, red, Near-infrared — NIR, Short-wave infrared 1 — SWIR1, Short-wave infrared 2 — SWIR2, and thermal). Additionally, we derived several key Landsat-based vegetation and water indices, including Bare Soil Index (BSI)³¹, Enhanced Vegetation Index (EVI)³², the Modified Normalized Burn Ratio (NBR2), also called Normalized Difference Tillage Index (NDTI)³³, the Normalized Difference Vegetation Index (NDVI)³⁴, the Normalized Difference Water Index (NDWI)³⁵ and the near-infrared reflectance of vegetation (NIRv)³⁶. Each index was calculated using distinct linear combinations of reflectance bands, providing information about vegetation health, moisture levels, burn severity, and overall ecological conditions. In addition to spectral indices, the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) was derived considering the correlation with NDVI³⁷. All Landsat-derived index formulas utilized in our modeling are summarized in Table 1. The inclusion of Landsat spectral bands and indices in our modeling framework was motivated by the strong correlation of this data with vegetation structure²⁸, photosynthesis³⁶ and soil characteristics³⁸.

Table 1 List of Landsat-derived indices used by our model.

Long-term MODIS aggregates

In addition to Landsat, we used coarser spatial resolution layers that have previously shown strong correlation with vegetation height³⁹, specifically MODIS Land Surface Temperature and Emissivity (LST&E) (MOD11A2⁴⁰) and Land Aerosol Optical Depth products (MCD19A2 –⁴¹). These products, available at 1 km spatial resolution, were processed to produce monthly long-term aggregates for daytime and nighttime surface temperatures⁴² and column water vapor above the ground⁴³. The aggregates were computed by estimating the median value of all monthly composite from 2000 to 2022, and resampled to 30-m by cubic spline⁴⁴, resulting in 36 input MODIS features (i.e. 12 layers per input variable). The inclusion of long-term MODIS aggregates in our modeling framework was motivated by the strong correlation of temperature and water fluxes with global distribution of biomes⁴⁵.

Static raster datasets

The elevation data and terrain derivatives used in the modeling were obtained from the Ensemble Digital Terrain Model (EDTM) of the world at 30 m resolution⁴⁶, which integrated multiple sources, including ALOS AW3D⁴⁷, GLO-30⁴⁸, MERIT DEM⁴⁹, and various national DTMs. EDTM was used to compute slope, hillshade, positive and negative openness, and geomorphons at 30 and 60 m spatial resolution, using Equi7 spatial reference system⁵⁰ and the software GrassGIS⁵¹. In total nine (9) input features were used to provide terrain characteristics to our model framework, variables that show strong correlation with the distribution of plant functional traits⁵².

Finally, we incorporated geometric temperature transformations, estimated as functions of latitude, day of the year, and elevation, following the methodology of Kilibarda⁵³. These transformations generate monthly estimates of minimum and maximum temperatures, adding 24 new input features. By incorporating Earth’s geometric and temporal dynamics, these variables enable the model to differentiate between locations with similar temperature patterns but distinct latitudinal positions or seasonal aspects. This distinction is especially useful for capturing regional temperature variations.

Sample splitting and spacetime overlay

To prepare the samples for our modeling approach, we (i) ran a spacetime overlay with all harmonized EO data and (ii) split them into training, calibration and testing sets. During the spacetime overlay, the pixel values of the Landsat aggregates were associated with each sample by matching the location (i.e. geographical coordinates) and the ICESat-2 acquisition year with 84 Landsat features in a specific year (i.e. seven reflectance bands and seven spectral indices for six bi-monthly periods). For long-term MODIS aggregates (i.e. 36 spatial layers) and static layers (i.e. 24 geometric temperature and nine DTM layers), the overlay considered only the sample locations, resulting in a total of 153 input features for the modeling.

Overlaid samples were split first into 90% for training/calibration, and 10% for testing, using as group strata a regular grid of 1 × 1 degree tiles. Globally distributed, the regular grid was used for ensuring that all samples inside a specific tile were assigned to either the training/calibration or testing set. Within the tiles with model training/calibration samples, we then selected 10% of samples (e.g. 2.7 million samples) for model optimization (i.e. Feature selection and hyper-parameter tuning). We kept the testing set completely isolated from model training and optimization, which allowed us to conduct an independent and unbiased technical validation of our models on a global scale (Fig. 3).

Spatiotemporal model training and optimization

Our modeling approach considered an ensemble of ten Gradient Boosted Tree (GBT) models, optimized and trained on about 29 million ICESat-2 samples (i.e. training and calibration set) using LightGBM⁵⁴. To increase the diversity among the models, we implemented a bootstrap strategy⁵⁵ and randomly selected a different set of initial hyper-parameters for each model. The final predicted value was estimated by averaging all individual model predictions. The ten predicted values were also used to derive a prediction interval of 90% probability (i.e. 5th and 95th percentiles) of median vegetation height.

Optimization ran for each model separately using the calibration set (10% of bootstrapped samples), first by selecting the most important features by Recursive Feature Elimination — RFE⁵⁶, and later searching/tuning the model hyper-parameters by Successive Halving — SH⁵⁷ to maximize model performance. RFE considered a standard Random Forest model with 100 trees, measurement of quality of split by Poisson criterion and default values for other hyper-parameters (fitted using scikit-learn⁵⁸). As most median vegetation heights are concentrated between 1 to 5 meters at global scale, the ML models need to address the very skewed distribution of the target variable, which requires the usage of Poisson criterion in the training⁵⁹. For each RFE iteration, the 7 least important features were removed according to gini importance, resulting in 40 features as the final selection (i.e. about 26% of the total number of features).

The selected features were then used to run SH using scikit-learn⁵⁸, for iteratively assessing different combinations of hyper-parameter candidates bounded by a customized search space. This assessment used an early stopping strategy (on half of the calibration set) and five-fold spatial blocking cross-validation (based on 1 × 1 degree tile) to minimize the risk of over-fitting⁶⁰. The implemented SH started with about 113,000 samples, selecting the best candidates (i.e. dropping half of the less accurate candidates) and doubling the number of samples per iteration until reaching the full size of the calibration set (i.e. about 2.7 million samples). We used the D² regression score metric (see equation (1)—⁶¹) to select the most accurate candidates, which is suitable for evaluating skewed distributions (e.g. Poisson distribution). Once the last iteration was done, the hyper-parameters with high D² were selected for each ML model. The optimization found ten sets of hyper-parameters, which were used to train the ten final models with 90% of the total number of samples (i.e. train and calibration set combined).

Model performance

For estimating the model performance of median vegetation height modeling, the testing set (i.e. 10% of samples, completely isolated tiles/group strata) was used to calculate the D² regression score (see equation (1)⁶¹), Root Mean Square Error (RMSE), R² and Concordance Correlation Coefficient (CCC) in transformed space (see equation (3)⁶²).

$${D}^{2}=1\,-\,\frac{{D}_{model}}{{D}_{null}}$$

(1)

$$D({y}_{i},{\widehat{y}}_{i})=\frac{1}{n}\mathop{\sum \,}\limits_{i=1\,}^{n}[{y}_{i}\log \frac{{y}_{i}}{{\widehat{y}}_{i}}\,-\,({y}_{i}\,-\,{\widehat{y}}_{i})]$$

(2)

$$CCC=\frac{2\cdot Cov\,({y}_{ln1},{\widehat{\widehat{y}}}_{ln1})}{Var({y}_{ln1})+Var({\widehat{\widehat{y}}}_{ln1})+{(\overline{{y}_{ln1}}-\overline{{\widehat{y}}_{ln1}})}^{2}}$$

(3)

where:

D_model = Mean Poisson deviance of the fitted model

D_null = Mean Poisson deviance of the null model, which predicts the mean $\bar{y}$ of the observed values.

$Cov(y,\widehat{y})$ = Covariance between the observed and predicted values

Var(y) = Variance of the observed values

$Var(\widehat{y})$ = Variance of the predicted values

y_i = Observed value

${\widehat{y}}_{i}$ = Predicted value

y_ln1 = Natural logarithm of 1 + observed value

${\widehat{\widehat{y}}}_{ln1}$ = Natural logarithm of 1 + predicted value

n = Number of samples

Global predictions and mosaicking

Global predictions were produced per 1 × 1 degree tile and on a yearly basis from 2000 to 2022, resulting in annual values of median vegetation height at 30 m spatial resolution. Although the ML modeling focused on open ecosystems, we extend predictions to all vegetated global land areas, excluding only pixels mapped as stable water according to UMD GLAD GLCLUC²⁸, an effort that enables a broader analysis of spatiotemporal predictions. The ten GBT models were compiled to a native C binary using lleaves⁶³, reducing the prediction time by a factor of 5. Per pixel prediction intervals (5th and 95th percentiles) were estimated through the individual predicted values retrieved from the final ensemble GBT models.

The entire processing workflow was executed on a High-Performance Computing (HPC) system, with tasks distributed across processing nodes using SLURM⁶⁴ and Docker containers⁶⁵. Generating the final predictions required approximately 90,300 CPU hours and 2.9 terabytes of RAM. The predicted tiles were then mosaicked into Cloud-Optimized GeoTIFF (COG) files and made publicly accessible via Google Earth Engine and SpatioTemporal Asset Catalog (STAC).

Comparison with existing products

The comparison with existing vegetation height products considered a sampling design previously used to acquire land cover reference samples for mapping grasslands worldwide, based on feature space coverage sampling and composed of 7,005 regular tiles (i.e. 1 × 1 km Global Pasture Watch (GPW) tiles)^4,66. We retrieved all ICESat-2 ATL08 data for these tiles, resulting in 3,576 globally distributed unique locations with reference median vegetation height values (including 103,549 ICESat-2 20 m segments), which were excluded from our ML modeling. These tiles were then aggregated by grasslands (60,078 segments in 1675 tiles), forested ecosystems (18,004 segments in 1304 tiles), and all land cover types to further explore the representation of our model at global and continental scales. Additionally, the independent satellite Lidar data collected by the GEDI instrument and aggregated to 1 km resolution⁹ was compared with our predicted height values. From all GEDI height metrics (i.e. based on the full energy waveform captured by the sensor), we selected the median and top of canopy (98th percentile height) for direct comparison with our modeling results. Finally, we compared the predicted height values with two products modeling top of tree canopy height, the ETH Global Sentinel-2 10 m Canopy Height (Sentinel Canopy Height)⁶⁷ and High Resolution Canopy Height Map by WRI and Meta (WRI/Meta Height Model)⁸, which have 10 m and 1 m spatial resolution, respectively. Additional details of the product version and Google Earth Engine (GEE) asset IDs used in this analysis are available in Table 2.

Table 2 Vegetation and Canopy Height Datasets.

Trend analysis

Aiming to identify hotspots of vegetation dynamics, related for example to deforestation, restoration and shrub encroachment, we ran a per-pixel trend analysis on all 23 years (2000–2022) of median vegetation height predictions. For each pixel, a linear regression was trained (between time and vegetation height) and a Mann-Kendall test was performed to detect monotonic trends on median vegetation height⁶⁸. The trend represents the average median height change in meters per year. All significant trend values (i.e.P < 0.05) were retained and mosaicked into Cloud-Optimized GeoTIFF (COG) files (Fig. 4). The final trend map is publicly accessible via Google Earth Engine and SpatioTemporal Asset Catalog (STAC).