Nvidia has announced major milestones in scalable machine learning: Xgboost 3.0, allowing you to train Gradient Boost Decision Tree (GBDT) models on one GH200 Grace Hopper SuperChip, from gigabytes to 1 terabyte (TB). This breakthrough allows businesses to process immense datasets for applications such as fraud detection, credit risk modeling, and algorithmic trading, simplifying the complex processes of scaling machine learning ML pipelines.
Breaking the barrier of terabytes
At the heart of this progress New external memory quantile dmatrix With xgboost3.0. Traditionally, GPU training was limited by adapting available GPU memory, capping achievable dataset sizes, or complex multinode frameworks to teams. The new release utilizes Grace Hopper Supertip Coherent Memory Architecture And super fast 900GB/S NVLINK-C2C Bandwidth. This allows for direct streaming of pre-compressed compressed data from host RAM to GPU, overcoming bottlenecks and memory constraints that previously required Ram Monster Servers or large GPU clusters.
Real World Benefits: Speed, Simplicity, Cost Reduction
Institutions like the Royal Bank of Canada (RBC) 16 times faster And a Reduced by 94% of total cost of ownership (TCO) For model training by moving the predictive analytics pipeline to GPU-powered XgBoost. This leap in efficiency is crucial for workflows with constant model tuning and rapidly changing data volumes, allowing banks and businesses to scale their capabilities faster and more as data grows.
How it works: External memory meets xgboost
The new external memory approach introduces several innovations.
- External memory quantile dmatrix:Private all functions into a bucket of cationic circumference, compress data in host RAM and stream as needed to maintain accuracy while reducing GPU memory load.
- Scalability of a single chip: One GH200 SuperChip with 80GB HBM3 GPU RAM and 480GB LPDDR5X system RAM is now capable of processing a complete TB scale data set.
- Simpler integration: For data science teams using Rapids, activating the new method is a simple drop-in and requires minimal code changes.
Technical Best Practices
- use
grow_policy='depthwise'For tree structures for best performance with external memory. - Runs with Cuda 12.8+ and HMM-enabled drivers to support full Grace Hopper support.
- Data Shape Problem: Number of rows (labels) is the main limiter of scaling.
Upgrade
Other highlights of Xgboost 3.0 include:
- Experimental support for Distributed external memory The entire GPU cluster.
- Reduced memory requirements and initialization times, especially for almost dense data.
- Support for category features, quantile regression, and SHAP explanability in external memory mode.
Industry impact
By introducing terabyte-scale GBDT training on a single chip, Nvidia democratizes access to large-scale machine learning for both financial and enterprise users, paving ways to reduce faster iterations, reduce costs and reduce IT complexity.
Mark Xgboost 3.0 and Grace Hopper Superchip together to mark major leap with scalable, accelerated machine learning.
Please check Technical details. Please feel free to check GitHub pages for tutorials, code and notebooks. Also, please feel free to follow us Twitter And don't forget to join us 100k+ ml subreddit And subscribe Our Newsletter.

Mikal Sutter is a data science expert with a Master's degree in Data Science from Padova University. With its solid foundations of statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.
