Automated differentiation, a method for calculating functional gradients, has experienced a surge in importance supported by advances in both machine learning and scientific computing. Afif Boudaoud, Alexandru Calotoiu, Marcin Copik, and Torsten Hoefler, all from ETH Zurich, present a new system, DACE AD, that addresses the important limitations of existing approaches to this critical calculation. Current autodifferentiation frameworks often require code changes, fight performance requirements for scientific computing, require excessive memory storage, and force scientists to manually derive gradients of complex problems. DACE AD promises to overcome these challenges with new optimization algorithms, significantly improving performance on average by more than 92x with standard benchmarks without code changes, and accelerate progress in a wide range of computing fields.
Automated division of large-scale scientific models
This paper explores methods for enabling and optimizing automatic differentiation (AD) in large-scale scientific computing applications. This makes it practical and efficient for complex models and simulations used in areas such as weather forecasting, climate modeling, and machine learning possibilities. This study investigates methods such as checkpointing and rematerialization to reduce memory footprints and explore memory-efficient algorithms to reduce overall memory requirements. This study addresses how to parallelize advertising calculations to utilize modern hardware such as GPUs and multi-core CPUs, and automate and optimize AD by highlighting the use of compiler technologies through source-to-source conversion and expression optimization. Unlike current systems that require substantial code changes or are restricted to specific programming languages, DACE AD works without rewriting the code and supports programs written in Python, Pytorch, ONNX, and Fortran. Co-innovation lies in a new way to balance the trade-off between storing data and recalculating it during gradient calculations. Scientists have implemented this balance using Integer Linear Programming (ILP)-based checkpointing techniques, automatically determining the median values and recalculations to store to maximize performance within specific memory constraints.
This approach leverages DACE's stateful data flow multi-graph (SDFG) intermediate representation to facilitate tracked data movement and data flow analysis to be essential for efficient gradient calculations. The researchers integrated critical computational subgraphs into SDFG, allowing the creation of backward paths and the application of autodifferentiation to both sequential and parallel loops. Experiments show that DACE AD significantly outperforms the state-of-the-art JAX frameworks of the high-performance computing benchmark NPBench suite, achieving an average speedup of 4.1x geometric average of 92x. This system overcomes the limitations found in existing tools that normally support only a limited range of programming languages, and often requires considerable code changes. DACE AD requires zero code rewrite, achieves this through a new approach to streamline integration of diverse computational models and balance memory usage and reconstruction. Experiments show that DACE AD surpasses state-of-the-art JAX systems with an average coefficient of 92x in the NPBench benchmark suite.
This breakthrough is enabled by an innovative Integer Linear Programming (ILP)-based technique that automatically optimizes checkpoints, and wisely determines which median values to store and which median values to recalculate. The DACE AD architecture utilizes a data-centric intermediate representation called stateful data flow multi-graph (SDFG) that facilitates tracking data movements and enables efficient data flow analysis. This allows the system to effectively manage data overwriting, propagate gradients through loops, and optimize storage and recalculation of median values. The geometric average speedup across the NPBench suite is 4.1x, confirming the consistent performance improvements achieved by this new framework.
DACE AD surpasses Jax for scientific computing
DACE AD presents a new approach to automated differentiation. This is a critical technique for efficiently calculating the gradient of machine learning and scientific computing applications. This task addresses the limitations found in existing frameworks, such as limited programming language support and complex scientific code performance issues by implementing systems that do not require code changes. Co-innovation lies in algorithms that optimize the balance between data storage and recalculation, achieving significant performance improvements. Demonstrating its effectiveness, DACE AD surpasses the cutting-edge JAX framework by an average coefficient of 4.
1 Beyond a set of high-performance computing problems, sometimes 92 times. This improvement is achieved through a new automatic checkpoint strategy that allows you to automatically manage data storage based on user-defined memory limits and apply automatic differentiation to more complex programs. Future work can explore applications for hybrid AI4Science scenarios and leverage these advances for new computational possibilities.
👉Details
🗞 DACE AD: Unifying high-performance auto-differentiation for machine learning and scientific computing
🧠arxiv: https://arxiv.org/abs/2509.02197
