The quest for universal foundational models in scientific machine learning (SciML) faces a critical bottleneck: negative transfer. This phenomenon has hindered the plasticity of dense neural operators, as training across different physical domains, such as fluid dynamics and porous media flow, induces gradient conflicts and optimization instability. The incompatible spectral and geometric demands of these different physics pose significant challenges to a single dense parameter path.
Visual TL;DR. SciML bottlenecks lead to dense operator failures. The bottleneck in SciML leads to the Shodh-MoE architecture. The Shodh-MoE architecture provides compressed potential. Shodh-MoE architecture leads to intra-tokenizer velocity. The speed within the tokenizer is physically enabled. Shodh-MoE architecture leads to interference rejection. Breaking through interference leads to Universal SciML.
SciML Bottleneck: Negative Transfer from Training Across Different Body Regimes
Dense operator failure: Gradient conflicts occur due to incompatible spectral/geometric demands.
Shodh-MoE architecture: a new sparsely activated latent transformer for multiphysics transfer
Compression potential: 16^3 physical potential generated by physics-based autoencoder
Intratokenizer velocity: Helmholtz-style parameterization constrains decoded states
Physically Valid: Guarantees a divergence-free velocity manifold for the decoded state
Break Interference: Resolve multiphysics interference due to sparse activations.
Universal SciML: Realize basic models with guaranteed physical properties
Visual TL;DR
Eliminating multiphysics interference with sparse activation
Ellwil and Arastu Sharma introduce the Shodh-MoE architecture, a new sparsely activated latent transformer designed to tackle multiphysics transfer. This approach utilizes a compressed 16^3 physical latent generated by a physically-informed autoencoder. A key innovation is the Helmholtz-style velocity parameterization within the tokenizer. This restricts the decoded states to physically valid divergence-free velocity manifolds. This not only ensures accurate mass conservation, but also achieves a physically verifiable velocity divergence of approximately 2.8 x 10^-10, post-verified in FP64 on a 128^3 grid.
Autonomous domain branching with expert routing
The core of Shodh-MoE’s effectiveness lies in its Top-1 soft semantic router. This component dynamically assigns localized latent patches to specialized expert subnetworks. This dynamic routing allows for separate parameter paths tailored to the unique physical mechanisms of different domains, while maintaining shared expertise to achieve universal physical symmetry. Telemetry revealed autonomous bifurcations during a large-scale distributed pre-training run. Tokens from the open channel hydrodynamics domain were routed only to expert 0, and porous media flow tokens were routed only to expert 1. This architectural mechanism allowed simultaneous convergence across both regimes, achieving low-latency verification MSEs (2.46 x 10^-5 and 9.76 x 10^-6) and decoded physical MSEs. (2.48 x 10^-6 and 1.76 x 10^-6).