Drop-in perceptual optimization for 3D Gaussian splatting

Even though the output is ultimately consumed by a human viewer, 3D Gaussian splatting (3DGS) techniques often rely on ad hoc combinations of pixel-level losses, resulting in blurry rendering. To address this, we systematically explore perceptual optimization strategies for 3DGS by exploring different sets of distortion losses. We conduct a first-of-its-kind large-scale human subjective study of 3DGS, including 39,320 pairwise evaluations across multiple datasets and 3DGS frameworks. The normalized version of Wasserstein Distortion (called WD-R) emerges as the clear winner, excelling at restoring fine textures without increasing the number of splats. WD-R is preferred by evaluators with more than 2.3 times the original 3DGS loss and more than 1.5 times the current best method, Perceptual-GS. WD-R also consistently achieves state-of-the-art LPIPS, DISTS, and FID scores across a variety of datasets and generalizes across recent frameworks such as Mip-Splatting and Scaffold-GS. Replacing the original loss with WD-R consistently improves the perceived quality within a similar resource budget (number of splats for Mip-Splatting, model size for Scaffold-GS), leading to a more preferred reconstruction. Human raters were 1.8x and 3.6x, respectively. We also found that this carried over to the task of 3DGS scene compression, resulting in approximately 50% bitrate savings for comparable perceptual metric performance.

† New York University (Tandon School of Engineering)
‡ Equal contribution

Diagram of the 3D Gaussian splatting representation and compression framework showing optimization using 2D distortion and rate-distortion objectives with perceptual loss components.

Figure 1: Optimized 3DGS representation and compression framework using 2D distortion and rate-distortion objectives. Perceptual loss is included as part of the training framework.

Graph showing Bayesian Elo scores comparing 3D Gaussian splatting representation methods across indoor, outdoor, and composite scene benchmarks. WD-R and WD achieve the highest scores.

Figure 2: Bayesian Elo scores indoor scenes (deep blending, indoor Mip-NeRF 360), outdoor scenes (tanks and temples, outdoor Mip-NeRF 360, BungeeNeRF), and a 3DGS representation method for all scenes combined. WD-R and WD achieve the highest scores for all settings (within 95% confidence intervals).

Source link