Iterative optimization in information geometry and model compression: operator factorization

The ever-growing parameter counting of deep learning models requires effective compression techniques for deployment to resource-constrained devices. This paper explores the application of information geometry, the study of density-induced metrics in parameter space, and analyzes existing methods within the space of model compression, focusing primarily on operator factorization. Adopting this perspective highlights the core challenges. Define and project the optimal low metering submanifold (or subset). We argue that many successful model compression approaches can be understood as implicitly approximating information differences in this projection. We emphasize that when compressing pre-trained models, using divergence of information is most important to improve the accuracy of zero shots, but this may not be the case anymore if the model is fine-tuned. In such a scenario, the training of the bottleneck model turns out to be much more important to achieve high compression rates with reduced performance, and the adoption of an iterative method is required. In this context, we demonstrate the convergence of iterative singular thresholds for training neural networks, subject to soft-rank constraints. To further illustrate the usefulness of this perspective, we present a simple change in existing methods through soft rank reduction to improve performance under fixed compression ratios.