Benign Overfitting Use Case 6 (Machine Learning) | By Monodeep Mukherjee | Jul 2023

Machine Learning


Monodeep Mukherjee
  1. Benign overfitting of time-series linear models with overparameterization (arXiv)

Author: Shogo Nakakita, Masaaki Imaizumi

Abstract: The success of large-scale models in recent years has increased the importance of statistical models with large numbers of parameters. Several studies have analyzed overparameterized linear models using potentially non-sparse high-dimensional data. However, the existing results depend on the independent setting of the samples. In this study, we analyze linear regression models with dependent time series data under overparameterization settings. Considering the estimator by interpolation, we developed the theory of excess risk of the estimator. The estimator then derives risk bounds for uniform and uneven temporal correlation of each coordinate of the dependent data. The derived bounds show that the temporal covariance of the data plays an important role. Its strength affects risk bias and its non-degeneracy affects risk diversification. Moreover, in the case of heterogeneous correlation, the convergence rate of risk for short-memory processes is the same as for independent data, indicating that risk can converge to zero even for long-memory processes . Our theory can be extended to infinite dimensional data in a uniform way. It also provides some examples of specific dependent processes that can be applied to your configuration.

2. A Geometric Perspective on the Benign Overfitting Property of Minimal l2-Norm Interpolator Estimators (arXiv)

Authors: Guillaume Lecuet, Zhong Xiang

Summary: Practitioners have observed that some deep learning models fit perfectly and generalize well to noisy training data. [5,45,44]. Since then, many theoretical studies have clarified several aspects of this phenomenon. [4,2,1,8] known as benign overfitting. In particular, for linear regression models, the minimum l2-norm interpolant estimator β^ has received much attention. [1,39] This is because it has been proven to be consistent even when it fits perfectly to noisy data under some condition regarding the covariance matrix Σ of the input vectors. Motivated by this phenomenon, we study the generalization properties of this estimator from a geometrical point of view. Our main results extend and improve the convergence rate and deviance probability. [39]. Our proof differs from classical bias/variance analysis and is based on the self-induced regularization property introduced in . [2]: β^ can be written as the sum of the ridge estimator β¹:k and the overfitting component β^k+1:p after spatially decomposing the feature space Rp=V1:k⊕⊥Vk+1:p can. V1:k contains the top k eigenvectors of Σ and Vk+1:p contains the pk last eigenvectors. It also proves a lower bound consistent with the expected predicted risk. Two geometric properties of random Gaussian matrices that are central to our analysis are the Dvoretsky-Millman theorem and the isomorphic and restricted isomorphic properties. In particular, the Dvoretsky dimension, which appears naturally from a geometric point of view, agrees with the effective rank from [1,39] An important tool for dealing with the behavior of design matrices restricted to the subspace Vk+1:p where overfitting occurs



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *