Understanding Non-Smooth Non-Convex Optimization Part 3 (Machine Learning Optimization 2024) | Written by Monodeep Mukherjee | April 2024

Machine Learning


Monodeep Mukherjee
Photo by Pramod Tiwari on Unsplash
  1. Opportunistic multiplication of proximal doglegs for non-convex and non-smooth optimization (arXiv)

Author: Zhou Yiming, Wei Dai

Summary: Consider minimizing a function consisting of quadratic and approximate terms that may be nonconvex and nonsmooth. This problem is also known as the scaled proximity operator. Despite their simple form, existing methods suffer from slow convergence, implementation complexity, or both. To overcome these limitations, we developed a fast and user-friendly quadratic approximation algorithm. Key innovations include constructing and solving a series of opportunistically majorized problems along the lines of hybrid he Newton. This approach uses the exact Hessian of the quadratic term directly and computes the inverse function only once, thus eliminating the need for iterative numerical approximations to the Hessian that are common in quasi-Newton methods. The convergence of the algorithm to a critical point is established, and the local convergence rate is derived based on the Kurdica-Rojasiewicz property of the objective function. Numerical comparisons are made for well-known optimization problems. The results show that the proposed algorithm not only achieves faster convergence but also tends to converge to a better local optimum compared to the benchmark his algorithm.

2. Zero-order gradient and quasi-Newton methods for non-smooth non-convex stochastic optimization (arXiv)

Author: Luke Marrinan, Uday V. Schanberg, Farzad Yousefian

Summary: Consider minimizing the Lipschitz continuous function and the expectation function defined as f(x)≜E.[f~(x,ξ)], on a closed convex set. Our focus is on obtaining both asymptote and rate and complexity guarantees for computing an approximate stationary point (in the Clarke sense) via a zero-order scheme. We adopt a smoothing-based approach that relies on minimizing fη, where fη(x)=Eu.[f(x+ηu)], u is a random variable defined on the unit sphere and η>0. It is observed that the stationary point of the η smoothing problem is his 2η stationary point of the original problem in Clark's sense. We develop two sets of schemes with promising empirical behavior in such environments. (I) We develop a smoothing-enabled variance-reduced zero-order gradient framework (VRG-ZO) and make two sets of contributions to the sequences generated by the proposed zero-order gradient scheme. (a) The residual function of the smoothed problem tends to be almost certainly zero along the generated sequence, allowing for the guarantee of an η-Clarke stationary solution of the original problem. (b) To compute x that guarantees that the expected norm of the residual of the η smoothing problem is within ε, we need no more than O(η−1ε−2) projection steps and O(η−2ε−4 ) function evaluations are required. (II) The second scheme is a zero-order stochastic quasi-Newton scheme (VRSQN-ZO) that relies on a combination of randomization and Morrow smoothing. The iteration and sample complexities corresponding to this scheme are O(η−5ε−2) and O(η−7ε−4), respectively.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *