Meta AI and Samsung Researchers Introduce Two New AI Methods for Learning Rate Adaptation, Prodigy and Resetting, to Improve the Adaptation Rate of State-of-the-Art D-Adaptation Methods

Modern machine learning relies heavily on optimization to provide effective answers to difficult problems in fields as diverse as computer vision, natural language processing, and reinforcement learning. The difficulty of achieving fast convergence and a high-quality solution is highly dependent on the chosen learning rate. Tuning the learning rate becomes more difficult for applications with a large number of agents, each with its own optimizer. Some manually tuned optimizers perform well, but these methods typically require expert skill and painstaking work. Therefore, “parameter-less” adaptive learning rate methods, such as the D-Adaptation approach, have gained popularity in recent years for learning rate-free optimization.

In order to improve the worst-case non-asymptotic convergence rate of the D-Adaptation method, the research team of Samsung AI Center and Meta AI introduced two unique modifications to the D-Adaptation method, called Prodigy and Resetting, to improve the convergence rate. This leads to higher efficiency and better optimization. output.

The authors introduced two new modifications to the original method to improve the worst-case non-asymptotic convergence rate of the D-Adaptation method. Improve algorithm convergence speed and solution quality performance by fine-tuning the adaptive learning rate technique. To validate the proposed tuning, a lower bound for the approach of tuning the distance to the solution constant D is established. Moreover, compared to other methods where the iteration increase is exponentially bounded, we show that the enhanced approach is worst-case optimal up to a constant factor. Extensive testing was then performed to show that the augmented D-Adaptation method rapidly adjusts the learning rate, yielding excellent convergence rates and optimization results.

🚀 Check out 100’s of AI Tools at the AI Tools Club

The team’s innovative strategy involves fine-tuning the error term in D-Adaptation with an Adagrad-like step size. Researchers can confidently take larger steps while keeping key error terms intact, allowing the improved method to converge more quickly. If the step size denominator becomes too large, the algorithm slows down. So just to be sure, we add a weight next to the gradient.

The researchers used convex logistic regression and the proposed method to solve a serious learning problem in an empirical study. Through multiple studies, Prodigy has been shown to be adopted faster than other known approaches. D-Adaptation with reset reaches the same theoretical speed as Prodigy while adopting a much simpler theory than Prodigy and D-Adaptation. Moreover, the proposed method can often outperform the D-Adaptation algorithm and achieve test accuracy comparable to manually adjusted Adam.

Two recently proposed methods surpass state-of-the-art D-adaptive approaches for learning rate adaptation. Extensive experimental evidence indicates that Prodigy, a weighted D-Adaptation variant, is more adaptable than existing approaches. The second method, D-Adaptation with reset, has been shown to match the theoretical pace of Prodigy with much less complex theory.

please check out paper.don’t forget to join 25,000+ ML SubReddits, Discord channeland email newsletterShare the latest AI research news, cool AI projects, and more. If you have any questions regarding the article above or missed something, feel free to email us. Asif@marktechpost.com

🚀 Check out 100’s of AI Tools at the AI Tools Club

Dhanshree Shenwai is a computer science engineer with extensive experience in FinTech companies covering the fields of finance, cards and payments, and banking, with a strong interest in AI applications. She is passionate about exploring new technologies and advancements in today’s evolving world to make life easier for everyone.

🔥 Unleash the power of live proxies: private, undetectable residential and mobile IPs.

Source link