Introducing LOw-Memory Optimization (LOMO): A new AI optimizer that fuses gradient computation and parameter update in one step to reduce memory usage

AI and ML Jobs


https://arxiv.org/abs/2306.09782

Large language models have revolutionized natural language processing by showcasing amazing skills such as emergence and glocking, and continuously increasing model size. The bar in NLP research is raised by training these models with billions of parameters, including 3 billion to 17.5 billion parameters. Tuning LLM frequently requires expensive GPU resources, such as his 880GB machine, making it difficult for small labs and companies to participate in this area of ​​research. Recently, parameter-efficient fine-tuning techniques such as LoRA and prefix tuning enabled his resource-constrained LLM tuning.

Full parameter fine-tuning is considered a more effective strategy than parameter-efficient fine-tuning, but both techniques should provide viable solutions. They want to investigate how to complete comprehensive parameter fine-tuning in resource-limited situations. They look at activations, optimizer states, gradient tensors, parameters (his four properties of LLM’s memory utilization), and next he optimizes the training process in three ways. 1) Re-evaluate the optimizer’s algorithmic capabilities and find that SGD is a good replacement for the optimizer. Tweak the full parameters of LLM. SGD does not maintain intermediate stages, so the entire optimizer state can be removed. 2) Their proposed optimizer LOMO reduces the memory usage of the gradient tensor to O, which is equal to the memory consumption of the largest gradient tensor, as shown in Figure 1. 3) Stabilize mixed-precision training with LOMO by incorporating gradient normalization and loss scaling and switching certain computations to full precision during training. Their method combines the same amount of memory as parameters, activations, and maximum gradient tensors.

These greatly increase the memory consumption of full parameter fine-tuning and bring it down to the inference level. This is because the forward process alone must not require less memory than the backward process. In particular, the parameter update process is similar to SGD, ensuring that fine-tuning capabilities are not compromised when using he LOMO to save memory. By empirically evaluating LOMO’s memory and throughput capabilities, researchers at Fudan University demonstrated that using LOMO successfully trained his 65B model on just eight of his RTX 3090 GPUs. Did. Furthermore, using LOMO he tunes the entire parameters of his LLM on the SuperGLUE dataset collection and validates the downstream performance of the proposed approach. Empirical results show how well he LOMO performs in optimizing He LLM using many parameters.

https://arxiv.org/pdf/2306.09782.pdf
🚀 Check out 100’s of AI Tools at the AI ​​Tools Club

Here are their overall contributions:

• They provide theoretical studies suggesting that SGD can successfully tune all LLM parameters. The roadblocks that once held back SGD adoption may become less severe when optimizing LLM.

• Proposes LOMO (Low Memory Optimization), which significantly reduces GPU memory utilization while preserving the fine-tuning process.

• Through careful analysis of memory utilization and throughput performance, we empirically demonstrate the effectiveness of LOMO in optimizing LLM in resource-constrained situations. Performance evaluation of downstream jobs further justifies this.

A code implementation is available on GitHub.


Please check Paper and Github link. don’t forget to join 25,000+ ML SubReddit, Discord channeland email newsletterShare the latest AI research news, cool AI projects, and more. If you have any questions regarding the article above or missed something, feel free to email me. Asif@marktechpost.com


Featured tools:

🚀 Check out 100’s of AI Tools at the AI ​​Tools Club

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his Bachelor of Science in Data Science and Artificial Intelligence from the Indian Institute of Technology (IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and he is passionate about building solutions around it. He loves connecting with people and collaborating on interesting projects.

🔥 StoryBird.ai added some great features. Generate illustrated stories from prompts. Check here. (with sponsor)



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *