NVIDIA Introduces DoRA: A Better Way to Fine-Tune AI Models

Machine Learning




NVIDIA Introduces DoRA: A Better Way to Fine-Tune AI Models


NVIDIA has announced the development of a new fine-tuning technique called Weight-Decomposed Low-Rank Adaptation (DoRA), which is a high-performance alternative to the widely used Low-Rank Adaptation (LoRA). According to an NVIDIA technical blog, DoRA enhances both the learning ability and stability of LoRA without incurring additional inference overhead.

Benefits of DoRA

DoRA demonstrated significant performance gains on a range of large-scale language models (LLMs) and visual language models (VLMs). For example, in common sense reasoning tasks, DoRA outperformed LoRA with improvements such as +3.7 points on Llama 7B and +4.4 points on Llama 3 8B. Additionally, DoRA showed superior results on multi-turn benchmarks, image/video text understanding, and visual instruction coordination tasks.

This innovative method was accepted for an oral presentation at ICML 2024, demonstrating its reliability and potential impact in the field of machine learning.

How DoRA works

DoRA works by decomposing the pre-trained weights into their magnitude and direction components and fine-tuning both. This method leverages LoRA for direction adaptation and ensures efficient fine-tuning. After the training process, DoRA merges the fine-tuned components back into the pre-trained weights, ensuring no additional latency during inference.

By visualizing the magnitude and directional differences between DoRA and the pre-trained weights, we can see that DoRA makes significant directional adjustments with minimal changes in magnitude, closely resembling the full fine-tuning (FT) learning pattern.

Performance across models

DoRA consistently outperforms LoRA across a range of performance benchmarks. For example, in large-scale language models, DoRA significantly improves commonsense reasoning and conversation/instruction following. In visual language models, DoRA shows superior results in image-text and video-text understanding, and visual instruction alignment tasks.

Large-scale language models

Comparative studies highlight that DoRA outperforms LoRA in common sense reasoning and multi-turn benchmarks. In tests, DoRA achieved higher average scores across various datasets, demonstrating its robust performance.

Visual Language Model

DoRA also outperforms LoRA in visual language modeling, outperforming it in tasks such as image-text understanding, video-text understanding, and visual instruction alignment. The effectiveness of our method is evidenced by its higher average scores across multiple benchmarks.

Compressed LLM

DoRA can be integrated into the QLoRA framework to improve the accuracy of low-bit pre-trained models. Our collaboration with Answer.AI on the QDoRA project has shown that QDoRA outperforms both FT and QLoRA on Llama 2 and Llama 3 models.

Text to Image Generation

DoRA’s applications extend to text-to-image personalization using DreamBooth, achieving much better results than LoRA on challenging datasets such as 3D icons and Lego sets.

Implications and Future Applications

Compatible with LoRA and its derivatives, DoRA is quickly becoming the default choice for fine-tuning AI models. Its efficiency and effectiveness make it a valuable tool for adapting underlying models to various applications, including NVIDIA Metropolis, NVIDIA NeMo, NVIDIA NIM, and NVIDIA TensorRT.

For more information, see the NVIDIA technical blog.

Image credit: Shutterstock





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *