Introducing LETI: A New Language Model (LM) Fine-tuning Paradigm Exploring the Potential of LM Learning from Textual Interactions

Machine Learning


https://arxiv.org/pdf/2305.10314.pdf

As large-scale language models (LLMs) grow in popularity, new research and advancements are introduced almost daily. Using the power of deep learning technology and artificial intelligence, LLM is continuously evolving and pervasive across all domains. LLMs are trained on large amounts of raw text, and these models are fine-tuned to improve performance. During the fine-tuning process, LLMs are trained on specific tasks using direct training signals that measure performance such as classification accuracy, question answering, and document summarization.

Recently, a new fine-tuning paradigm called LETI (learning from text interactions) was introduced. This delves into the potential for large-scale language models to learn from textual interactions and feedback. LETI helps you understand not only if your language model is wrong, but why it’s wrong. This approach allows LLM to go beyond learning from labels and scalar rewards alone.

The team of researchers behind the development of LETI describes how this approach provides textual feedback to the language model. Use binary labels to check the correctness of model outputs and help identify and explain errors in generated code. The LETI paradigm is similar to the iterative process of software development, involving developers creating programs, testing them, and improving them based on feedback. Similarly, LETI fine-tunes LLM by providing textual feedback pinpointing bugs and errors.

🚀 Check out 100’s of AI Tools at the AI ​​Tools Club

During the fine-tuning process, the model is asked for a natural language description of the problem, after which a set of solutions is generated. Solution evaluators then evaluate these solutions using a set of test cases. The researchers used the error messages and stack traces obtained from the generated code using the Python interpreter as sources of text feedback. The solution evaluator is a Python interpreter.

The training data used to fine-tune the model consists of three components: natural language instructions, LM-generated programs, and text feedback. If the generated program fails to provide a solution, feedback will be provided to the LLM. Otherwise, reward tokens are provided to the model in the form of binary feedback to help generate accurate solutions. The generated text feedback is used in LM’s fine-tuning process known as feedback conditional fine-tuning.

For the evaluation process, the researchers used a dataset of code generation tasks called the MBPP (Multiple Big Programming Problem) dataset. The results showed that LETI significantly improved the performance of his two base LMs for different scales on the MBPP dataset without requiring ground truth outputs for training. On the HumanEval dataset, LETI achieves performance equal to or better than base LM for unseen problems. Furthermore, the researchers found that textual feedback allowed the model to achieve the same performance with fewer gradient steps compared to binary feedback.

In conclusion, LETI is a great approach for fine-tuning, using detailed text feedback to enhance language models. This allows you to learn from your mistakes and improve performance in tasks such as code generation. LETI looks promising.


Please check paper and GitHub link.don’t forget to join 21,000+ ML SubReddit, Discord channeland email newsletterShare the latest AI research news, cool AI projects, and more. If you have any questions regarding the article above or missed something, feel free to email us. Asif@marktechpost.com

🚀 Check out 100’s of AI Tools at the AI ​​Tools Club

Tanya Malhotra is a final year student at the University of Petroleum and Energy Research, Dehradun, with a Bachelor of Science in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
A data science enthusiast with good analytical and critical thinking, she has a keen interest in learning new skills, leading groups, and managing work in an organized manner.

➡️ Introducing Bright Data: The World’s #1 Web Data Platform



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *