This AI paper introduces SELF-REFINE: a framework for improving initial output from LLMs through iterative feedback and refinement

AI and ML Jobs


Iterative improvement is an important aspect of human problem solving. Iterative refinement is the process of creating an initial draft and improving it through self-feedback. For example, when writing an e-mail to a colleague to request a document, a person first uses a simple request such as “Data now, please.” However, after some thought, the author realized that this phrase might be considered unfriendly, so he was able to change it to “Could you please provide your data?” Using iterative feedback and correction, they show in this study that large-scale language models (LLMs) can successfully mimic this cognitive process in humans.

LLM can produce coherent output in the early stages, but when dealing with more complex requirements, especially multi-purpose tasks (making responses relevant, engaging, and safe). ), it is often inadequate when dealing with interactive responses using criteria such as The goal is not clear (e.g. improve program readability). Modern LLMs sometimes produce readable output in such cases. Still, iterative improvement is required to ensure that all quota requirements are met and an appropriate level of quality is achieved.

Advanced methods that rely on third-party reward and monitoring models call for either huge amounts of training data or expensive human annotation. These shortcomings highlight the need for more adaptable and efficient text generation methods that can be used for many jobs with little supervision. In this study, researchers from CMU, Allen Labs, University of Washington, NVIDIA, UCSD, and Google Research found that SELF-REFINE overcomes these constraints and enables human creative thinking without costly human feedback loops. We propose to reproduce the production process better. (Figure 1).

Figure 1: The first step in SELF-REFINE is to take the originally produced output (0) and pass it back to the same model M (1) to receive the feedback (2). Feedback on the initial output is then fed back to the model (3), and the model iteratively adjusts (0) the originally generated output. Without human help, SELF-REFINE is instantiated using a powerful language model like GPT-3.5.
🚀 Join the fastest ML Subreddit community

The two halves of SELF-REFINE, FEEDBACK and REFINE, work together in an iterative cycle to produce high-quality results. They send the same model M (1), the first draft output produced by model M (0), and receive feedback (1). The same model (3) is given feedback on the original production and iteratively improves (0) the originally produced output. This iterative iteration continues until the model determines that no further improvements are needed, at which point the process ends. The central thesis of this work is that in the small-shot situation, the same underlying language model handles feedback and refinement.

SELF-REFINE provides a first iteration strategy to effectively utilize NL feedback to enhance generation.

Figure 1 shows an example procedure. They used his SELF-REFINE to complete a variety of tasks across many domains, including review rewriting, acronym creation, limited generation, narrative generation, code rewriting, response generation, Ask for feedback and revision techniques, such as elimination of toxicity. Their core components are instantiated using a few-shot prompt strategy. This allows you to immediately start training your model with a few instances. An iterative approach involving experiments, component analysis, various tasks, generation of useful feedback, and stopping criteria aims to guide future research in this area.

In a nutshell, their contributions are:

  1. To help LLMs perform better on a variety of tasks, we propose SELF-REFINE, a unique technique that uses iterative feedback to improve results. Unlike previous efforts, their method requires a single His LLM using reinforcement learning or supervised training data.
  2. They conducted large-scale experiments on seven different tasks (review rewriting, acronym generation, story generation, code rewriting, response generation, constrained generation, and toxicity removal) and found that SELF-REFINE It shows a performance improvement of at least 5% and sometimes more. 40% better than generating directly from powerful generators such as GPT-3.5 and GPT-4.

check out paper, code and plan. All credit for this research goes to the researchers of this project.Also, don’t forget to participate Our 18k+ ML SubReddit, cacophony channeland email newsletterWe share the latest AI research news, cool AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing a Bachelor’s Degree in Data Science and Artificial Intelligence from the Indian Institute of Technology (IIT), Bhilai. He spends most of his time on projects aimed at harnessing the power of machine learning. His research interest is image processing and his passion is building solutions around it. He loves connecting with people and collaborating on interesting projects.

🔥 Must read – What is AI hallucinations? The problem with AI chatbots How to find hallucinatory artificial intelligence?

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *