GPT-4 Leads Instruction Tuning for Large-Scale Language Models: Facilitates Generalization Capabilities for Real-World Tasks

AI and ML Jobs


https://instruction-tuning-with-gpt-4.github.io/

Demonstrates superior generalization skills for Large Language Models (LLMs), such as in-context learning and chain of thought reasoning. Researchers have focused on techniques for tuning LLMs to help them follow instructions in plain language and get the job done in the real world. This can be done by supervised fine-tuning using publicly available benchmarks and datasets that have been manually enriched, or by automatically generated instructions, or by using human-annotated prompts and feedback. This is achieved by training the model on a simple task.

In the instruction tuning research area, efficient methods have been developed to enhance LLM’s ability to generalize to zero and few shots. One of these techniques, Self-Instruct tuning, aligns LLMs with human objectives by learning from instruction-following data generated by state-of-the-art instructor LLMs with tuned instructions. With instruction tuning, the recent success of ChatGPT and GPT-4 provides a wealth of opportunities to enhance his open source LLM. His group of open-source LLMs, called LLaMA, perform on par with commercial LLMs such as GPT-3.

High-performance, low-cost Self-Instruct tuning is easily adapted to train LLaMA to follow instructions. For example, Vicuna uses about 700K instruction-following samples shared by user ChatGPT, while Stanford Alpaca uses 52K instruction-following samples generated by his GPT-3.5. They initially propose using his GPT-4 as a self-instruction tuning teacher to enhance LLM’s state-of-the-art instruction tuning.

🚀 Join the fastest ML Subreddit community

Microsoft researchers contributed to the study by:

GPT-4 data: These are the available data, such as the 52K English and Chinese instruction follow dataset generated by GPT-4, and the feedback data generated by GPT-4 scoring the results of the three instruction adjustment models. Create a.

model and evaluation: They used data collected by GPT-4 to create a reward model and a command-adjusted LLaMA model. They measure the effectiveness of instruction-tuned LLM using three metrics evaluated on test samples (that is, invisible instructions). Human evaluation with three alignment criteria, automated evaluation using GPT-4 feedback, and his ROUGE-L with artificial instructions.

This study demonstrates the efficiency of instruction fine-tuning using GPT-4. Their empirical research confirms the value of he using the data provided by GPT-4 for fine-tuning LLM orders. Provides helpful advice for writing general-purpose order-following agents based on LLM. We will release 52,000 English and Chinese instruction-following instances made in GPT-4 and model checkpoints tuned from LLaMA. We hope that their empirical findings and resources will assist in the creation of his LLM of open source and general proposals. Human value for completing tasks.

This is still a work in progress and many avenues can be explored: data and model scale. The size of the base LLaMA model is 7B, while the data size of GPT-4 is 52K. Vicuna takes the 13B LLaMA model and collects around 700K conversion turns (based on his ShareGPT data for multiple turns). It is encouraged to continue collecting additional GPT-4 instruction-following data, merging it with the ShareGPT data, and training larger LLaMA models to improve performance. RLHF is (ii). Using a reward model in the decoding stage means that comparative data are likely to provide relevant feedback for his LLM training. It seems prudent to continue applying LLM to reward model training, such as reinforcement learning with machine-generated feedback. We publish both the data and the codebase generated using GPT-4.


check out paper, github, and plan. All credit for this research goes to the researchers of this project.Also, don’t forget to participate Our 18k+ ML SubReddit, cacophony channeland email newsletterWe share the latest AI research news, cool AI projects, and more.

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing a Bachelor’s Degree in Data Science and Artificial Intelligence from the Indian Institute of Technology (IIT), Bhilai. He spends most of his time on projects aimed at harnessing the power of machine learning. His research interest is image processing and his passion is building solutions around it. He loves connecting with people and collaborating on interesting projects.

🔥 Must read – What is AI hallucinations? The problem with AI chatbots How to find hallucinatory artificial intelligence?



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *