How “chain of thought” makes Transformers smarter

Screenshot 2024-05-12 at 5.59.07 PM — https://arxiv.org/abs/2402.12875

Large-scale language models (LLMs) such as GPT-3 and ChatGPT far outperform standard supervised machine learning techniques at complex inference tasks such as mathematical problem solving and code generation. . The key to unlocking these advanced reasoning abilities is Chain of Thought (CoT)This refers to the model's ability to generate intermediate inference steps before arriving at the final answer. This is similar to how humans mentally break down complex problems into smaller steps. This can be achieved by methods such as training the model on enriched examples in intermediate inference steps or using few-shot prompts that instruct the model to generate CoTs.

Now, you might think that the content of these intermediate steps allows the model to reason better. But interestingly, in this study, the researchers found that even if the intermediate steps were wrong or completely random, just generating them could greatly help the model. It's like the model is saying, “Okay, let's think about this step by step,” and that alone greatly improves the model's ability to reason.

So the researchers wanted to understand why this “chain of thought” approach is so powerful for Transformers, the type of model used in GPT-3 and others. They used concepts from circuit complexity theory and adopted the language of computational complexity classes such as NC, AC, and TC to analyze this problem.

Essentially, they discovered that without chain of thought, the transformer is limited to efficiently performing only parallel computations, solving problems that can be broken down into independent subtasks that can be computed simultaneously.

However, many complex inference tasks require serial computations in nature, where one step follows from the previous step. This is where the chain of thought comes in very handy for Transformers. By generating step-by-step inferences, the model can perform more serial computations than without his CoT.

The researchers argue that while a basic transformer without CoT can only solve problems up to a certain level of complexity, by allowing a polynomial number of CoT steps, the transformer can, at least from a theoretical perspective, We have theoretically proven that it can be powerful enough to solve all computationally difficult problems.

To support their theory, they also conducted several experiments on a variety of arithmetic tasks, including those that can be parallelized and those that require inherently sequential computation. Sure enough, we found that without CoT, the transformer struggled with sequential tasks, but enabling CoT significantly improved performance, especially when the transformer model was relatively small/shallow.

In essence, chaining of thoughts is a simple but powerful trick that greatly improves the inference capabilities of Transformer models like GPT-3. This allows you to tackle complex tasks that require sequential logic where parallel models would fail.

Please check paper. All credit for this study goes to the researchers of this project.Don't forget to follow us twitter.Please join us telegram channel, Discord channeland LinkedIn groupsHmm.

If you like what we do, you'll love Newsletter..

Don't forget to join us 42,000+ ML subreddits

Vineet Kumar is a consulting intern at MarktechPost. He is currently pursuing his bachelor's degree from the Indian Institute of Technology (IIT), Kanpur. He is a machine learning enthusiast. He is deeply passionate about research and the latest advances in learning, computer vision, and related fields.

✅ [Free AI Webinar] Zapier Central + SingleStore = Full RAG Agent

Source link