This AI study empirically and theoretically delves into the limitations and capabilities of Transformers Large Language Models (LLMs) for composition tasks.

ChatGPT is trending and used by millions of people every day. ChatGPT makes our lives easier with amazing human-like features such as question answering, generating unique and creative content, summarizing large amounts of text data, code completion, and developing super-useful virtual assistants. doing. Developed by OpenAI, ChatGPT is based on the GPT 3.5 (Generative Pre-Trained Transformer) and GPT 4 transformer architectures. The latest version of the language model released by OpenAI, GPT 4, is inherently multimodal. That is, unlike previous versions, it receives input in the form of text and images. Other Large Language Models (LLMs), such as PaLM, LLaMA, and BERT, are also used in various fields of application including healthcare, e-commerce, finance, education, and more.

In a recently published research paper, a team of researchers highlighted the difference between the superior performance of LLMs such as GPT on complex tasks and their struggles on simple tasks. Exploring the limitations and capabilities of Transformer LLM, the team conducted experiments on three of his representative compositional tasks: multidigit multiplication, logic grid puzzles, and classical dynamic programming problems. These tasks involve breaking down the problem into smaller steps and combining those steps to produce a precise solution.

Aiming to study the limitations of Transformers in solving compositional tasks requiring multi-stage reasoning, the authors proposed two hypotheses. First, because transformers accomplish their tasks by linearizing multi-step reasoning into path matching, they actually understand and implement the underlying computational rules needed to develop good solutions. It relies on pattern matching and shortcut learning instead of doing it. This approach allows fast and accurate predictions on similar patterns during training, but cannot be generalized to rare complex examples. A second hypothesis is that transformers may have inherent limitations in trying to solve complex configuration tasks with inherent patterns. Early computational errors can spread out and lead to severe compounding errors in later steps, preventing the model from reaching a good solution.

🚀 Check out 100’s of AI Tools at the AI Tools Club

The authors formulated the composition task as a computational graph to investigate two hypotheses. These graphs decompose the process of solving a problem into smaller, more manageable sub-module functional steps, providing a structured measure of problem complexity and the language of computational steps as an input sequence to a language model. make it possible to Additionally, information gain is used to make predictions about the patterns the model will learn based on the underlying task distribution without performing full computations in the graph.

Based on empirical findings, the authors proposed that Transformers address compositional challenges by reducing multi-stage inference to linearized subgraph matching. They provide a theoretical argument based on an abstract multi-step reasoning problem, highlighting that the performance of Transformers degrades rapidly as task complexity increases. This indicates that the model’s ability to handle very complex configuration problems may already be limited.

In conclusion, empirical and theoretical results suggest that the performance of Transformers is driven primarily by pattern matching and subgraph matching rather than a full understanding of the underlying thought processes. This also supports the idea that Transformers will have difficulty performing increasingly difficult tasks. .

please check out paper. don’t forget to join 22,000+ ML SubReddits, Discord channeland email newsletterShare the latest AI research news, cool AI projects, and more. If you have any questions regarding the article above or missed something, feel free to email us. Asif@marktechpost.com

🚀 Check out 100’s of AI Tools at the AI Tools Club

Tanya Malhotra is a final year student at the University of Petroleum and Energy Research, Dehradun, graduating with a Bachelor of Science in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
A data science enthusiast with good analytical and critical thinking, she has a keen interest in learning new skills, leading groups, and managing work in an organized manner.

➡️ The Ultimate Guide to Data Labeling in Machine Learning

Source link