CMU Researchers Announce Unlimiformer: An AI Method That Extends Pretrained Encoders/Decoders With External Datastores, Allowing Inputs Of Unlimited Length

Source: https://arxiv.org/abs/2305.01625

Screenshot 2023-05-08 at 6.02.14 AM — Source: https://arxiv.org/abs/2305.01625

Transformer-based models have dominated the natural language processing (NLP) field since their introduction in 2017. Tokens such as words, morphemes and punctuation marks are generated from the input text by the transformer. However, the transformer must pay attention to every token in the input, so to handle long-form jobs, such as book summaries, where the number of tokens in the input can easily exceed 100,000, the context I need to make the window bigger. To handle arbitrary-length inputs, a group of researchers at Carnegie Mellon University offers a wide range of strategies to enhance model performance by supplementing pre-trained encoder-decoder converters with external data stores. I’m here.

Unlimiformer is a new search-based strategy that extends the input length tolerance of pretrained language models during testing. Existing encoder/decoder transformers can be extended with Unlimiformer to accept unlimited inputs. Unlimiformer builds a data store for the hidden state of every input token given a long input sequence. The decoder then uses default mutual attention to access the database and focus on the top k input tokens. Datastores support sublinear searches and can be held in GPU or CPU memory. A trained model can have checkpoints enhanced by Unlimiformer without additional training. Unlimiformer’s effectiveness can be further enhanced through tuning.

The maximum length of the input to the transformer is limited by the size of the encoder’s context window. However, different information may be meaningful at the decoding stage, and different attentional centers may focus on multiple aspects of the data. As a result, fixed context windows can be inefficient because they focus on tokens that the attention head should prioritize. Unlimiformer gives each head the option to select its own context window from the entire input at each stage of decoding. To formalize this, we insert an Unlimiformer lookup into the decoder before applying cross-attention. This causes the model to perform a k-nearest neighbor (kNN) search on the external datastore to select the set of tokens to look at for each decoder layer and attention head.

🚀 Check out 100 AI Tools in the AI Tools Club

To further enhance Unlimiformer’s efficacy, researchers are now looking at training approaches. As a preliminary step, they are looking at alternative training methods that require less processing power than traditional fine-tuning methods. They also explore the computationally expensive option of training Unlimiformer directly.

The study code and model can be downloaded from GitHub.

Empirically, the team has tested Unlimiformer on long document and multi-document summarization tasks, showing that it can summarize documents with as many as 350k tokens without truncating the input. Existing pretrained models have also been fine-tuned using his Unlimiformer so that they can handle unlimited inputs without the need for newly learned weights or source code changes. Unlimiformer may add structure to the datastore or recover embeddings within chunks to further improve the performance of large acquisition-enhanced language models, and downstream sequence-to-sequence generation It shows promising results in the task. Incorporating the structure into the datastore or getting the embedding in chunks are two ways researchers believe future work will improve the performance of search-extending LLMs on difficult downstream tasks. To go even further, the information retrieval community has developed various approaches to improve retrieval. This is why the researchers behind the HuggingFace Transformers library have released a script that allows her Unlimiformer to be injected into any model with a single click.

check out paper and Github link. don’t forget to join 20,000+ ML SubReddits, cacophony channeland email newsletterWe share the latest AI research news, cool AI projects, and more. If you have any questions about the article above or missed something, feel free to email me. Asif@marktechpost.com

🚀 Check out 100 AI Tools in the AI Tools Club

Dhanshree Shenwai is a Computer Science Engineer with a keen interest in AI applications and strong experience in FinTech companies covering the domains of Finance, Cards & Payments and Banking. She is passionate about exploring new technologies and advancements in today’s evolving world to make life easier for everyone.

Source link

Mia commented on Don’t Be Fooled By Data Drift « Machine Learning Times: This is such a valuable viewpoint on data drift in
創建binance帳戶 commented on MEGA sconto del 34% su Amazon: Your article helped me a lot, is there any more re
binance registrering commented on Global Industrial Automation Services Market Size to Reach: Your point of view caught my eye and was very inte
binance commented on WestMetric Defends Controversial On-Page SEO Services for the Era of AI: I don't think the title of your article matches th
创建个人账户 commented on AI in CMO Strategy: Transforming Marketing Leadership: Can you be more specific about the content of your

CMU Researchers Announce Unlimiformer: An AI Method That Extends Pretrained Encoders/Decoders With External Datastores, Allowing Inputs Of Unlimited Length

Leave a Reply

RECENT POSTS

AI Agent Failure Detection and Root Cause Analysis with Strands Evals

Novi’s Teradyne Robotics unveils production-ready physical AI applications at Automate 2026

Lightroom adds the ability to convert photos to videos, but this is the first Adobe tool to use generated credits within Lightroom

Related Posts

Leave a Reply