
Transformer-based models have dominated the natural language processing (NLP) field since their introduction in 2017. Tokens such as words, morphemes and punctuation marks are generated from the input text by the transformer. However, the transformer must pay attention to every token in the input, so to handle long-form jobs, such as book summaries, where the number of tokens in the input can easily exceed 100,000, the context I need to make the window bigger. To handle arbitrary-length inputs, a group of researchers at Carnegie Mellon University offers a wide range of strategies to enhance model performance by supplementing pre-trained encoder-decoder converters with external data stores. I’m here.
Unlimiformer is a new search-based strategy that extends the input length tolerance of pretrained language models during testing. Existing encoder/decoder transformers can be extended with Unlimiformer to accept unlimited inputs. Unlimiformer builds a data store for the hidden state of every input token given a long input sequence. The decoder then uses default mutual attention to access the database and focus on the top k input tokens. Datastores support sublinear searches and can be held in GPU or CPU memory. A trained model can have checkpoints enhanced by Unlimiformer without additional training. Unlimiformer’s effectiveness can be further enhanced through tuning.
The maximum length of the input to the transformer is limited by the size of the encoder’s context window. However, different information may be meaningful at the decoding stage, and different attentional centers may focus on multiple aspects of the data. As a result, fixed context windows can be inefficient because they focus on tokens that the attention head should prioritize. Unlimiformer gives each head the option to select its own context window from the entire input at each stage of decoding. To formalize this, we insert an Unlimiformer lookup into the decoder before applying cross-attention. This causes the model to perform a k-nearest neighbor (kNN) search on the external datastore to select the set of tokens to look at for each decoder layer and attention head.
To further enhance Unlimiformer’s efficacy, researchers are now looking at training approaches. As a preliminary step, they are looking at alternative training methods that require less processing power than traditional fine-tuning methods. They also explore the computationally expensive option of training Unlimiformer directly.
The study code and model can be downloaded from GitHub.
Empirically, the team has tested Unlimiformer on long document and multi-document summarization tasks, showing that it can summarize documents with as many as 350k tokens without truncating the input. Existing pretrained models have also been fine-tuned using his Unlimiformer so that they can handle unlimited inputs without the need for newly learned weights or source code changes. Unlimiformer may add structure to the datastore or recover embeddings within chunks to further improve the performance of large acquisition-enhanced language models, and downstream sequence-to-sequence generation It shows promising results in the task. Incorporating the structure into the datastore or getting the embedding in chunks are two ways researchers believe future work will improve the performance of search-extending LLMs on difficult downstream tasks. To go even further, the information retrieval community has developed various approaches to improve retrieval. This is why the researchers behind the HuggingFace Transformers library have released a script that allows her Unlimiformer to be injected into any model with a single click.
check out paper and Github link. don’t forget to join 20,000+ ML SubReddits, cacophony channeland email newsletterWe share the latest AI research news, cool AI projects, and more. If you have any questions about the article above or missed something, feel free to email me. Asif@marktechpost.com
🚀 Check out 100 AI Tools in the AI ​​Tools Club
Dhanshree Shenwai is a Computer Science Engineer with a keen interest in AI applications and strong experience in FinTech companies covering the domains of Finance, Cards & Payments and Banking. She is passionate about exploring new technologies and advancements in today’s evolving world to make life easier for everyone.