Imagine you are a lawyer using an AI chatbot. Upload related legal documents to your favorite bot and start publishing prompts. Soon, the bot is relaxed. However, it is the incredible memory, calculation power and energy consumed in the background.
“The AI internal representation of a 70,000-word legal document can consume more than 100 gigabytes of valuable GPU memory. Putting in context how large this representation is, the raw text of the same document is only 400 kilobytes. “All of this memory consumption is extremely expensive and slow to produce a chatbot response.”
Eyuboglu is the first author of a new preprint paper with an interesting solution to this problem. He calls them “cartridges.” A cartridge is a compact memory module that is offline trained to represent a document or other text, allowing AI bots to answer queries quickly without having to redecipher the complete document. Eyuboglu says the cartridges are working on all sorts of textual information, including legal documents, computer codes, personal files, textbooks, and patient medical records.
“Today's AI systems do a good job of adapting responses to small amounts of contexts. Think of a few pages of text. But unfortunately, today's systems will experience a decline in performance and efficiency as contexts grow,” says author Simran Arora. “With cartridges we are looking for ways to more efficiently and effectively expand the amount of context we can provide to our models.”
By storing the context in these compact cartridges, Arora and Eyuboglu, together with co-authors and their advisor professor Chris Ré, decided that they could reduce their memory requirements by an order of magnitude. Cartridges use almost 40 times the memory and say they increase the bot's word output by 25 times more than traditional context learning (ICL) methods. This study was funded in part by Stanford University, human-centered AI.
New horizon
Innovation came from a relatively modest concept. “The same document is often referenced by many queries, so invest in a lot of computing upfront to prepare the cartridge,” says Eyuboglu. “And then, if you get more queries on the line, you can respond very quickly.”
This is not the first time researchers have tried to reduce the memory load on AI, but previous efforts have invested relatively little calculations into the compression process. The memory footprint was small, yes, but those profits came at high cost – the answer is bad. In contrast, cartridges consume less memory while generating high-quality answers. This is possible because it is generated in a highly computationally intensive process. “This trade-off is desirable when the context is shared across many queries and the cost of producing the cartridge can be shared,” says Eyuboglu.
In fact, cartridges train themselves through important innovations that teams call “self-learning.” In self-learning, the model continues a conversation with itself about essentially simulating a query that a real user may ask, rather than memorizing the text. These conversations are burned into the cartridge using standard training algorithms. In this way, the cartridge can be used beyond a variety of prompts. This saves time, effort and memory.
“If you just train in context for simple purposes like predicting the next token, you can remember the document, but you can only reflux it,” says Eyuboglu. “What a synthetic conversation does – self-learning is important because the model can actually answer common questions and tasks quickly and accurately at a later point.”
Next Steps
Cartridges are never free. Self-learning requires the use of a powerful multi-GPU system. They are actually needed more The first is energy for production/training, but then to use it regularly. few Energy. The key part is that training cartridges can occur offline when computing power is cheap or demand is low. Cartridges can also be reused with countless queries of large amounts of text.
Future Directions Teeuboglu tips include more efficient training of cartridges, real-world deployment in certain domains such as medicine and law, and perhaps even standard libraries of cartridges for public use. Eyuboglu points out that it can be combined with things trained in a variety of texts. This is an interesting finding that can drive future research into cartridges.
What's most exciting for teams is that cartridges can be personalized and present a scalable, sustainable path to AI systems that continuously learn from the user's context.
“The recent history of AI has been about building the same huge monolithic model for everyone,” concludes Eyuboglu. “I think we're beginning to see the limits of that approach. This work provides evidence that self-study techniques can advance scalable paths.”
