Anastasios Kiriridis, Noah Harding Assistant Professor of Computer Science at Rice University, is one of 79 recipients of the recently announced Amazon Research Awards (ARA). His proposed research, Efficient and Affordable Transformers for Distributed Platforms, builds on previous breakthroughs in large-scale system optimization.
Anyone who has used voice commands to communicate with applications like Alexa and Siri has interacted with artificial intelligence (AI) through deep learning solutions. As omniscient as they may seem, these applications have limitations that computer scientists continue to challenge. In fields such as natural language processing and computer vision, deep learning models called transformers have gained prominence for their success in deciphering very large datasets. But transformers are costly in both time and memory. These costs limit the model’s utility in commercial and consumer applications and consumer electronics.
Kyrillidis was intrigued by the increased resource requirements of the transformer, which is directly related to how the model attaches importance to each part of the input. He said, “Transformers are everywhere, but I’m not talking about Optimus Prime or the Autobots.” It is the neural network architecture that underpins most of the recent advances made. Think AlphaFold and protein structure prediction. Consider DALLE-2 and text-to-image compositing. Consider ChatGPT and language applications. So far so good; but the computational and financial budgets required to train such models are prohibitive for most people. By us, we mean all but a few technology companies. “
It is not yet clear whether such large computational and monetary budgets are always needed. “Yes, the more data, the bigger the model, the better. No, there are already scaling laws that indicate that finding the sweet spot between dataset size, model size, and computational power parameters is an interesting and important open research question,” Kiriridis said. said.
In the transformer proposal, Kirillidis said that cost and other limitations limit access to existing high-budget models that prevent contributions from traditional research “stakeholders” such as academic research institutions. expressed concern. He said: “We see distributed computing as a means to train such large-scale models. Of course we are not alone. Various protocols already exist for decentralization, and we depart slightly from these approaches by exploring trade-offs between computation and performance and allowing approximate training dynamics. With this grant, we explored this unsolved open question of whether there is redundancy in the transformer model, which enables training of sparse models, leading to smaller, faster, yet more accurate models. We will focus on.”
Several graduate students in Kiriridis’ OptimaLab have already expressed interest in working on ARA research. Chen Dun and Jasper Liao are two of his members in his OptimaLab, working closely on these issues from a practical and theoretical perspective. Chen Dun led the group’s work on efficient large-scale neural network training (IST project). Jasper Liao contributed greatly to the theoretical understanding of these efficient techniques.
Amazon funded recently announced awards in four areas: Prime Video, Automated Reasoning, Amazon Sustainability, and Amazon Web Services AI. The largest category, his AWS AI, includes Kiriridis’ suggestions. In addition to unlimited funding, recipients will have access to his 300+ Amazon public datasets, promotion of AWS AI/ML services and tools, credit for his work, and practice with Amazon scientists and engineers. You will have the opportunity to participate in informative sessions.
“Besides funding, access to Amazon’s public datasets and AI/ML tools is important to our work,” says Kiriridis. It often happens that not all aspects (negative or positive) that characterize performance become apparent: for example, if there is only one data set that can be manipulated, the solution may (over time) be this particular data set. It is possible to overfit to perform well on a set, making it impossible to generalize to other scenarios: the more available datasets you can “work with”, the more interesting behavior your developed algorithms will discover. more likely to. This raises further open questions and advances research. Finally, as datasets become more accessible to different applications, such as computer vision, language tasks, and other applications, the need for algorithms that adapt across different task modalities may become apparent. “
Although this grant is a one-year, unrestricted award, the purpose of the funding is to develop open source tools and research that will benefit the entire machine learning community, or to develop impactful research using machine learning tools from AWS. to continue. Kyrillidis believes there is a need to encourage more of these efforts to make traditional players (such as academic research institutes) willing to contribute to open source communities and resources and re-enable themselves in modern research. I’m here.
