Google says its new AI training techniques are significantly faster.

AI News


Researchers at Google's DeepMind have unveiled a new way to accelerate AI training, drastically reducing the computational resources and time required for the task. According to a recent research paper, this new approach to a normally energy-intensive process could make AI development faster and cheaper, which could also be good news for the environment.

“Our approach, multimodal contrastive learning with joint example selection (JEST), outperforms state-of-the-art models with up to 13x fewer iterations and 10x fewer computational efforts,” the study states.

The AI ​​industry is known for its high energy consumption. Large-scale AI systems like ChatGPT require significant processing power, which in turn requires large amounts of energy and water to cool these systems. For example, Microsoft's water consumption is reported to have skyrocketed by 34% from 2021 to 2022 due to increased demand for AI computing, and ChatGPT has been accused of consuming approximately half a liter of water for every 5-50 prompts.

The International Energy Agency (IEA) compared AI’s electricity demands to the energy profile of the oft-maligned cryptocurrency mining industry, predicting that data center electricity consumption will double between 2022 and 2026.

But approaches like JEST could offer a solution: By optimizing data selection for AI training, Google says JEST can significantly reduce the number of iterations and required computing power, lowering overall energy consumption — an approach that aligns with efforts to improve the efficiency and reduce the environmental impact of AI technology.

If this technique proves effective at scale, AI trainers will require only a fraction of the power used to train models, meaning you could create more powerful AI tools with the same resources you use today, or develop new models with fewer resources.

How JEST works

JEST works by selecting complementary batches of data to maximize the learnability of an AI model. Unlike traditional methods that select individual examples, the algorithm considers the composition of the entire set.

For example, say you're learning multiple languages. Instead of learning English, German, and Norwegian separately in order of difficulty, it might be more effective to learn the three languages ​​together in such a way that your knowledge of one language supports your learning of the others.

Google has adopted a similar approach with success.

“We demonstrate that jointly selecting batches of data is more effective for learning than individually selecting examples,” the researchers wrote in their paper.

To do so, Google researchers used “multimodal contrastive learning” to identify dependencies between data points in the JEST process, a method that significantly reduces the computational power required while improving the speed and efficiency of AI training.

According to Google, the key to this approach is starting with a pre-trained reference model to guide the data selection process. This technique allows the model to focus on high-quality, curated datasets, further optimizing training efficiency.

“The quality of a batch depends not only on the total quality of the data points considered individually, but also on its composition,” the paper explains.

Research experiments have shown robust performance improvements across a range of benchmarks, including significant improvements in learning speed and resource efficiency when training on the popular WebLI dataset with JEST.

The researchers also found that the algorithm accelerated the training process by quickly finding sub-batches that were more likely to learn and focusing on specific data that “matched” each other. This technique, called “data quality bootstrapping,” prioritizes quality over quantity and has proven to be well-suited for AI training.

“Reference models trained on small, curated datasets can effectively guide the curation of much larger datasets, making it possible to train models that significantly exceed the quality of the reference models on many downstream tasks,” the paper states.

Editor: Ryan Ozawa.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *