A new approach to scaling laws could change the way AI models are trained

Machine Learning


Big tech companies are tight-lipped about how much it costs to train large-scale language models like ChatGPT, Claude, and Gemini, but estimates range from hundreds of millions of dollars to $1 billion per training iteration. This huge cost means that AI developers only want to train a new model once.

To contain costs and increase the reliability of these large single training runs, developers have begun to rely on what are known as scaling laws to explore the functionality of the many smaller models that make up the model. In other words, it helps predict how language models will perform. scale up during training. Scaling laws have now become essential AI infrastructure, and even these scaling methods require expensive computing.

Now, academics have developed a new approach to scaling that significantly reduces training demands and reduces scaling time and costs.

“Before the law of scaling was proven, the most famous developers bet their farms on it, and it happened to work out. They made big strategic decisions about how to tune and design their models, and they used the law of scaling to estimate performance, and they were right. But scaling was still expensive, it was just cheaper than the alternatives,” said Sanmi Koyejo, assistant professor of computer science and senior author of the book. says Mr. new research Endorsed at the International Conference on Machine Learning, it shows a smart way to improve scaling while reducing computational demands by as much as 99%.

“The core question we’re studying is very simple,” he says. San Truonga graduate student in Koyejo’s lab and lead author of the paper “Can we use algorithms to improve scaling?”

Required architecture

In a new paper, Koyejo, Truong et al. show how scaling algorithms can be tuned to significantly reduce computational demands. They call their framework Item Response Scaling Laws (IRSL). This is the same concept used in standardized academic assessments such as the SAT.

IRSL borrows principles from measurement science (psychometrics) and education to build a relationship between test takers and the questions they are asked, increasing the difficulty of the questions in successive rounds as the model answers them correctly. This significantly reduces the number of queries needed to accurately estimate capacity, Coejo says. Researchers have shown that IRSL can achieve comparable or better prediction accuracy with far fewer queries, saving time and cost while improving performance.

This is a kind of statistical shortcut. Koyejo and Truong use the information more effectively and efficiently than asking every question multiple times for every model. The number of potential questions in traditional scaling can be 10,000 or more. Multiplying the number of models by the number of times an answer needs to be sampled means that a scaling run could involve 10 trillion queries. IRSL, on the other hand, achieves comparable accuracy with just 50 questions. This is a reduction of over 99%.

Beyond Big Tech

“Existing frameworks often required running thousands of small models across tens of thousands of benchmark questions to predict outcomes,” Truong explains. “Our approach significantly increases the efficiency and reliability of this process. In some cases, reducing computational effort improves prediction results.”

Koyejo predicts that IRSL’s impact will be greatest in academia. Training costs can be prohibitive in academia, but private developers with deep pockets may also benefit. The goal is to give researchers new tools to help them reason about scaling in a scientifically and statistically rigorous way, Truong said.

“We believe item response scaling is an important step forward,” Coejo concludes. “This shows that scaling and training in general can be improved. You get a better signal with less work, a counterintuitive combination.”

Contributors include graduate students Rylan Schaefer of Stanford University and Yuheng Tu of the University of California, Los Angeles.

This research was made possible with funding from the National Science Foundation, ARPA-H, the MacArthur Foundation, Schmidt Science, the Stanford Institute for Human-Centered Artificial Intelligence (HAI), OpenAI, Microsoft, and Google.



Source link