Using geometry and physics to explain feature learning in deep neural networks

Machine Learning


Using geometry and physics to explain feature learning in deep neural networks

The analogy of handmade folding rulers discovered by the team can be used to model DNN training in various regimes. Credits: Shi, Pan & Dokmanic.

Machine learning algorithms that underpin the capabilities of deep neural networks (DNNS), large-scale language models (LLM), and other artificial intelligence (AI) models learn to make accurate predictions by analyzing large amounts of data. These networks are made up of layers, each converting input data into “features” that guide the analysis of the next layer.

The process by which DNNS learns features is a topic of numerous research research and is ultimately key to excellent performance in various tasks of these models. Recently, some computer scientists have begun to explore the potential of modeling functional learning in DNNS using a physics-based framework and approach.

Researchers from the University of Basel and the University of Science and Technology in China have discovered stage diagrams, which are graphs used in thermodynamics to draw stages of liquids, liquids and gases. Their papers published in Physical Review Lettermodel DNN as a spring blockchain. This is a simple mechanical system often used to study the interaction between linear (spring) and nonlinear (friction) forces.

“Chen and I were in a workshop where there was an inspiring talk on the 'Law of Data Separation',” Ivan Dokmanic, the researcher who led the study, told Phys.org. “The layers of deep neural networks (but biological neural networks such as the human visual cortex) process the input by gradually distilling and simplifying.

“The deeper you get into the network, the more regular and geometric these representations become. This means that representations of different classes of objects (such as cats and dogs) are more independently distinguishable. There is a way to measure this separation.

“In a well-trained neural network, these data separation “summary statistics” often work in a simple way.

The team discovered that the “law of data separation” is true in commonly used “hyperparameters”, networks such as learning speed and noise, but not for different hyperparameter options. Understanding why this happens, I realized that DNNS can shed light on how it learns great features across the model.

“At the same time, we were involved in several geophysics projects in which people use the spring block model as a phenomenological model of fault and earthquake dynamics,” Docmanic said. “The phenomenology of data separation reminded me of that. I thought of many other analogies. For example, Cheng thought equal data separation was like a retractable coat hanger. It was like a folding ruler.

“We exchanged photos and videos of various “layer structured” household items and tools on our winter holidays, including these coat hangers, folding rulers and more.

After identifying various potential theoretical models and layered physical systems that DNNS can use to study how functions are learned, the researchers ultimately decided to focus on the spring block model. These models have already proven valuable for studying a wide range of real-world phenomena, such as earthquakes and material deformation.

Using geometry and physics to explain feature learning in deep neural networks

A diagram showing the spring block theory of functional learning teams in deep neural networks. Credits: Shi, Pan & Dokmanic.

“This data separation behavior has shown to be creepy similar to that of blocks connected by springs sliding onto rough surfaces (and also of other mechanical systems, such as foldable rulers).

“How simplification of the layer corresponds to how much the spring stretches. Nonlinearity in the network corresponds to the amount of friction between the block and the surface. Both systems can add noise.”

Looking at two systems in the context of the law of data separation, Docmanic and his colleagues discovered that the behavior of DNNs is similar to that of the spring blockchain. DNN responds to training losses (i.e. requests requesting descriptions of observed data) by isolating data layers by layer-by-layer. Similarly, the spring blockchain responds to the pulling force by separating the block layers by layer.

“The more nonlinearity there is, the more inconsistency there is between the outer (deep) and inner (shallow) layers. Deep layers learn/separate more.

“However, if you add training noise or start swinging and shaking in the spring block system, the block will spend time 'airing' without experiencing friction. This allows the spring to even out the separation a little even.

This recent study introduces new theoretical approaches to studying DNNS and how they learn their functions over time. In the future, this approach will help deepen the current understanding of deep learning algorithms and the process of learning to ensure that you tackle a particular task.

“Most of the existing results deal with simplified networks that lack the important aspects of the actual deep net that are actually used: depth, nonlinearity, etc.” explained Dokmanić.

“These works study a single impact factor for stylized models, but the success of deepnets is based on the accumulation of factors (depth, nonlinearity, noise, learning rate, normalization, etc.). In contrast, we get a general theory rather than phenomenological, not the first principle.

Spring block theory adopted by researchers has so far been found to be simple and effective for understanding the ability of DNNs to generalize to a variety of scenarios. In their paper, Docmanic and his colleagues successfully used it to calculate data separation curves for DNNS during training, and found that the shape of these curves demonstrates the performance of trained networks on invisible data.







Funny video of folding ruler experiments and DNN training in different regimes. credit: Physical Review Letter (2025). doi:10.1103/ys4n-2tj3

“We understand which direction the different noise and nonlinearities will change the shape of the data separation curve, which gives us (potentially) a powerful tool to speed up training on very large nets,” says Dokmanić.

“Most people have strong intuitions about springs and blocks, but not deep neural nets. Our theory states that by leveraging intuition about simple mechanical systems, we can make interesting, useful and true statements about deep nets.

The theoretical models adopted by this team of researchers were immediately used by both theorists and computer scientists to further explore the foundations of deep learning-based algorithms. As part of the next study, Docmanic and his colleagues hope to use theoretical approach to explore functional learning from a microscopic perspective.

“We are approaching having an initial explanation of the phenomenology of spring blocks (or perhaps folding ruler phenomenology) in deep nonlinear networks.

“The other direction we are pursuing is to really double the way we operate this to improve deep net training, especially for very large transformer-based networks like large language models. Having a proxy for cheap generalization when training, and understanding how to manipulate training to improve generalization is an alternative route to the very popular scaling laws now.”

By carefully designing DNNS training to improve its ability to generalize across other tasks, researchers can also devise diagnostic tools for large-scale neural networks. For example, this tool can help identify areas that need to be improved to improve model performance, similar to the way stress maps are used in structural mechanics to identify areas of focused stress that may impair structural safety.

“By analyzing the internal load distribution of neural networks, we can find layers/regions of overload that may indicate generalization of overload.

Written for you by author Ingrid Fadelli, edited by Gaby Clark and fact-checked and reviewed by Robert Egan. This article is the result of the work of a careful human being. We will rely on readers like you to keep independent scientific journalism alive. If this report is important, consider giving (especially every month). You'll get No ads Account as a thank you.

detail:
Cheng Shi et al., Spring block theory of feature learning in deep neural networks; Physical Review Letter (2025). doi: 10.1103/ys4n-2tj3.

©2025 Science X Network

Quote: Explains feature learning in deep neural networks (August 10, 2025) using geometry and physics. Retrieved from August 10, 2025 https://phys.org/news/2025-08-Geometry-physics-feature-deep-neural.html

This document is subject to copyright. Apart from fair transactions for private research or research purposes, there is no part that is reproduced without written permission. Content is provided with information only.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *