Research confirms that machine learning has difficulty understanding the complex state of matter

Machine Learning


Scientists are increasingly studying the intersection of quantum physics and machine learning, and new research by Tarun Advais Kumar, Zou Yijiang, and Amir Reza Negari of the Perimeter Institute for Theoretical Physics and the University of Waterloo, in collaboration with colleagues Roger G. Melko and Timothy H. Hsieh, also from the Perimeter Institute for Theoretical Physics and the University of Waterloo, reveals fundamental limitations of machine learning. This can be achieved using an algorithm. The researchers demonstrated that certain complex phases of matter, especially those characterized by locally indistinguishable states, pose a computational barrier to unsupervised learning methods such as autoregressive neural networks. This study, which utilizes conditional mutual information and restricted statistical query models as diagnostic tools, establishes a link between the difficulty of learning distributions and the existence of nonlocal correlations, and may provide new methods for identifying exotic phases of matter and error correction thresholds in quantum systems.

This study focuses on unsupervised learning, where algorithms identify patterns from unlabeled data, and reveals that autoregressive neural networks struggle to capture the global properties of distributions characterized by locally indistinguishable (LI) states. These LI states have locally identical characteristics but are globally different, which poses a significant obstacle to accurate data representation. The researchers established a link between conditional mutual information (CMI), a measure of statistical dependence, and the presence of LI conditions and found that long-range CMI in the classical distribution is the spatial counterpart of LI. By introducing a restricted statistical query model, this work proves that nontrivial phases exhibiting long-range CMI, such as those caused by strong-to-weak spontaneous symmetry breaking, are inherently difficult to learn. This theoretical finding was verified through extensive simulations using recurrent, convolutional, and transformer neural networks trained on syndromes and physical distributions of toric/surface codes subject to bit flip noise. This result suggests that learning difficulties may serve as a diagnostic tool for identifying mixed-state phases, transitions, and even error correction thresholds in physical systems. Furthermore, this study proposes that CMI, or more broadly “nonlocal givenness”, can serve as a metric to quantify the inherent difficulty of learning a particular distribution. This work bridges the gap between machine learning and condensed matter physics, providing new perspectives on data analysis and potentially helping to develop more robust and reliable artificial intelligence. The identification of LI and CMI as key indicators of learning difficulty could have significant implications for the safety of AI and the design of algorithms that can handle complex real-world data. Furthermore, the concept of (un)learnability provides a new computational method for investigating mixed-state phases and transitions, with potential applications in both numerical simulations and quantum experiments. This work focuses on demonstrating the computational difficulty through the lens of mixed-state phases of matter and its learnability through autoregressive neural networks. A 72-qubit superconducting processor was not used. In this study, we investigated the challenges of learning distributions characterized by locally indistinguishable (LI) states using a restricted statistical query model. LI state is a concept that describes a scenario in which two quantum states cannot be distinguished by local measurements. This approach allows for rigorous theoretical analysis of the complexity of learning, going beyond empirical observations of neural network performance. To generate the training data, the team utilized a Monte Carlo method and created a dataset of 100,000 samples for most of the system. Importantly, for each sample, the true probability is calculated using tensor network techniques, providing an accurate benchmark for evaluating the performance of neural networks. A notable exception is the 1D horizontal field Ising model, where a more efficient matrix product state (MPS) technique optimized with a density matrix renormalization group (DMRG) was implemented for both sampling and stochastic calculations. For the surface code, modifications to the Monte Carlo sample were introduced to group spins to create a 4-dimensional local degree of freedom and add artificial spins at the boundaries to maintain consistency. Neural network optimization involved an Adabelief optimizer initialized with a batch size of 128 and a learning rate of 5×10−5. To ensure robust results, up to 30 copies of each network were trained with different random initializations for each parameter point and ran for 400,000 steps. KL divergence, a measure of how one probability distribution differs from another, was used to quantify performance, calculated using 100,000 samples different from the training set. All numerical experiments were implemented using JAX, a high-performance numerical library, and the NetKet library, which provides tools for quantum many-body physics. The Hilbert space representing the state space of a quantum system was constructed as a 1D hypercube graph, even for 2D systems, to facilitate autoregressive modeling. Initial analyzes revealed that autoregressive neural networks struggle to learn distributions exhibiting locally indistinguishable (LI) states, and this finding was demonstrated both in theoretical proofs and in extensive numerical experiments. Specifically, this work establishes a lower bound on polynomial time for learning within restricted statistical query models and demonstrates the fundamental difficulties associated with these types of distributions. This study demonstrates that conditional mutual information (CMI) serves as a diagnostic for LI and shows that long-range CMI in a classical distribution implies the presence of a spatial LI partner. This work introduces a restricted statistical query learning model that reflects gradient-based training in autoregressive neural networks, and within this framework locally indistinguishable states prove to be computationally intractable. Conversely, one-dimensional distributions using short-range CMI can be learned efficiently, consistent with existing results on Gibbs state learning. The toy model illustrates how LI conditions lead to vanishing gradients during the learning process, preventing effective training. Extensive numerical evidence supports these theoretical predictions across a variety of architectures, target systems including recurrent neural networks (RNNs), convolutional neural networks (CNNs), transformers, and syndromes and physical distributions of classical error-correcting codes with quantum and noise. The relentless pursuit of more powerful machine learning models can often feel like we’re moving away from fundamental limitations. This study is a reminder that not everything can be learned, regardless of dataset size or architectural innovation. Researchers have identified a deep relationship between the difficulty of learning certain types of probability distributions and the existence of “locally indistinguishable” states within the probability distribution. This essentially means that the model cannot reliably distinguish between these states. This is not just a technical hurdle. This is a statement about the inherent limitations of learning from data when the data itself contains irreducible ambiguities. Over the years, the more “non-local” the information in a dataset, the harder it is for machines to understand its underlying structure. Accurately identifying these limits is a challenge in itself, and while the diagnostic tools presented here, particularly the conditional mutual information, are promising, further improvements are needed. Future research could investigate whether these “hard-to-learn” distributions are systematically underestimated in training datasets, or whether new learning paradigms are needed to overcome these fundamental barriers. At the end of the day, understanding these limitations is not about abandoning machine learning, but rather about focusing its power on problems where success can truly be achieved.



Source link