Professor Yi Ma, a world-renowned expert on deep learning and artificial intelligence, presented a compelling challenge to the prevailing paradigms of AI in an interview on Machine Learning Street Talk. Speaking with host Tim Scarf, Professor Marr systematically dismantled common assumptions about large-scale language models (LLMs) and 3D vision systems, arguing that current successes often mask a fundamental lack of real understanding. Instead, he proposed a unified mathematical theory of intelligence built on two fundamental principles: parsimony and self-consistency, suggesting a path toward white-box AI, where all components are derived from first principles rather than empirical guesses.
“What is the difference between compression and abstraction? The difference between memorization and understanding,” Professor Ma argued at the beginning of the discussion, summarizing the central theme. He argued that current AI models, especially LLMs, operate primarily on the basis of memorization and process text (which is already compressed with human knowledge) using mechanisms similar to how they learn from raw data. This leads to the illusion of understanding, where the model can produce coherent text but lacks the underlying conceptual grasp to perform true abstractions and causal inferences. Its impressive capabilities, such as reconstructing complex 3D scenes from limited data, as seen in systems such as Sora and NeRF, are still insufficient for basic spatial reasoning tasks.
The core of Professor Ma's theory is based on the principles of parsimony and self-consistency. He explained parsimony as the willingness to “learn what is predictable” by identifying low-dimensional structure within high-dimensional data. This involves discovering the patterns and regularities inherent in the world and making things “as simple as possible, but not simpler.” This is a quote by Albert Einstein that Professor Marr frequently quotes. It is about distilling information into its most essential form and removing redundancy to reveal the underlying truth.
The second pillar, self-consistency, refers to the ability of the system to verify the learned representations. This is a closed-loop learning process that ensures that the system accurately reflects its internal model and reliably reproduces the original data distribution. This continuous feedback mechanism allows error correction and refinement, driving the system toward more robust and generalizable knowledge. Ma argued that such a framework naturally leads to iterative optimization and compression, ultimately making deep neural networks more transparent and explainable.
Currently popular “black box” AI systems are empirically designed and face significant challenges in explainability, reliability, and control. Professor Ma advocated a move to “white box” AI, where mechanisms are mathematically derived and fully interpretable. This principles-based approach has already led to architectures like CRATE, which provide a transparent alternative to empirical models like ViT by deriving components from fundamental principles of parsimony and self-consistency.
“The world is not yet completely random and is still largely predictable,” Professor Marr said, highlighting the fundamental reason why intelligence exists and evolves. This predictability is what both natural and artificial intelligence seek to exploit. Continuously acquiring knowledge to more accurately predict the world is a central driver. He goes on to elaborate on how the introduction of noise is not just a nuisance, but a necessary element for discovering structure in data, a concept he calls “all roads lead to Rome.” This “dimensionality benefit” suggests that the natural optimization landscape is surprisingly smooth and that gradient descent is an effective tool for learning.
Professor Ma drew parallels between the evolution of life and intelligence, distinguishing between phylogenetic intelligence (slow DNA-based inheritance and natural selection) and ontogenetic intelligence (individual learning, memory, and error correction). He emphasized that human intelligence, especially the capacity for abstraction and mathematical reasoning, represented a stage transition from empirical observation to scientific deduction. He argued that modern AI operates primarily at the level of experiential memory, similar to early life forms. The next frontier involves developing systems that can truly theorize, generate new scientific hypotheses, and infer meaning from fundamental principles. This, he concluded, is the true path to building intelligent systems that move beyond mere memorization toward true understanding and scientific discovery.
