Benchmarks show current model is vulnerable

Yann LeCun’s $1.03 billion bet on a global model as the future of AI has resulted in the clearest theoretical and empirical mapping yet. Two arXiv preprints from his research group, posted within days of each other in late May, define exactly when Joint Embedding Prediction Architecture (JEPA) can learn a faithful model of the world, and how far current implementations still fall short of that standard.

The timing means both papers were published in the same week that news reports about the identifiability results began circulating, making them the most important research results from LeCun’s group since he founded AMI Labs in early 2026. It was also the most substantive response yet to a field that has watched his thesis with skepticism and committed $1.03 billion in investor capital.

Although the two papers differ in scope, they cannot be read independently. One is a theorem, the other a stress test, and their conclusions rhyme in a way that defines both the destination and the current distance from it.

Formal proof: When JEPA actually learns the world model

The first paper “LeJEPA The paper, submitted to arXiv on May 25 by Cold Spring Harbor Laboratory’s David Klint, LeCun, and Brown University’s Randall Balestriello, attacks a question at the heart of the World Model program: When a machine learns a compact representation of raw observations, does that representation correspond to the actual hidden causes behind those observations, or simply to the statistical patterns that are cheapest to find?

LeJEPA, an architecture introduced by Balestriero and LeCun in November 2025, combines a predictive alignment objective with a Gaussian regularizer called SIGReg. A new paper proves this combination achieves what mathematicians say linear identifiability: Given messy, nonlinear observations, such as raw pixels, sensor feeds, or arbitrary high-dimensional inputs, LeJEPA recovers the true underlying variables, such as object position, velocity, and orientation, up to linear rotation. Under the conditions described in this paper, the architecture does more than just learn useful shortcuts. Learn the actual structure of the world that generated your data.

The paper signature results include an “if and only if” format. Within the class of worlds where the latent variables evolve under stationary additive noise dynamics, the Gaussian distribution is the unique distribution for which LeJEPA’s identifiability guarantee applies. The forward direction is based on a spectral argument using Hermitian polynomials, and nonlinearities of any order are severely penalized. Conversely, all non-Gaussian options are excluded. The fourth theorem extends the result to plans. Under the same conditions, a plan in the learned latent space produces the same actions and the same values as a plan in the true latent space. The proof is also formalized in the Lean 4 Proof Assistant, which goes beyond the conventions of a standard mathematics paper.

What does data exploration have to do with it?

The identifiability guarantee has practical benefits and should be read carefully by researchers and engineers building on JEPA. latent variable must be Gaussian and Data should be collected in a manner that approximates an isotropic (almost uniform) exploration of the state space. Any violation of these terms will weaken or eliminate the warranty.

The authors directly test this using a simulated two-joint robotic arm rendered to raw pixels. When arm configurations were sampled isotropically, i.e., when the joint angle space was explored uniformly, true angle recovery was nearly perfect, with an R² of approximately 0.95, as reported in the preprint. If the training data were instead obtained from a goal-directed reinforcement learning policy where the trajectories were concentrated in a narrow non-Gaussian region of space, the recovery never exceeded 0.5. A lesson for everyone building world models, including AMI Labs itself: How data is collected You can determine whether faithful learning is possible. The kind of goal-seeking behavior that most robot training pipelines rely on can quietly move data into areas where identifiability guarantees no longer apply.

AI World Model Benchmark: How do current models perform in practice?

If the theoretical paper maps a destination, the second preprint measures the current distance from the destination. “stable-worldmodel: A platform for reproducible world modeling research and evaluation,” led by Mira and Lucas Maes of the University of Montreal, and featuring 12 authors including LeCun and Balestriero, is an open-source benchmarking platform launched on May 20.

One of the reasons this platform was built was because the field had become fragmented to the point of unreliability. As the paper points out, one commonly used planning algorithm had been independently reimplemented in at least five recent papers, creating a recipe for undetected bugs and unparalleled results that undermined confidence in published benchmarks. The Stable World Model System, abbreviated as swm, provides a shared set of environments, a standardized data layer, and a suite of controlled perturbation tests that allow researchers to monitor what breaks when visual, geometric, or physical conditions change.

The verdict on the current benchmarked global model is straightforward: it remains fragile. This paper reports results across several major architectures, including models from the LeWorldModel family alongside DINO-WM and PLDM baselines. In a standard Push-T manipulation task (in which a simulated agent pushes an object into a target location), one model tested reported a success rate of approximately 50.8% under clean conditions. When the agent changed color, the success rate dropped to about 12%. When the background color changed, it dropped to about 6%. Adding a visual distractor square to the scene resulted in successful quadratic collapse for all tested baselines. All figures are from preprints and have not yet been independently reproduced.

The deeper discoveries go beyond the numbers in the headline. The swm experiment demonstrated that prediction error alone is an insufficient measure of plan success under distribution shifts. The error distributions of successful and unsuccessful plans overlapped significantly even under strong perturbations. This means that the model can accurately predict the next frame even if it has a fundamental misunderstanding of the task geometry. Standard benchmarks may give a model a high score if it relies on the background color rather than stable characteristics of the task.

The fragility of world models: Why data regimes link both papers.

When read together, the two papers accomplish something that neither could do alone. The identifiability results explain possible mechanisms for what the benchmarks observed. That is, the goal-oriented training data drifts precisely into the non-Gaussian region where the identifiability guarantee weakens. A model trained with reinforcement learning trajectories clustered around a narrow target region may learn accurate looking representations during training, but may fail under the visual distribution shifts that the SWM suite introduces. Although the relationship between theory and experience is not explicitly stated in either paper, as both are preprints and would require independent replication, the implications for research design are clear. Search strategy during training is not a secondary concern. It can be a prerequisite for meaningful world model learning.

The SWM team notes that closing the vulnerability gap will likely require both architectural advances and systematic scaling. Related theory papers also suggest that much more attention is needed to how machines are able to observe the world in the first place.

What does LeCun’s preprint mean for AMI Labs?

LeCun left Meta in November 2025 after 12 years as chief AI scientist, citing disagreements over the direction of the architecture. AMI Labs’ $1.03 billion seed round was raised in March 2026 at a pre-valuation of $3.5 billion, the largest seed round ever for a European startup, putting institutional capital behind the JEPA paper with backers including NVIDIA, Samsung, and Bezos Expeditions. CEO Alexandre LeBrun told TechCrunch at the time that the company expected it to take about a year to produce something suitable for real products, and would initially target healthcare, robotics, and industrial automation.

Neither paper proves that AMI Labs can create a deployable world model on that schedule. The identifiability result is formal but conditional. Benchmark results are empirical, but limited to a simulated environment and a set of existing baselines. What the two papers do together is clarify the object of study. This theorem specifies the data collection conditions under which actual learning is mathematically achievable. This benchmark specifies distributional robustness obstacles that must be overcome before mathematical achievability becomes practical reliability. This is a much more rigorous map of the problem than the one the field produced a week ago.

LeCun has spent years arguing that the AI industry is climbing the wrong mountain. These two preprints are among the most accurate surveys yet of how tall the correct one is.

FAQ

What is Yann LeCun’s theory of the world model? Why is it important?

LeCun argues that large-scale language models (which predict the next word in a sequence) are architecturally inadequate for real-world intelligence because they do not learn a model of how physical events cause each other. His alternative world model is built on the Joint Embedding Predictive Architecture (JEPA), which trains AI systems to predict abstract representations of future states from observations, with the goal of enabling causal inference and reliable planning. His Paris-based startup AMI Labs raised $1.03 billion in March 2026 to pursue this approach for robotics, healthcare, and industrial automation.

What does LeJEPA’s identifiability proof show?

A formal proof submitted to arXiv on May 25, 2026, shows that LeJEPA can recover the true hidden variables behind raw observations (a property called linear identifiability) if those variables follow a Gaussian distribution and evolve under stationary additive noise dynamics. The results will also lead to planning. Under the same conditions, a policy optimized in the learned latent space will produce the same decisions as one optimized in the real latent space. The proof is formalized in the Lean 4 Proof Assistant, giving it machine-checkable rigor beyond standard published papers.

Why do current world models fail so badly when visual details are changed?

A stable world model benchmark posted on arXiv on May 20, 2026 found that all world model architectures tested degrade rapidly with mild perturbations. Changing the agent or background color significantly reduced the success rate, and adding small visual distractions caused a quadratic collapse across all baselines. Even a model that accurately predicts the next frame can still plan poorly because it has learned to rely on extraneous visual features rather than task geometry. Companion theory research hints at the cause. This means that the goal-directed training data does not explore the state space extensively enough to maintain the representation within a regime where identifiability guarantees apply.

What is the relationship between AMI Labs and these research papers?

AMI Labs is a Paris-based startup co-founded by LeCun as Chairman and CEO Alexandre Leblanc runs day-to-day operations. The May 2026 preprint is a scholarly work by LeCun and his co-authors at Brown University, Cold Spring Harbor Laboratory, and Mila and is not a product announcement from AMI Research. Both papers are preprints and have not yet been peer-reviewed. Although these advance the basic science on which AMI Labs’ ultimate commercial activities depend, they represent basic research and are not engineering milestones toward a product.

Source link

Binance推荐代码 commented on Tell Us Your Thoughts on Saw X and The Creator: I don't think the title of your article matches th
binance Registrera dig commented on New Podcast Exploring A.I. and Business Travel: Thank you for your sharing. I am worried that I la
注册以获取100 USDT commented on Two divergent skills that matter in an AI world: Math and business development: Can you be more specific about the content of your
Linda Espey commented on Revolutionizing safety and seamless journeys: This was a fantastic and informative article! I re
skapa ett binance-konto commented on The humor of French slang: Thank you for your sharing. I am worried that I la

Benchmarks show current model is vulnerable

Formal proof: When JEPA actually learns the world model

AI World Model Benchmark: How do current models perform in practice?

The fragility of world models: Why data regimes link both papers.

What does LeCun’s preprint mean for AMI Labs?

FAQ

RECENT POSTS

UCSF Health unveils new model for AI in healthcare

SME founders let AI serve employees rather than replace them

New machine learning model improves accessibility of cholesterol calculations

Formal proof: When JEPA actually learns the world model

AI World Model Benchmark: How do current models perform in practice?

The fragility of world models: Why data regimes link both papers.

What does LeCun’s preprint mean for AMI Labs?

FAQ

Related Posts