Advancing Technology
Advancing Technology

The Barrier LLMs Cannot Cross

Listen to the article

People have built models that can write legal briefs, debug code, and explain quantum mechanics in the voice of a patient schoolteacher. Ask the same model to predict what happens when a glass slides off the edge of a table, and things get interesting. It knows the word “gravity.” It has read thousands of descriptions of objects falling. But it has never felt the pull.

This is not a minor gap. It is the central tension in artificial intelligence right now, and it is reshaping where the smartest money and the boldest researchers are placing their bets.

What language models actually do

Large language models, such as GPT, Claude, Gemini, and their relatives, are trained to predict what comes next in a sequence of text. They do this extraordinarily well. They have absorbed more written knowledge than any human could read in a thousand lifetimes, and they can recombine it with a fluency that still startles even the people who built them. Regarding tasks that live inside the world of words, whether writing, translating, explaining, or coding, they have already changed how most of us work.¹

That said, their power comes from pattern-matching across text. They capture the world as described by humans, not the world as it actually behaves. They have read about gravity, friction, momentum, and decay. They have never experienced any of it. The distinction sounds philosophical. It is, in fact, an engineering problem. And it is becoming urgent.

The man who is betting everything on it

Yann LeCun spent a decade at Meta building some of the most advanced AI research programs on the planet. Then, in November 2025, he left to start his own company. Advanced Machine Intelligence Labs, based in Paris, raised over one billion dollars before shipping a single product. A record initial funding round for a European AI company.²

The bet is specific. LeCun believes that a system trained only on language will never achieve robust, general intelligence, no matter how much data or compute we pour into it. His alternative is what researchers call a world model: an AI that learns not by reading descriptions of the world but by building an internal simulation of how it works. Given a situation and a possible action, a world model predicts what happens next. Not in words. In the dynamics of cause and effect.³

In 2022, LeCun laid out the full blueprint in a widely discussed position paper.⁴ In 2023, Meta released I-JEPA, the first working prototype built on his architecture.⁵ Then in March 2026, his team published LeWorldModel, which trains stably from raw video using a remarkably simple setup: about 15 million parameters, runnable on a single GPU, planning actions up to 48 times faster than foundation-model-based alternatives.⁶

From manifesto to working system in four years. That is not a research curiosity. That is a trajectory.

What world models actually learn

The difference is easier to grasp with an example. Ask a language model what happens when a ball rolls off a table. It will produce a competent answer: gravity, acceleration, impact. It assembled this from text it has read. A world model would instead simulate the trajectory internally, testing what happens under different conditions, in something closer to how a child learns physics. Not by reading about it, but by watching objects move, bounce, and break, over and over, until the pattern becomes intuitive.

This matters most when the AI needs to act, not just talk. A robot navigating a warehouse. A self-driving car anticipating what the cyclist ahead will do next. A planning system simulating what happens to a supply chain if a port closes for two weeks. In all of these cases, the system needs to reason about consequences in physical or dynamic environments, territory where language models remain, at best, eloquent guessers.

Fei-Fei Li, who helped launch an earlier revolution in computer vision, has been making a version of the same argument. We need to move from large language models to large world models, she argues, systems that can reason about space, objects, and three-dimensional movement.⁷ The kind of intelligence that lets a system interact with the physical world rather than describe it.

The evidence building up

The case is no longer theoretical. DreamerV3, published in Nature in April 2025, showed that a world-model-based agent can outperform specialized algorithms across more than 150 diverse tasks, from robot manipulation to Minecraft, using a single set of hyperparameters.⁸ It learns by imagining future scenarios rather than relying on real-world trial and error, which makes it both safer and dramatically more data-efficient. More often than not, it outperformed systems that had been hand-tuned for each specific task.

Also worth noting is what is happening in autonomous driving. XPeng released X-Cache in May 2026, a training-free accelerator for its X-World simulation system that boosts inference speed by 2.7 times without retraining.⁹ The company now runs daily simulations equivalent to 30 million kilometers of real-world driving, testing its models against scenarios that would be dangerous or impossible to reproduce on actual roads: a pedestrian darting into traffic, hesitation during a lane change, an obstacle appearing where no obstacle should be.

These are not conference demos. They are engineering systems running in production. That distinction matters more than any benchmark.

Pliny’s problem

There is a historical parallel worth holding in mind. Pliny the Elder compiled the Naturalis Historia, the most comprehensive encyclopedia of the ancient world: thirty-seven volumes covering everything from astronomy to zoology, drawn from two thousand sources. He knew more about the physical world, as recorded in text, than anyone else alive in the first century. When Vesuvius erupted in 79 AD, Pliny sailed toward the volcano, partly to rescue people and partly out of scholarly curiosity. He died on the shore, overcome by the very gases he could have described in meticulous Latin prose.

The most well-read man in Rome had never needed to outrun a volcano before.

The parallel is not perfect. They never are. But the structure rhymes. Language models have read more about the physical world than any system ever built. They remain, by design, unable to reason about it the way a two-year-old can. And this is not a gap that more reading will close.

What comes next

The likely future is not world models replacing language models but the two becoming layers of the same architecture. Language models stay as the conversational front end, the part that explains, translates, and communicates with us. World models become the core that understands physics, plans actions, and simulates consequences before committing to them. LeCun’s own blueprint explicitly includes a language module as an interface, not as the engine of intelligence.⁴

In 2026, world models remain far less mature than LLMs in ecosystem, tooling, and standardization. They are advancing fast in robotics and autonomous driving, but generalizing them across the breadth of domains where language models already operate is still an open research problem. The first foundation-scale world models trained on rich multimodal data are only now appearing.

Hence a detail worth pausing on: the largest single bet on this future is being made from Paris. LeCun is French. AMI Labs is European. For a continent that has spent the better part of a decade writing regulations about AI it did not build, this is, to put it mildly, a welcome change of pace.

Whether world models deliver on their promise or reveal limits we have not yet anticipated, the structural critique they represent is not going away. Reading the world, however brilliantly, is not the same as understanding it. Every new deployment in robotics and autonomous driving makes this harder to argue with. And harder to ignore for those still betting everything on the next trillion tokens.

Pliny could describe volcanic gases in exquisite detail. It did not save him from breathing them.


References

  1. Nature (2026). ‘World models’ are AI’s latest sensation. https://www.nature.com/articles/d41586-026-00820-5
  2. LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence. https://openreview.net/forum?id=BZ5a1r-kVsf
  3. LeCun, Y. (2022). A Path Towards Autonomous Machine Intelligence. https://openreview.net/forum?id=BZ5a1r-kVsf
  4. Meta AI (2023). I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI. https://ai.meta.com/blog/yann-lecun-ai-model-i-jepa/
  5. Maes, L. et al. (2026). LeWorldModel: Stable End-to-End Joint-Embedding Predictive Architecture from Pixels. arXiv:2603.19312. https://arxiv.org/abs/2603.19312
  6. Li, F.-F. (2025). From Words to Worlds: Spatial Intelligence is AI’s Next Frontier (referenced in multiple 2025–2026 reports). https://drfeifei.substack.com/p/from-words-to-worlds-spatial-intelligence
  7. Hafner, D. et al. (2025). Mastering diverse control tasks through world models. Nature, 640, 647–653. https://www.nature.com/articles/s41586-025-08744-2
  8. CleanTechnica (2026). XPeng Unveils the “World Model Accelerator” X-Cache. https://cleantechnica.com/2026/05/07/xpeng-unveils-the-world-model-accelerator-x-cache-which-requires-no-training-is-plug-and-play-and-boosts-inference-speed-by-2-7-times/

Stay in the loop

New essays on AI, technology, and society, delivered when they matter.

Powered by Buttondown

Mythos and the New Class System in Frontier AI

Prev

Running to Stand Still (Part I): The Race

Next