AI systems that build internal representations of how the world works.
A world model is an AI system's internal representation of how the world works β enabling it to predict outcomes, plan actions, and reason about causality. Yann LeCun argues that current LLMs lack true world models and that building them is the key to achieving human-level AI. Without a world model, AI can only pattern-match on training data rather than truly "understand."
World models allow an agent to simulate "what would happen if..." before taking action β the foundation of planning and common sense. Humans do this constantly: you can predict that a glass will break if dropped, even if you have never seen that specific glass. Building this capability into AI is one of the grand challenges of the field.
What Is a World Model
An internal simulation of reality that enables prediction, planning, and causal reasoning. Humans have rich world models β we understand physics, social dynamics, and cause-effect intuitively. AI world models aim to replicate this.
LeCun's JEPA Architecture
Yann LeCun proposes Joint Embedding Predictive Architecture (JEPA) as the path to world models. Instead of predicting pixels, JEPA predicts abstract representations of future states. This avoids the complexity of pixel-level prediction.
Do LLMs Have World Models?
Heated debate: some research shows LLMs develop internal spatial and temporal representations (Othello-GPT, linear probes). Critics argue these are statistical patterns, not true understanding. The truth likely lies in between.
Video Prediction as World Modeling
Predicting future video frames requires understanding physics, object permanence, and causality. Sora (OpenAI), Runway Gen-3, and similar models demonstrate implicit physics understanding through video generation.
Physics Engines vs Learned Models
Traditional approach: hand-coded physics rules (Unity, Unreal). New approach: learned physics from data (neural physics engines). Hybrid: combining traditional physics with neural networks for robustness.
Planning with World Models
If you can simulate consequences, you can plan: try actions in simulation, observe predicted outcomes, choose the best. Model-based RL (MuZero, Dreamer) uses learned world models for efficient planning.
Common Sense Reasoning
Understanding that objects fall down, water is wet, and people have feelings β the "easy" things that are hardest for AI. World models are considered essential for common sense, which remains a major weakness of current AI.
Implicit vs Explicit World Models
Implicit: knowledge encoded in network weights (LLMs may have this). Explicit: a separate, queryable model of the world. LeCun and others argue explicit world models are needed for robust reasoning and planning.
Multimodal World Models
True world models must integrate vision, language, sound, and physical interaction. A world model that only processes text cannot understand physics. Multimodal approaches (Gemini, GPT-4V) move toward integrated understanding.
The Path Forward
Combining LLM reasoning with learned physics, embodied experience, and abstract representation learning. World models may not arrive as a single breakthrough but as a gradual integration of capabilities across AI systems.
World ModelAI's internal representation of reality enabling prediction, planning, and causal reasoning.
JEPAJoint Embedding Predictive Architecture β LeCun's proposed approach for building world models that predict abstract states.
Common SenseIntuitive understanding of everyday physics, social dynamics, and cause-effect relationships β still a major AI challenge.
Model-Based RLReinforcement learning using a learned world model to simulate and plan actions before taking them.