General World Model - Basic Theory

A world model is an AI system's internal representation of how the world works — enabling it to predict outcomes, plan actions, and reason about causality. Yann LeCun argues that current LLMs lack true world models and that building them is the key to achieving human-level AI. Without a world model, AI can only pattern-match on training data rather than truly "understand."

World models allow an agent to simulate "what would happen if..." before taking action — the foundation of planning and common sense. Humans do this constantly: you can predict that a glass will break if dropped, even if you have never seen that specific glass. Building this capability into AI is one of the grand challenges of the field.

What Is a World Model

An internal simulation of reality that enables prediction, planning, and causal reasoning. Humans have rich world models — we understand physics, social dynamics, and cause-effect intuitively. AI world models aim to replicate this.

LeCun's JEPA Architecture

Yann LeCun proposes Joint Embedding Predictive Architecture (JEPA) as the path to world models. Instead of predicting pixels, JEPA predicts abstract representations of future states. This avoids the complexity of pixel-level prediction.

Foundation Models

Do LLMs Have World Models?

Heated debate: some research shows LLMs develop internal spatial and temporal representations (Othello-GPT, linear probes). Critics argue these are statistical patterns, not true understanding. The truth likely lies in between.

Reasoning

Video Prediction as World Modeling

Predicting future video frames requires understanding physics, object permanence, and causality. Sora (OpenAI), Runway Gen-3, and similar models demonstrate implicit physics understanding through video generation.

Generative AI

Physics Engines vs Learned Models

Traditional approach: hand-coded physics rules (Unity, Unreal). New approach: learned physics from data (neural physics engines). Hybrid: combining traditional physics with neural networks for robustness.

Planning with World Models

If you can simulate consequences, you can plan: try actions in simulation, observe predicted outcomes, choose the best. Model-based RL (MuZero, Dreamer) uses learned world models for efficient planning.

Agents

Common Sense Reasoning

Understanding that objects fall down, water is wet, and people have feelings — the "easy" things that are hardest for AI. World models are considered essential for common sense, which remains a major weakness of current AI.

Implicit vs Explicit World Models

Implicit: knowledge encoded in network weights (LLMs may have this). Explicit: a separate, queryable model of the world. LeCun and others argue explicit world models are needed for robust reasoning and planning.

Multimodal World Models

True world models must integrate vision, language, sound, and physical interaction. A world model that only processes text cannot understand physics. Multimodal approaches (Gemini, GPT-4V) move toward integrated understanding.

Multimodality

The Path Forward

Combining LLM reasoning with learned physics, embodied experience, and abstract representation learning. World models may not arrive as a single breakthrough but as a gradual integration of capabilities across AI systems.

AGI