AI understanding of 3D space, physics, and physical world.
Spatial intelligence is AI's ability to understand and reason about the three-dimensional physical world β perceiving space, predicting physical interactions, navigating environments, and manipulating objects. While LLMs excel at language, spatial intelligence addresses the gap between digital and physical understanding.
This field is critical for robotics, autonomous vehicles, AR/VR, and any application where AI must interact with the physical world. Recent advances in 3D generation, physics simulation, and embodied AI are rapidly closing the gap between AI's language abilities and its understanding of physical space.
What Is Spatial Intelligence
The ability to understand 3D structure, spatial relationships, physics, and physical causality. Humans do this naturally β AI must learn it from data. Fei-Fei Li calls it "the next frontier of AI."
Computer Vision to 3D Understanding
Evolution from 2D image recognition (ImageNet era) to 3D scene understanding. NeRF and Gaussian Splatting reconstruct 3D scenes from photos. Depth estimation, object pose detection, and scene graphs map spatial relationships.
3D Generation
AI generating 3D objects and scenes from text or images. Point-E, Shap-E (OpenAI), DreamFusion (Google), Meshy, and others create 3D assets. Applications: gaming, architecture, product design, virtual worlds.
Physics Simulation
AI learning physical dynamics: how objects fall, collide, deform, and interact. Differentiable physics engines combine traditional simulation with neural networks. Enables more realistic prediction of physical outcomes.
Robotics and Embodied AI
Robots that perceive, understand, and act in physical space. Foundation models for robotics (RT-2, Octo) use language-vision-action training. The challenge: bridging the sim-to-real gap between virtual training and physical deployment.
Autonomous Navigation
Self-driving cars, drones, and delivery robots navigating complex 3D environments. Combines perception (cameras, LiDAR), mapping (SLAM), planning (path optimization), and real-time decision making.
World Models for Spatial AI
AI systems that build internal models of how the physical world works. Predict "what happens next" in physical scenarios. Critical for planning physical actions and understanding consequences before acting.
AR/VR and Spatial Computing
Apple Vision Pro, Meta Quest, and spatial computing platforms need AI that understands 3D space. Real-time object recognition, scene understanding, hand tracking, and spatial anchoring all require spatial intelligence.
Multimodal Spatial Understanding
Combining language with spatial reasoning: "put the cup on the table to the left of the book." Requires grounding language in 3D space. Models like SpatialVLM and 3D-LLM bridge language and spatial understanding.
The Gap to Close
A 2-year-old child has better spatial understanding than the most advanced AI. Closing this gap requires both better architectures (world models, embodied training) and better data (real-world interaction at scale).
NeRFNeural Radiance Fields β AI technique for reconstructing photorealistic 3D scenes from 2D photos.
Embodied AIAI systems with physical bodies (robots) that learn through real-world interaction.
SLAMSimultaneous Localization and Mapping β building a map of an environment while tracking location within it.
Sim-to-RealTransferring AI skills learned in simulation to real-world physical environments.