Foundation Models - Basic Theory

The concept of foundation models - large pre-trained models adapted for many tasks.

Foundation models are large AI models pre-trained on broad, diverse datasets that can be adapted to a wide range of downstream tasks. The term was coined by Stanford's HAI center to describe a paradigm shift: instead of training a new model for each task, you train one massive model and then adapt it through fine-tuning, prompting, or few-shot learning.

GPT-4, Claude, Gemini, and Llama are all foundation models. Their power comes from scale — both in parameters and training data — which gives them general capabilities that can be directed toward specific applications without starting from scratch each time.

What Makes a Model "Foundational"

Broad pre-training on diverse data enables adaptation to virtually any downstream task — translation, coding, analysis, creative writing — without training a new model each time.

Generative AI

Pre-training at Scale

Foundation models are trained on trillions of tokens from books, web pages, code repositories, and scientific papers. This massive exposure creates general-purpose knowledge and language understanding.

Data to Model

Transfer Learning

Knowledge gained during pre-training transfers to specific tasks. A model trained on general text can be adapted for medical diagnosis, legal analysis, or code generation without starting from scratch.

Generalist vs Specialist Trade-offs

Foundation models are generalists — good at many things but not the best at any one. Task-specific fine-tuning creates specialists that excel in narrow domains at the cost of versatility.

Training & Fine-tuning

The Model Ecosystem

Base models → fine-tunes → distilled versions → API services. Each step in the chain trades generality for specificity, or size for speed, creating a rich ecosystem of model variants.

Model Optimization

Open Foundation Models

Llama 3, Qwen 2.5, Mistral, OLMo — anyone can download, run locally, fine-tune, and inspect these models. Open weights enable research, customization, and privacy-sensitive deployments.

The Big Players

Closed Foundation Models

GPT-4, Claude, Gemini — accessed only via API. These typically offer the highest performance but with less transparency and control. You pay per token and trust the provider.

API Providers

Fine-tuning for Domains

Adapting a foundation model for specific domains (medical, legal, coding) using curated datasets. This preserves general capabilities while dramatically improving domain-specific performance.

Training & Fine-tuning

Few-Shot Learning

Providing examples in the prompt to guide the model without any retraining. Foundation models can learn new tasks on-the-fly from just 2-5 examples — a capability that emerges at scale.

Prompt

Cost of Training

Frontier foundation models cost $100M+ to pre-train from scratch. But fine-tuning costs $100-$10K, and prompting is nearly free — the ecosystem makes foundation model capabilities accessible at every budget level.

SOTA