Constitutional AI

Published December 2022. A method for training AI systems to be helpful and harmless using a set of principles (a "constitution") rather than human feedback on individual outputs. The model critiques and revises its own outputs based on these principles.

Scaling Laws

Co-authored research at OpenAI (2020) showing predictable relationships between model size, dataset size, compute, and performance. These findings influenced the development of GPT-3 and subsequent large language models.

AI Safety Research

Anthropic's research agenda includes interpretability (understanding what models are doing internally), evaluations (measuring dangerous capabilities), and alignment (making models behave as intended).

Interpretability

Mechanistic analysis of neural network internals. Published work on sparse autoencoders and feature visualization.

Evaluations

Testing for dangerous capabilities: biosecurity, cybersecurity, deception, persuasion.

Alignment

Techniques to ensure models follow human intent and refuse harmful requests.

Claude

Anthropic's AI assistant, first released March 2023. Current versions include Claude 4 (February 2026), Claude 3.5 Sonnet, and Claude 3 Opus. Available via API and consumer products (claude.ai).