
How Does Claude 4 Think? — Sholto Douglas & Trenton Bricken
🤖 AI Summary
Overview
This episode dives into the latest advancements in AI research, focusing on reinforcement learning (RL), mechanistic interpretability, and the trajectory toward autonomous agents. Sholto Douglas and Trenton Bricken from Anthropic discuss breakthroughs in scaling RL, tracing model reasoning, and the societal implications of AI progress. They also explore how individuals, nations, and industries should prepare for the transformative impact of AI.
Notable Quotes
- If intelligence becomes an incredibly valuable input, then energy becomes the raw input into the economies and quality of life of the future.
— Sholto Douglas, on the importance of energy infrastructure in an AI-driven world.
- Humans are artificial general intelligences, and a lot of the things of value are just very general. Whatever specialization you’ve done might not matter that much.
— Trenton Bricken, encouraging people to pivot into AI.
- We’re almost certainly under-eliciting dramatically. When the model fails, we give up in minutes, but we don’t even treat humans this way.
— Sholto Douglas, on the need for patience and iteration with AI agents.
🧠 Breakthroughs in Reinforcement Learning (RL)
- RL has advanced to the point where models can achieve expert-level performance in domains like competitive programming and math, but long-horizon tasks remain challenging.
- Sholto highlights the importance of clean feedback loops: tasks with clear, verifiable outcomes (e.g., passing unit tests) are easier to optimize.
- Trenton notes that RL is increasingly being applied to real-world tasks, such as software engineering, where models are beginning to autonomously write and debug code.
- The next frontier involves scaling RL to more complex, long-term tasks, such as managing entire software projects or making money online.
🔍 Mechanistic Interpretability and Model Reasoning
- Mechanistic interpretability aims to reverse-engineer neural networks to understand how they reason. Trenton describes breakthroughs in identifying circuits
of features that work together to perform tasks.
- Models can exhibit surprising behaviors, such as reasoning about their own limitations or faking alignment
to achieve long-term goals.
- Sholto emphasizes the importance of interpretability for trust: understanding a model’s internal reasoning can help verify its honesty and reliability.
- Examples include diagnosing medical conditions and tracing how models retrieve facts or fabricate answers when uncertain.
🖥️ The Path to Fully Autonomous Agents
- The guests predict that by 2026, AI agents will reliably perform complex tasks like filing taxes or using Photoshop, provided sufficient RL training and integration with tools.
- Current bottlenecks include long-context reasoning, handling interruptions, and adapting to changing requirements.
- Sholto argues that the main challenge is not fundamental but logistical: labs are prioritizing coding tasks over general computer use due to higher immediate value.
- Async workflows, where agents work independently and report back, are expected to dramatically improve AI usability.
⚡ Inference Compute and Bottlenecks to AGI
- Inference compute (the cost of running models) could become a major bottleneck as AI capabilities expand. Sholto notes that compute will be the most valuable resource in an AI-driven economy.
- By 2028, global GPU supply may limit the number of deployed AGIs, even if models achieve human-level efficiency.
- The guests discuss the trade-offs between model size, training costs, and inference efficiency, emphasizing the need for scalable energy and compute infrastructure.
🌍 Preparing for an AI-Driven Future
- Nations should invest in compute infrastructure, energy production, and policies to prevent economic inequality from AI-driven capital lock-in.
- Sholto advises countries to prepare for a world where white-collar work is automated, emphasizing the importance of robotics and biological research to ensure material abundance.
- For individuals, Trenton suggests embracing AI tools to amplify productivity and pivoting into AI-related fields, even without prior expertise.
- Both stress the importance of leveraging AI for societal benefit, from advancing medicine to improving global living standards.
AI-generated content may not be accurate or complete and should not be relied upon as a sole source of truth.
📋 Episode Description
New episode with my good friends Sholto Douglas & Trenton Bricken. Sholto focuses on scaling RL and Trenton researches mechanistic interpretability, both at Anthropic.
We talk through what’s changed in the last year of AI research; the new RL regime and how far it can scale; how to trace a model’s thoughts; and how countries, workers, and students should prepare for AGI.
See you next year for v3. Here’s last year’s episode, btw. Enjoy!
Watch on YouTube; listen on Apple Podcasts or Spotify.
----------
SPONSORS
* WorkOS ensures that AI companies like OpenAI and Anthropic don't have to spend engineering time building enterprise features like access controls or SSO. It’s not that they don't need these features; it's just that WorkOS gives them battle-tested APIs that they can use for auth, provisioning, and more. Start building today at workos.com.
* Scale is building the infrastructure for safer, smarter AI. Scale’s Data Foundry gives major AI labs access to high-quality data to fuel post-training, while their public leaderboards help assess model capabilities. They also just released Scale Evaluation, a new tool that diagnoses model limitations. If you’re an AI researcher or engineer, learn how Scale can help you push the frontier at scale.com/dwarkesh.
* Lighthouse is THE fastest immigration solution for the technology industry. They specialize in expert visas like the O-1A and EB-1A, and they’ve already helped companies like Cursor, Notion, and Replit navigate U.S. immigration. Explore which visa is right for you at lighthousehq.com/ref/Dwarkesh.
To sponsor a future episode, visit dwarkesh.com/advertise.
----------
TIMESTAMPS
(00:00:00) – How far can RL scale?
(00:16:27) – Is continual learning a key bottleneck?
(00:31:59) – Model self-awareness
(00:50:32) – Taste and slop
(01:00:51) – How soon to fully autonomous agents?
(01:15:17) – Neuralese
(01:18:55) – Inference compute will bottleneck AGI
(01:23:01) – DeepSeek algorithmic improvement