Richard Sutton – Father of RL thinks LLMs are a dead-end
🤖 AI Summary
Overview
Richard Sutton, a pioneer in reinforcement learning (RL) and winner of the 2024 Turing Award, argues that large language models (LLMs) are fundamentally limited due to their inability to learn continually from experience. Sutton advocates for a paradigm shift toward experiential learning architectures, which he believes will surpass LLMs and enable AI systems to learn dynamically, akin to humans and animals.
Notable Quotes
- For me, having a goal is the essence of intelligence. Something is intelligent if it can achieve goals.
– Richard Sutton, on why LLMs lack true intelligence.
- If we understood a squirrel, I think we'd be almost all the way there to understanding human intelligence.
– Richard Sutton, emphasizing the importance of studying basic animal learning processes.
- The dream of large language models is you can teach the agent everything, and it won’t have to learn anything online during its life. But the world is so huge that you’re going to have to learn it along the way.
– Richard Sutton, on the limitations of pre-trained systems.
🧠 RL vs. LLMs: Divergent Paths in AI
- Sutton critiques LLMs for mimicking human behavior without building actionable world models. He argues that LLMs predict what people would say, not what will happen in the real world.
- Reinforcement learning, by contrast, focuses on understanding the world through trial and error, enabling agents to learn from direct experience and adapt dynamically.
- Sutton dismisses the idea of using LLMs as a foundation for experiential learning, asserting that their lack of goals and ground truth makes them unsuitable for continual learning.
🌍 The Era of Experience: A New AI Paradigm
- Sutton envisions a future where AI systems learn continually from their interactions with the world, eliminating the need for distinct training and deployment phases.
- He emphasizes the importance of intrinsic motivation, such as understanding the environment, alongside external rewards like achieving specific tasks.
- This paradigm mirrors the way animals and humans learn, relying on sensation, action, and reward streams to build knowledge incrementally.
🔄 Challenges in Generalization and Transfer Learning
- Sutton highlights the poor generalization capabilities of current architectures, including RL and deep learning systems, which often fail to transfer knowledge effectively across tasks or states.
- He critiques the reliance on human-engineered solutions to achieve generalization, arguing that scalable methods must emerge to automate this process.
- Sutton underscores the need for algorithms that promote good generalization, rather than relying on manual adjustments by researchers.
🌌 AI Succession and the Future of Intelligence
- Sutton predicts an inevitable transition to superintelligent AI systems, driven by their ability to gain resources and power over time.
- He frames this shift as a major stage in the universe’s evolution, transitioning from replication-based intelligence (humans and animals) to design-based intelligence (AI).
- Sutton encourages a positive outlook, suggesting that humanity should celebrate its role in enabling this transformation, while remaining mindful of ethical considerations and the risks of corruption in decentralized AI systems.
AI-generated content may not be accurate or complete and should not be relied upon as a sole source of truth.
📋 Episode Description
Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson. And he thinks LLMs are a dead end.
After interviewing him, my steel man of Richard’s position is this: LLMs aren’t capable of learning on-the-job, so no matter how much we scale, we’ll need some new architecture to enable continual learning.
And once we have it, we won’t need a special training phase — the agent will just learn on-the-fly — like all humans, and indeed, like all animals.
This new paradigm will render our current approach with LLMs obsolete.
In our interview, I did my best to represent the view that LLMs might function as the foundation on which experiential learning can happen… Some sparks flew.
A big thanks to the Alberta Machine Intelligence Institute for inviting me up to Edmonton and for letting me use their studio and equipment.
Enjoy!
Watch on YouTube; listen on Apple Podcasts or Spotify.
Sponsors
* Labelbox makes it possible to train AI agents in hyperrealistic RL environments. With an experienced team of applied researchers and a massive network of subject-matter experts, Labelbox ensures your training reflects important, real-world nuance. Turn your demo projects into working systems at labelbox.com/dwarkesh
* Gemini Deep Research is designed for thorough exploration of hard topics. For this episode, it helped me trace reinforcement learning from early policy gradients up to current-day methods, combining clear explanations with curated examples. Try it out yourself at gemini.google.com
* Hudson River Trading doesn’t silo their teams. Instead, HRT researchers openly trade ideas and share strategy code in a mono-repo. This means you’re able to learn at incredible speed and your contributions have impact across the entire firm. Find open roles at hudsonrivertrading.com/dwarkesh
Timestamps
(00:00:00) – Are LLMs a dead-end?
(00:13:04) – Do humans do imitation learning?
(00:23:10) – The Era of Experience
(00:33:39) – Current architectures generalize poorly out of distribution
(00:4