
🤖 AI Summary
Overview
This episode dives into a massive internet outage caused by Google Cloud Platform (GCP), exploring the technical missteps, the broader implications for Google's reputation, and the role of AI and human error in software development.
Notable Quotes
- AI is now writing well over 30% of Google's code, and so many people immediately jump to blame Gemini.
– On the growing role of AI in coding and its potential pitfalls.
- An outage of this magnitude can cost companies millions upon millions of dollars, but the damage to Google's reputation might be even worse.
– On the financial and reputational stakes of cloud service failures.
🖥️ The Internet Outage and Its Scale
- A bad code deployment by Google Cloud Platform caused widespread outages, affecting major services like Snapchat, Spotify, Discord, Gmail, and Google Drive.
- Cloudflare's Workers KV service also experienced nearly 100% error rates, creating a domino effect across the internet.
- The outage lasted over four hours, with significant disruptions to businesses and users worldwide.
📉 Repercussions for Google Cloud
- Google Cloud, already trailing behind AWS and Azure in market share, faces reputational damage that could further hinder its growth.
- Companies affected by the outage may claim SLA (Service Level Agreement) credits, costing Google millions in refunds.
- The incident highlights the risks of centralized cloud infrastructure and the immense power these providers hold over the internet.
🧑💻 The Technical Breakdown
- The issue stemmed from a new quota policy check added on May 29, 2025, which introduced a dormant bug due to a lack of proper error handling.
- The bug caused a null pointer exception, leading to a crash loop in the API management service when a policy change was triggered on June 12.
- Despite having a rollback mechanism, it took 40 minutes to initiate and four hours to stabilize the system, underscoring the challenges of mitigating such failures.
🤖 The Role of AI and Human Error
- Speculation arose about whether Google's AI, Gemini, was responsible for the faulty code, given that AI now writes over 30% of Google's code.
- However, the critical code was likely written by a human, emphasizing the importance of rigorous testing and error handling in software engineering.
- The incident serves as a cautionary tale about the interplay between AI-generated and human-written code in high-stakes systems.
🌐 Lessons for the Tech World
- The outage underscores the need for robust staging environments and comprehensive testing to catch dormant bugs before deployment.
- It also highlights the importance of having efficient rollback mechanisms to minimize downtime during crises.
- For businesses relying on cloud providers, the incident is a reminder to diversify infrastructure and prepare for potential outages.
AI-generated content may not be accurate or complete and should not be relied upon as a sole source of truth.
📋 Video Description
Build better apps with PostHog https://posthog.com/fireship
Last week, Google Cloud Platform managed to take down a large chunk of the internet by pushing some bad code into production. In today's video, we'll find out exactly what happened from a software engineering perspective.
#tech #coding #programming
💬 Chat with Me on Discord
https://discord.gg/fireship
🔗 Resources
https://techcrunch.com/2025/06/12/google-cloud-outage-brings-down-a-lot-of-the-internet/
🔥 Get More Content - Upgrade to PRO
Upgrade at https://fireship.io/pro
Use code YT25 for 25% off PRO access
🎨 My Editor Settings
- Atom One Dark
- vscode-icons
- Fira Code Font
🔖 Topics Covered
- How Google Cloud caused an internet outage
- The repercussions for Google
- How the GCP outage happened