A new or growing SRE team. A copy of the book. A company that says it cares about reliability. What happens next? Usually… not much.
In this episode, I sit down with Dave O’Connor, a 16-year Google SRE veteran, to talk about what happens when organizations cargo-cult reliability practices without understanding the context they were born in.
You might know him for his self-deprecating wit and legendary USENIX blurb about being “complicit in the development of the SRE function.”
This one’s a treat — less “here’s a shiny new tool” and more “here’s what reliability actually looks like when you’ve seen it all.”
✨ No vendor plugs from Dave at all, just a good old-fashioned chat about what works and what doesn’t.
Here’s what we dive into:
The adoption trap: Why SRE efforts often fail before they begin—especially when new hires care more about reliability than the org ever intended.
The SRE book dilemma: Dave’s take on why following the SRE book chapter-by-chapter is a trap for most companies (and what to do instead).
The cost of “caring too much”: How engineers burn out trying to force reliability into places it was never funded to live.
You build it, you run it (but should you?): Not everyone’s cut out for incident command—and why pretending otherwise sets teams up to fail.
Buying vs. building: The real reason even conservative enterprises are turning into software shops — and the reliability nightmare that follows.
We also discuss the evolving role of reliability in organizations today, from being mistaken for “just ops” to becoming a strategic investment (when done right).
Dave's seen the waves come and go in SRE — and he's still optimistic. That alone is worth a listen.
Share this post