Most teams talk about reliability with a margin for error. “What’s our SLO? What’s our budget for failure?”
But in the energy sector? There is no acceptable downtime. Not even a little.
In this episode, I talk with Wade Harris, Director of FAST Engineering in Australia, who’s spent 15+ years designing and rolling out monitoring and control systems for critical energy infrastructure like power stations, solar farms, SCADA networks, you name it.
What makes this episode different is that Wade isn’t a reliability engineer by title, but it’s baked into everything his team touches. And that matters more than ever as software creeps deeper into operational technology (OT), and the cloud tries to stake its claim in critical systems.
We cover:
Why 100% uptime is the minimum bar, not a stretch goal
How the rise of renewables has increased system complexity — and what that means for monitoring
Why bespoke integration and SCADA spaghetti are still normal (and here to stay)
The reality of cloud risk in critical infrastructure (“the cloud is just someone else’s computer”)
What software engineers need to understand if they want their products used in serious environments
This isn’t about observability dashboards or DevOps rituals. This is reliability when the lights go out and people risk getting hurt if you get it wrong.
And it’s a reminder: not every system lives in a feature-driven world. Some systems just have to work. Always. No matter what.
Share this post