Reliability Enablers (SREpath)
Reliability Enablers
Can ITIL Benefit from Site Reliability Engineering?
0:00
Current time: 0:00 / Total time: -29:23
-29:23

Can ITIL Benefit from Site Reliability Engineering?

I asked Dr Vladislav Ukis this question after noticing that ITIL people are checking reliability engineering out. Here's his answer...

According to Vlad Ukis, there are a lot of enterprises around whose IT functions are organized around ITIL. What you use SRE for is something completely different.

SRE is not for setting up the IT function. It is for enabling the product organization to operate online services reliably at scale.

However, the problem is that many in the industry are NOT using SRE principles but instead handing over complex services to a more traditional IT function.

Dr. Vladislav Ukis is well qualified to talk about reliability, being at Siemens Healthineers and leading 250 people globally to offer their cloud platform running off Microsoft Azure.

We discussed key concepts from his book, Establishing SRE Foundations: A Step-by-Step Guide to Introducing Site Reliability Engineering in Software Delivery Organizations.

Unlike other technical books in this field, Dr Ukis’ book is aimed at technology professionals who are beginners to the reliability journey.

This is different from the Site Reliability Engineering (2016) book by Google, which covers all the bells and whistles that SRE encompasses. That book requires a degree of prior knowledge and also prior experience in the field.

Vlad wanted to make it more accessible:

What I did with my book is to say, ‘Okay, so now you've never done operations, but you now are thrown in the world of online services where you have to operate them. How do you get started?’ So this is what the book is for. So for people who want to learn how to get started in the world of operating online services.

ITIL was originally developed by the UK government in the 80s to improve IT governance. It is best related to SRE through its service management and incident management components. But it’s for managing systems that are more predictable and can be handled through strict process control.

Modern product delivery doesn’t have the luxury of bureaucratic levels of predictability that older IT services have. It requires a more engineer-oriented approach to solving problems/incidents and providing services.

So how was Vlad’s experience bringing SRE into an organization that previously had run solely on the ITIL model?

Siemens Healthineers for many years operated like a traditional software development organization. In other words, they were developing on-prem software, not cloud software.

The company would ship the physical software product to its hospital customers and then those hospitals would have the software operated and supported by their IT departments.

The change came about when Siemens Healthineers began to work on a new digital health platform, which would be cloud-based from the beginning. So they would no longer ship physical software in discs to customers, but provide online services in the cloud centrally for the customers to use.

The early days were haphazardly done with the software deployed to the cloud with no major issues. Not many customers were on the cloud platform so the team could get away with “handcrafted operating procedures”.

But as traffic and service count started to rise rapidly, the Healthineers team learned that they needed a more professional approach. They began to understand that their initial approach to operations could not continue as-is.

This is when Vladislav began to drive SRE practices in the organization.

This was a sub-30-minute conversation that covered a lot of ground that would be relevant to the needs of organizations looking to transition to product delivery of online services at scale.

Have a listen.

Discussion about this podcast

Reliability Enablers (SREpath)
Reliability Enablers
Software reliability is a tough topic for engineers in many organizations. The Reliability Enablers (Ash Patel and Sebastian Vietz) know this from experience. Join us as we demystify reliability jargon like SRE, DevOps, and more. We interview experts and share practical insights. Our mission is to help you boost your success in reliability-enabling areas like observability, incident response, release engineering, and more.