Reliability Enablers (SREpath)
Reliability Enablers
#37 An SRE Approach to Managing Technology Risk

#37 An SRE Approach to Managing Technology Risk

This episode covers Chapter 3 of the Site Reliability Engineering book (2016). In this first part, we talk about embracing risk from the SRE perspective.

We'll cover how it's very different to the typical IT risk management mindset.

Here are key takeaways from our conversation:

  1. Embrace Risk with Velocity: Rather than being hindered by traditional governance models and change approval boards, consider embracing risk while maintaining development velocity. Strive to find a balance between risk management and the speed of innovation.

  2. Reevaluate Risk Management Approaches: Challenge traditional approaches to risk management, especially in larger organizations with extensive governance procedures. Explore alternative methods that prioritize agility and efficiency without compromising reliability.

  3. Conceptualize Risk as a Continuum: View risk as a continuous spectrum and assess it based on various dimensions, such as the complexity of changes, the criticality of systems, and the impact on user experience. Continuously evaluate and adjust risk management strategies accordingly.

  4. Balance Stability and Innovation: Recognize that extreme reliability comes at a cost and may hinder the pace of innovation. Aim for an optimal balance between stability and innovation, prioritizing user satisfaction and efficient service operations.

  5. Implement Service-Level Objectives (SLOs): Deliver services with explicitly delineated levels of service, allowing clients to make informed risk and cost trade-offs when building their systems. Define SLOs based on the importance and criticality of services to enable better decision-making.

  6. Visualize Risk Assessment: Utilize visual representations, such as whiteboard diagrams, to assess and communicate different levels of risk within your software systems. Encourage collaborative discussions among team members to determine acceptable risk levels.

  7. Prioritize Customer Impact: Consider the impact of changes on customer experience and prioritize risk management efforts accordingly. Differentiate between critical user journeys and cosmetic changes to allocate scrutiny appropriately.

Reliability Enablers (SREpath)
Reliability Enablers
Software reliability is a tough topic for engineers in many organizations. The Reliability Enablers (Ash Patel and Sebastian Vietz) know this from experience. Join us as we demystify reliability jargon like SRE, DevOps, and more. We interview experts and share practical insights. Our mission is to help you boost your success in reliability-enabling areas like observability, incident response, release engineering, and more.