Reliability Enablers (SREpath)
Reliability Enablers
#47 How to Grow Team Impact Through Learning Culture
0:00
-28:37

#47 How to Grow Team Impact Through Learning Culture

This topic is not 100% focused on reliability BUT its ideas will 100% support efforts to improve your team's reliability work and value to the organization. Also, a bonus if you read all the way thru

The common refrain after an incident is “We could and should learn from this”.

To me, that alludes to the need for a robust learning culture.

We might think we already have a good learning culture because we talk about problems and deep-dive them into retrospectives.

But how often do we explore the nuances of how we are learning?

Sorrel Harriet is an expert in supporting software engineering teams to develop a stronger learning culture. She was a “Continuous Learning Lead” at Armakuni (software consultancy) and now does the same work under her own banner.

Her work ties in well with the ideas shared by Manuel Pais in episode #45 about how enabling teams can support a continuous learning culture.

We tackled issues like the value of certifications, comparing technical with non-technical skills, and more.

You can ⁠connect with Sorrel via LinkedIn

Learn more about what Sorrel does via LaaS.consulting


Here’s a bonus section because you read all this way. It covers 5 public outages and how the affected teams could improve their learning culture:

1. Slack Outage (February 2023)

Slack experienced a global outage disrupting communication for hours due to backend infrastructure issues. Perhaps the team could focus their learning on more robust infrastructure management and resilience improvement.

2. Twitter Algorithm Glitch (April 2023)

A glitch in Twitter's algorithm caused timeline issues, stemming from a problematic software update. Perhaps the team could focus their learning on thorough testing and game days to rectify critical system errors swiftly.

3. Microsoft Azure AD Outage (March 2023)

Azure Active Directory faced a significant outage due to an internal configuration change. Perhaps the team could focus their learning on the importance of rigorous change management and how to address misconfigurations quickly.

4. Google Cloud Platform Networking Issue (May 2023)

Google Cloud Platform experienced widespread service disruptions from a software bug in its networking infrastructure. Perhaps the team could focus their learning on the need for comprehensive testing and preventing disruptions.

5. GitHub Outage (June 2023)

GitHub suffered a major outage caused by a cascading failure in its storage infrastructure. Perhaps the team could focus their learning on robust fault-tolerance mechanisms and ways to address the root causes of failures.

Discussion about this podcast

Reliability Enablers (SREpath)
Reliability Enablers
Software reliability is a tough topic for engineers in many organizations. The Reliability Enablers (Ash Patel and Sebastian Vietz) know this from experience. Join us as we demystify reliability jargon like SRE, DevOps, and more. We interview experts and share practical insights. Our mission is to help you boost your success in reliability-enabling areas like observability, incident response, release engineering, and more.