SRE is not a monolithic role
SRE is gaining more traction and a misconception is gaining steam among senior stakeholders. That SRE is a monolith role like what “programmers” were in the 90s. Let’s burst that misconception…
SRE is a broad, overarching responsibility that needs a multitude of role considerations to pull off properly.
It is not a monolithic role where all SREs do pretty much the same thing. Like what programmers were in the 90s — they (supposedly) all pumped out code in similar strokes. Now we have front-end engineers, back-end engineers and everything in-between.
SRE is the same — a mélange of diverse role opportunities.
I will cover the nuances of SRE roles in more detail below.
No, SREs are not…
working one-size-fits-all roles — their scope of work will depend on the needs of the software systems they are responsible for e.g. more alerting if responsible for critical services
ops-on-steroids — a highly-skilled site reliability engineer should not get pigeonholed full-time into Sysadmin tasks like running Bash scripts or spooling VMs
stereotypical introverts — they are capable of being technicians and leaders with vocal contributions to areas like architecture, project management and team collaboration
able to offer turnkey SRE on their own — an individual may be able to “run SRE” for a smaller org with limited scope but won’t come close to the full scope of the SRE domain (it’s HUGE)
SREs have a wide scope of available work
More likely to call for T-shaped abilities where you are a specialist in a certain area but have a breadth of knowledge to be “dangerous enough” in many related areas
Solid SRE teams benefit from a combination of generalist and specialist engineers — you might have an SRE working across many responsibility areas while another may solely focus on Chaos
Roles may become more fluid in the future — SRE leaders may guide individual SREs toward broader responsibilities within 1-2 responsibility areas like performance engineering, QA etc.
SREs work on systems and software at the same time
Some SREs are systems pros with a reasonable ability to code their way out of trouble
Other SREs are code mavens who want to get their hands dirty with infrastructure work
While others still are neither and learn enough code to modify open-source tools to their needs and they learn enough systems to make sure IPv6 won’t (hopefully) make life harder
Whatever they do, SREs should spend at least half of their time on automation, move away from toil and otherwise improve systems (proactive) rather than respond to incidents (reactive)
SREs can be injected across the enterprise
SRE roles can be designed to embed into various levels of the enterprise. The SaFE Agile Framework is the most popular agile framework among mid-level and larger companies. Its steering group has already worked out how SRE can fit into the various levels.
I’ve broken down the roles and responsibilities below:
Service-level SREs
Entry-to-mid-level SREs who are responsible for a single service
Provide app-level support for critical software services
Implement tools and teach for more seamless DevOps
Own SLOs and error budgets for their service
System-level SREs
Senior SREs who help release train engineers manage multiple services
Coordinate multiple product streams in the release train
Guide system architecture and production readiness
Own SLOs and error-budget tracking across the system
Enterprise-level SREs
Most senior-level of SREs reporting direct to CTOs
Run SRE center of excellence (CoE) for the enterprise
Develop SRE platform and best practices
Architecture support
In conclusion…
Doing SRE well means the difference between high-performance software and painful 50x errors. So let’s get our SRE team roles done right and not mistake SRE as a monolith role.