Reliability Enablers (SREpath)
Reliability Enablers
#60 How to NOT fail in Platform Engineering
0:00
Current time: 0:00 / Total time: -30:33
-30:33

#60 How to NOT fail in Platform Engineering

Ankit Wal from ThoughtWorks Asia Pacific gave me the inside word on this hot topic

Here’s what we covered:

Defining Platform Engineering

  • Platform engineering: Building compelling internal products to help teams reuse capabilities with less coordination.

  • Cloud computing connection: Enterprises can now compose platforms from cloud services, creating mature, internal products for all engineering personas.

Ankit’s career journey

  • Didn't choose platform engineering; it found him.

  • Early start in programming (since age 11).

  • Transitioned from a product engineer mindset to building internal tools and platforms.

  • Key experience across startups, the public sector, unicorn companies, and private cloud projects.

Singapore Public Sector Experience

  • Public sector: Highly advanced digital services (e.g., identity services for tax, housing).

  • Exciting environment: Software development in Singapore’s public sector is fast-paced and digitally progressive.

Platform Engineering Turf Wars

  • Turf wars: Debate among DevOps, SRE, and platform engineering.

    • DevOps: Collaboration between dev and ops to think systemically.

    • SRE: Operations done the software engineering way.

    • Platform engineering: Delivering operational services as internal, self-service products.

Dysfunctional Team Interactions

  • Issue: Requiring tickets to get work done creates bottlenecks.

    • Ideal state: Teams should be able to work autonomously without raising tickets.

    • Spectrum of dysfunction: From one ticket for one service to multiple tickets across teams leading to delays and misconfigurations.

Quadrant Model (Autonomy vs. Cognitive Load)

  • Challenge: Balancing user autonomy with managing cognitive load.

  • Goal: Enable product teams with autonomy while managing cognitive load.

  • Solution: Platforms should abstract unnecessary complexity while still giving teams the autonomy to operate independently.

    How it pans out

    • Low autonomy, low cognitive load: Dependent on platform teams but a simple process.

    • Low autonomy, high cognitive load: Requires interacting with multiple teams and understanding technical details (worst case).

    • High autonomy, high cognitive load: Teams have full access (e.g., AWS accounts) but face infrastructure burden and fragmentation.

    • High autonomy, low cognitive load: Ideal situation—teams get what they need quickly without detailed knowledge.

Shift from Product Thinking to Cognitive Load

  • Cognitive load focus: More important than just product thinking—consider the human experience when using the system.

  • Team Topologies: Mentioned as a key reference on this concept of cognitive load management.

Platform as a Product Mindset

  • Collaboration: Building the platform in close collaboration with initial users (pilot teams) is crucial for success.

  • Product Management: Essential to have a product manager or team dedicated to communication, user journeys, and internal marketing.

Self-Service as a Platform Requirement

  • Definition: Users should easily discover, understand, and use platform capabilities without human intervention.

  • User Testing: Watch how users interact with the platform to understand stumbling points and improve the self-service experience.

Platform Team Cognitive Load

  • Burnout Prevention: Platform engineers need low cognitive load as well. Moving from a reactive (ticket-based) model to a proactive, self-service approach can reduce the strain.

  • Proactive Approach: Self-service models allow platform teams to prioritize development and avoid being overwhelmed by constant requests.

Discussion about this podcast