#60 How to NOT fail in Platform Engineering

Reliability Enablers

0:00

-30:33

Ankit Wal from ThoughtWorks Asia Pacific gave me the inside word on this hot topic

Oct 01, 2024

Here’s what we covered:

Platform engineering: Building compelling internal products to help teams reuse capabilities with less coordination.
Cloud computing connection: Enterprises can now compose platforms from cloud services, creating mature, internal products for all engineering personas.

Didn't choose platform engineering; it found him.
Early start in programming (since age 11).
Transitioned from a product engineer mindset to building internal tools and platforms.
Key experience across startups, the public sector, unicorn companies, and private cloud projects.

Public sector: Highly advanced digital services (e.g., identity services for tax, housing).
Exciting environment: Software development in Singapore’s public sector is fast-paced and digitally progressive.

Turf wars: Debate among DevOps, SRE, and platform engineering.
- DevOps: Collaboration between dev and ops to think systemically.
- SRE: Operations done the software engineering way.
- Platform engineering: Delivering operational services as internal, self-service products.

Issue: Requiring tickets to get work done creates bottlenecks.
- Ideal state: Teams should be able to work autonomously without raising tickets.
- Spectrum of dysfunction: From one ticket for one service to multiple tickets across teams leading to delays and misconfigurations.

Challenge: Balancing user autonomy with managing cognitive load.
Goal: Enable product teams with autonomy while managing cognitive load.
Solution: Platforms should abstract unnecessary complexity while still giving teams the autonomy to operate independently.
How it pans out
- Low autonomy, low cognitive load: Dependent on platform teams but a simple process.
- Low autonomy, high cognitive load: Requires interacting with multiple teams and understanding technical details (worst case).
- High autonomy, high cognitive load: Teams have full access (e.g., AWS accounts) but face infrastructure burden and fragmentation.
- High autonomy, low cognitive load: Ideal situation—teams get what they need quickly without detailed knowledge.

Cognitive load focus: More important than just product thinking—consider the human experience when using the system.
Team Topologies: Mentioned as a key reference on this concept of cognitive load management.

Collaboration: Building the platform in close collaboration with initial users (pilot teams) is crucial for success.
Product Management: Essential to have a product manager or team dedicated to communication, user journeys, and internal marketing.

Definition: Users should easily discover, understand, and use platform capabilities without human intervention.
User Testing: Watch how users interact with the platform to understand stumbling points and improve the self-service experience.

Burnout Prevention: Platform engineers need low cognitive load as well. Moving from a reactive (ticket-based) model to a proactive, self-service approach can reduce the strain.
Proactive Approach: Self-service models allow platform teams to prioritize development and avoid being overwhelmed by constant requests.