Why You Should Have a Platform Team
by James Burns
The fundamental role of a platform team is to increase developer velocity.
They do this in, primarily, two ways. Platform teams lessen the cognitive load for developers to interact with production systems and they optimize common developer activities. Developers end up able to make bigger changes more quickly and with more confidence. Let’s dive into how that’s possible.
If you don’t yet have product/market fit, time spent working on anything but finding that fit is wasted. As that is found, the need to expand rapidly doesn’t change, yet the systems you built to find it are often not optimized to keep it or expand.
- It takes too long to make new developers effective.
- It takes too long to roll out new features.
- Customer’s use of the system is endangered by big changes.
- Keep-the-lights-on work is overwhelming customer-impacting work
One of the things that slows developers down is cognitive load. Think of using a new system as akin to driving around an unfamiliar city, running into unexpected traffic jams and road closures. As a driver, this is a stressful experience, especially if you need to arrive at your destination on time. By comparison, a platform team can make life easier for developers by enforcing a standardized “road map” of systems, so that the traffic patterns, turns, and general layout of different cities are nearly identical. In this way, a developer who is familiar with one system can easily navigate the next, with a minimum of frustration.
A Platform team’s first job is to survey the ways things have been done, to understand how development patterns and habits have changed throughout the history of the company, and how these trends are reflected in existing code. With this understanding, they can plan what’s going to lighten the cognitive load fastest. Very often this is reducing the number of ways a system can be built and making sure the supported ways look the same for deployment, observability, and monitoring.
For a platform team to be successful, instead of just setting standards by fiat and hoping for compliance, they will need to do often substantial work to either move existing systems to the new standard or make it a less than day-long task for development teams to move and verify them. Unless the platform team has done the work of moving a service to the new standard, it’s all too common for them to dramatically underestimate the cost to a development team. If the problem you’re trying to solve is developer velocity but you impose a multi-sprint velocity hit, it’s going to be a hard sell and ultimately self-contradictory.
The standards the platform team sets—and the platform itself—should optimize common developer activities, particularly:
- Creating New Services
- Testing Services
- Deploying Changes to Services Safely
- Understanding the Performance and Correctness of Deployed Services
If the cost in developer time for creating and integrating new services is high, developers will lump new functionality in with unrelated other functionality because they need to get it shipped. The chance that unrelated but mutually-critical functionality fails together goes up. If the cost to create and integrate a new service is low, then new functionality will more often be placed into new services. Services stay smaller (micro) and also easier to understand, lowering the cognitive load associated with switching to work on a new service.
Testing a proposed change to a service is a necessary and common activity. However, if it takes more than a few minutes, or if teams are working quickly, changes pile up. If something turns out to be broken, all the changes behind it usually are broken too. Platform teams create and optimize common ways of testing a service and make sure they scale out appropriately, so that changes are validated and ready to deploy a few minutes after being proposed. Keeping services simple by making it easy to create new ones also helps with this.
Deployments are what put changes in front of customers, letting them see new features or products. Unfortunately, they’re also what’s most likely to create bad experiences, downtime, even lost data. Confidence, not in never making mistakes, but being able to minimize customer impact of mistakes, is the most critical role of the platform team. By developing tools and methods like canaries and progressive deploys anything that might impact customers can be immediately detected and rolled back before there’s an outage.
Platform teams make deployed software observable by default so that engineers and operators can see what is happening, what has happened, and make reasonable guesses about what will happen. By presenting information about applications and services in a way that’s comparable and comprehensible, platform teams enable developers to effectively work together on confusing and complex problems as well as transparently switch between projects. Developers don’t have to spend weeks learning the way that a particular team instrumented their application, they can use what they know about previous projects to be effective right away.
Platform teams are force multipliers, by reducing cognitive load for developers and making it easier to ship, allowing them to focus on feature velocity. They do this by making common developer activities fast and easy. Platform teams require dedicated investment and commitment. The investment is transformative not just to software development but also to the entire business. See how you can get started building a platform team and have your team look into observable by default applications with Lightstep.