In this blog postOrg chart as a serviceOrg chart as a serviceMy views on microservicesMy views on microservicesWhy is it so slow to develop big software applications?Why is it so slow to develop big software applications?Decrease communications overhead to maximize productivityDecrease communications overhead to maximize productivityObservability is keyObservability is keyA closing metaphorA closing metaphor
Guest post author - VijayVijay is SVP of Engineering at Databricks
Unfortunately, it’s not clear that they all understand why they’re doing so. I’ve heard shallow rationalizations about performance, "cleanliness," and even blind imitation of well-known practices at Google and elsewhere. I rarely hear people citing the one truly excellent reason to adopt microservices though: shipping their org chart.
Org chart as a service
The "org chart as a service" is the primary use case I’ve found for microservices. Debugging and running microservices is hard. Anyone who thinks it’s easier to debug microservices than a monolith is misinformed or inexperienced. However, using microservices is absolutely the right way to scale your engineering organization because you‘re going to ship your org chart, no matter what. Before I get into all that, let’s take a step back to consider microservices in general and academic findings about organizational design.
My views on microservices
I posted this tweet back in January, and it largely reflects my views on microservices:
Most engineering leaders don’t understand why things are being done the way they are, and this leads to cargo cultscargo cults. Given that software development is central to the future of companies, the amount of hard research available on development best practices is inadequate.
Why is it so slow to develop big software applications?
There were two pieces of seemingly conflicting research about the impact of organizational structure on software performance. The first, The Influence of Organizational Structure On Software Quality: An Empirical Case StudyThe Influence of Organizational Structure On Software Quality: An Empirical Case Study by Nachiappan Nagappan, Brendan Murphy and Victor Basili, suggested that globally distributed teams didn’t perform worse than colocated teams. However, Splitting the Organization and Integrating the Code: Conway's Law RevisitedSplitting the Organization and Integrating the Code: Conway's Law Revisited by Rebecca Grinter and James D. Herbsleb, found the opposite – that globally distributed teams performed worse than colocated teams. So which report was correct? It turns out both were correct when you controlled for team size. This means that the single largest factor across these studies (on the effects of organizational structure) appears to be communication overhead. Period.
Decrease communications overhead to maximize productivity
To maximize productivity, you need to cut down the communications overhead. That means smaller teams, more scoped work, and more time-boxing. Cut features to keep time constant. Systemic requirements for collaboration is an anti-pattern. As Stu Feldman stated, "We just don’t know how to get more than 10 or 12 communicating together effectively nor how to keep them all on top of what’s going on so they could do their jobs well."
Microservices can help you scale your team. Large, inflexible, monolithic applications often lead to large inefficient engineering teams because no matter what you do, you’re going to ship your org chart. Try not to fight this: it’s a law of nature, not an artifact of avoidable consequences. In particular, there are two "laws" in question: Conway’s lawConway’s law and Brooks’ law (from The Mythical Man MonthThe Mythical Man Month). Conway’s law is perhaps better-known: that a system (software or otherwise) will end up a structural copy of the organization’s communication structure (i.e., its org chart). Brooks’ law states, "Since software construction is inherently a systems effort – an exercise in complex interrelationships – communication effort is great, and it quickly dominates the decrease in individual task time brought about by partitioning."
So how does this map to microservices? Microservices are "Conway’s law as a feature." They can help you scale your engineering organization, not your product. Smaller, more nimble teams are created to support the individual microservices – thus reducing the communications overhead and increasing team efficiency. When properly designed, microservices only communicate with a small number of adjacent microservices – per Brooks, this reduces the drag of N^2 human communication patterns.
Observability is key
While the productivity gains associated with teams creating microservices are clear, running microservices is hard. Microservices are much more complicated to develop and debug. With a monolith, when there’s an issue, you probably know exactly who to call (or a stack trace and a 'git blame' will get you close enough). With microservices, you’re probably not sending pages to the right teams the bulk of the time because you don’t know what’s going on. Observability is key to overcoming these challenges. Products like LightstepLightstep exist – and are absolutely necessary – specifically because it’s so extraordinarily difficult to make sense of the complexity and concurrency in a microservices environment.
To prevent collaboration from killing you, break down your work into teams and hand them over with good interfaces. Build small organizations that do a few things well and expose good APIs and/or contracts. The upper limit on team size should be five.
A closing metaphor
I would like to close my guest blog post with another tweet.