Microservices Architecture: When and How To Move To Microservices
Building an application using microservices means building small, independent components, each with a narrowly defined function. It’s no longer just a (monolithic) shopping app, but a products service, reviews service, a recommendations service, a shopping cart service, a check-out service, not to mention services for users, preferences, authentication, history, and a dozen other components of your app.
While many would say the advantages of microservices are faster releases, better scalability, or more flexibility in choosing languages or frameworks, the real reason to build a microservices-based application is that it will enable the teams in your organization to work more independently. And as part of working more independently, they will also work more efficiently. So while it might seem obvious that your next green-field application should be based on microservices, what about your current application? When — and how — should you move your existing code to a microservice-based architecture? There are big costs and risks in doing so, and choosing the right time and approach can make all the difference on getting a return on your investment.
Since microservices are really only necessary when parts of your org need to work independently, the question of when you should move to microservices is really a question of how you are thinking about your org and how it’s going to change over time. Are you scaling your team to build out several new sets of features? Is your performance backlog finally catching up with you? Is your operations team burning out? Or maybe your developers are just complaining that they’re spending all of their time in meetings, and you’re looking to boost productivity.
Understanding how to break up your monolithic application is really about understanding what you want your organization to look like over the next few years. You’ll want to consider your business strategy, where you’re going to need to invest, and where you and your organization are going to have the highest leverage.
Often the transition to microservices is framed as a migration. Realistically, it’s better to think about this as a journey: you will learn a lot along the way, including more about your software and your organization. Prepare yourself (and your org) to be in a state of transition for a while.
To begin that journey, start with functionality that’s critical to your business. While it’s tempting to pick something ancillary, what you really want to do is build the case for continuing that journey. Picking something business-critical will help you better understand how microservices can support your organization and business, and if it’s successful, to create a model to emulate moving forward. And if it’s not successful… well, you’ll have great data about what to change next time around.
Defining a clear API for your first microservice (and your second, and your third…) is important in that it will enable you to create an on-going measure of success: after all, an API is the most direct way of describing what a microservice does. And those APIs will be a key part of defining Service Level Objectives (SLOs) for each microservice. SLOs are part of the contract between a microservice and others that depend on it. No microservice provides perfect service 100% of the time, but an SLO sets expectations about what users can expect. How fast should your service respond? What percentage of errors are acceptable? When is down-time okay and how much? You’re probably used to dealing with SLOs (and the associated SLAs) for your managed service providers; be ready to define these for each microservice as it comes online.
Of course, SLOs are only useful if you measure the underlying metrics and report on whether or not you’re meeting those objectives. Even as you begin your journey, it’s important to make sure that teams are bubbling up these reports. Someone needs to have organization-wide visibility into how services are performing and hold teams accountable for meeting their goals — it’s critical that you establish a culture of accountability early. Without this you will inevitably find yourself in a place where performance or reliability are suffering, and you’re not sure whom to blame.
One thing that will change with a move to microservices is that, while before the "customers" for your developers were, well, predominantly, your business customers, now most developers provide service to other developers. While it’s tempting to continue to rely on informal communication channels to resolve issues, to reap the benefits of a microservices architecture — independent and fast-moving teams — you’ll need to put some new tools in place to monitor your production systems. Without them, your teams will spend countless hours trying to track down problems across teams, often blaming the wrong team for an issue.
Defining and measuring SLOs are the beginning of understanding performance and reliability, but they will only tell you when something has gone wrong — and usually not why. Most production failures in a microservice architecture are the result of an interaction of two or more components. That is partly because while each individual service can test their changes in isolation before deployment, it’s incredibly hard to predict how these changes will affect other services until they are live in production.
Unfortunately, taking traditional approaches to monitoring (metrics and logging) and just "scaling them up" won’t work, for two reasons. First, the amount of telemetry data generated by your application will scale with the number of microservices. This is particularly problematic when it comes to logging, as you are likely paying (either in terms of vendor costs or developer cycles committed to maintenance) based on data volume, and more microservices will mean higher costs, even for a fixed transaction rate. Second, while additional metrics and logging might help each team understand their own services better, they fail to help teams understand when changes in other services are impacting their own.
While distributed tracing is sometimes viewed as a "third pillar" to complement metrics and logging in a microservices-based architecture, it’s critical that all monitoring tools become "microservices-aware."
This is easiest to understand for logging, where it’s important to be able to organize logs by transaction, even when that transaction touches dozens or hundreds of microservices. In some ways, many distributed tracing tools are just this: logs and some lightweight timing information that’s collected and presented in a transaction-oriented way rather than a service-oriented one.
It’s also true for metrics: while it’s useful to understand how latency for, say, your recommendations service has changed, it’s also important to understand whether or not it is affecting your front page as much shopping cart views, as well as which end-user segments are most affected. Just as for logging, developers will need to understand how a given metric is being affected — and is affected by — other services up and down the stack.
To make your monitoring tools microservices-aware, it’s critical that you build transaction IDs and other context into your telemetry and tools from day one of your microservices journey.
While there are many pitfalls along the microservices journey, it’s also a great opportunity to promote team development. New architectures and new tools mean that you can attract fresh talent, and it’s also a chance to foster new skills and new roles for your existing team members. After all, when heading out on a long road-trip, isn’t it your traveling companions that matter the most?