Want to Reduce Service Cost and Resource Waste? Start Squeeze Testing
by James Burns
For growing businesses, it’s normal to size deployments of services based on the intuitions of the development team involved. Almost all the time sizing includes some safety margin so that sudden spikes in demand don’t take down the service or wake up whoever is on call. Unfortunately, often these intuitions, even if correct at the beginning, don’t stay up to date with changes in the service or its dependencies leading to both outages and unhelpful paging.
The practice of “squeeze testing” exists to keep information about the performance requirements of a service up to date for a given traffic load. By selectively steering traffic or workload to a service running on a different instance size or with different container resource constraints the impact of sizing alternatives or differences between releases can be clearly seen without customer impact. With investment in tooling, this process can be easily automated to provide teams with up to date information about performance options on an ongoing basis.
Squeeze Testing Prerequisites
There are two key prerequisites to make squeeze testing possible and allow you to optimize your service cost and resource footprint. We’ll talk about each in detail in the rest of this article. The first is to have the behavior of the service be observable. If you’re unable to see whether the squeeze test is having negative impact on customers even at the 99th or 99.9th percentile, then it’s not going to be safe to run.
The second prerequisite is the ability to control the proportion of traffic going to a subset of the service (the canary). Many simple load balancing schemes use round-robin or least loaded with all deployed instances of the application having the same target amount of traffic. For squeeze testing instead there will be a target percentage of traffic independent of the number of available service instances.
Service Observability Prerequisites
When doing squeeze testing using production traffic against a canary, observability is important to understand any custom impact, especially outlier impact. At the very least, your service should track the performance of all resources and methods separately. For instance, if you have a service for getting store stock information GET /stores, GET /items, POST /items, and PUT /stores will have very different performance characteristics. If you’re using service level metrics with rarely hit endpoints, you will be unlikely to notice critical regressions with new configurations or code until too late. Even within a particular resource and method combination, performance can significantly and meaningfully vary by request patterns of clients or by things like users each have different amounts of data. If a change in instance size changes caching behavior but only for some customers, it’s good to know which customers those are and how much it changes performance for them. Perhaps the changes will be acceptable but without knowing first, you could negatively impact key high value customers.
Key metrics for comparison between the canary and control instances of the service will be latency and error rate. Since part of the process will be purposefully increasing or decreasing the proportion of traffic handled by a service instance, request rates are less likely to be comparable. When comparing latency it’s especially important to look at tail latency. Perhaps only 1% of requests’ performance has changed, but they’ve gotten ten times slower. Investigation of the impact of that will be necessary, but again that’s only possible if it’s noticed. Likewise, aggregate error rate may stay the same but per customer error rate could spike for particular customers. Noticing allows for an informed choice.
Load Balancing Prerequisites
The other essential component of squeeze testing is the ability to control the amount of traffic going to the service being tested. If the change is dramatic, for instance dropping two instance sizes, then it may be prudent to start with a small amount of traffic before ramping up. If trying to discover the actual performance threshold for a current instance size, then sending two or three times the normal traffic (or more) may make sense. However, in order to do any of these things, it’s necessary to have a load balancing system that separates the number of instances of a service “in load balancer” from the proportion of traffic that they receive.
Often teams following a canary process [link or callout defining canaries?] will just add a single instance of the new code or configuration and watch for behavior changes. While this might work for services with many instances deployed, there are some fundamental problems. First, load balance attempts to maintain fair traffic distribution, so the canary gets 1/x of the traffic where x is the number of instances. 1/100th of traffic (1%) might be an acceptable amount for some sorts of canaries but even that might be considered too risky. Much more often it’s 1/20th or 1/10th of traffic. A bad configuration or code change can then significantly affect customers and even the business.
Many of the newer generations of edge and service proxies support finer-grained traffic control. Look for follow up posts with step by step directions for this.
Squeeze testing, the process of understanding the behavior of a service with different traffic volumes or resource allocations, can reduce resource waste, ensure effective service scaling based on demand, and cut service costs. Squeeze testing, when included in a continuous testing practice, can proactively alert for performance regressions with code or config changes removing the guesswork from successful service deployment and operations.
Service observability that can provide high-resolution comparison of performance requires a platform built for handling high cardinality and sampling it as late as possible to ensure accuracy. Lightstep was built specifically to enable these sorts of insights. To see examples that you can play with check out the Sandbox at https://lightstep.com/play.
James BurnsDeveloper Advocate
From network load balancers to FPGAs to ASICs to embedded security to cloud ops at scale, James has seen how systems work but, more interestingly, how they fail. He is passionate about sharing what he's learned to level up teams, make developers happier, and improve customer experiences.