Lightstep from ServiceNow Logo

Products

Solutions

Documentation

Resources

Lightstep from ServiceNow Logo
< all announcements

Move fast and know what’s broken: Announcing Service Health for Deployments

Superstition runs through my blood. When I was a product manager, I remember telling my team, “Never launch on a Friday!” The gut-wrenching feeling I had of customers complaining about a new release or running into newly deployed issues didn’t seem so different from what the engineers were feeling about deploying, especially those on call. 

At Lightstep, we started to think about this problem. What did we need to give ourselves peace of mind when pushing something into production? And we realized, we wanted to be able to:

  1. Quickly catch a performance regression if something (inevitably) went wrong

  2. Easily isolate the root cause of that regression.

We wrestled with how to bring this level of confidence to deployments, and after months and months of work, we are proud to launch Service Health for Deployments, the firstobservabilityobservability tool to automatically show you what impacted your service’s performance during and after a deployment — and surface why it happened.  

With this release, developers no longer need to wonder if a deployment has impacted their service level indicators, such as latency, error rate or throughput. There’s no reason to get stuck in the cycle of rolling forward, getting alerted, guessing possible fixes, rolling back, rolling forward again, getting alerted again, guessing on possible fixes again, etc. etc., when you can get an immediate understanding of service and system health. 

How Does Service Health for Deployments Work?

Service Health for Deployments allows developers to proactively monitor deployments or reactively investigate regressions by: 

  • Comparing historical latency histogram distributions 

  • Identifying tags that have the biggest impact on latency through Correlations

  • Viewing operations or service diagrams to see the lifecycle of a request through a service or system

  • Comparing before and after views of a regression

  • Viewing the latency, error ratio and throughput for a service, all while seeing when a deployment for that service occurred

Service Health in Real Life

Imagine this: It’s Friday afternoon and you are deploying a new version of the inventory service powering your ecommerce platform. There are some risky changes going out, so you want to make sure everything looks healthy before you leave for the weekend. And it’s your partner’s birthday dinner, so you don’t want to be late. 

You jump into Lightstep and see that the operation update-inventory in your inventory service is showing a latency increase. 

move-fast-and-know-whats-broken-announcing-service-health-for-deployments

You click on the latency spike and start investigating. You select a baseline an hour before your 2 p.m. deployment in order to compare how performance changed before and after you rolled out. 

move-fast-and-know-whats-broken-announcing-service-health-for-deployments

You start to compare tags to see if anything strongly correlated to this latency. You see that large_batch:true tag is new and correlated with high latency. 

move-fast-and-know-whats-broken-announcing-service-health-for-deployments

Now you want to identify which operation is taking the most time in the service you just deployed. You use the Operation Diagram to find the critical path intraservice. You see that the write-cache operation is contributing the most to latency. 

move-fast-and-know-whats-broken-announcing-service-health-for-deployments

You validate this by grouping traces on the large_batch tag to see if there is a marked difference. The trace analysis table confirms that your deploy to submit large batches is resulting in considerably more latency. 

move-fast-and-know-whats-broken-announcing-service-health-for-deployments

Thankfully, this is a quick fix and it’s not even 3 p.m. yet. 

Wait there’s more: Instrumentation Quality Scores

In addition to providing insights into service health, Lightstep can also help ensure that your services are properly instrumented and therefore always ready to provide you the value you need during a fire. 

With Instrumentation Quality ScoresInstrumentation Quality Scores, your team gets specific advice on how to make improvements to instrumentation, and an easy-to-understand overview of the quality of that instrumentation. 

move-fast-and-know-whats-broken-announcing-service-health-for-deployments

Our hope is to make developers' lives as easy as possible. We recognize the pain that comes with a multi-tab, manual investigation, and we’re changing that with a one tab solution, that automatically surfaces the regressions you should care about and why they happened.

Check out these features for yourself in our interactive sandboxinteractive sandbox.

Interested in joining our team? See our open positions herehere.

January 27, 2020
4 min read
Announcements

Share this article

About the author

Talia Moyal

Strengthening our commitment to the OpenTelemetry project 

Carter Socha | Apr 20, 2023

Lightstep is the first company to natively provide customers with complete control of their telemetry pipeline which saves time and money, and provides the freedom to innovate at scale. By embracing OpenTelemetry support without vendor lock-in, Lightstep helps you make complex app development easier and faster.

Learn moreLearn more

Transform ServiceNow workflows with Service Graph Connector for Observability - Lightstep

Andrew Gardner | Dec 20, 2022

The Service Graph Connector for Observability - Lightstep is the bridge between IT Operations and DevOps teams. When combined with ITOM Visibility, it provides organizations with a complete, end-to-end view of their entire cloud estate.

Learn moreLearn more

Evolving our incident response strategy

Lightstep | Nov 2, 2022

Lightstep’s Incident Response offering will be sunset effective January 31, 2023. Current customers may continue to use the service until then. Lightstep Observability will not be affected.

Learn moreLearn more
THE CLOUD-NATIVE RELIABILITY PLATFORM

Lightstep sounds like a lovely idea

Monitoring and observability for the world’s most reliable systems