Lightstep’s Change Intelligence combines metric and tracing telemetry data to gain full observability into your system using one tool. Your metric dashboards and alertsalerts not only show you when there’s a problem, they also become actionable tools that find the source. You don’t need to be a DevOps engineer or know the dependencies in your system; Lightstep understands them for you and can find the issues deep in your stack.
When you notice a deviation on a metric chart, you can use Change IntelligenceChange Intelligence to correlate that deviation with traces from your services to find what in your system may have caused that change. Change Intelligence uses trace data from your service instrumentation to determine up and downstream dependencies and then finds changes in that path that happened at the same time as the metric deviation.
When you find an issue in a dashboarddashboard or chartchart (even a chart in an alertalert), you can click into the deviation and choose What caused this change?
Change Intelligence begins by setting baseline and deviation time windows and then compares and analyzes the performance of the service that sent the metric data before and during the deviation.
When Change Intelligence finds performance changes on any Key Operations on that service, it determines the magnitude of the change and lists them in descending order (most changed first). For each operation, it displays sparkcharts for latency, operation rate, and error rate.
In this example, Change Intelligence shows us that the update-catalog
Key Operation on the warehouse
service experienced the most change (in p99 latency) at the same time as the metric deviation.
Once it finds a Key Operation with meaningful performance changes, Change Intelligence looks for traces with that operation. It analyses those traces, searching for attributesattributes that appear frequently on spans from services up and down the stack, during the performance regression. In other words, if an attribute appears on a number of traces with performance issues (and doesn’t on traces that are stable), it’s likely that something about that attribute is causing the issue. These are displayed as likely causes of the change.
In this example, the attribute customer:ProWool
was found on over 41% of traces during the deviation and in less than 7% during the baseline. The latency on traces with that attributed increased 5x and the operation rate for those spans also increased.
Looking at the service diagram, you can see that attribute is being sent on traces coming from the /api/get-catalog
operation on the iOS
service. The diagram also shows that there's one service in between the iOS
service and the warehouse
service.
Change Intelligence collects exemplar traces that include that correlated attribute with the performance issue. Clicking View sample traces, allows you to choose one and open it in the Trace viewTrace view.
In the trace, it looks like the customer ProWool sent 1,000 requests and the write to the database is overwhelmed. That's likely why the CPU metric spiked.
Change Intelligence was able to pinpoint the part of the system that is likely causing the change in the metric performance. By combining metrics with tracing, instead of just knowing that a change occurred, you can find the source without leaving Lightstep.
Interested in joining our team? See our open positions herehere.
Explore more articles

How to Operate Cloud Native Applications at Scale
Jason Bloomberg | May 15, 2023Intellyx explores the challenges of operating cloud-native applications at scale – in many cases, massive, dynamic scale across geographies and hybrid environments.
Learn moreLearn more
2022 in review
Andrew Gardner | Jan 30, 2023Andrew Gardner looks back at Lightstep's product evolution and what's in store for 2023.
Learn moreLearn more
The origin of cloud native observability
Jason English | Jan 23, 2023Almost every company that depends on digital capabilities is betting on cloud native development and observability. Jason English, Principal Analyst at Intellyx, looks at the origins of both and their growing role in operational efficiency.
Learn moreLearn moreLightstep sounds like a lovely idea
Monitoring and observability for the world’s most reliable systems