Distributed tracingDistributed tracing generates a stream of rich, contextual data. But as systems grow in scale and complexity, it can be challenging to navigate through thousands of traces to quickly find answers to performance questions.
When things are on fire, how do you know if your hypothesis is a good one?
To help answer that question, we built Snapshot Analyzer. It’s a simple way to investigate cross-service performance at scale.
With Snapshot Analyzer, you can filter comprehensive views of complete system behavior (SnapshotsSnapshots) across any dimension in your system, and group cross-service traces by any attribute.
Think of it as being able to perform SQL-like operations on large amounts of trace data.
Narrow the Scope of Your Investigation to What Really Matters
When you’re investigating an issue in a complex or deep systemdeep system, it can be difficult to narrow the scope to whatever is the most likely culprit.
Tracing provides context for end-to-end requests, which often include multiple services. With Snapshot Analyzer, you can filter these traces by one or more services, operations, or tags and focus your analysis on only the traces relevant to your investigation.
In the above gif, we currently have a SnapshotSnapshot of the most recent traces from our system. From here, we can filter by the tags
error: true and
canary:trueto find only the traces that are returning errors after a canary release.
Digging into a single trace, we can quickly glance at the right logs and find the issue! Additionally, the suggestions themselves are scoped to only the ones matching the provided filter. This allows you to perform flexible, exploratory analysis on complex tracing data.
What If You Have No Idea Where to Start Your Investigation?
In certain (scary) situations, you may be notified of an issue by your customer, and you don’t know where to begin investigating.
Fear not! Backed by CorrelationsCorrelations, Snapshot Analyzer can help you reduce distractions from red herrings and speed up root cause identification.
Correlations automatically surfaces attributes that are associated with latency. With Snapshot Analyzer, you can dive deeper into the insights provided by Correlations, add additional filters, and focus your investigation on rapid hypothesis generation and validation.
Group Traces By Any Attribute
Snapshot Analyzer also allows you to group traces that share a certain attribute and compare performance characteristics across groups.
So, how does this work?
In the example below, we’re investigating what we initially guess is a database issue. We start by filtering the set of traces down to only those that have a
db.type=cassandra tag. We then group these traces by
region to see aggregate statistics across both
us-west-1. The difference in error percentage and average latency tell us that the issues is actually region-specific. We can dig into a trace in this region to get the context we need to mitigate the issue. This ability to group traces by a tag of any cardinality is invaluable to quickly corroborating or eliminating hypotheses.
Select Attributes Across Traces
Snapshot Analyzer provides the ability to view additional contextual information across traces. The Add Column feature allows the user to view the value for any tag in the Trace Analysis table. Having this information next to the individual spans helps identify patterns and anomalies in your system. It can expedite hypothesis generation when performing root cause analysis by helping you narrow down the trace search space.
Want to Give Snapshot Analyzer a Try? Check Out Our Free Sandbox
Go to lightstep.com/playlightstep.com/play and use Snapshot Analyzer to resolve a performance regression in under 10 minutes.
Interested in joining our team? See our open positions herehere.
November 4, 2019
3 min read
About the author
Karthik KumarRead moreRead more
In this blog postNarrow the Scope of Your Investigation to What Really MattersNarrow the Scope of Your Investigation to What Really MattersWhat If You Have No Idea Where to Start Your Investigation?What If You Have No Idea Where to Start Your Investigation?Group Traces By Any AttributeGroup Traces By Any AttributeSelect Attributes Across TracesSelect Attributes Across TracesWant to Give Snapshot Analyzer a Try? Check Out Our Free SandboxWant to Give Snapshot Analyzer a Try? Check Out Our Free Sandbox
Explore more articles
Strengthening our commitment to the OpenTelemetry projectCarter Socha | Apr 20, 2023
Lightstep is the first company to natively provide customers with complete control of their telemetry pipeline which saves time and money, and provides the freedom to innovate at scale. By embracing OpenTelemetry support without vendor lock-in, Lightstep helps you make complex app development easier and faster.Learn moreLearn more
Transform ServiceNow workflows with Service Graph Connector for Observability - LightstepAndrew Gardner | Dec 20, 2022
The Service Graph Connector for Observability - Lightstep is the bridge between IT Operations and DevOps teams. When combined with ITOM Visibility, it provides organizations with a complete, end-to-end view of their entire cloud estate.Learn moreLearn more
Evolving our incident response strategyLightstep | Nov 2, 2022
Lightstep’s Incident Response offering will be sunset effective January 31, 2023. Current customers may continue to use the service until then. Lightstep Observability will not be affected.Learn moreLearn more
Lightstep sounds like a lovely idea
Monitoring and observability for the world’s most reliable systems