Root Cause Analysis in Three Clicks: Announcing Major Updates to Lightstep’s Distributed Tracing
by Talia Moyal
Since the early days of distributed tracing at Google, we’ve been working to make complex systems easier to understand. Today, we’re excited to announce updates — including the ability to search logs on traces — that enable developers to identify the root cause of virtually any regression in three simple clicks.
Root Cause Analysis in Three Clicks
Click 1: See what’s changed
Lightstep now highlights where latency, error rates, or other service level indicators (SLIs) have experienced the greatest change. This enables anyone on the team to understand system-wide service health in seconds.
Click 2: Compare before and after
Once you’ve identified a regression, you can quickly compare it to baseline performance with a simple before-and-after view.
Added bonus: this comparison is also available between deployment versions (both real-time and historical). This means you can quickly see which version of your canaries performs best over any period of time.
Click 3: Pinpoint the exact logs and traces needed to resolve an issue
The before-and-after view takes you straight to our RCA (root cause analysis) page, where we automatically surface the traces, metrics, logs, tags, and operations associated with increased latency or errors.
By adding logs to tracing, you can now view and search log messages in full system context, and identify where a regression occurred, down to a single line of code. You can understand the impact it made — all in one tab.
Click 4: There is no click 4! That’s it.
Rather than switching back and forth between traces and logs (and spend hours grepping through log files), you can now view log messages in the context of the problem you're trying to solve, even if you have little to no knowledge about the application or system you're investigating.
To recap, as part of this update, you can now:
- Automatically see which service level indicators (SLIs) have meaningfully changed
- Instantly identify which logs you need to resolve an incident or investigate a regression
- View side-by-side version comparisons for canary deployments, and immediately know which version is performing the best.