Root cause analysis in three clicks: Announcing major updates to Lightstep’s distributed tracing

Talia Moyal

by Talia Moyal

Explore more Announcements Blogs

Talia Moyal

by Talia Moyal


Looking for Something?

No results for 'undefined'

Since the early days of distributed tracing at Google, we’ve been working to make complex systems easier to understand. Today, we’re excited to announce updates — including the ability to search logs on traces — that enable developers to identify the root cause of virtually any regression in three simple clicks.

Root Cause Analysis in Three Clicks

Click 1: See what’s changed

Lightstep now highlights where latency, error rates, or other service level indicators (SLIs) have experienced the greatest change. This enables anyone on the team to understand system-wide service health in seconds.

Lightstep Top Changes

Click 2: Compare before and after

Once you’ve identified a regression, you can quickly compare it to baseline performance with a simple before-and-after view.

Lightstep version comparisons

Added bonus: this comparison is also available between deployment versions (both real-time and historical). This means you can quickly see which version of your canaries performs best over any period of time.

Click 3: Pinpoint the exact logs and traces needed to resolve an issue

The before-and-after view takes you straight to our RCA (root cause analysis) page, where we automatically surface the traces, metrics, logs, tags, and operations associated with increased latency or errors.

By adding logs to tracing, you can now view and search log messages in full system context, and identify where a regression occurred, down to a single line of code. You can understand the impact it made — all in one tab.

Lightstep RCA errors

Click 4: There is no click 4! That’s it.

Rather than switching back and forth between traces and logs (and spend hours grepping through log files), you can now view log messages in the context of the problem you're trying to solve, even if you have little to no knowledge about the application or system you're investigating.

To recap, as part of this update, you can now:

  • Automatically see which service level indicators (SLIs) have meaningfully changed
  • Instantly identify which logs you need to resolve an incident or investigate a regression
  • View side-by-side version comparisons for canary deployments, and immediately know which version is performing the best.

How can I try out these new features?

Interested in joining our team? See our open positions here.

Explore more Announcements Blogs