Resolve errors faster with Lightstep and Rollbar
by Fran Thorpe
Every day developers wade through a growing number of error alerts emerging from changes and issues across their interdependent systems. But identifying the critical errors and mapping them to the root cause and resolution involves too much unproductive, time-consuming guesswork.
At Lightstep we’re always on the look for ways to make Observability best practices easier for any developer to adopt. Recently we’ve combined efforts with Rollbar, a platform for error monitoring and analysis. In connecting our collective expertise, we’ve created workflows to remove the guesswork and improve error resolution in pre-production and production software.
Rollbar is an early warning system for errors. Using call stack fingerprints, they’ve automated the process of surfacing and triaging new and critical errors. Once the error is identified, Rollbar provides actionable insights as next steps for engineering teams. The latest example is the integration with Lightstep that enables a developer to move seamlessly from a critical error alert in Rollbar to rapid, deep-dive root cause analysis.
Customer-visible reliability issues are happening more frequently as high velocity engineering teams push to do multiple deployments and config changes per day. Rollbar analysis cuts through the overwhelming error volume to surface newly introduced errors, which allows developers to jump straight addressing to any new, unwanted changes in their system behavior.
Rollbar uses the Lightstep Snapshots API to capture a full system trace of the error across all services and dependencies at the moment of the error. This means no more wasted time manually digging through logs and system metrics trying to reconstruct what happened at the moment of things went wrong! Instead, the Rollbar error information is connected directly to the system context at the time of the error in the form of permanently persisted Lightstep distributed traces. Developers are given far more information to understand the issue and get straight to resolving the error.
Rollbar and Lightstep now have shared access to detailed call stack information and full persisted traces of every newly detected error in the production system. Developers can investigate via a simple button within Rollbar directly to the full system context available in Lightstep.
In Lightstep, developers can further explorer the error to understand which services and operations are causing, contributing to, correlated with, or are otherwise affected by the error. In this example, Lightstep Trace Analysis identifies the operation
A double click on the span reveals the full details of the trace associated with the error with the Rollbar error UUID embedded in the trace:
The investigation can easily jump back to Rollbar view of the associated call stacks and source code:
Leveraging each other’s strengths, Lighstep and Rollbar are removing guesswork from the error resolution path, giving developers time back to focus on shipping quality code.
Recently Lighstep published a GitHub Action that enables developers to understand t system health inside a pull request — before they merge their code. The Lightstep Pre-Deploy Check GitHub Action leverages publicly-available APIs from Lightstep and Lightstep Partners, including Rollbar, to provide a summary of deployment risk ahead of a code change going to a production environment.
When the Rollbar Versions feature is enabled, errors are collected for a service instrumented in Lightstep and available via a link in the pre-deploy check details. The Lightstep action checks for new errors since the last deploy, and connects the Rollbar UI from the GitHub PR via a link in the pre-deploy check details.