Announcements


Lightstep adds complete system context to PagerDuty alerts


Fran Thorpe

by Fran Thorpe

Explore more Announcements Blogs

Fran Thorpe

by Fran Thorpe


10-23-2020

Looking for Something?

No results for 'undefined'

“Now developers automatically have PagerDuty on-call details inside of a pull request, alongside system health details, at their fingertips in one screen.” Steve Gross, Sr. Director, Strategic Ecosystem Development at PagerDuty

There is a lot of noise surrounding the term “Observability”. While vendors and pundits debate three pillars, Lightstep has partnered with PagerDuty, to ensure software teams can move from context within an incident to quickly understand and determine root cause. Together we’re augmenting incident response solutions for pre-production scenarios.

Today, when a developer gets an early-morning notification and it’s unfortunately a major incident, they immediately want to know the context surrounding that incident. Lightstep adds extensive insights and correlation detail for the production system to PagerDuty's incident response workflow. Given the rich Lightstep and PagerDuty data-sets and the context they bring, we saw an opportunity to help developers understand an incident even before opening the runbook.

The moment before you hit Merge

Both alert and observability context can also provide relevant insights just before developers make an important code change to a production system. For example, when service owners working in GitHub are about to merge a pull request that has passed code review, they are likely missing important information without switching between different solutions. They don’t have access to product health context, and they don't necessarily know who is on-call and responsible for the service in production. For Lightstep and PagerDuty, this provides an opportunity to ask and answer “Is the code ready to deploy?”

Observability - PagerDuty

Recently Lighstep published the Lightstep Pre-Deploy Check GitHub Action, providing an opinionated view of the health of the whole system before developers merge their service’s code, inside a pull request. Automatically surfacing complementary data from Lightstep and PagerDuty, just before a merge is initiated, helps teams ship move quickly and reliably. Developers gain additional context: who owns which service, information about the on-call team, and even an immediate view of system health and performance via a Lightstep Snapshot.

Github + Lightstep

If issues are surfaced by the Action, the developer has what’s needed to investigate before clicking the merge button. This is very different from a production issue or ongoing incident. The Action gives the developer visibility to the grey area where latency might be slightly higher although the customer experience is not adversely impacted yet. The developer now has all the context needed, including the name of the person on-call for the service, before they decide the system is all clear to deploy new code.

Adding more context with a PagerDuty change event.

Lightstep brings context to services in PagerDuty using the new Change Events API. The Action detects issues with the production system, and generates a Change Event. In addition to customized messages (i.e.”Lightstep Pre-deploy Check failed”), the Action attaches metadata: the pull request and a Lightstep Snapshot.

PagerDuty + Lightstep

PagerDuty to Lightstep Snapshot

The Incident Response team now has real-time access to all the telemetry for a production system at the time the code merged all the traces, metrics and correlations presented in a easy-to-consume UI that includes a service diagram. With Lightstep Pre-Deploy Check and the PagerDuty Change Event, developers and Incident Response teams have more control, and a simple and clear way to see all the interactions between what they are developing, deploying, and then investigating, when something inevitably goes wrong.

How can I try out these new features?

Explore more Announcements Blogs