Twilio, Github, and Under Armour gain complete visibility with Lightstep

See how!

Monitoring


Managing SLOs and SLIs in Lightstep


Ashley Rahimi Syed

by Ashley Rahimi Syed

Explore More Monitoring Blogs

Ashley Rahimi Syed

by Ashley Rahimi Syed


03-18-2020

Looking for Something?

No results for 'undefined'

Why are service levels important?

Because ultimately, service level objectives, indicators, and agreements (SLO, SLIs, and SLAs), reflect customer expectations. They help developers manage the kinds of failures that will stand in the way of your customers’ success.

Lightstep can help you monitor and meet your Service-Level Objectives (SLOs) and resolve incidents quickly. It’s easy to set custom alerts and notify your team as soon as SLI behavior trends towards a regression. Here’s how it works.

How to track key SLOs with custom alerting 

Let’s start with a visual representation of the performance history of a specific service, operation, or query. In Lightstep, we refer to this as a stream.

Here we have a stream for the Krakend API Gateway service in this system. I’ll click the “Create Condition” button in the upper-righthand corner to produce the dialogue seen below. I can choose exactly which signal I’d like to monitor – latency, error percentage, or operation rate – along with the threshold and evaluation window. In this instance, I’ve indicated that I’d like to be alerted if the error percentage for this stream surpasses 10% in a ten minute period.

PagerDuty Lightstep gif

Now that I’ve set my conditions, I have to add an alerting rule. From here, I’ll select the PagerDuty integration from the list of available options, along with the destination and update interval.

Lightstep view of PagerDuty integration

Lightstep will automatically review the percentage of errors affecting the service’s performance health over the last ten minutes, and report any findings in the sidebar. It is clear in the image below that the condition I set was breached sometime during the last ten minutes.

Pagerduty Alert in Lightstep

As a result, a page was triggered by PagerDuty as soon as this breach was detected.

Lightstep showing PagerDuty Alert Now, I can immediately investigate the error using Lightstep, and hopefully identify and implement a solution more quickly.

What’s Different About Investigations with Lightstep

When trying to rapidly restore service, it can be difficult to separate good hypotheses from bad ones. But Lightstep can help you avoid the guesswork entirely: with unlimited cardinality and a high-fidelity dataset uncompromised by sampling, Lightstep reveals issues unavailable to conventional monitoring solutions. It instantly analyzes thousands of traces from your system to produce root-cause insights for performance regressions, so your team can resolve issues quickly and meet SLOs.

Want to see it for yourself? Check out our free interactive sandbox, where you can debug an iOS error or resolve a performance regression using our suite of observability tools.

Explore More Monitoring Blogs