Enhancements in LightStep Monitoring
by Joe Blubaugh
Our customers often tell us they’re able to diagnose their systems’ problems much faster with LightStep than other tools. One key reason for this is LightStep’s ability to quickly and accurately monitor latency statistics and error rates. However, many customers also ran into limits with our SLAs: only one latency SLA per saved search, and one error rate SLA.
We’re happy to announce that we’re removing those limits. SLAs are now called Conditions, and they have powerful new capabilities.
Monitor any number of conditions for each saved search
You can now have any number of conditions in LightStep – create three latency conditions for different latency percentiles, different error conditions to notify different groups, or send notifications to different places depending on the severity of the condition. You can write monitoring conditions that are flexible and fit in with your team’s workflow.
New signals to monitor
Monitor latency, operation rate, or error rate
In addition to monitoring the latency and error rate in your saved searches, you can now monitor the operations-per-second rate for a saved search. This is useful if you need to monitor whether a service is online and the load is in line with your expectations.
You can also monitor whether any value is above or below a threshold. This is great for setting bounds on operation rates, so you get notified if traffic is significantly above or below your expectations. Operation rates, error rates, and latency percentiles can all have “greater than” and “less than” conditions.
Monitor over any time range
LightStep monitoring has been tuned for quick response to signals, and by default, it monitors the last two minutes of data from your system. We’ve learned from you that sometimes you’d prefer to monitor the last hour, day, or even last week of performance data for some systems. LightStep monitoring now lets you control the evaluation window of your conditions, giving you flexibility between extremely sensitive monitoring and the ability to detect more sustained trends.
When a condition is violated, you can now send notifications to as many Slack, PagerDuty, or Webhook destinations as you like, rather than one to each.
Send notifications to PagerDuty, Slack, or Webhooks
You can temporarily add a “snooze” for any condition to prevent it from sending notifications. This can be useful when experimenting with new condition settings or during a pager storm. We’ll keep track of who applied the snooze and when, so you’ll always know “why didn’t I get paged?”
Getting started with monitoring
All your existing SLAs work the same as they did before, and you can find them in the “Monitoring” section of the LightStep navigation bar, represented by the bell. You can create new Conditions for a saved search on the “Saved Search” page by opening the “Conditions” drawer on the right-hand side.
Monitoring is under the Bell icon on the left-hand side
We’re excited to give you improved and more powerful monitoring in LightStep, so you can go even faster from an alert to a root cause. Give it a try and share your feedback with us.
Joe BlubaughSoftware Engineer
Joe Blubaugh works on data storage systems and data analysis tools for LightStep customers. He’s interested in real-time data systems, distributed system design, and data visualization.