Twilio Improves Mean Time To Resolution (MTTR) by 92% with Lightstep
by Kristin Brennan
Lightstep enables companies to pinpoint the root cause of issues quickly, and Twilio used Lightstep to improve mean time to resolution (MTTR) by 92%.
Challenge: reducing time to detect and remediate issues
When we first talked to the team at Twilio, they said they wanted to be able to identify traces of specific, noteworthy events, but traditional approaches – like centralized logging – were “simply not the right solution. Logging solutions can provide information about who, what, and where things happened, but Lightstep answers why things happened and helps us do root cause analysis very quickly,” said Jason Hudak, VP of Platform Engineering at Twilio.
Lightstep satellite architecture yields targeted insights
Lightstep is built on Lightstep’s cutting-edge Satellite Architecture which distributes data collection and statistical analyses, yielding targeted insights from anywhere within today’s software systems. To help customers reduce MTTR, Lightstep delivers prompt, content-rich alerts and provides real-time traces that give visibility into exactly how separate services and parts of an application interact with each other.
Root causes for anomalous latency spikes or errors are often buried in some backend service, making them extremely difficult to uncover. Lightstep lets users easily drill down and examine the complex service interactions for very large traces across arbitrary time ranges and for any latency band to diagnose those issues. Lightstep further analyzes these services within the context of one another for every trace to help users quickly determine the critical path, and it presents log information and payloads inline for each transaction of interest. These capabilities enable customers like Twilio to visualize, identify, and resolve issues faster.
Visualize, identify, and resolve latency spikes and errors faster with Lightstep
Lightstep has demystified root cause analysis at Twilio. As Hudak said, “With Lightstep, our ability to detect and remediate issues has dramatically improved. When we go through exercises to test the system, root cause analysis for many complex failures has been reduced from an average of 40 minutes to less than three minutes with Lightstep. This saves our engineering team nearly 20 hours each week.”
Read the full case study, Twilio Improves Mean Time To Resolution (MTTR) by 92% with Lightstep, to get all of the details about Twilio’s success.