Twilio Improves Mean Time To Resolution (MTTR) by 92% with LightStep [x]PM

March 13, 2018 | Kristin Brennan | Case Studies

LightStep [x]PM has helped some of the world’s most innovative companies, including Twilio, monitor what matters most and diagnose anomalies within seconds. LightStep enables companies to pinpoint the root cause of issues quickly, and Twilio used [x]PM to improve mean time to resolution (MTTR) by 92%.

Challenge: reducing time to detect and remediate issues

When we first talked to the team at Twilio, they said they wanted to be able to identify traces of specific, noteworthy events, but traditional approaches – like centralized logging – were “simply not the right solution. Logging solutions can provide information about who, what, and where things happened, but LightStep [x]PM answers why things happened and helps us do root cause analysis very quickly,” said Jason Hudak, VP of Platform Engineering at Twilio.

LightStep [x]PM satellite architecture yields targeted insights

[x]PM is built on LightStep’s cutting-edge Satellite Architecture which distributes data collection and statistical analyses, yielding targeted insights from anywhere within today’s software systems. To help customers reduce MTTR, [x]PM delivers prompt, content-rich alerts and provides real-time traces that give visibility into exactly how separate services and parts of an application interact with each other.

Root causes for anomalous latency spikes or errors are often buried in some backend service, making them extremely difficult to uncover. [x]PM lets users easily drill down and examine the complex service interactions for very large traces across arbitrary time ranges and for any latency band to diagnose those issues. [x]PM further analyzes these services within the context of one another for every trace to help users quickly determine the critical path, and it presents log information and payloads inline for each transaction of interest. These capabilities enable customers like Twilio to visualize, identify, and resolve issues faster.

LightStep [x]PM Real Time Trace for Root Cause AnalysisVisualize, identify, and resolve latency spikes and errors faster with LightStep [x]PM

[x]PM has demystified root cause analysis at Twilio. As Hudak said, “With [x]PM, our ability to detect and remediate issues has dramatically improved. When we go through exercises to test the system, root cause analysis for many complex failures has been reduced from an average of 40 minutes to less than three minutes with [x]PM. This saves our engineering team nearly 20 hours each week.”

Read the full case study, Twilio Improves Mean Time To Resolution (MTTR) by 92% with LightStep [x]PM, to get all of the details about Twilio’s success.

We're Hiring!

Add your talent and experience to our team of friendly, low-ego, and motivated people.