DigitalOcean Uses LightStep [x]PM to Get a Reliable Picture of its Distributed System and Saves 1,000 Hours of Developer Time per Month

April 5, 2018 | Kristin Brennan | Case Studies

LightStep [x]PM enables customers, including DigitalOcean, to diagnose performance problems across service boundaries and identify the teams that can fix them using end-to-end traces. DigitalOcean uses [x]PM to monitor 100+ apps in real time across its distributed system. [x]PM also helps engineers work together and improve productivity, saving 1,000 hours per month of developer time.

Challenge: performing root cause analysis in a distributed system

As its software team was growing quickly, DigitalOcean wanted to improve the way it responded to errors and performance degradations. The company needed a source of truth to see a complete, reliable picture of the system in real time that would help them all have the same baseline information. Teams were shipping features efficiently, but communication across different engineering teams had suffered. Because it was so difficult to pinpoint the exact origins of a performance problem, it was also difficult to determine the right person to address the problem. Teams had logs at their disposal, but correlating events in log data was like looking for a needle in a haystack, wasting countless developer hours per week. According to Dave Smith, Sr. Director of Engineering at DigitalOcean: “In our increasingly complex environment, it was impossible for a single person to understand the entire system. Root cause analysis was becoming difficult, and we couldn’t find an application performance monitoring system robust enough to work with our heterogeneity.”

Find the root cause and assign the right team to fix it quickly

[x]PM was able to fit into DigitalOcean’s complex ecosystem, and now gives the engineers a real-time view of the entire system. 100+ apps are being monitored using [x]PM, and the organization is using the results to promote intra-company accountability and visibility. They also have 144 company-wide visible dashboards that help each team understand their services’ performance and see how it relates to all the other services hosted by other teams.

DigitalOcean Uses [x]PM’s End-to-End Traces Along with Customizable Dashboards and AlertsCustomize dashboards to measure application performance along any dimension, by team ownership, customer transactions, or even individual services.

[x]PM has also changed how teams collaborate on root cause analysis. Prior to using [x]PM, logs were one of the main ways to drill into issues and identify a root cause. It involved digging through multiple databases and external services to identify the problem, followed by a lengthy search through logs to find the cause. Identifying the responsible team to fix the issue was an additional challenge before final remediation. Using [x]PM’s end-to-end traces, alongside customizable dashboards and alerts, this process was cut down to 2-3 steps, and it was completed in less than 15 minutes. [x]PM breaks down a performance issue into detailed traces, which connects the dots and explicitly highlights the root cause. This process makes it easy to identify the team that can mitigate the issue even when it crosses teams and service boundaries. “[x]PM scales beautifully with our business and our use cases. We’re very pleased with our decision to standardize on it for application performance management,” said Smith.

Read the full case study, DigitalOcean Uses LightStep [x]PM as a Source of Truth for its Distributed System, Saving 1000 Hours of Developer Time per Month, to get more information about DigitalOcean’s success.

We're Hiring!

Add your talent and experience to our team of friendly, low-ego, and motivated people.