Taming Modern Architectures with Lightstep and Sumo Logic
by Andrew Chee
Traditional monitoring tools built for the monolith simply can’t handle the complexity. They can’t trace the flow of the transaction across service boundaries, and more importantly, they weren’t designed to tell stories about what’s actually happening in the system. That leaves the on-call engineer, who was awakened at 2 am to fight a performance fire, in a precarious position. To address this, Lightstep and Sumo Logic have joined forces to help organizations tame their modern systems and make engineering teams more efficient and effective at solving problems, which ultimately will reduce MTTR and improve customer experiences.
Lightstep is a platform built for the modern microservice architecture that provides detailed tracing information across service boundaries as well as powerful application-level metrics. Sumo Logic is the leading cloud-native machine data analytics platform. Recently, Lightstep and Sumo Logic partnered to create an integration that provides customers with a full suite of observabilitytools to help them measure and explain performance issues.
The integration includes two main use cases. The first one exposes Lightstep application latency metrics within the Sumo Logic UI. Lightstep ingests 100% of production tracing data which allows it to calculate and provide latency metrics at any arbitrary percentile desired. This allows you to correlate your distributed transaction performance with machine data (logs and performance metrics) gathered through Sumo Logic, providing powerful correlation signals between different data sources. These signals can show you where performance bottlenecks are occurring.
Lightstep application metrics correlated in Sumo Logic dashboard.
Having time-correlated metrics is an important way to narrow the problem area and offending services. But digging deeper in a distributed services environment requires the ability to trace individual calls across service boundaries to see where the problem is in the stack. Lightstep also captures example traces along latency distributions, so you can guarantee a trace from your average latency transactions (P50) all the way to your 1/1000 (P99.9) transactions and beyond. Currently, this is exposed in the Lightstep UI.
Lightstep alerts and example traces exposed in Sumo Logic dashboard.
The second use case exposes alerts and example traces directly within Sumo Logic. Lightstep alerts always provide traces of transactions that violated the alerting condition, and these alerts can be exposed directly inside the Sumo Logic UI. These alerts can be visualized within Sumo Logic dashboards or searches to provide a single pane of glass that allows for a holistic view into your entire application environment.
Of course when problems occur, you need to investigate the problem and identify the root cause(s). Lightstep can provide the additional depth of a distributed trace.
Traces direct users to problem areas with tag and log context in one place.
Selecting a problematic trace within Sumo Logic will take you directly to the Lightstep Trace View where you can view the transaction as well as the operation within the transaction that’s causing the issue. With modern distributed systems, it’s very difficult to identify the problem operation within the service, as a transaction crosses multiple service boundaries. However, using the Lightstep distributed trace UI, the problem areas are immediately highlighted. Additional context such as metadata tags (key/value pairs) and log statements can also be included which provides more information about the offending operation/service.
The ultimate goal of any observability system is to quickly identify a problem and its root cause. Together, Sumo Logic and Lightstep provide the full suite of signals required to provide immediate, actionable information to help solve today’s complicated performance problems.