Lightstep from ServiceNow Logo

Products

Solutions

Documentation

Resources

Lightstep from ServiceNow Logo
< all blogs

OpenTelemetry: Emerging standard for all DevOps solutions - error analytics

Lightstep wants to help customers and our DevOps partnersDevOps partners adopt OpenTelemetry. As a founding memberfounding member and core contributor of the standard, we have the expertise, tools, and templates to help vendors easily adopt it in their solutions.

We are launching a series of OpenTelemetry-based tutorials and example instrumentation for different DevOps solutions. We’ll show how to connect these solutions to observability data within Lightstep, and show how that’s a better user experience for running experiments, operating cloud services, or investigating errors. In earlier posts, we showed how you can extend instrumentation to AWS cloud servicesAWS cloud services and feature flag solutionsfeature flag solutions.

In this post we discuss the value of adding instrumentation to error analytics tools as part of your overall monitoring and incident response workflow.

Error Analytics

Error analytics are at the center of troubleshooting potential problems with software, especially unexpected problems with code. Error analytics solutions provide detailed analysis that can connect specific lines of code to a problem. They are a key part of developer workflows that happen behind the scenes when a customer receives a message that says something went wrong.

Connecting errors to root cause

Error analytics solutions work well when developers can connect a specific signal to an error. For example, a new version of an app is released and the error count spikes. A quick look at error analytics for that app points to an exception coming from a new line of code that didn’t cover some edge case.

In speaking with Lightstep customers, we know some of the trickiest kinds or errors are those where they are not connected to an obvious cause or they are missed entirely. This is especially problematic when teams scale to dozens or hundreds of services.

Here are some examples of root causes not-at-all obvious customer-facing errors we’ve seen:

  • Latency in retrieving data from cloud-based object storage service—itself a dependency of a backend service—occasionally caused requests for mobile app users in Europe to fail with a cryptic error message.

  • Requests suddenly fail for a small subset of customers when pod restarts in Kubernetes.

  • A dependency of backend service 500s when a specific request is made from customers with over 1,000 active subscribers.

At the center of investigating all three problems is an error trace in an error analytics tool. Unfortunately, getting to the root cause can be extremely difficult—different services involved are owned by different teams and the types of telemetry each collect vary and live in different places.

OpenTelemetryOpenTelemetry allows developers to link errors and their associated rich metadata (like the line of code where the errors was observed) to their telemetry, specifically specifically distributed traces. With a single line of code, this can be done automatically using an open standards based plugin.

Here’s Rollbar error information embedded in Lightstep’s Trace View:

Error analysis

With workflow links, it’s possible to then jump directly into RollbarRollbar to troubleshoot errors. This is complementary with our previous OpenTracing-based integrationOpenTracing-based integration with Rollbar.

Next steps

Check our Lightstep Developer Toolkit:

Contact usContact us if you’d like to learn more or know what’s planned for future integrations or know more about OpenTelemetry.

March 18, 2021
3 min read
OpenTelemetry

Share this article

About the author

Clay Smith
OpenTelemetry

OpenTelemetry Collector in Kubernetes: Get started with autoscaling

Moh Osman | Jan 6, 2023

Learn how to leverage a Horizontal Pod Autoscaler alongside the OpenTelemetry Collector in Kubernetes. This will enable a cluster to handle varying telemetry workloads as the collector pool aligns to demand.

Learn moreLearn more
OpenTelemetry

Observability-Landscape-as-Code in Practice

Adriana Villela, Ana Margarita Medina | Oct 25, 2022

Learn how to put Observability-Landscape-as-Code in this hands-on tutorial. In it, you'll use Terraform to create a Kubernetes cluster, configure and deploy the OTel Demo App to send Traces and Metrics to Lightstep, and create dashboards in Lightstep.

Learn moreLearn more
OpenTelemetry

OpenTelemetry for Python: The Hard Way

Adriana Villela | Sep 20, 2022

Learn how to instrument your Python application with OpenTelemetry through manual configuration and manual context propagation, using Lightstep as the Observability back-end.

Learn moreLearn more
THE CLOUD-NATIVE RELIABILITY PLATFORM

Lightstep sounds like a lovely idea

Monitoring and observability for the world’s most reliable systems