Lightstep from ServiceNow Logo

Products

Solutions

Documentation

Resources

Lightstep from ServiceNow Logo
< all blogs

OpenTelemetry 101: What Is Observability?

By now, you've likely heard about OpenTelemetryOpenTelemetry, an open source observability framework created by the merger of OpenTracingOpenTracing and OpenCensusOpenCensus.

What is Observability?

You may be asking yourself, “What’s observability, anyway?” Observability is a topic that has been in the news a lot recently, and it seems that every application monitoring vendor is trying to rebrand as an ‘observability’ vendor. This paper aims to demystify observability, explain the concepts you need to know in order to understand OpenTelemetry, and why this project matters.

The term ‘observability’ stems from control theorycontrol theory, an engineering discipline that concerns itself with how to keep dynamic systems in check, and refers to the ability to infer the internal state of a system based on its external outputs. An applied example of control theory can be seen in cruise control for cars. Under constant power, a car’s speed would decrease as it drives up a hill; in order to keep the vehicle’s speed consistent, an algorithm increases the power output of the engine in response to the measured speed. This is also an application of observability — the cruise control subsystem is able to infer the state of the engine by observing the measured output (in this case, the speed of the car).

Observability in software

In software, observabilityobservability generally refers to the ability to understand an application’s performance based on output data, or telemetry. In distributed systems, this telemetry can be divided into three major categories:

  • Traces: contextual data about a request through a system

  • Metrics: quantitative information about processes

  • Logs: specific messages emitted by a process or service

Historically, these three verticals have been referred to as the “three pillars” of observability. The growing scale and complexity of software have led to changes in this model, however, as practitioners have not only identified the interrelationships between these types of telemetry data, but coordinated workflows involving them.

For example, time series metrics dashboards can be used to identify a subset of traces that point to underlying issues or bugs. Log messages associated with those traces can identify the root cause of the issue. When resolving the issue, new metrics can be configured to more proactively identify similar issues before the next incident.

Observability + OpenTemeletry

The ultimate goal for OpenTelemetry is to ensure that this telemetry data is a built-in feature of cloud-native software. This means that libraries, frameworks, and SDKs should emit this telemetry data without requiring end-users to proactively instrument their code. To accomplish this, OpenTelemetry is producing a single set of system components and language-specific libraries that can create, collect, and capture these sources of telemetry data and export them to analysis tools through a simple exporter model.

In summary, observability in software is about the integration of multiple forms of telemetry data which together can help you better understand how your software is operating. It is unique from traditional application monitoring because it focuses on the integration of multiple forms of telemetry data, and the relationships between them. Observability doesn't just stop at the capture of telemetry data, however – the most critical aspect of the practice is what you do with the data once it's been collected. This is where a tool like Lightstep comes in handy, providing features such as correlation detection, historical context, and automatic point-in-time snapshots through unparalleled analysis of your telemetry data.

In the next part of this series, we'll take a deeper dive into telemetry data sources, starting with tracing.

Interested in joining our team? See our open positions herehere.

September 29, 2019
3 min read
Distributed Tracing

Share this article

About the author

Austin Parker

Austin Parker

Read moreRead more
Distributed Tracing

A modern guide to distributed tracing

Austin Parker | Dec 21, 2022

Austin Parker reviews developments, innovations, & updates in the world of distributed tracing

Learn moreLearn more
Distributed Tracing

Distributed Tracing: Why It’s Needed and How It Evolved

Austin Parker | Oct 1, 2020

Distributed tracing is the “call stack” for a distributed system, a way to represent a single request as it flows from one computer to another.

Learn moreLearn more
Distributed Tracing

How we built & scaled log search and pattern extraction

Karthik Kumar, Katia Bazzi | Jul 31, 2020

We recently added the ability to search and aggregate trace logs in Lightstep! This article will go through how we built & scaled log search and pattern extraction.

Learn moreLearn more
THE CLOUD-NATIVE RELIABILITY PLATFORM

Lightstep sounds like a lovely idea

Monitoring and observability for the world’s most reliable systems