What is OpenTelemetry?
Today’s cloud-native software, and the organizations that build it, require observability to perform at the highest of levels. Observability requires high-quality telemetry, which is data that reports on the state of a system environment. Through analysis and understanding of this data, SRE teams and developers can model and understand their system in order to improve performance, see the impact of changes, give maximum control over data, and build better end-user experiences.
OpenTelemetry is an open source project from the Cloud Native Computing Foundation (CNCF), created by a merger of OpenTracing and OpenCensus and uses the sam standard-based approach. The goal of this project is to provide SRE teams and developers with ubiquitous, high-quality, telemetry data to support their observability practice. This is accomplished through a collection of APIs (Application Programming Interfaces), SDKs (Software Development Kits), and integration tooling for creating and capturing telemetry data from application stacks.
In this article, we’ll take a closer look at what telemetry data OpenTelemetry provides, why the project is important, and how it works.
What is Telemetry Data?
Telemetry data consists of signals emitted from applications and resources about their internal state. The three most common types of signals are traces, metrics, and logs, also known as the three pillars of observability.
A trace represents the flow of a transaction through a software system. Traces are composed of many individual units of work known as spans. A span implicitly tracks rate, error, and duration (RED) metrics, which are three of the four golden signals. Each trace corresponds to a single end user’s request as it flows through the system, and each span includes useful contextual metadata such as request properties, method arguments, and error messages.
Traces highlight performance bottlenecks in the workflow of an application from the perspective of an end-user, making them a core signal to an observability strategy.
A metric is a numeric value that is measured over a regular interval of time. Metrics are highly compact, and highly structured. They are designed to be aggregated together from multiple sources and stored for a long period of time.
Metrics are commonly used to understand the health of system resources, such as CPU utilization, memory pressure, or disk consumption. They can also be used to measure application statistics, such as the number of concurrent users logged in to an application, or the count of messages published to a queue.
A log is a time-stamped record of an event that occurs in an application or resource. Logs are designed to be human-readable, and vary in their structure. They can contain a wide variety of data about the system depending on their purpose, but they are difficult to aggregate and search due to the volume of logs produced by a system.
Logs are commonly used to diagnose specific problems in a particular component of a system. For example, developers can understand failures in builds or deployments by inspecting the log output of the deployment tooling.
Why is OpenTelemetry Important?
Traces, metrics, and logs are used by SRE teams and developers to monitor their systems and troubleshoot problems. Without telemetry data, it becomes impossible to diagnose problems or make informed decisions about improving a system. The latter of these is crucial; telemetry allows the continuous monitoring phase of the DevOps lifecycle to be executed. Continuous monitoring improves feedback loops, allowing for iterative improvement of a system.
OpenTelemetry provides common standards and tools to create and capture telemetry data for continuous monitoring. This means that SRE teams and developers aren’t locked in to a vendor-provided agent or library in order to monitor their systems -- OpenTelemetry data is portable to any monitoring tool. OpenTelemetry also provides standard conventions for system metadata, ensuring that common attributes (like CPU utilization, request duration, or request size) are consistently recorded across all languages and runtimes.
The vendor-agnostic nature of OpenTelemetry means that it’s quickly becoming an industry standard for telemetry data. This not only means more industry-leading organizations are adopting it as a pillar of their monitoring efforts, but also that more popular frameworks and libraries are integrating with OpenTelemetry, in order to support innovative application development and better end-user experiences.
How Does OpenTelemetry Work?
At a high level, OpenTelemetry instruments a system to collect telemetry signals and exports those signals for analysis.
OpenTelemetry signals (traces, metrics, and logs) share a common context propagation subsystem, which allows for them to be correlated with each other through shared transaction-level identifiers and common attributes. OpenTelemetry instrumentation is available in 11 different programming languages, and is designed as a series of composable packages, listed here:
OTLP and Semantic Conventions
Collector and Contributed Packages
Let’s look at each in detail.
The OpenTelemetry API provides a framework to define data types and operations for generating telemetry data. The API is independent of the underlying SDK, meaning that developers can include it as a lightweight dependency of third party libraries or frameworks. The API is also composable, meaning that it can be integrated with existing instrumentation systems by re-implementing API functions as needed.
The OpenTelemetry SDK acts as a reference implementation of the API. It provides configuration parameters, extension methods for processing and transforming telemetry data, and export pipelines to send this data elsewhere. The SDK is required to actually generate telemetry data and send it to a back-end analysis tool such as Jaeger, Zipkin, Prometheus, OpenSearch, a commercial offering, or to the OpenTelemetry Collector.
OTLP and Semantic Conventions
The OpenTelemetry Protocol (OTLP) and OpenTelemetry Semantic Conventions define a standard wire format for expressing and transmitting telemetry signal data, as well as common metadata keys and values for that data. This metadata includes attributes corresponding to common system components, like HTTP servers, deployment artifacts, RPC frameworks, database servers, and more.
Collector and Contributed Packages
These are software packages that are separate from the SDK, but provide useful functionality to an OpenTelemetry implementation. Examples include custom processors and exporters, or libraries to instrument third-party libraries such as SQL clients.
Additionally, a variety of external tooling exists to receive and process telemetry. The OpenTelemetry Collector is the foremost of these, as it provides dozens of integrations to receive and transform existing telemetry signals and convert them into OTLP, then send that data to a supported back-end. It can also perform processing -- data masking and redaction, transforming telemetry attributes, batching and compressing telemetry data, or advanced sampling to reduce the amount of telemetry signals emitted.
OpenTelemetry instruments application code using APIs. The SDK translates these API calls into data, which are then exported using OTLP to a collector. The collector processes this data, and exports it to a back-end.
The Benefits of OpenTelemetry
OpenTelemetry’s primary benefit is that it provides consistent, high-quality, telemetry data to enable an observability practice.
Data consistency and quality
OpenTelemetry provides a common, vendor-agnostic, framework across different languages, cloud providers, and third-party frameworks and libraries for telemetry signals. It allows for drop-in instrumentation in many languages, allowing SRE teams and developers to produce high-quality telemetry data without code changes. This consistency is key, as it ensures that telemetry is of uniform quality across multiple teams throughout an organization.
Telemetry data alone isn’t enough to build a resilient system, but observability cannot be achieved without high-quality data. OpenTelemetry enables advanced, cloud-native observability for SRE teams and developers through its semantic conventions and shared context, ensuring that signals are linked together to provide a complete picture of system state and health. OpenTelemetry reduces the time required to begin to see value from an observability practice as it provides the required signals to build high-quality alerts and Service Level Objectives.
How Does OpenTelemetry Relate to Observability?
OpenTelemetry is a requirement for observability, by providing a foundation of high-quality and consistent telemetry data about your system. Using this data, you are able to understand the state of your system and make informed decisions about the necessary actions to improve it. Observability is more than just OpenTelemetry, however; You also need to store, query, and analyze the data it produces. Lightstep performs this latter role, giving your SRE team and developers a planet-scale telemetry observability platform.
Observability is not the same thing as finding a tracing tool, a metrics tool, and a logging tool: it’s about solving problems by putting that telemetry data to work. In this way, you can think of OpenTelemetry as the first step in any observability strategy since high-quality, vendor-neutral data is a great starting point.
Where to Go From Here
OpenTelemetry has contributors from dozens of prominent cloud, networking, and observability companies. Lightstep is a founding member, and top contributor, to OpenTelemetry. To learn more about OpenTelemetry, you can visit its website or GitHub. Want to keep up-to-date? There’s several ways to stay in touch with the project:
Become a part of the
Follow the OpenTelemetry tag on
Subscribe to the
Join the OpenTelemetry
In the future, OpenTelemetry plans to support a variety of initiatives such as improved application-specific metrics for big data systems like Kafka and Hadoop, improved telemetry data from Kubernetes and Windows hosts, as well as streamlining its APIs, installation experience, and configuration. You can read more about how OpenTelemetry works, and future initiatives, at the documentation.
While the exact details differ between implementations, the general process of integrating with OpenTelemetry is consistent:
First, get your process and core libraries instrumented. Auto-instrumentation is a great first step
Validate your instrumentation by sending it to an
Learn how to troubleshoot instrumentation issues
With a good understanding of your telemetry, enrich your data with custom attributes and events
Explore advanced topics and best practices to improve collection and analysis
Steps 4 and 5 are ongoing, and should be revisited regularly so you can ensure your telemetry evolves alongside your observability and business needs.
OpenTelemetry and Lightstep
OpenTelemetry is a Cloud Native Computing Foundation sandbox project and was founded in 2019. OpenTelemetry brings together experts from open source projects, vendors, and the observability community to make it easy to get telemetry out of cloud native applications – and into tools that can analyze that telemetry, like Lightstep.
Cloud-native systems require cloud-native observability to diagnose and remediate issues that impact end-user experience. OpenTelemetry provides a standardized, open-source, vendor-agnostic solution to provide high quality telemetry data for this purpose.
Using OpenTelemetry in your application allows you to leverage the power of platforms like Lightstep. Lightstep is an all-in-one system for observability and incident response that SRE teams and developers depend on every day. Lightstep uses the telemetry data provided by OpenTelemetry to power its best-in-class analysis tools, helping you to more quickly answer the question of what’s changed in your system. You can get started with incident response or book a demo to learn more about the observability features.