Asia, Pacific, and Japan
Europe, Middle East, and Africa
Sales
+1 (844) 863-1987OpenTelemetry (OTel) is an Observability framework and toolkit that is a shared language, data model, and unification of underlying propagation behaviors. OTel is also designed for managing telemetry data such as traces, metrics, and logs, and is open source, allowing for greater flexibility.
OpenTelemetry is the result of a merger between two prior projects, OpenTracing and OpenCensus. Both of these projects were created to solve the same problem: the lack of a standard for how to instrument code and send telemetry data to an observability backend. However, neither project was fully able to solve the problem on its own, and so the two projects merged to form OpenTelemetry so that they could combine their strengths and truly offer a single standard.
Crucially, OpenTelemetry is vendor- and tool-agnostic, meaning that it can be used with a broad variety of observability backends, including open-source tools like Jaeger and Prometheus, as well as commercial offerings. OTel is a Cloud Native Computing Foundation (CNCF) project, built to function as a robust, portable, and easy to instrument solution across many languages. Providing a single set of APIs, libraries, agents, and collector services to capture distributed traces and metrics from your application. OpenTelemetry makes true observability possible.
"Observability" is a term that has gained significant attention in the world of software engineering and system administration—almost to the point of losing some of its meaning. Today, even many of those who work in IT use the term loosely to indicate any kind of system visibility. But what exactly is observability, and how does it differ from traditional monitoring?
Monitoring is the process of actively watching and checking the system's status, typically using predefined thresholds and alerts. While this approach can call your attention to the impacted systems when something goes wrong, monitoring often doesn't provide you with insights into why the issue is occurring or how to best resolve it.
Observability, on the other hand, is a more comprehensive approach. It refers to the ability to understand what's happening inside a system by analyzing its external outputs. Observability allows you to ask any question about what occurred at any time, without having predefined what you wanted to know in advance. Essentially, monitoring can tell you when a problem exists, but observability helps you understand why that problem occurred.
There are three main pillars that are essential to observability:
Traces A trace represents the journey of a request through a system and can provide a detailed picture of service interactions. OpenTelemetry traces are organized into spans that contain attributes and events, describing and contextualizing the work being done across various processes and services. Metrics Metrics are numerical representations of data measured over intervals of time. They allow for the continuous monitoring of system performance and behavior. In OpenTelemetry, the Metrics API can be used to record, set, and add system metrics, offering tools for filtering and aggregate analysis later on. Logs Logs are text-based records generated by systems and applications. They provide a chronological account of events and can be vital for debugging and understanding system behavior.While these three elements have long been considered the most essential factors in observability, the increasing scale and complexity of modern distributed systems has forced this traditional model to evolve. Practitioners have begun to recognize that the three pillars are not isolated, but fundamentally interconnected and should be used in coordination with each other.
OpenTelemetry plays a crucial role in enabling observability in distributed, microservice-based systems. Its emphasis on portability, availability of implementations and software development kits (SDKs) across most languages, and especially its exporter and collector models, makes it a key tool in gathering and distributing data appropriate for the system.
With OpenTelemetry, teams can efficiently gather key trace and metrics data needed to analyze the system's behavior. This alignment with observability practices allows for a more profound understanding of system performance, thereby enhancing the ability to diagnose and solve issues.
Just as traces are one of the three pillars of observability, tracing is a crucial practice within software development, typically used to profile and analyze application code through specialized debugging tools. In the context of OpenTelemetry, tracing takes on a more nuanced meaning, referring most often to distributed tracing, an essential concept in today's complex architectures.
Distributed tracing represents the application of traditional tracing techniques to modern, microservices-driven applications. Unlike monolithic applications where diagnosing a failure might involve following a single stack trace, distributed systems are simply too complex for this approach—when an application consists of potentially thousands of services running across numerous hosts, individual traces become insufficient. Distributed tracing solves this problem by profiling requests as they move across different service boundaries, generating high-quality data for analysis. This method provides capabilities such as:
Anomaly detection Workload modeling Steady-state problem diagnosisIn OpenTelemetry, a trace is a collection of linked spans, which are named and timed operations representing a unit of work in a request. A span may have a parent, or it can be a root span, which describes the end-to-end latency of the entire trace. Child spans represent sub-operations within the trace.
Root Span The parent or starting point of the trace, representing the entire request's latency. Child Span A specific sub-operations within the trace.Spans encapsulate essential information, including the operation's name, start and end timestamps, events, and attributes that occurred during the span. They also contain links to other spans and the operation's status.
Given the complexity of distributed system trace data, having access to context and other relevant details can make a major positive difference in data analysis. In OpenTelemetry, these tags are known as attributes and events:
Attributes Key-value pairs (known on OpenTracing as “tags”) that can be added to a span to assist in analyzing trace data. Events Timestamped strings with an optional set of attributes that provide further description, enabling a more detailed analysis.Spans in OpenTelemetry are generated by the Tracer, an object that tracks the currently active span and allows the creation of new spans. Propagator objects support transferring the context across process boundaries, a vital aspect of tracing in distributed systems. The Tracer dispatches completed spans to OTel’s SDK’s Exporter, responsible for sending the spans to a backend system for further analysis.
While tracing provides detailed insights into individual requests and operations, it also ties into the broader context of observability within OpenTelemetry, including metrics. By connecting trace data with relevant metrics, organizations can gain a comprehensive understanding of their system's behavior and performance.
In the context of OpenTelemetry, the Metrics API is designed to continually process and summarize raw measurements. This gives organizations enhanced visibility into vital operational metrics, including process memory utilization, error rates, and more. Metrics can be sent to various instruments for aggregation and recording. The OpenTelemetry Metrics API classifies metric instruments according to their semantic meaning rather than the final type of value they export.
The Metrics API provides six distinct metric instruments, each of which fulfills a specific function. These instruments are created and defined through calls to a Meter API—the user's entry point to the SDK. Metric instruments are either synchronous or asynchronous.
Synchronous instruments are invoked within a request and possess an associated distributed context. Synchronous instruments include:
Counter The Counter instrument accepts positive values with the Add() function. This is Ideal for counting data such as bytes received, requests completed, and measuring error incidence, and is particularly well suited to rate measurements. UpDownCounter An extension of the counter functionality, UpDownCounter processes both positive and negative increments through the Add() function. This is useful for monitoring quantities like active requests, memory in use, and queue size. ValueRecorder The ValueRecorder instrument captures values for discrete events using the Record() function. This is used for capturing latency, request size, and queue length, emphasizing a distribution of valuesUnlike synchronous instruments, asynchronous instruments lack an associated context. These instruments are reported by a callback and called only once per collection interval. A synchronous instruments include:
SumObserver The SumObserver instrument utilizes the Observe() function for asynchronous monotonic sums, and is most suitable for data such as cache misses and system CPU. UpDownSumObserver UpDownSumObserver is an asynchronous, non-monotonic counter that accepts positive and negative sums with the Observe() function. This instrument is used for measurements that capture the rise and fall of sums, such as the process heap size. ValueObserver Finally, the ValueObserver empowers organizations with fine-grained control over non-additive measurements using the Observe() function. This is ideal when the focus of the measurements is a distribution.Metric events captured by any instrument consist of a timestamp, instrument definition (name, kind, description, unit of measure), label set (keys and values), value (integer or floating-point number), resources associated with the SDK, and distributed context (for synchronous events only). The variety of metric instruments available offers flexibility in capturing different types of data, aiding in the analysis and monitoring of various system attributes.
Whether monitoring error rates, latency, or resource utilization, OpenTelemetry's metric instruments provide diverse options for developers to gain critical insights into their applications.
The exporter is responsible for batching and transporting telemetry data to a backend system for analysis and alerting. OpenTelemetry's exporter model supports the integration of instrumentation at three different levels:
Service-level integration This involves declaring a dependency on the relevant OpenTelemetry package within your code and deploying it accordingly. Library-dependencies Similar to service-level integration, but specific to libraries, which typically only declare a dependency on the OpenTelemetry API. Platform dependencies These are independent software components that support your service, like Envoy and Istio. They deploy their copy of OpenTelemetry and emit trace context useful for your service.The exporter interface—implemented by OpenTelemetry SDKs—uses a plug-in model that translates telemetry data to the required format for a particular backend system before transmitting the data. This model also supports the composition and chaining of exporters, facilitating shared functionality across different protocols.
One significant advantage of OpenTelemetry's approach is the ease of switching or adding export components. This sets it apart from OpenTracing, where changing the system requires replacing the entire tracer component.
While the exporter model offers great convenience, certain organizational or technical constraints may prevent the easy redeployment of a service to add a new exporter. The OpenTelemetry collector is designed to act as a 'sink' for telemetry data from multiple processes, exporting it to various backend systems like ServiceNow Cloud Observability, Jaeger, or Prometheus. The collector can be deployed either as an agent alongside a service or as a remote application:
Agent Deployed with your service, running as a separate process or sidecar. Remote collector Deployed separately in a container or virtual machine, receiving telemetry data from each agent and exporting it to backend systems.OpenTelemetry consists of the following major components:
A specification for all components A standard protocol that defines the shape of telemetry data Semantic conventions that define a standard naming scheme for common telemetry data types APIs that define how to generate telemetry data A library ecosystem that implements instrumentation for common libraries and frameworks Automatic instrumentation components that generate telemetry data without requiring code changes Language SDKs that implement the specification, APIs, and export of telemetry data The OpenTelemetry Collector, a proxy that receives, processes, and exports telemetry data Various other tools, such as the OpenTelemetry Operator for KubernetesCompatible with a wide variety of open source ecosystem integrations, OpenTelemetry is also supported by a vast number of vendors, many of whom provide commercial support for OpenTelemetry and contribute to the project directly.
OpenTelemetry is designed to be extensible. Some examples of how it can be extended include:
Adding a receiver to the OpenTelemetry Collector to support telemetry data from a custom source Loading custom instrumentation into an SDK Creating a distribution of an SDK or the Collector tailored to a specific use case Creating a new exporter for a custom backend that doesn’t yet support the OpenTelemetry protocol (OTLP) Creating a custom propagator for a nonstandard context propagation formatAlthough most users will not need to extend OpenTelemetry, the project is designed to make it possible at nearly every level.
With the rise of cloud computing, microservices architectures, and increasingly complex business requirements, the need for observability has never been greater.
In order to make a system observable, it must be instrumented. That is, the code must emit traces, metrics, and logs. The instrumented data must then be sent to an Observability backend.
OpenTelemetry does two important things:
Allows you to own the data that you generate rather than be stuck with a proprietary data format or tool. Allows you to learn a single set of APIs and conventionsThese two things combined enable teams and organizations the flexibility they need in today’s modern computing world.
OpenTelemetry is not an Observability back-end like Jaeger, Prometheus, or commercial vendors. OpenTelemetry is focused on the generation, collection, management, and export of telemetry data. The storage and visualization of that data is intentionally left to other tools.
The need for end-to-end visibility is more crucial than ever. OpenTelemetry provides a standardized approach to telemetry data collection across diverse applications and programming languages. By offering comprehensive insights into system behavior, OTel enables organizations to optimize performance, troubleshoot issues, and deliver a seamless user experience. But when it comes to observability, there is always room for improvement.
ServiceNow Cloud Observability, formerly known as Lightstep, supports OpenTelemetry as the way to get telemetry data (traces, logs, and metrics) as requests travel through services and other infrastructure. Cloud Observability ingests OpenTelemetry data via the native OpenTelemetry Protocol (OTLP). OTLP data can then be exported to Cloud Observability either via HTTP or gRPC.
To further enhance observability and governance in cloud-native environments, ServiceNow also launched the Service Graph Connector for OpenTelemetry (SGC). Leveraging OpenTelemetry data and the ServiceNow Cloud Observability backend, this solution revolutionizes how businesses gain visibility and insights into their cloud-native applications and Kubernetes infrastructure. SGC automatically discovers and maps service dependencies, creating an accurate and up-to-date service topology. By enabling organizations to assess the impact of changes, streamline incident management, and foster cross-functional collaboration, this solution improves efficiency, reduces risk, and helps ensure the best possible observability into complex IT environments.
Experience the transformative capabilities of Service Graph Connector for OpenTelemetry; contact ServiceNow today!