Lightstep from ServiceNow Logo





Lightstep from ServiceNow Logo
< all blogs

The Big Pieces: OpenTelemetry client design and architecture

In this blog series, The Big Pieces, we’re going right to the heart of it: OpenTelemetry’s raison d'etre, design philosophy, and resulting architecture.

In this first installment, we’re going to cover the high-level goals of the project as a whole, then dive into the design goals of the OpenTelemetryOpenTelemetry client components which run within the application process.

OpenTelemetry: The Bigger Picture

Observing a distributed system can be a complex operation. It’s important to understand how OpenTelemetry fits into the bigger picture of observability, and how it works with other tools in your observability toolkit. Let’s start with the root of the name itself.

te·lem·e·try /təˈlemətrē/ Noun The science or process of collecting information about objects that are far away and sending the information somewhere electronically.

OpenTelemetry transmits observations about distributed computer systems. The signals start from within OSS libraries, frameworks, and applications, and they end at the doorstep of a distant analysis tool or storage system.

The OpenTelemetry project has two primary goals. To create a standardized language for describing distributed systems, and a place for the community and share the mountain of code required to instrument and observe this gigantic, ever-growing world of software.

The OpenTelemetry project also has an anti-goal. We do not want to standardize the analysis of data. We believe that data analysis will become a greenfield of possibility once a stable bridge has been built between systems which describe what they do and systems which analyse those descriptions. That stable bridge is OpenTelemetry.

And, yep, that’s also why we named it OpenTelemetry. A single transmission system for many analysis systems.

OpenTelemetry Client Architecture

The vast bulk of the code in an observability system is instrumentation code, which lives within frameworks, libraries, and application code. We believe that code should be self-instrumenting, and not rely upon instrumentation plugins written by others.

Observability is a cross-cutting concern; many, many packages make many, many API calls against the instrumentation API. Shared packages, such as popular OSS libraries, must run in many environments and are sensitive to taking on dependencies. Do no harm is an important goal when designing and supporting such a widely used API.

To allow for maximum compatibility, the OpenTelemetry clients are designed to have a clean separation of concerns between their interface and their implementation.

The OpenTelemetry API

If we want important, widely-shared libraries to self-instrument, there are several design goals the instrumentation API needs to hit.

  • Minimize the potential for version incompatibility: the newest API package must be backwards compatible with all prior API packages, no exceptions.

  • Minimize the number of dependencies: transitive dependencies can create their own version incompatibilities. Looking at you, gRPC. Since we cannot control these other codebases, we should not rely on them to meet our compatibility requirements.

An independent API allows for multiple implementations. OpenTelemetry currently provides two implementations per language. The API includes a no-op implementation which is used by default, and a production implementation is provided in a separate package called the SDK.

Other implementations are possible. For example, an implementation which binds to OpenTelemetry C++ via foreign function calls may be extremely performant in some scenarios. In other scenarios, the C++ dependency might be a deal-breaker. The ability to switch to a specialized implementation when the situation permits is a great example of robust design leading to flexibility and optimization.

The OpenTelemetry SDK

The OpenTelemetry SDK implements the API, and provides a framework for processing the data. Like most frameworks, the SDK follows the observer pattern: at the lowest level, lifecycle hooks can be attached to every API call. Higher-level plugin interfaces are also provided:

  • Samplers: control whether data should be collected or dropped to conserve resources.

  • SpanProcessors: one a spanspan is complete, span processors can manipulate the resulting SpanData object.

  • Exporters: serialize spans into a data protocol and send them off to be processed.

You don’t really want to mix and match samplingsampling plugins and exporters. In fact, most configuration is just boilerplate, once you’re made a few key decisions.

These plugins can be bundled up and preconfigured into a coherent telemetry client, called an SDK distro. This makes it easy to install SDKs which are designed to talk to Jaeger, Lightstep, AWS, GCP, and other systems, without having a single massive distro which pulls in all of the collective dependencies that come with all of these systems.

By default, OpenTelemetry provides an SDK distro designed to flush data to a nearby collectorcollector as fast as possible, minimizing buffering and configuration within the application process.

API vs SDK: a separation of concerns

How does this API/SDK separation work in practice? While the SDK performs all of the work, it does so primarily in the background. Application developers initialize and configure the SDK at program startup, before loading the rest of their code, and may need to access the SDK again during program shutdown. Outside of setup and shutdown, there is no need to reference the SDK.

The rest of the program – a mix of OSS libraries and application logic – only accesses the OpenTelemetry API. The SDK provides flexibility, and the API provides stability.

And that’s all you need to know about OpenTelemetry’s client architecture. Check out The Big Pieces: Part 2 on OpenTelemetry CollectorsOpenTelemetry Collectors. Interested in more? Check out our latest video on how to instrument our OpenTelemetry LaunchersOpenTelemetry Launchers.

Interested in joining our team? See our open positions herehere.

November 19, 2020
5 min read

Share this article

About the author

Ted Young

From Day 0 to Day 2: Reducing the anxiety of scaling up cloud-native deployments

Jason English | Mar 7, 2023

The global cloud-native development community is facing a reckoning. There are too many tools, too much telemetry data, and not enough skilled people to make sense of it all.  See how you can.

Learn moreLearn more

OpenTelemetry Collector in Kubernetes: Get started with autoscaling

Moh Osman | Jan 6, 2023

Learn how to leverage a Horizontal Pod Autoscaler alongside the OpenTelemetry Collector in Kubernetes. This will enable a cluster to handle varying telemetry workloads as the collector pool aligns to demand.

Learn moreLearn more

Observability-Landscape-as-Code in Practice

Adriana Villela, Ana Margarita Medina | Oct 25, 2022

Learn how to put Observability-Landscape-as-Code in this hands-on tutorial. In it, you'll use Terraform to create a Kubernetes cluster, configure and deploy the OTel Demo App to send Traces and Metrics to Lightstep, and create dashboards in Lightstep.

Learn moreLearn more

Lightstep sounds like a lovely idea

Monitoring and observability for the world’s most reliable systems