The Big Pieces: OpenTelemetry client design and architecture
by Ted Young
In this blog series, The Big Pieces, we’re going right to the heart of it: OpenTelemtry’s raison d'etre, design philosophy, and resulting architecture.
In this first installment, we’re going to cover the high-level goals of the project as a whole, then dive into the design goals of the OpenTelemetry client components which run within the application process.
Observing a distributed system can be a complex operation. It’s important to understand how OpenTelemtry fits into the bigger picture of observability, and how it works with other tools in your observability toolkit. Let’s start with the root of the name itself.
te·lem·e·try /təˈlemətrē/ Noun The science or process of collecting information about objects that are far away and sending the information somewhere electronically.
OpenTelemetry transmits observations about distributed computer systems. The signals start from within OSS libraries, frameworks, and applications, and they end at the doorstep of a distant analysis tool or storage system.
The OpenTelemetry project has two primary goals. To create a standardized language for describing distributed systems, and a place for the community and share the mountain of code required to instrument and observe this gigantic, ever-growing world of software.
The OpenTelemetry project also has an anti-goal. We do not want to standardize the analysis of data. We believe that data analysis will become a greenfield of possibility once a stable bridge has been built between systems which describe what they do and systems which analyse those descriptions. That stable bridge is OpenTelemtry.
And, yep, that’s also why we named it OpenTelemetry. A single transmission system for many analysis systems.
The vast bulk of the code in an observability system is instrumentation code, which lives within frameworks, libraries, and application code. We believe that code should be self-instrumenting, and not rely upon instrumentation plugins written by others.
Observability is a cross-cutting concern; many, many packages make many, many API calls against the instrumentation API. Shared packages, such as popular OSS libraries, must run in many environments and are sensitive to taking on dependencies. Do no harm is an important goal when designing and supporting such a widely used API.
To allow for maximum compatibility, the OpenTelemetry clients are designed to have a clean separation of concerns between their interface and their implementation.
If we want important, widely-shared libraries to self-instrument, there are several design goals the instrumentation API needs to hit.
- Minimize the potential for version incompatibility: the newest API package must be backwards compatible with all prior API packages, no exceptions.
- Minimize the number of dependencies: transitive dependencies can create their own version incompatibilities. Looking at you, gRPC. Since we cannot control these other codebases, we should not rely on them to meet our compatibility requirements.
An independent API allows for multiple implementations. OpenTelemetry currently provides two implementations per language. The API includes a no-op implementation which is used by default, and a production implementation is provided in a separate package called the SDK.
Other implementations are possible. For example, an implementation which binds to OpenTelemetry C++ via foreign function calls may be extremely performant in some scenarios. In other scenarios, the C++ dependency might be a deal-breaker. The ability to switch to a specialized implementation when the situation permits is a great example of robust design leading to flexibility and optimization.
The OpenTelemetry SDK implements the API, and provides a framework for processing the data. Like most frameworks, the SDK follows the observer pattern: at the lowest level, lifecycle hooks can be attached to every API call. Higher-level plugin interfaces are also provided:
- Samplers: control whether data should be collected or dropped to conserve resources.
- SpanProcessors: one a span is complete, span processors can manipulate the resulting SpanData object.
- Exporters: serialize spans into a data protocol and send them off to be processed.
You don’t really want to mix and match sampling plugins and exporters. In fact, most configuration is just boilerplate, once you’re made a few key decisions.
These plugins can be bundled up and preconfigured into a coherent telemetry client, called an SDK distro. This makes it easy to install SDKs which are designed to talk to Jaeger, Lightstep, AWS, GCP, and other systems, without having a single massive distro which pulls in all of the collective dependencies that come with all of these systems.
By default, OpenTelemetry provides an SDK distro designed to flush data to a nearby collector as fast as possible, minimizing buffering and configuration within the application process.
How does this API/SDK separation work in practice? While the SDK performs all of the work, it does so primarily in the background. Application developers initialize and configure the SDK at program startup, before loading the rest of their code, and may need to access the SDK again during program shutdown. Outside of setup and shutdown, there is no need to reference the SDK.
The rest of the program – a mix of OSS libraries and application logic – only accesses the OpenTelemetry API. The SDK provides flexibility, and the API provides stability.
And that’s all you need to know about OpenTelemetry’s client architecture.