Webinar: Everything (we think) you need to know about sampling + distributed tracing

Register now

OpenTelemetry

The Big Pieces: OpenTelemetry context propagation


Ted Young

by Ted Young

Explore more OpenTelemetry Blogs

Ted Young

by Ted Young


01-27-2021

Looking for Something?

No results for 'undefined'

In this blog series, The Big Pieces, we dive into the major architectural components of OpenTelemetry. During our last post, we discussed client architecture. In this second installment, we’re going to cover context propagation.

Context propagation is the fundamental feature that separates distributed tracing from traditional logging and metrics. Context propagation is at the heart of what makes OpenTelemetry so useful, and it can be a little confusing to new users who have never encountered it. But! If you can understand the basics of context propagation, then everything else about OpenTelemetry’s architecture will make sense.

OpenTelemetry context propagation, defined

OpenTelemetry Context Propagation

Context is within a process, propagation is between processes.

Context propagation can be broken down into two components: context, and… propagation. The short, short version is that for tracing to work, you need a bag of values that is always accessible in your code, from the beginning of the transaction in your web client, all the way through to your backend services.

For example, when you start a trace, you create a traceID. Every trace event in the transaction needs to include this traceID when it is recorded, so that you can find all of the events later and reconstruct the trace. In fact, adding a traceID to all of your logs is the most basic definition of tracing that I can think of.

So that traceID has to follow along as your code executes, hopping from service to service, no matter how many services might be involved in the transaction, in order for the tracer to have access to it. How do you pass it around? As a programmer, you don’t want that tracing data mixed up with your application data, because it is a totally separate system from your application – that would violate the separation of concerns and make a mess of your code. But at the same time, you can’t fully separate your tracing code from your application code, since tracing your application code is the entire point. For this reason, tracing is sometimes referred to as a cross-cutting concern.

How do you keep your tracing data self-contained? Context propagation solves this issue. This handy mechanism has two parts. The context object passes these values around within your service, and the propagators pass the values to the next service whenever there is a network call.

Having access to these values at any point in the transaction is what makes distributed tracing work.

Context

The context object is an immutable dictionary of key-value pairs. The immutability is important, as it gives us thread safety without the need for locks.

That seems both really simple, and really useful. But, there’s a catch. You can’t pass this bag around like a regular object. Passing the context object around as a parameter is really annoying. You would have to add a context object as a parameter to every function. Besides being a lot of work, that would mean breaking all of your APIs when you add tracing, as all of your method signatures would change. And how would you convince third party code to do the same? Even in Go, the one language where this is the official way to pass a context, it can be difficult.

It would be much better if you could just ask for the current context whenever you needed it. In most languages, the context object can be managed through a combination of language primitives (such as thread locals and closures) so that it is always available without having to be passed as a parameter. This works differently in each language, but generally speaking, you can just call a getCurrentContext to get the context, and setCurrentContext(context,closure) to create a closure where the new context is now set as the current context. If you don’t know what a closure is, just think of it as a function, like a callback in JavaScript.

An extremely basic context API might look like this:

// Context basics using pseudocode

// example of printing a spanID
function printSpanID(){
  // Access the context using the context manager
  context = otel.getCurrentContext()

  // print the current value of “spanID”
  print(context.get(“spanID”))
}

function contextExample(){
  // prints empty string since nothing is set
  printSpanID()

  // Access the context using the context manager
  context = otel.getCurrentContext();

  // Set the spanID. Since contexts are immutable,
  // this creates a new context.
  newContext = context.set(“spanID”,”12345”);

  // Give the context manager newContext
  // and a closure. Within this closure, 
  // newContext will be the current context.
  otel.setCurrentContext(newContext, function(){

    // prints “12345”
    printSpanID()  

  });

  // outside of the closure, the old context is still active
  printSpanID()   // prints empty string
}

In some languages, like Go and Python, there is a standard context object already available for this purpose. In other languages, there are no agreed-upon standards, and the OpenTelemetry project provides its own context implementation. We prefer that language runtimes manage the context object, and encourage every language to include a built-in mechanism for context management.

Propagation

In a monolith, a context object might be enough. But these are distributed transactions, so the context must be sent from service to service to follow the flow of execution. This means that every time a client makes an HTTP request, the context must be added to the request as headers. On the server-side, those headers can be pulled off of the received request, and turned back into a context object, allowing the trace to continue.

Serializing a context object into headers is called injection. Deserializing headers into a context object is called extraction. The object which knows the details about how to serialize and deserialize the context is called a propagator.

// Propagation basics using pseudocode

// On the client, add the current context to HTTP headers 
// every time you send an HTTP request
function setContextHeaders(headers){
  propagator = otel.newTraceContextPropagator()
  context = otel.getCurrentContext()
  newHeaders = propagator.inject(context, headers)
}

// On the server, create a context object from the headers
// in the HTTP request every time your receive a request.
function getContextFromHeaders(headers){
  propagator = otel.newTraceContextPropagator()
  context = propagator.extract(headers)
  return context
}

You might be wondering what headers actually get used for this purpose. Glad you asked! Through the W3C, we have been working to standardize these propagation headers for HTTP. These official tracing headers are referred to as Trace-Context headers. The Zipkin B3 headers are another popular format.

When you install OpenTelemetry, it defaults to using Trace-Context headers. That is usually fine, but you may need to configure this if you are trying to mix OpenTelemetry with other tracing systems which use different headers. If various services in your system are configured to use different headers, they won’t be able to propagate the trace.

Gotchas

As an application developer, you rarely need to interact with the context propagation APIs directly. But you may still have to debug related issues, which is why understanding the concept is so important to understand.

The first issue is forgetting to extract or inject the context, which will cause the trace to stop propagating. This is usually caused by failing to install instrumentation for your HTTP client or your HTTP server.

The second issue is misconfigured propagators, as mentioned above. If the client is configured to inject W3C Trace-Context headers, but the server is configured to extract B3 headers, the trace will stop propagating. The third issue is context gets separated from the flow of execution. One way this happens is that the execution switches threads, but the context is not moved to the new thread.

If you see that all of your spans are being created, but not connected together, look for where the break is. If the break happens between services, then it is a propagation issue. If the break happens within a service, it is a context issue.

That’s all, folks!

And that’s all you need to know about context propagation to make use of OpenTelemetry! If you want to dig in deeper, I recommend playing around with the propagator and context APIs in your language of choice, and have a look inside the instrumentation that OpenTelemetry provides to get a sense of how they are used. You could also read the OTEP I wrote which originally defined context propagation for OpenTelemetry. I hear it is an excellent cure for insomnia.

Thanks for reading, and stay tuned for the next edition of The Big Pieces.

Interested in joining our team? See our open positions here.

Explore more OpenTelemetry Blogs