Lightstep from ServiceNow Logo

Products

Solutions

Documentation

Resources

Lightstep from ServiceNow Logo
< all blogs

The Big Pieces: OpenTelemetry specification

In this installment of The Big Pieces, we’re going to cover the way that OpenTelemetry designs itself, and how to get involved yourself.

OpenTelemetry is a big project. Every language requires an OpenTelemetry client, which can be complex and tricky to build, and require features that programmers normally do not worry about. For example, OpenTelemetry clients have strict rules around dependency management and backward compatibility, and they need to follow the flow of execution without being passed around as function parameters. The clients need to feel similar across all languages, produce the same data, and interoperate with each other. For example, there is more than one way to interpret OpenTelemetry data when exporting it to Prometheus or Jaeger. If every implementation varied its approach, we would have a mess. We want to provide clarity through consistency.

OpenTelemetryOpenTelemetry is also a community project. From the outset, we knew that we did not want to take the benevolent dictator approach, and simply trust the instincts of a single person or a small group of insiders. We want our decisions to be well researched and peer-reviewed, and allow anyone to participate in the design process.

All of this led towards a specification model for OpenTelemetry. The project began as a merger between OpenTracing and OpenCensus. We knew that we wanted to keep attributes from both projects; what would the result look like? The initial version of the specification resulted from these discussions.

How we make changes

Specification driven OpenTelemetry

Step 1: Read the room

Every week, we have a specification meeting (you can find all of our meeting times on our calendarcalendar). The spec meeting is a good place to bring up ideas and concerns, and to talk through issues when the writing process gets stuck. I recommend floating your idea here to gauge interest and identify stakeholders before putting in a lot of work. It’s not necessary but it helps.

Step 2: Create a tracking issue

Official work starts with an issue being raised in the specificationspecification. This tracking issue serves as a reference point and helps keep track of progress as it moves around the project.

For small additions that do not significantly change the meaning of the spec, you can just make a pull requestpull request and that’s the end.

Step 3: Propose a solution

To actually propose the change, you need to make an RFC. We call ours OTEPS – OpenTelemetry Enhancement ProposalsOpenTelemetry Enhancement Proposals. This allows the entire proposal, supporting arguments, prior art, etc, to all be collected in one place and reviewed as a complete work.

As part of creating an OTEPOTEP, I strongly recommend prototyping in several languages. It can be hard to understand how a big change will actually play out. English is a sloppy language, and we tend to think in terms of our favorite programming language. The bigger the change, the bigger the effort, the sooner you want to confirm that the design proposal is universal and won’t run into implementation problems. I have found that prototyping identifies issues and solutions much faster than trying to work everything out in your mind like some kind of Greek philosopher.

Since it can be a fair amount of work, I try to recruit a group from different SIGs to help with prototyping and drafting the OTEP before I submit it. If no one is interested in prototyping with me, I’ve received my first piece of feedback and I go back to working on my pitch or I move on to a more popular problem.

In order to be accepted, OTEPs require four approvals from the specification teamspecification team. OTEPs are our highest bar; we want all of the issues identified and worked out before we try to add something to the spec (and definitely not after we add something to the spec).

Step 4: Update the specification

Once your OTEP is approved, you’re ready to add it to the spec.

We want the spec to be as clear as possible about what the requirements are, so really think about how to remove incidental detail and phrasing which sounds overly restrictive. More than once I’ve submitted spec changes that encoded the requirements as a particular solution when more than one solution exists! Be on the lookout for that, and try to simplify as much as possible.

When writing your pull request, it’s important to look over the entire spec, as larger changes often affect more than one section. Use the IETF RFC keywordsIETF RFC keywords to clarify which required features MUST be implemented and which optional features SHOULD be implemented, and don’t forget to update the changelogchangelog and the glossaryglossary!

Step 5: Cut a new version of the spec and alert every working group

Once your PR is merged, it will go out in the next release of the specification and every SIG (special interest group) will implement the change. The specification maintainers can help with this, but as the champion, I like to follow up and ensure that the issues are properly created in every backlog.

And that’s all you need to know about the specification process! If you’d like to see an example of this process in action, please read on.

A walkthrough of the OpenTelemetry specification process

Sometimes an example of a process can help explain things. I recently went through the specification journey, defining the versioning procedures and stability guarantees for the OpenTelemetry clients. Since we do all of our work in public, it’s easy to reconstruct the entire process. Here’s how it went, from start to finish.

(I should mention that this OTEP is on the extreme end of organizing effort, most issues are definitely simpler than this so please don’t be intimidated.)

Here’s how it went. We had been discussing versioning over the course of the project, and we knew that we wanted strict backward compatibility guarantees – if we broke the API, that would break every project which depends on OpenTelemetry. That was acceptable risk in beta, but once mainstream adoption began it would no longer be an option. Now that we were getting close to finishing the tracing portion of the project, we needed to buckle down and define how we would deliver the stability guarantees we needed while still leaving a pathway open for experimentation and improvements. Someone needed to champion this issue, and since I had already been giving it a lot of thought, I raised my hand.

Big Pieces OpenTelemetry Specification Infographics

I like to add infographics when I can, I find they really help explain things.

I started by creating a spec issuespec issue, as usual. The GitHub collaboration tools aren’t always the best tools at this stage, so I made my first draft of my OTEP in a shared google docshared google doc so that others could comment and edit. We like to talk things out over zoom, so I scheduled a set of meetings and brought it up at every spec meeting. We record all of our meetings so those are public too (here’s an exampleexample).

Working with maintainers of different OpenTelemetry implementations, we prototyped our approach to versioning and stability by going through an exercise. We imagined that tracing was now stable, and we were going to add metrics using the process proposed in our rough draft. Listing out all the actions we would take, including versions, releases, movement of packages, etc. we identified a number of gotchas and differences in how languages manage backward compatibility. This exercise helped a lot.

After that, a refined version of the initial draftinitial draft was submitted as an OTEP. There was still a lot of work to do, but because of our drafting and prototyping, I knew that the basic structure was working for people. Line by line commenting in GitHub would work now to discuss the remaining details.

Once the OTEP was approved, I added all of the relevant content to the specification as a pull requestpull request. Along the way, I noticed that the overview and the glossary did not do the greatest job of describing some of the client architectureclient architecture that was relevant to versioning, so I improved those sections as well.

Some last-minute concerns were raised as well. We had made a rule about how instrumentation could be updated to ensure that future changes to the data OpenTelemetry emitted would not break dashboards. Members of the metrics working group pointed out that some of our assumptions about how metrics data could be updated did not hold for Stackdriver and possibly other systems. Rather than hold the entire process up, we removed that section and marked it TBD. I find this is a common pattern – you reach 90% agreement, but there are one or two thorny issues that can’t seem to reach consensus. In these cases, we always try to merge the 90% if we can, and not let perfect be the enemy of good enough.

With those issues resolved, we let the pull request sit open for a week, and merged it in. Now OpenTelemetry had a plan for versioning, and not a moment too soon – the tracing specification had stabilized, and we were ready to release v1.0release v1.0!

Thanks for reading. I hope the specification process is clearer now, and I look forward to your changes.

Cheers, TedsuoTedsuo

February 22, 2021
8 min read
OpenTelemetry

Share this article

About the author

Ted Young

From Day 0 to Day 2: Reducing the anxiety of scaling up cloud-native deployments

Jason English | Mar 7, 2023

The global cloud-native development community is facing a reckoning. There are too many tools, too much telemetry data, and not enough skilled people to make sense of it all.  See how you can.

Learn moreLearn more

OpenTelemetry Collector in Kubernetes: Get started with autoscaling

Moh Osman | Jan 6, 2023

Learn how to leverage a Horizontal Pod Autoscaler alongside the OpenTelemetry Collector in Kubernetes. This will enable a cluster to handle varying telemetry workloads as the collector pool aligns to demand.

Learn moreLearn more

Observability-Landscape-as-Code in Practice

Adriana Villela, Ana Margarita Medina | Oct 25, 2022

Learn how to put Observability-Landscape-as-Code in this hands-on tutorial. In it, you'll use Terraform to create a Kubernetes cluster, configure and deploy the OTel Demo App to send Traces and Metrics to Lightstep, and create dashboards in Lightstep.

Learn moreLearn more
THE CLOUD-NATIVE RELIABILITY PLATFORM

Lightstep sounds like a lovely idea

Monitoring and observability for the world’s most reliable systems