It is no longer possible to instrument the world.
Over the past 20 years, there has been an explosion of software development. With that growing interest has come an ever-diversifying, fractal landscape of widely shared software libraries. Applying these shared libraries makes up most of modern programming.
This has been a wonderful experience, but there’s a catch. Observing modern applications now means obtaining data from a constantly expanding list of programming languages, application frameworks, message queues, platforms, and runtimes. All interconnected.
To observe our systems properly, these shared libraries must produce observations. Either this happens innately, as a feature provided by the library, or it must be added later as ad hoc instrumentation.
Today, even though they are heavily relied upon, almost no shared library includes any form of instrumentation. Certainly not instrumentation that would be coherent with the instrumentation any other library might provide. So ad hoc is what we have been stuck with.
When we don’t have an option, we muddle. Traditionally, this has meant that if you wanted to observe the world of software, you inevitably had to roll up your sleeves and re-instrument that world. That used to work. Today, the cost is untenable. Monitoring systems can no longer find success by focusing on a small handful of application frameworks. The world has moved past that.
In response, we realized that we were over the muddle, and it was time to get organized.
Develop a common language for describing the operations of distributed systems.
Pool our resources as a community, and maintain a shared repository of clients and instrumentation packages.
Provide a safe mechanism for widely shared software libraries to ship with native instrumentation, alleviating the need for third-party instrumentation.
That realization came in 2016, four years ago. Two projects were started in response. Two years later, they merged to form a single standard.
Knowing that it would set the entire effort back a year, we chose to merge OpenTracing and OpenCensus. From a distance, it would be easy to see the creation of OpenTelemetry and ask “another standard? Where does it end?” The xkcd comicxkcd comic is usually flashed at this point; now there are 15 competing standards. So why do it?
To answer that, you have to join the community. Because up close, the view could not be more different. Before there were two competing projects, now there is one single standard. This was a real change, and it had a substantial impact on both perception and participation. Participation spiked — total contributions quickly surpassed both preceding projects. We didn’t go from fourteen to fifteen, we went from two to one.
Ultimately, standards are not an act of coding, they are an act of organizing. For the past four years, I have focused my organizing on building the trust needed to make this standard happen. A spec on its own is not enough.
For a real standard to emerge, everyone must either be directly involved or accepting of the results. Otherwise, by definition, it would not be a standard. When people ask me “Why did these two projects merge?” I like to answer, “Because that was the point.”
“Sandbox projects have jumped in usage by 238%. The top three projects being evaluated by respondents are OpenTelemetry (20%), Service Mesh Interface (14%) and OpenMetrics (14%).”– CNCF, 2020 Survey2020 Survey
Today marks two years since we solidified into a single effort, and the OpenTelemetryOpenTelemetry project is preparing for its first stable release. Industry participation is nearly universal. All major infrastructure providers, plus every major vendor, are now core contributorscore contributors. Huge applications, such as Shopify, are both building and using OpenTelemetry in production. OpenTelemetry is the largest and most active project within the CNCFCNCF, after Kubernetes.
As we become stable, monitoring and observability systems are looking at retiring their current instrumentation and transitioning to OpenTelemetry as their recommended client architectureclient architecture.
This is incredibly noteworthy. Instrumentation is a critical part of an observabilityobservability system – ultimately, a data product is only as good as its data. To switch away from a solution where you have complete control to one where you must share the decision making with others is a real act of trust. And yet, several major vendors have already announced they are doing precisely that.
A prediction: every future observability system will leverage OpenTelemetry instrumentation as its starting point. Why recreate all of that effort, when you could be focused on shipping features?
OpenTelemetry is here to stay.
Interested in joining our team? See our open positions herehere.
December 10, 2020
4 min read
About the author
Ted YoungRead moreRead more
Explore more articles
From Day 0 to Day 2: Reducing the anxiety of scaling up cloud-native deploymentsJason English | Mar 7, 2023
The global cloud-native development community is facing a reckoning. There are too many tools, too much telemetry data, and not enough skilled people to make sense of it all. See how you can.Learn moreLearn more
OpenTelemetry Collector in Kubernetes: Get started with autoscalingMoh Osman | Jan 6, 2023
Learn how to leverage a Horizontal Pod Autoscaler alongside the OpenTelemetry Collector in Kubernetes. This will enable a cluster to handle varying telemetry workloads as the collector pool aligns to demand.Learn moreLearn more
Observability-Landscape-as-Code in PracticeAdriana Villela, Ana Margarita Medina | Oct 25, 2022
Learn how to put Observability-Landscape-as-Code in this hands-on tutorial. In it, you'll use Terraform to create a Kubernetes cluster, configure and deploy the OTel Demo App to send Traces and Metrics to Lightstep, and create dashboards in Lightstep.Learn moreLearn more
Lightstep sounds like a lovely idea
Monitoring and observability for the world’s most reliable systems