Announcing Major Updates to Lightstep’s Distributed Tracing:

RCA in Three Clicks!

OpenTelemetry


OpenTelemetry 101: What Are Metrics?


Austin Parker

by Austin Parker

Explore More OpenTelemetry Blogs

Austin Parker

by Austin Parker


10-10-2019

Looking for Something?

No results for 'undefined'

OpenTelemetry is an open-source observability framework for generating, capturing, and collecting telemetry data for cloud-native software. In the previous posts in this series, I discussed what observability is as it relates to OpenTelemetry, and what took a closer look at tracing with OpenTelemetry. Today, I’d like to cover the second major area of the OpenTelemetry API: metrics.

Note: The information in this post is subject to change as the specification for OpenTelemetry continues to mature.

The OpenTelemetry Metrics API supports capturing measurements about the execution of a computer program at run time. The Metrics API is designed explicitly for processing raw measurements, generally with the intent to produce continuous summaries of those measurements, giving developers visibility into their service’s operational metrics.

Most developers are familiar with metrics in some fashion. It is extremely common, for instance, to monitor metric values such as process memory utilization or error rate, and to create alerts to indicate when a service is violating a predetermined threshold. In addition to these common measurements, metrics events streaming to these instruments can be applied in other unique ways, including being aggregated and recorded by tracing or logging systems. With that in mind, let’s look at the instruments available through the OpenTelemetry Metrics API and discuss how they can be used.

The Metrics API distinguishes between the metric instruments available through their semantic meaning rather than the eventual type of the value they export. The word "semantic" or "semantics" as used here refers to how we give meaning to metric events, as they take place under the API. The term is used extensively in the Metrics API to define and explain these API functions and how we should interpret them.

This is somewhat unconventional, and it stems from the design of OpenTelemetry itself; the separation between the API and the SDK forces the SDK to ultimately determine what should happen with any specific metric event, and could potentially implement a given instrument in a non-obvious or non-standard way. If you’re familiar with existing metrics APIs (such as the Prometheus API), this explains why there is no method to export a histogram or summary distribution.

Metric instruments and their functions

In OpenTelemetry, the Metrics API provides six metric instruments. These instruments are created and defined through calls to a Meter API, which is the user-facing entry point to the SDK. Each instrument supports a single function, named to help convey the instrument's semantics, and is either synchronous or asynchronous.

Synchronous instruments are called inside a request and have an associated distributed context. There are two synchronous additive instruments, Counter and UpDownCounter. They support an Add() function, signifying that they add to a sum, as opposed to capturing a sum. The synchronous non-additive instrument, ValueRecorder, supports a Record() function, signifying that it captures, rather than augments, metric event data.

Asynchronous instruments, on the other hand, are reported by a callback, and are only called once per collection interval. These instruments do not have an associated context. There are two asynchronous additive instruments, SumObserver and UpDownObserver. The asynchronous non-additive instrument is ValueObserver. All three instruments support an Observe() function, as they only capture one value per measurement interval.

Metric events captured through any instrument will consist of:

  • timestamp (implicit)
  • instrument definition (name, kind, description, unit of measure)
  • label set (keys and values)
  • value (signed integer or floating point number)
  • resources associated with the SDK at startup
  • distributed context (for synchronous events only)

Counter

The Counter instrument supports the Add() function, which accepts positive values only. Use a Counter to count data such as:

  • bytes received
  • requests completed
  • error incidence

Counters are well-suited for rate measurements. Divide the sum by the time interval and you have metrics like request rate and error rate.

UpDownCounter

UpDownCounter is able to process positive and negative increments with its Add() function. This makes it useful for monitoring quantities that rise and fall during a request, like system resources. Use UpDownCounter to count data such as:

  • number of active requests
  • memory in use
  • queue size

ValueRecorder

In semantic fashion, this instrument records values for discrete events in a distribution or summary, capturing all the info needed to synchronously find rate, mean, and range with its Record() function. Latency is a classic example of the type of measurement captured with ValueRecorder.

ValueRecorder can be either non-additive or additive –– the emphasis with this instrument is capturing data where a distribution of values is of principal interest. This makes it ideal for many observability concerns, including:

  • latency, which is an example of a non-additive implementation
  • request size and queue length, examples of additive implementations

While a particular sum might be important for measurements like request size and queue length, ValueRecorder should be used when the measurement is being considered as part of a distribution.

SumObserver

Sometimes measurements need to be taken asynchronously. When monotonic sums are unnecessary or cost prohibitive on a per request basis, e.g. a system call is required, SumObserver is the instrument to use, with its Observe() function. Some examples of such data are:

  • cache misses
  • system CPU

UpDownSumObserver

UpDownSumObserver is OpenTelemetry’s asynchronous non-monotonic counter, meaning it can accept negative and positive sums with its Observe() function on a periodic basis. Use UpDownSumObserver for measurements that capture the rise and fall of sums, such as:

  • number of active shards
  • process heap size

ValueObserver

For more fine-grained control of when a non-additive measurement is made, you want to use the ValueObserver instrument, especially when the purpose of the measurements is a distribution. ValueObserver instruments also leverage the Observe() function.

Before we continue, here is a quick recap of the six metric instruments and their properties:

NameSynchronousAdditiveMonotonicFunction
CounterYesYesYesAdd()
UpDownCounterYesYesNoAdd()
ValueRecorderYesNoNoRecord()
SumObserverNoYesYesObserve()
UpDownSumObserverNoYesNoObserve()
ValueObserverNoNoNoObserve()

Meter provider

A concrete MeterProvider implementation can be obtained by initializing and configuring an OpenTelemetry Metrics SDK. Once configured, the application or library chooses whether it will use a global instance of the MeterProvider interface, or whether it will use dependency injection for greater control over configuring the provider.

See the Metrics API spec for more information on implementing a MeterProvider and finer instrument detail.

Metrics and distributed context

Synchronous measurements are implicitly associated with the distributed Context at runtime, which may include a Span context and Correlation values. Correlation context is supported in OpenTelemetry as a means for labels to propagate from one process to another in a distributed computation. Sometimes it is useful to aggregate metric data using distributed correlation values as metric labels.

The use of correlation context must be explicitly configured, using the (WIP) Views API to select specific key correlation values that should be applied as labels. The default SDK will not automatically use correlation context labels in the export pipeline, since using correlation labels can be a significant expense.

Implementing metric instruments

The exact mechanism for each of these instruments can vary between implementations — OpenTelemetry, by design, is trying to allow each language’s SIG to implement the API in a way that is conventional for the given language. This means the exact details of creating a new metric event may not match the general specification precisely, and you should consult the documentation for your particular OpenTelemetry implementation for more information.

At a high level, however, here’s how it works:

First, you’ll need to create an instrument of the appropriate kind, and give it a descriptive name. Each instrument name needs to be unique inside its process. You can also provide label keys, which are optional key values that are used to optimize the metric export pipeline. You will also need to initialize a LabelSet, which is a set of labels (both keys and values) that correspond to attributes being set on your metric events. What does this look like?

1 // initialize instruments statically or in an initializer, a Counter and a Value Recorder
2 meter = global.Meter(‘my_application”)
5 requestBytes = meter.NewIntCounter("request.bytes", WithUnit(unit.Bytes))
6 requestLatency = meter.NewFloatValueRecorder("request.latency", WithUnit(unit.Second))
7
8 // then, in a request handler define the labels that apply to the request
9 labels = {“path”:/api/getFoo/{id}, “host”: “host.name”}

Again, the specifics are going to be slightly different for each language, but the gist is the same –– early on, create actual metric instruments by giving them a name. Once this is accomplished, actually recording metric events is fairly straightforward:

1 requestBytes.Add(req.bytes, labels)
2 requestLatency.Record(req.latency, labels)

Another option, useful in performance-critical scenarios, is to utilize a bound instrument. This effectively skips a step, by precomputing the instrument and labels. Bound instruments automatically associate their labels with metric events when they are used:

1 requestBytesBound = requestBytes.Bind(labels)
2 requestLatencyBound = requestLatency.Bind(labels)
3
4 for req in requestBatch {
56     requestBytesBound.Add(req.bytes)
7     requestLatencyBound.Record(req.latency)
89 }

An important thing to remember with bound instruments is that you are responsible for ‘cleaning up’ by freeing the resources and deallocating it with Unbind() once it is no longer needed!

To summarize the Metrics API details:

  • Metric events are recorded through three functions and six instruments: Add(), for Counter and UpDownCounter instruments; Record() for Value Recorder instruments; and, Observe() for SumObserver, UpDownObserver, and ValueObserver instruments
  • Create a unique instrument for each metric you want to record, and give it a unique name
  • Additional metadata can be applied to your metric events with labels
  • Recording metric events is performed either by calling the appropriate method on the instrument itself, or by calling the method on a bound instrument

Hopefully, this has given you a better understanding of how the OpenTelemetry Metrics API functions. In the next section, we will cover OpenTelemetry’s Exporter and Collector, which are the tools to get data out of OpenTelemetry and into a backend system like Lightstep.

Explore More OpenTelemetry Blogs