Lightstep from ServiceNow Logo

Products

Solutions

Documentation

Resources

Lightstep from ServiceNow Logo
< all blogs

How Queries at Lightstep Work

At Lightstep, we treat metric queries and span queries almost exactly the same. In this post, we’ll explore how the different stages of a query interact with each other to take your raw data from our data storage layer and aggregate it into useful visualizations.

Data Ingest

Before we dive into how data points are queried, we first need to understand how data is stored in Lightstep. We continuously collect customer data points that have a wide variety of attributes, such as customer_id, or hostname. For both metric and span data points, we partition the points based on these attributes, as well as the name of the span or metric itself. Points with the same exact set of attributes are stored together in chronological order based on the timestamp. This is also known as a timeseries.

Four timeseries being ingest and stored in Lightstep

Query Pipeline

Every query at Lightstep can be broken down into multiple stages. The most common stages found in almost every query are:

  • Fetch

  • Filter

  • Align

  • Group by

These stages are applied to the data in the order defined by the user, which is usually the same order as they are listed above. Each of these stages uses the output of the previous stage and produces a collection of timeseries. Let’s look at each of these stages in more detail.

Query Pipeline: Fetch and Filter

Now that we have data stored as timeseries, we can query that data. The fist stage of every query pipeline is called the fetch operation. Fetch uses a predicate to determine what subset of timeseries need to be loaded from the data storage layer. In Lightstep’s Unified Query Language – a text-based query language for metrics and spans also known as UQL – this is spelled metric <metric_name> for metrics and spans count or spans latency for span queries along with an (optional for metrics) filter operation. In our visual query builder, the fetch and filter options correspond with the first section.

Fetch and filter in the visual query builder for metrics
Fetch and filter in the visual query builder for spans

💡 TIP: For those readers who are familiar with the SQL language, the fetch and filter stages are quite similar to SQL’s FROM and WHERE clauses.

The fetch operation will grab all the timeseries that match the predicate. These points may not be temporally aligned, so the next stage in every pipeline aligns the data.

Query Pipeline: Align

Raw timeseries data for queries can often contain thousands of individual data points. If we attempted to plot this raw data on a chart, we’d quickly run into a fundamental issue: there aren’t enough pixels to render each value! To avoid this issue, we pick a few hundred timestamps that we want data for and align the raw data to match. We’ll also combine points across multiple timeseries in the “group by” stage. To do that, the points must first have the same timestamp. And so, the align stage is required for all telemetry queries at Lightstep.

For each output timestamp that has been chosen, we aggregate all the points from the original timeseries that have a timestamp between the output timestamp and [output timestamp - input window]. The default “input window” is just the distance between each output point, which results in each output point aggregating a unique set of input points. By specifying an input window that is larger than the distance between output points, the resultant data can be smoothed; the larger the input window the more smoothing occurs.

A large input window smoothing data as points are aligned

This type of aggregation is called temporal aggregation because we combine points across the time dimension. Note that the points being combined during a temporal aggregation all come from the same timeseries and all have different timestamps. The following aggregators are available to use for temporal aggregation:

💡 TIP: Think of temporal aggregation as horizontal aggregation.

Alignment aggregator

UQL spelling

Description

Latest

latest

Output the last point in the found in the window.

Delta

delta

Output the change in the metric from the earliest point to the latest point.

Rate

rate

Output the rate that the metric is changing at (per second).

Sum

reduce sum

Output the mathematical sum of all points.

Max

reduce max

Output the maximum point.

Min

reduce min

Output the minimum point.

Mean

reduce mean

Output the average of all points.

Continuing the visual from above here we show the fetch and align.

Align stage

Query Pipeline: Group by

Now that we have aligned the timeseries data, we can combine it in a pipeline stage called “group by.” When grouping, we only consider attributes with the keys defined by the user and ignore all others. The “group by” stage combines all timeseries where the considered attributes’ key/values are equal using a specified aggregator.

For example, if we group by “host” and half of our fetched timeseries have a host value of “A” and the other half have value “B,” we end up with two timeseries where these two sets of values are aggregated.

This type of aggregation is called spatial aggregation because we combine points across the space of all the timeseries in each group. Note that the points being combined during a spatial aggregation all have the same exact timestamp and all come from a different timeseries. This is why the “align” stage is so crucial; it ensures that the points during spatial aggregation all have the same timestamp. The following aggregators are available to use for spatial aggregation:

💡 TIP: Think of spatial aggregation as vertical aggregation.

Group by Aggregator

UQL spelling

Description

For scalar values, all points are added together.

Sum

group_by [...], sum

For distribution values the distributions are combined. This is the only supported aggregator for distribution metrics and span latency queries.

Mean

group_by [...], mean

The mathematical mean of all points.

Max

group_by [...], max

The maximum value of combined points.

Min

group_by [...], min

The minimum value of combined points.

Count

group_by [...], count

Ignores the values of points and just returns the number of points combined.

Let’s look at the “group by" stage in action. While this only shows two underlying timeseries in each group, in practice we can combine hundreds of timeseries in each group.

Group by stage

Tying it all together

Now that we've seen how each stage of the query pipeline works, let's put it together. Over the course of this post we’ve been building this query.

Example Query Screenshot

Here is the same query represented in UQL

Query represent in UQL

And here is the full animation of that query as it is processed through each pipeline stage.

All stages combined

Hopefully, you have a better understanding of how our query pipeline works for these basic queries. It works almost identically for metrics and spans, which means that once you know how to query one, it’s super easy to query the other! Stay tuned for part 2 where we will cover stages like joins that allow you to build even more powerful queries!

October 24, 2022
6 min read
Engineering

Share this article

About the author

Brian Lamb

This component is not supported.

This component is not supported.

This component is not supported.

THE CLOUD-NATIVE RELIABILITY PLATFORM

Lightstep sounds like a lovely idea

Monitoring and observability for the world’s most reliable systems