Integrating LightStep [x]PM with Istio

November 15, 2018 | Julian Griggs | Technical

Service Mesh technologies decouple application logic from service infrastructure concerns. This separation enables organizations to converge on standard patterns of observability making it easier for them to consume metrics, logs, and most excitingly (ok, we’re biased) distributed tracing across all their services! In this post, we’ll introduce a LightStep [x]PM integration we built for Istio and show you how it works with an example application that’s deployed with Istio. This integration makes it faster and easier to get started with distributed tracing at scale. If this is your first time hearing about Istio, Envoy, or Service Mesh, check out the Istio website.

Distributed Tracing with Istio

Istio 1.x supports distributed tracing via two mechanisms:

  1. Directly through the Envoy sidecars
  2. Through an adapter to the Mixer component

Istio supports both mechanisms because a core design principle of the Istio project is to ensure it can be used with or without the Mixer component.

While using the Envoy sidecars directly gives you access to all of Envoy’s distributed tracing support, using Istio’s Mixer component can provide more fine-grained control over the collection of trace data.

LightStep [x]PM with Istio

While we plan to eventually build a Mixer Adapter for [x]PM, we opted to first focus on plugging into the Envoy sidecars directly. There were two main reasons for this decision:

  1. We wanted to support Istio deployments that don’t use Mixer
  2. We already had an Envoy integration that we could leverage

With this decision made, integrating [x]PM was mostly a matter of configuration piping. If you’re interested, you can check out the GitHub PR and the operator documentation for details.

In order to get a better idea of what the integration looks like, let’s walk through a deployment of the Bookinfo application that’s using Istio with our [x]PM integration enabled.

Integrating LightStep with Istio for Distributed Tracing at Scale

The image above depicts the Bookinfo application when deployed with Istio. When an HTTP request comes into the Ingress Envoy (a front-proxy), it’s then routed to the Product Page Service, which then calls to a Details Service and one of three variants of a Reviews Service (v1, v2, v3), with Reviews Service v2 and v3 further calling to a Ratings Service. Despite being a simple application, there’s already a fair amount of complexity in the service-to-service communication. Let’s take a look at what this trace looks like when Istio is integrated with [x]PM.

Integrating LightStep with Istio for Distributed Tracing Demo
Spans are automatically generated whenever the Bookinfo service is invoked.

In the trace, each row is a span that corresponds to an invocation of a Bookinfo service (when a user loads the product page). In the Bookinfo demo application, all of the spans are generated by the Envoy proxies. As a result, there are two spans for each RPC between services: one for the client side of the RPC and one for the server side. There’s a lot of interesting information in each of these spans, so let’s take a minute to parse it out.

Integrating LightStep with Istio for Distributed Tracing at Scale
[x]PM shows parent-child relationships and contextual info for spans involved in loading the Bookinfo application.

In the image above, the highlighted span has three pieces of information directly available in the trace view: Operation, Service, and Duration. Let’s analyze each one separately.

Operation: reviews.default.svc.cluster.local:9080/*
The span “operation” or “name” is used to identify which action in your system the span represents. In our example, the spans are all created by Envoy, which by default, sets operation name to the host of the invoked service. In this example, the operation represents the server side of the RPC between the Product Page service and the Reviews service.

Service: reviews.default: proxy server
A span’s “service” is its place of origin. In this example, we can see that the span came from the Reviews service in the default namespace. We’re also given an additional bit of information that this span was generated by the Envoy proxy acting as an RPC server.

Duration: 1.26s
This is the amount of time it took for the span in question to complete from start to finish. Looking at the highlighted span, we can see that it has multiple child spans that are spawned from it. The duration of 1.26 seconds encompasses the amount of time taken by the highlighted span and all of its children.

In addition to the three high-level pieces of information shown in the trace view, each span comes with Tag information which is shown in the sidebar on the right. These Tags are used to provide valuable contextual data about the span in question. For example, we might be surprised to see that a request to the Reviews service is taking longer than a second. From the architecture diagram, we know that there are currently three different versions of the Review service deployed. Maybe the issue is with one of the specific versions? Looking at the associated Tags we see:

Node_id: sidecar~

This additional metadata helps us to figure out that this span came from Reviews-v3 allowing us to target our debugging.

Closing Thoughts

Service Mesh technologies like Istio not only make it easier to manage and operate microservices architectures, but they can also greatly simplify the process of capturing distributed traces across your application. Using [x]PM alongside your Istio deployment can be an invaluable tool for figuring out what went wrong when the inevitable happens.

If you’re already using [x]PM and Istio (or plan to use it soon), we’d love to hear from you about what you’d like to see next for this integration as we continue to expand on this initial release.

We're Hiring!

Add your talent and experience to our team of friendly, low-ego, and motivated people.