Lightstep from ServiceNow Logo

Products

Solutions

Documentation

Resources

Lightstep from ServiceNow Logo
< all blogs

Observability-Landscape-as-Code in Practice

If you follow Adriana's writings on ObservabilityAdriana's writings on Observability, you may recall a post from back in June introducing the concept of Observability-Landscape-as-Code (OLaC)Observability-Landscape-as-Code (OLaC).

An Observability Landscape is made up of the following pieces:

  • Application instrumentation

  • Collecting and storing application telemetry

  • An Observability back-end

  • A set of meaningful SLOs

  • Alerts for on-call Engineers

The Observability landscape is made up of Collecting and storing application telemetry, An Observability back-end, A set of meaningful SLOs, Alerts for on-call Engineers

Keeping that in mind, OLaC is simply the codification of your Observability Landscape, thereby ensuring consistency, maintainability, and reproducibility.

That’s all well and good, but how about seeing this thing in action? Well, my friend, you’ve come to the right place, because today, you get to see a tutorial featuring a number of OLaC practices in action!

Today, Ana Margarita MedinaAna Margarita Medina (my fellow DevRel and partner in crime) and I will be highlighting the following aspects of OLaC:

  1. Application instrumentationApplication instrumentation

    How? À la OpenTelemetryOpenTelemetry, via the OpenTelemetry Demo AppOpenTelemetry Demo App, which contains examples of TracesTraces and MetricsMetrics instrumentation.

  2. Collecting & storing application telemetryCollecting & storing application telemetry

    How: OpenTelemetry CollectorOpenTelemetry Collector is deployed via code (Helm chart), alongside the various services that make up the OpenTelemetry Demo AppOpenTelemetry Demo App.

  3. Codifying your Observability back-end configurationCodifying your Observability back-end configuration

    How: Using the Lightstep Terraform ProviderLightstep Terraform Provider to create dashboards in LightstepLightstep.

We wanted to showcase OLaC principles with a real-life example using modern cloud-native tooling...Which means using KubernetesKubernetes for our cloud infrastructure with Google CloudGoogle Cloud’s Kubernetes offering, GKEGKE. Now, since we are good practitioners of OLaC and SRE, we won’t just be setting things up through the clickity click of a UI. No sirreee. Instead, we’ll be #automatingAllTheThings using HashiCorpHashiCorp TerraformTerraform. Terraform allows us to do infrastructure-as-code (IaC), and gives us tons of added benefits like better control over our resources and standardization. Key principles in OLaC and IaC.

We will be deploying OpenTelemetry Demo AppOpenTelemetry Demo App to our cluster. The Demo App has been instrumented using OpenTelemetryOpenTelemetry, and will send TracesTraces and MetricsMetrics through the OpenTelemetry CollectorOpenTelemetry Collector to Lightstep.

Are you ready??? Let’s get started!

Tutorial

Pre-Requisites

Before you begin, you will need the following:

Steps

In today’s tutorial, we’ll be running Terraform code to showcase OLaC in action. For convenience, the main components have been broken up into separate Terraform modulesTerraform modules that will do the following:

  1. Create a Kubernetes cluster (GKE) in Google Cloud using the Google Terraform ProviderGoogle Terraform Provider. This is defined in the k8sk8s module in our repo.

  2. Deploy the OpenTelemetry Demo AppOpenTelemetry Demo App to the cluster using the Helm Terraform ProviderHelm Terraform Provider. This is defined otel_demo_appotel_demo_app module in our repo.

  3. Create Metrics dashboards in Lightstep Terraform ProviderLightstep Terraform Provider. This is defined in the lightsteplightstep module in our repo.

The full code listing for this tutorial can be found herehere.

1- Clone the example repo

Let's start by cloning the example repo:

git clone https://github.com/lightstep/unified-observability-k8s-kubecon.git

2- Initialize Sub-Modules

This project makes use of a few Git submodulesGit submodules, so in order to ensure that things work nicely, you’ll need to pull them in:

cd unified-observability-k8s-kubecon
git submodule init && git submodule update

3- Google Cloud Login

Before we can create a GKE cluster you must authenticate your Google Cloud account:

gcloud auth application-default login --no-launch-browser

You will be presented with a link which you need to open up in a browser, to authenticate your Google ID. Once you are authenticated, the browser will display an authorization token for you to paste in the command line, as follows:

gcloud auth prompt

4- Create terraform.tfvars

Now that you’re authenticated, let’s get ready to Terraform! Before you can do that, we need to create a terraform.tfvars file.

Lucky for you, we have a handy-dandy template that you can use to get started:

cd k8s-cluster-with-otel-demo/terraform
cp terraform.tfvars.template terraform.tfvars

Next populate the following values in the file:

  • <your_gcp_project>: The name of your Google Cloud project. Don’t know your project name? No problem! Just run gcloud config get-value project to find out what it is!

  • <your_gke_cluster_name>: The name you wish to give your GKE cluster. Make sure it follows Kubernetes cluster naming conventions (i.e. no underscores _ or special characters).

  • <your_lightstep_access_token>: Your Lightstep Access TokenLightstep Access Token. This is used to send Traces to your Lightstep ProjectLightstep Project.

  • <your_lightstep_api_key>: Your Lightstep API keyLightstep API key. This is used to create our Metrics dashboards.

  • <your_lightstep_org_name>: Your Lightstep organization name. Not sure what your organization is called? No problem! Log into Lightstep,and click on the person icon on the bottom left of your screen. This will pop up a little menu. The organization name can be found under the “Account Management” heading, like this:

Finding your Lightstep organization name

Notice that my organization is called “LightStep”. Yours will be different. Note also that Organization names are case-sensitive.

Note: terraform.tfvars is in .gitignore and won't be put into version control.

5- Run Terraform

This step will initialize Terraform (install providers locally), and then will apply the Terraform plan.

It will:

Before running the commands below, make sure that you’re already in the k8s-cluster-with-otel-demo/terraform folder.

terraform init
terraform apply -auto-approve

Please note that this step may take up to 30 minutes, depending on GKE’s disposition. Be patient. 😄

6- Update your kubeconfig

Now that the cluster is created, you can add it to your kubeconfig file! By default, the file is saved at $HOME/.kube/config.

Before you can update your kubeconfig, you first need to make sure that you have the gke-gcloud-auth-plugingke-gcloud-auth-plugin installed:

gcloud components install gke-gcloud-auth-plugin
gke-gcloud-auth-plugin --version
echo "export USE_GKE_GCLOUD_AUTH_PLUGIN=True" >> ~/.bashrc

Now we can add the cluster to kubeconfig:

gcloud container clusters get-credentials $(terraform output -raw kubernetes_cluster_name) --region $(terraform output -raw region)

This gets the kubernetes_cluster_name and region output values from Terraform (that’s the terraform output -raw stuff), and plunks those into your gcloud container clusters get-credentials command.

Or, if you closed the terminal in which you were running Terraform and lost your output values, you can also do this:

gcloud container clusters get-credentials <cluster_name> --region <region>

Where <cluster_name> and <region> correspond to the values you entered in Step 3 in your terraform.tfvars file.

7- Check out the OTel Demo app

Now that you’ve got the OpenTelemetry Demo AppOpenTelemetry Demo App deployed to your cluster, let’s take a look at it! First, let’s peek into Kubernetes to see what’s up.

If you run kubectl get ns, you’ll notice that there’s now a new namespace called otel-demo:

Results of running "kubectl get ns"

This is where we deployed the OTel Demo app. Let’s look into this namespace to see what we’ve created. First, let’s look at the pods with kubectl get pods -ns otel-demo:

Results of running "kubectl get pods"

Notice how we deployed a bunch of different services that make up the OTel Demo App, including adservice, cartservice, recommendationservice, etc.

We also deployed an OTel CollectorOTel Collector. Its configuration YAML is stored in a configmap. We can take a peek by running kubectl describe configmap otel-demo-app-otelcol -n otel-demo:

Results of running "kubectl describe configmap"

You can see that we also reference a variable called ${LS_TOKEN} which represents your Lightstep Access TokenLightstep Access Token, which you set in terraform.tfvars. But where is it? The secret is mounted to the OTel Collector container instance as a secret called otel-collector-secret. Let’s take a look at the secret by running kubectl describe secret otel-collector-secret -n otel-demo:

Results of running "kubectl describe secret"

All this magic happens in otel-demo-app-values-ls.yamlotel-demo-app-values-ls.yaml. This is a version of values.yamlvalues.yaml from the OTel Demo App Helm ChartOTel Demo App Helm Chart with updates to the Collector configs so that we can configure the OTel Collector to send Traces to Lightstep.

8 - Run the OTel Demo App

Okay...enough Kubernetes talk. Let’s look at the OpenTelemetry Demo AppOpenTelemetry Demo App! You can access the Demo App by Kubernetes port-forward:

kubectl port-forward -n otel-demo svc/otel-demo-app-frontend 8080:8080

To access the front-end, go to http://localhost:8080http://localhost:8080:

OTel Demo App UI running on http://localhost:8080

Go ahead and explore the amazing selection of telescopes and accessories, and buy a few. 😉🔭

9- See Traces in Lightstep

We can now pop over to Lightstep and check out some TracesTraces. Let’s do this by creating a NotebookNotebook.

First, click on the little page icon on the left nav bar (highlighted in blue, below). That will bring up this page:

Lightstep create blank notebook page

Next, we build our query for our Traces. Let’s look at the traces from the recommendationservice. We’ll do by entering recommendationservice in the field next to “All telemetry”. Because this is a service, select the second value from the drop-down, which says, “Use ‘recommendationservice’ as service value”, as per below:

Creating a notebook for the OTel Demo App's recommendationservice

After you select that value, you’ll see a chart like this:

Results of recommendationservice Trace query

The little green dots represent trace exemplars from that Service. Hover over one of them to see for yourself!

Trace exemplars for recommendationservice in Lightstep notebook

If you click on one of these dots, you’ll get taken to the Trace view. Before you click, be sure to save your Notebook first (don’t worry, you’ll get a reminder before you navigate away from the page)!

Here’s the Trace view we see when we click on the get_product_list dot (Operation) above::

OTel Demo App's recommendationservice Trace view

Pretty cool, amirite?

10- See Kubernetes Metrics in Lightstep

But wait...there’s more! Not only can we see Traces in Lightstep, we can also see MetricsMetrics!

Remember when you ran terraform apply? Well, not only did it create a Kubernetes cluster, deploy the OTel Demo App (and OTel Collector), it also created some handy-dandy Metrics dashboards for us.

You can check out the newly-created Metrics dashboards by going to the Dashboards icon (the icon with 4 little squares) on the left navigation bar:

Lightstep dashboard list for OTel Demo App

First, let’s check out the Kubernetes / Compute Resources / Cluster dashboard. This dashboard lets you see the state of your cluster.

Kubernetes Compute Resources Cluster Dashboard

We then have various other Metrics called Kubernetes Workload Metrics. These are the dashboards with names that start with “Kubernetes / Compute Resources / Workload”. These dashboards are specific to the services you are running. They take into account the Kubernetes Workloads in your various namespaces, using kube-state-metricskube-state-metrics. For a closer look, check out otel_demo_app_k8s_dashboard.tfotel_demo_app_k8s_dashboard.tf.

We used Lightstep’s Prometheus Kubernetes OpenTelemetry CollectorLightstep’s Prometheus Kubernetes OpenTelemetry Collector to get these Metrics into Lightstep. This Helm chart is inspired by kube-prometheus-stackkube-prometheus-stack, but with one crucial difference -- no Prometheus! We’re able to use recent enhancements to the OpenTelemetry Operator for KubernetesOpenTelemetry Operator for Kubernetes such as support for Service Monitors in order to scrape Prometheus metrics from pods, system components, and more.

The best part is that you don’t need to install and maintain a PrometheusPrometheus instance to be able to run it! But wait…there’s more! You also don’t need to use Lightstep as your Observability back-end in order to take advantage of this special Collector! How cool is that??

Note: You can learn more about the Prometheus Kubernetes OpenTelemetry CollectorPrometheus Kubernetes OpenTelemetry Collector by checking out the docs herehere.

For example, the Kubernetes / Compute Resources / Workload / otel-demo-app-cartservice dashboard displays metrics for the OTel Demo App’s cartservice. In it we can see how our containers and pods are doing based on Metrics such as those for CPU and Memory.

Workload dashboard for OTel Demo App's cartservice

11- See Application Metrics in Lightstep

Ah...but we’re not done with Metrics just yet! If you go back to the dashboard view and scroll to the very end of the list, you’ll see the OTel Demo App - Application Metrics dashboard.

Lightstep dashboard list for OTel Demo App

Let’s click on it to take a quick little peek!

OTel Demo App Metrics Dashboard

The latest version of the OTel Demo App emits both auto-instrumented and manually-instrumented Metrics. In today’s demo, we wanted to highlight some of the MetricsMetrics from the recommendationservice.

First, we have the auto-instrumented Python Metrics, which are captured from the Python runtime:

  • runtime.cpython.cpu_time: Track the amount of time being spent in different states of the CPU. This includes user (time running application code) and system (time spent in the operating system). This metric is represented as total elapsed time in seconds.

  • runtime.cpython.memory: Memory utilization

  • runtime.cpython.gc_count: Number of times the garbage collector has been called.

We also have one manually-instrumented Metric:

  • app_recommendations_counter: Cumulative count of the number of recommended products per service call

For more on the recommendationservice MetricsMetrics, check out this docthis doc. For more on MetricsMetrics captured by other services, check out the OTel Demo App service docsOTel Demo App service docs.

12- Teardown

If you’re no longer using this environment, don’t forget to tear down its resources, to avoid running up a huge cloud bill. You’re welcome. 😉

terraform destroy -auto-approve

This step can take up to 30 minutes, so please be patient! Also, you’ll probably notice that on first run, you’ll see the following error:

Terraform destroy error

Don’t panic! If you run terraform destroy -auto-approve again, it will finish nukifying all the things.

Final Thoughts

Today we got to see some aspects of Observability-Landscape-as-Code (OLaC) in practice! Specifically, we looked at the following elements:

We showcased this by using Terraform to:

  • Deploy the OpenTelemetry Demo App to Kubernetes. The Otel Demo App showcases the TracesTraces and MetricsMetrics instrumentation of different services in different languages using OpenTelemetry.

  • Deploy an OpenTelemetry Collector to Kubernetes (part of the Demo App deployment). The Collector is used to send application Traces and Metrics to Lightstep.

  • Configure Lightstep dashboards. The Lightstep Terraform provider allowed us to codify this.

Codifying our Observability Landscape means that we can tear down and recreate our application, Collector, and dashboards as needed, knowing that we’ll have consistency across the board every single time. Plus, it means that we can version control it, so that it’s not lost in the ether somewhere, or sitting in a secret server under Bob’s desk. Bonus!

Hopefully this gives you a nice little flavour of the power of OLaC, and will inspire you to go out there and start OLaC-ing too! (I just made up a new verb. You’re welcome.)

Whew! That was a lot to think about and take in! Give yourself a pat on the back, because we’ve covered a LOT! Now, please enjoy this picture of Adriana’s rat, Bunny, enjoying an almond!

Bunny the rat enjoying an almond treat

Peace, love, and code. 🦄 🌈 💫


The OpenTelemetry Demo AppOpenTelemetry Demo App is always looking for feedback and contributors. Please consider joining the OTel CommunityOTel Community to help make OpenTelemetry AWESOME!


Got questions about Observability-Landscape-as-Code? Talk to us! Feel free to connect with us through e-maile-mail, or:

Hope to hear from y’all!

October 25, 2022
12 min read
OpenTelemetry

Share this article

About the authors

Adriana Villela

Adriana Villela

Read moreRead more
Ana Margarita Medina

Ana Margarita Medina

Read moreRead more

This component is not supported.

This component is not supported.

This component is not supported.

THE CLOUD-NATIVE RELIABILITY PLATFORM

Lightstep sounds like a lovely idea

Monitoring and observability for the world’s most reliable systems