Lightstep from ServiceNow Logo

Products

Solutions

Documentation

Resources

Lightstep from ServiceNow Logo
< all blogs

Developing a culture of observability

In the race to attract and retain customers, businesses must deliver remarkable customer experiences, release reliable products fast, and review costs to achieve consistent growth. That can either be a well-oiled machine or a tangle of disjointed communications and workflows that frustrates customers, employees, and management alike. But, by developing a culture of observability, you can have a framework that harmonizes the experience for everyone.

What is observability and why does it matter?

Observability is the ability to gain insight quickly and efficiently into the health of your tech estate through gathering, correlating, and interpreting metrics, distributed traces, and log data. This data is also called telemetry.

Let’s look at an example of observability in action in the world of healthcare.

If you are having a possible heart attack, you have to act quickly to save precious heart muscle and your life. As first responders arrive on the scene to help you (or you check yourself into the emergency room), they gather metrics (blood pressure, heart rate, blood oxygen saturation) and begin to log the event, and other important information like health history, recent meals, and drinks, etc. But, to really gain insight into the health of the heart, a trace of the heart is done with an electrocardiogram or ECG.

If you've ever watched a show, movie, or been in a hospital, you’ve probably seen screens like the one below. At the top is an ECG trace along with aggregated metrics of the types of events encountered. This trace and the shape of the waves in green describe the health, timing, and condition of the actual physical heart to the provider.

ECG Screen

Below is a diagram of the human heart and how the shape of the trace corresponds to the location it was conducted through the heart and where the location of the problem most likely exists. To a healthcare practitioner, the shape of the trace conveys the specific heart health to them just as tracing of running code helps us technically understand where problems may exist and how mitigation can occur.

Cardiac activity

Public domain


Additional metrics, from blood draws, can also be captured to confirm the diagnosis and guide the care team to remediation and recovery. All of these elements are necessary to make a quick diagnosis and determine the correct measures to save the patient’s life. And it’s all a form of observability.

It’s the same for our code and systems. Just as with life-saving treatments, time matters when it when it comes to the health of your tech estate. Deep real-time inspection of running processes leads to reduced mean-time-to-resolution (MTTR).

Ultimately, you want your customers and your employees to have a positive, seamless experience. And, if an issue does pop up, a quick resolution and painless way to examine the outcomes. Everyone benefits.

How can you develop a culture of observability?

Your observability journey may begin with a few enthusiastic team members evangelizing the practice and enrolling stakeholders to adopt it. This is where proofs of concept, demos, and showcases come in. By helping key technical and organizational leaders understand why strong observability will have a positive impact on the business, you’re more likely to get buy-in and sponsorship.

BTW: The OpenTelemetry DemoDemo repository is a great way for teams using different programming languages to get started.

Often, Site Reliability Engineering (SRE) may introduce the concept of observability, as it’s central to the practice as started by Google engineering. Engineering and Product teams may also experiment with OpenTelemetry by adding it to services to help show its power.

That said, fostering a culture of observability can begin anywhere. When an executive asks product or engineering, “How could we have caught that?” it can trigger an observability exercise to determine the services involved and how to resolve the issues. If a customer contacts support with an issue, the product or support teams should lead the exercise. Invite people to the conversation and create modes of accountability through health targets like Service Level Objectives. Reducing incidents and improving quality should be a consistent message and encouraged and incentivized.

Why should you develop a culture of observability?

It’s important to ask, “Why would we do this?” Then, gather a list of possible reasons and of key team members who can help realize the vision. Here are some answers to consider:

  • Improved mean time to resolution (MTTR)

  • Increased operational efficiency

  • Proactive detection of issues before they impact customers

  • Improved Customer Experience (CX)

  • Ability to redirect resources to innovation

  • Business and/or revenue growth

  • Reduced dev burnout / employee turnover

  • Increased speed to market for new products and services

  • Predictable operating expense (OpEx) spend

You can see that the list addresses three things:

  1. Customer success outcomes

  2. The forces that can lead to team burn out

  3. The high-level business outcomes an enterprise might see

Observability is a journey

It’s important to note that observability is a journey of continuous improvement. To start, identify your critical workflows or services, instrument them, and observe them over a specified time period to determine health targets (like service level objectives, etc.). Then, alert thresholds can be determined. Your goal is to not overwhelm product or support teams with “alert noise,” but to achieve an accurate level with strong health targets. Take the time to observe and then you can make determinations from the p90 and p99 thresholds more precisely.

The diagram below is a good example of this journey. It shows continuous improvement and communicates to everyone that analysis and refinement brings excellence to the organization and helps manage change along the way.

Diagram of the continuous journey cycle in observability

Observability is a culture

Think back to the example of the heart attack. Without some of these metrics, logs, or traces, not only would patients face higher mortality rates, providers could also face negative outcomes. Institutional funding could be jeopardized by negative KPIs. This may be happening in your organization as well. If so, it’s time to evolve and see the overall health of your organization improve. Observability should not be seen as a sunk cost but as an opportunity cost. Adopting this culture from leadership to individual contributors will make help and offer ways to create profitability and efficiency through practice.

By adopting open standard technologies and refining your observability strategy, you can help your organization grow.

Learn how you can increase operational efficiency and make your tech estate and your team more resilient. Schedule a demoSchedule a demo to see it in action.

January 4, 2023
6 min read
Observability

Share this article

About the author

Doug Odegaard

Doug Odegaard

Read moreRead more
Observability

The origin of cloud native observability

Jason English | Jan 23, 2023

Almost every company that depends on digital capabilities is betting on cloud native development and observability. Jason English, Principal Analyst at Intellyx, looks at the origins of both and their growing role in operational efficiency.

Learn moreLearn more
Observability

Gain agility through observability

Heather Waters | Jan 19, 2023

As we navigate geopolitical challenges, macroeconomic headwinds, and the post-pandemic comedown, there is pressure to drive transformation, reduce costs, and be more efficient. See how observability can help you rise to the challenge and be more agile.

Learn moreLearn more
Observability

KubeCon North America 2022: A Retrospective

Adriana Villela, Ana Margarita Medina | Nov 7, 2022

Adriana, as a first-time KubeCon attendee, and Ana, as a four-time KubeCon attendee share their thoughts on KubeCon North America 2022

Learn moreLearn more
THE CLOUD-NATIVE RELIABILITY PLATFORM

Lightstep sounds like a lovely idea

Monitoring and observability for the world’s most reliable systems