How we built Lightstep Metrics: 6 core principles to our user experience
by Kristen Bouchard
and Kay Watson
If you’re an engineer, you might find it challenging to answer these three high-level questions in a typical troubleshooting scenario:
- What is the severity of the issue?
- Is my service the problem or is it one of my dependencies?
- What are the biggest changes that most likely caused the issue?
We hear this from our customers and from our competitors’ customers so often that they’ve become the most important questions we strive to answer as easily as possible within our user experience.
Our goal is to enable engineers of all levels to resolve incidents with the skill and speed of a highly experienced engineer, by guiding them through hypotheses about “what caused a change” quickly and easily.
Earlier this year, Lightstep launched a ton of new things that expands our product into a more robust observability tool focused on helping developers troubleshoot issues with clarity and confidence.
Behind these newly released features, there are 6 key principles we follow in designing a useful, intuitive, and guided user experience. These principles were formulated from listening and watching the workflows of many engineers.
- Above all else, workflows matter
- Context should be organized to tell a story
- Sharing is a first-class citizen
- Progressively reveal info to help simplify the chaos and guide the user
- Use color and language thoughtfully
- When a user’s actions yield no results, be explicit on what happened and how to resolve
For years, it’s been the norm to have a lot of siloed features in monitoring tools. In turn, engineers have to gather data across multiple tools and multiple UIs as they form hypotheses. They are constantly jumping between dashboards, trace views, logs, building ad hoc queries to prove and disprove hypotheses. It can be extremely frustrating and highly stressful.
To identify the severity of an issue, engineers are constantly looking for patterns and anomalies in the shape of the data:
- Did something change suddenly and remain unchanged?
- Did something change then stop?
- Was there a gradual slope in a problematic direction?
With Lightstep’s newly released features, we streamlined the loop, reduced the manual process of sifting through data, and reduced the number of views and tools an engineer has to look at.
In Lightstep, your workflow to investigate these issues is triggered from the data itself. On any time series chart either in a dashboard or alert view, you can take action right from the data so you can quickly narrow down the problem.
It’s easy to become aware of a problem with your telemetry, but it’s very difficult to identify what caused a change. As an engineer is formulating a hypothesis, they constantly ask themselves questions like:
- If there was a spike in throughput for my service, was error rate also affected?
- What changes happened within my service?
- What changes happened in my dependencies?
It’s this process that helps them isolate and narrow down the issue.
With Lightstep’s Change Intelligence, answering these questions can be done at a glance and all within a single interface so you can quickly narrow down the problem. We sort the data by the biggest changes and highlight the magnitude of those changes so you know where to focus your attention.
Engineers tend to have a lot of uncertainty regarding the performance of their dependencies. Some are even unaware of what their service’s dependencies are.
With Change Intelligence, you can see in plain language where the most likely cause of health issues is stemming from - either from upstream, downstream, or intraservice. This information is not only useful when identifying the root cause, but can also be an educational tool for someone who is new to a team to help familiarize themselves with callers and callees.
Engineers on teams collaborate frequently. When you’re debugging and identifying a hypothesis, you need to share all the context you’ve built up so your teammates can jump in and help. Sharing that context can be frustrating, where filters or states are not persisted in URLs or you might want to jump a colleague right to a particular bit of information on a larger page.
Particularly in Lightstep’s Change Intelligence, you can share the most likely cause of a problem as an anchor link so your colleagues can see all the other context.
In other tools, you’re often drowning in unhelpful features. Overloading an interface with features can make you blind to capabilities or information that might be helpful in your workflow. When using common monitoring tools, you might find yourself asking, “I don’t know what to do next,” “I’m overwhelmed,” or you might resort to just click on everything to see what happens.
To reduce the chaos, progressively revealing information and actions can help guide you through an interface as you’re trying to find answers to a problem - when you need that guidance the most.
Inevitably, there is more data available than we can reasonably display. Knowing when to surface data in a workflow and when to show it upon request is one of the hardest aspects of designing an observability tool.
The language used in a user experience can either bring more confusion or the clarity you need in your workflow.
For example, when building a query, you can second guess yourself on what the right aggregation method would be for a question you are asking about your data - even for the most advanced engineers.
In Lightstep’s new workflows, we were very intentional with the language used to help educate new users and to ensure you have clarity each time you used our product. We also wanted to tell the story of what caused a change, how metrics and traces are connected, and how data might have affected any health changes.
Color is another element we use thoughtfully to ensure our interfaces are accessible to everyone. At Lightstep, we are very conscious of accessibility in our product.
For example, we avoid colors that already have intrinsic meaning in observability - red, yellow, greens - in our new data visualization color palettes. For someone who might be colorblind, elements that guide you to what is good or bad aren’t dependent on just color. Visual markers help clearly identify those elements.
When there is a lack of data, users need guidance as well! In Lightstep, we embrace the need to collect and display data from a variety of sources. This can sometimes lead to gaps in the query, which results in an incomplete or empty chart. We make a conscious effort to quickly surface the specifics of the incomplete query and point the viewer in the right direction to fix the issue.
Getting the answers you need from your observability tools can be challenging. When you’re a customer of Lightstep, you help us build better user experiences so you can get those answers easily. Understanding the “why” and finding the patterns behind the feedback is our ultimate goal.
Our solutions are influenced by the behaviors and needs of our end-users. We watch, listen, and go deep on discovering the problems our customers are trying to solve in their daily work. We go deep on understanding the context and meaning behind the feedback we receive, and thoughtful in applying that feedback.
We know there is more we can learn and improve in our solutions and we partner with our customers to get us there. Together we will provide engineering teams with the clarity and confidence they need to build and operate the software that powers our daily lives.
Take Lightstep for a spin using our interactive Sandbox.
Interested in joining our team? See our open positions here.