Transform your troubleshooting journey with Lightstep
by Sachi Shah
Our team has been hard at work building, iterating, and transforming the troubleshooting journey with Lightstep Notebooks. We strongly believe Notebooks are an indispensable tool for engineering teams working together to investigate, mitigate, and resolve incidents. Since Notebooks became available to select customers in March of this year, it has helped countless SRE and DevOps teams streamline and accelerate their troubleshooting journeys. When coupled with our new Lightstep Incident Response postmortem capabilities, we enable teams to collaborate to resolve issues in real time - and learn from them to prevent them in the future.
With Notebooks, the ability to perform ad-hoc analysis of your data is simplified, and streamlined. Analysis - in context - reduces mean time to resolution and drives proactive performance improvements. Without breaking your investigative flow, you can now quickly understand the scope of the problem by looking upstream and downstream with data powered by your tracing. From a Notebook (or Dashboard chart), you can easily see supporting evidence for a correlation and understand how your SLIs have deviated for each correlation to then share and collaborate across teams. Lightstep empowers teams to see which parts of the system are correlated and how they are related - in context - to instantly understand changes in service health and – most importantly – what caused those changes.
So, what else have we been up to when it comes to improving the troubleshooting journey for organizations?
Lightstep is known for its ability to handle seriously large volumes of data at scale. With so much data, it requires an ability to sift through that data firehouse quickly. We are officially launching Lightstep Heatmaps. Heatmaps offer intuitive insights from powerful data visualizations to easily progress in your analysis. Lightstep’s newest visualization takes a developer favorite, histograms, and adds a new dimension to the data, essentially creating “histograms over time.” Heatmaps allow you to visualize the distribution of data for a given query over time providing users with a 3-dimensional view of your data to more easily spot “interesting moments,” identify outliers and edge cases to speed up investigation into problems impacting more customers, and visualize the shape of their distribution data over time. Lightstep Heatmaps are now available as a chart type in our dashboards or Notebooks.
Lightstep Notebooks helps organizations spend less time debugging and more time delivering value. Improve incident response rates with context of the problem and begin collaborating on alerts and incident response. Notebooks is a collaborative tool designed to help developers, SREs, and Ops teams collaborate to resolve issues in real-time. The average number of experts involved in triage increases as systems become more complex. Ensure your team is equipped to resolve the issues that matter most.
Notebooks provide organizations with the ability to tell data-driven stories about incidents that can include text, graphs, and other various sources of data in a single, comprehensive view.
Combined with powerful correlations -- powered by Change Intelligence, Lightstep’s analysis engine -- Lightstep Notebooks make it possible to run real-time queries when investigating an incident for more effective executive reporting and post-incident analysis to help organizations transform the troubleshooting journey with Lightstep. You can access Change Intelligence within Lightstep Notebooks by clicking the “Analyze Deviation” button so that any developer, operator, or SRE can instantly understand changes in their service’s health and – most importantly – what caused those changes by suggesting routes to continue their investigation. Observability solutions enable teams to ask questions of their data, Lightstep Notebooks’ deviation analysis helps organizations ask the right questions of their data.
From the moment your pager goes off, to the official close of the incident, with Lightstep, you can now create postmortems across the entire life of an issue. As every organization works to improve the availability and performance of their services, the reality is that outages and human error are inevitable even for the largest, most robust organizations. We’ve already covered how Lightstep Notebooks can help with your postmortems, but did you know that you can carry your postmortem view into your incident response solution as well? Lightstep Incident Response helps teams auto-populate postmortems throughout the incident process. Once the service is restored, a postmortem review meeting can walk through what happened and capture lessons learned. The team can create a postmortem document to formally track the root cause analysis of the issue and the actions required to prevent this issue from occurring in the future.
Monitoring solutions box you into specific investigative flows often overlooking adjacent problem areas or grouping those as separate issues. Lightstep correlates across service boundaries, and isn’t limited by telemetry type to identify issues wherever they occur in your stack. We launched our Unified Query Builder back in December to harness the power of Lightstep across all your telemetry -- metrics, logs, and traces. With Notebooks, you can search across all your data types -- no need to know what you’re looking for before you start looking in true observability fashion.
We believe Notebooks from Lightstep are an indispensable tool for engineering teams working together to investigate, mitigate, and resolve incidents.
Lightstep Notebooks enable DevOps and SRE teams to run ad-hoc queries while investigating an incident or proactively optimizing their application. Notebooks, which leverage Change Intelligence, allows any developer, operator, or SRE to instantly understand changes in their service’s health and – most importantly – what caused those changes. This allows teams to quickly share and collaborate on findings for faster incident resolution, and proactive optimization. This is critical when investigating an incident, collaborating across teams, and quickly documenting learnings via notebooks to share across the organization.