Lightstep Brings Observability Automation to GitHub
by Fran Thorpe
You’ve finished your pull request, merged it, and deployed the code. Then, your phone starts freaking out. Looks like your service is taking forever to respond, but where’s the bug?
Since your change is one of many in this version, there’s no way to understand the impact on production without going live. Why? Because your service is reliant on other services with exponential dependencies whose performance can negatively impact your own. In short, you’re responsible for so much more than you can possibly control.
At Lightstep, we’re also developers, and we know that deploys can often result in regressions that take time to investigate. We know you can end up burning hours trying to discover what changed when you shipped your code. Often when investigating you have to move from one tab to the next when reviewing logs and metrics, monitoring dashboards, and responding to messages from your team.
We knew there had to be a better way. And since we already use GitHub, we wondered what would happen if we integrated our powerful service health features with GitHub source code management.
We took the experiment to heart during our recent Lightstep hackathon and built a proof of concept that’s now making its way to beta. Let’s take a look!
Imagine you’ve recently deployed a change to your “Checkout Service” for the “process-pull-request” operation. In Lightstep, you can see your recent deployment activity and understand the performance of your latest release. Now, we’ve added the ability to view and compare pull request versions.
In the following image, you can review a comparison between the recent pull request V5.3.0 and the version V5.3.1.that’s currently being deployed. As illustrated, a noticeable latency regression occurred right after deployment. With this information, you can associate the actual pull requests that went out with this deployment.
Understanding what changed and who changed it is the next step. When you click “View Pull Requests”, a prompt appears where you can review all listed changes. This gives you a quick summary of who and what might have triggered the regression in the “Compute and Store” operation.
We know that context and screen switching slows you down. Since your work lives in GitHub, we want to keep you there as much as possible. This means we took the workflow one step further, and implemented message automation between Lightstep and GitHub. Now if there’s a regression during a deployment, a message is automatically sent from Lightstep to the related pull request conversation in GitHub. In the following image, the service owner can see a latency issue notification and dive straight into understanding why.
Code and configuration changes are some of the most common causes for performance regressions. With Lightstep, you know immediately that your service is working as designed, and can quickly identify and address issues when they occur.