Three Pillars with Zero Answers – Towards a New Scorecard for Observability

This article originally appeared on Medium.

The orthodoxy

Have you heard about the “three pillars of observability” yet? No? The story goes like this:

If you’re using microservices, you already know that they’re nearly impossible to understand using a conventional monitoring toolchain: since microservices were literally designed to prevent “two-pizza” devops teams from knowing about each other, it turns out that it’s incredibly difficult for any individual to understand the whole, or even their nearest neighbor in the service graph.

Well, Google and Facebook and Netflix were building microservices before they were called microservices, and I read on Twitter that they already solved all of these problems… phew! They did it using Metrics, Logging, and Distributed Tracing, so you should, too – those are called “The Three Pillars of Observability,” and you probably even know what they look like already:

LightStep - Beware Observability Dogma - Metrics
Metrics!
LightStep - Beware Observability Dogma - Logs
Logs!
LightStep - Beware Observability Dogma - Traces
Traces!

So, if you want to solve observability problems like Google and Facebook and Twitter, it’s simple… find a metrics provider, a logging provider, a tracing provider, and voila: your devops teams will bask in the light of an observable distributed system.

Fatal flaws

Perhaps the above is hyperbolic. Still, for those who deployed “the three pillars” as bare technologies, the initial excitement dissipated quickly as fatal flaws emerged.

Metrics and cardinality

For Metrics, we all needed to learn a new vocab word: cardinality. The beauty of metrics is that they make it easy to see when something bad happened: the metric looks like a squiggly line, and you can see it jump up (or down) when something bad happens. But diagnosing those anomalous moments is deeply difficult using metrics alone… the best we can do is to “drill down,” which usually means grouping the metric by a tag, hoping that a specific tag value explains the anomaly, then filtering by that tag and iterating on the drill-down process.

“Cardinality” refers to the number of elements in a set. In the case of metrics, cardinality refers to the number of values for any particular metric tag. If there are 5 values, we’re probably fine; 50 might be ok; 500 is probably too expensive; and once we get into the thousands, you simply can’t justify the ROI. Unfortunately, many real-world tags have thousands or millions of values (e.g., user-id, container-id, and so forth), so metrics often prove to be a dead end from an investigative standpoint.

Logging volumes with microservices

For Logs, the problem is simpler to describe: they just become too expensive, period. I was at a speaker dinner before a monitoring conference last year, and one of the other presenters – a really smart, reputable individual who ran the logging efforts for one of today’s most iconic tech companies – was giving a talk the following day about how to approach logging in a microservices environment. I was excited about the topic and asked him what his basic thesis was. He said, “Oh, it’s very simple: don’t log things anymore.”

It’s easy to understand why: if we want to use logs to account for individual transactions (like we used to in the days of a monolithic web server’s request logs), we would need to pay for the following:

LightStep - Beware Observability Dogma - Logging Costs

Logging systems can’t afford to store data about every transaction anymore because the cost of those transactional logs is proportional to the number of microservices touched by an average transaction. Not to mention that the logs themselves are less useful (independent of cost) due to the analytical need to understand concurrency and causality in microservice transactions. So conventional logging isn’t sufficient in our brave new architectural world.

Tracing and foreknowledge

Which brings us to “Distributed Tracing,” a technology specifically developed to address the above problem with logging systems. I built out the Dapper project at Google myself. It certainly had its uses, especially for steady-state latency analysis, but we dealt with the data volume problem by applying braindead, entirely random, and very aggressive sampling. This has long been the elephant in the room for distributed tracing, and it’s the reason why Dapper was awkward to apply in on-call scenarios.

The obvious answer would be to avoid sampling altogether. For scaled-out microservices, though, the cost is a non-starter. It’s more realistic to defer the sampling decision until the transaction has completed: this is an improvement, though that approach masks a crucial question: which traces should we sample, anyway? If we’re restricting our analysis to individual traces, we typically focus on “the slow ones” or those that result in an error; however, performance and reliability problems in production software are typically a byproduct of interference between transactions, and understanding that interference involves much more sophisticated sampling strategies that aggregate across related traces that contend for the same resources.

In any case, a single distributed trace is occasionally useful, but a bit of a hail mary. Sampling the right distributed traces and extracting meaningful, accessible insights is a broader challenge, yet much more valuable.

And about emulating Google (et al.) in the first place…

Another issue with “The Three Pillars” is the very notion that we should always aspire to build software that’s appropriate for the “planet-scale” infrastructure at Google (or Facebook, or Twitter, and so on). Long story short: don’t emulate Google. This is not a dig on Google – there are some brilliant people there, and they’ve done some terrific work given their requirements.

But! Google’s technologies are built to scale like crazy, and that isn’t necessarily “good”: Jeff Dean (one of those brilliant Googlers who deserves all of the accolades – he even has his own meme) would sometimes talk about how it’s nearly impossible to design a software system that’s appropriate for more than 3-4 orders of magnitude of scale. Further, there is a natural tension between a system’s scalability and its feature set.

Google’s microservices generate about 5 billion RPCs per second; building observability tools that scale to 5B RPCs/sec therefore boils down to building observability tools that are profoundly feature poor. If your organization is doing more like 5 million RPCs/sec, that’s still quite impressive, but you should almost certainly not use what Google uses: at 1/1000th the scale, you can afford much more powerful features.

Bits vs Benefits

So each “pillar” has a fatal flaw (or three), and that’s a problem. The other problem is even more fundamental: Metrics, Logs, and Distributed Traces are just bits. Each describes a particular type of data structure, and when we think of them, we tend to think of the most trivial visualization of those data structures: metrics look like squiggly lines; logs look like a chronological listing of formatted strings; and traces look like those nested waterfall timing diagrams.

None of the above directly addresses a particular pain point, use case, or business need. That is, with the “three pillars” orthodoxy, we implicitly delegate the extraordinarily complex task of actually analyzing the metric, log, and trace data as “an exercise to the reader.” And, given the fatal flaws above and the subtle interactions and co-dependencies between these three types of data, our observability suffers greatly as a result.

In our next installment…

We need to put “metrics, logs, and tracing” back in their place: as implementation details of a larger strategy – they are the fuel, not the car. We need a new scorecard: stay tuned for our next post, where we will introduce and rationalize a novel way to independently measure and grade an observability strategy.

Hippies, Ants, and Healthy Microservices

This article originally appeared on Medium.

For any organization that expects its developers to produce powerful software, the decision to adopt microservices should be an easy one – developer velocity is king, and that’s hard to come by in the byzantine build-test-release lifecycles of monolithic software architectures. But the initial commitment to adopt microservices is much simpler than the decisions that follow about how to structure that adoption: there are uncountably many blog posts addressing various and sundry technical details, and dozens of partially-overlapping solutions for every problem, even (especially?) for the ones you haven’t encountered yet.

And yet, despite the mountains of content about the technical details, it surprises me how little has been written about the biggest failure mode I’ve seen out there in the wild: a fundamental misunderstanding of the goals surrounding a microservices migration, and how those goals best translate into engineering management practices. In particular, the conventional wisdom makes a microservices-oriented engineering organization sound like a hippie commune. But it should probably feel more like an ant colony.

Hippies

LightStep Microservices - Hippies
Hippies!

Before I proceed, let it be known that I have a soft spot for the hippies of yore. I love idealists as long as they’re peaceful, and you can’t get much more peaceful or idealistic than a good, old-fashioned hippie. If the hippie ethos could be distilled into a single value, it would be the freedom to make independent decisions and act on them.

There are other posts offering greater detail (I’m especially fond of this article about the intersection of management and microservices from Vijay Gil, SVP Eng at Databricks), but to summarize: the only good reason to adopt microservices is to accelerate development through reduced human communication overhead. The idea is that each microservice gets its own development team, and these teams stay out of each other’s way – i.e., they make completely independent decisions that further their own goals, and they try to allow others to do the same. It’s like “the Me generation” for software.

But I don’t think hippies have the right instincts for engineering management. What happens if we try to truly maximize the independence of distinct microservice teams? Every dev team chooses the language, frameworks, message queue, CI/CD strategy, and naming conventions (etc) that make the most sense for their service and their expertise as a group. Since every service and situation is different, this appears to be a rational strategy: after all, aren’t microservices about increasing parallelism in decision-making (if not the software itself)?

Ants

LightStep Microservices - Ants
Ants!

And yet there are many flavors of independence. Ants are certainly enterprising little creatures: they readily explore every nook and cranny, they can famously carry up to 50x their own body weight, and some species build architecturally marvelous structures for themselves. But they use social and utterly standardized behaviors (and some pheromones) to facilitate their own versions of load-balancing, discovery, security, and replication.

LightStep Microservices - Ants
Service discovery for ants

While one can observe an individual ant and reason about their actions in the context of their environment, their most adaptive behaviors rely on “biological standardization.” For instance, if an individual wanders its way to a plentiful food source, that ant will emit a “trail pheromone” and head straight back to the colony; their fellow ants pick up the scent and use it to backtrack to the food source.

Similarly, when ants are in an alarmed or panicked state, they emit chemicals that alert their peers to the threat and protect the group. And so on and so forth: for every macrobehavior that benefits the colony, there is a standard chemical mechanism that all individuals understand and obey that facilitates that macrobehavior.

These collective adaptations have made ants one of the most “horizontally scalable” animals on Earth: the largest ant colony is 3,700 miles wide and is home to billions of individual organisms. They are remarkable animals!

…and back to microservices

There’s no question that hippies are more independent than ants. And I suppose I should acknowledge that ants would make lousy engineering managers (they can’t even drink coffee). But when we’re spinning up microservices, we have a lot to learn from ants and other hive-minded animals: their reliance on the rigid standardization of certain functions facilitates optimal outcomes for the group as a whole.

There’s always a temptation to allow each service team to decide on a language, a stack, and a set of primitives that feel familiar or appropriate to them. This is well-intentioned, as it seems to maximize the autonomy of the distinct service teams. But in a microservices deployment – especially at scale – we must also facilitate cross-cutting concerns like deployment, load-balancing, service discovery, security, and observability. If we encourage our two-pizza teams to make entirely independent decisions about each of these critical aspects, we are left with a monstrous challenge when operating our distributed application, especially as teams disband and services go into maintenance mode.

When transitioning towards a microservices architecture, it’s best to create a limited number of choices – ideally only one – for each cross-cutting aspect of the larger system. For example:

  • Programming language(s)
  • Service (and infrastructure) naming conventions
  • Orchestration and auto-scaling
  • Web/RPC framework
  • Service-to-service authentication
  • Instrumentation for logging
  • Instrumentation for tracing (I am obligated as a co-creator to plug OpenTracing for this)
  • Instrumentation for metrics
  • Service discovery
  • Load balancing
  • (and so on…)

By standardizing in these areas, a central team can manage these well-factored facets of the larger system, and the developers working on the microservices themselves can focus on what’s most important: building something valuable.

Performance is a Shape, Not a Number

This article originally appeared on Medium.

Applications have evolved – again – and it’s time for performance analysis to follow suit

In the last twenty years, the internet applications that improve our lives and drive our economy have become far more powerful. As a necessary side-effect, these applications have become far more complex, and that makes it much harder for us to measure and explain their performance – especially in real-time. Despite that, the way that we both reason about and actually measure performance has barely changed.

I’m not here to argue about the importance of understanding real-time performance in the face of rising complexity – by now, we all realize it’s vital – but for the need to improve our mental model as we recognize and diagnose anomalies. When assessing “right now,” our industry relies almost entirely on averages and percentile estimates: these are not enough to efficiently diagnose performance problems in modern systems. Performance is a shape, not a number, and effective tools and workflows should present and explore that shape, as we illustrate below.

We’ll divide the evolution of application performance measurement into three “phases.” Each phase had its own deployment model, its own predominant software architecture, and its own way of measuring performance. Without further ado, let’s go back to the olden days: before AWS, before the smartphone, and before Facebook (though perhaps not Friendster)…

Watch our tech talk now. Hear Ben Sigelman, LightStep CEO, present the case for unsampled latency histograms as an evolution of and replacement for simple averages and percentile estimates.

Phase 1: Bare Metal and average latency (~2002)

LightStep - the Stack 2002The stack (2002): a monolith running on a hand-patched server with a funny hostname in a datacenter you have to drive to yourself.

If you measured application performance at all in 2002, you probably did it with average request latency. Simple averages work well for simple things: namely, normally-distributed things with low variance. They are less appropriate when there’s high variance, and they are particularly bad when the sample values are not normally distributed. Unfortunately, latency distributions today are rarely normally distributed, can have high variance, and are often multimodal to boot. (More on that later)

To make this more concrete, here’s a chart of average latency for one of the many microservice handlers in LightStep’s SaaS:

LightStep - Recent Average LatencyRecent average latency for an important internal microservice API call at LightStep

It holds steady at around 5ms, essentially all of the time. Looks good! 5ms is fast. Unfortunately it’s not so simple: average latency is a poor leading indicator of reliability woes, especially for scaled-out internet applications. We’ll need something better…

Phase 2: Cloud VMs and p99 latency (~2012)

LightStep - the Stack 2012The stack (2012): a monolith running in AWS with a few off-the-shelf services doing special-purpose heavy lifting (Solr, Redis, etc).

Even if average latency looks good, we still don’t know a thing about the outliers. Per this great Jeff Dean talk, in a microservices world with lots of fanout, an end-to-end transaction is only as fast as its slowest dependency. As our applications transitioned to the cloud, we learned that high-percentile latency was an important leading indicator of systemic performance problems.

Of course, this is even more true today: when ordinary user requests touch dozens or hundreds of service instances, high-percentile latency in backends translates to average-case user-visible latency in frontends.

To emphasize the importance of looking (very) far from the mean, let’s look at recent p95 for that nice, flat, 5ms average latency graph from above:

LightStep - Recent p95 LatencyRecent p95 latency for the same important internal microservice API call at LightStep

The latency for p95 is higher than p50, of course, but it’s still pretty boring. That said, when we plot recent measurements for p99.9, we notice meaningful instability and variance over time:

LightStep - Recent p99.9 LatencyRecent p99.9 latency for the same microservice API call. Now we see some instability.

Now we’re getting somewhere! With a p99.9 like that, we suspect that the shape of our latency distribution is not a nice, clean bell curve, after all… But what does it look like?

Phase 3: Microservices and detailed latency histograms (2018)

LightStep - the Stack 2018The stack (2018): A few legacy holdovers (monoliths or otherwise) surrounded — and eventually replaced — by a growing constellation of orchestrated microservices.

When we reason about a latency distribution, we’re trying to understand the distinct behaviors of our software application. What is the shape of the distribution? Where are the “bumps” (i.e., the modes of the distribution) and why are they there? Each mode in a latency distribution is a different behavior in the distributed system, and before we can explain these behaviors we must be able to see them.

In order to understand performance “right now”, our workflow ought to look like this:

  1. Identify the modes (the “bumps”) in the latency histogram
  2. Triage to determine which modes we care about: consider both their performance (latency) and their prevalence
  3. Explain the behaviors that characterize these high-priority modes

Too often we just panic and start clicking around in hopes that we stumble upon a plausible explanation. Other times we are more disciplined, but our tools only expose bare statistics without context or relevant example transactions.

This article is meant to be about ideas (rather than a particular product), but the only real-world example I can reference is the recently released Live View functionality in LightStep [x]PM. Live View is built around an unsampled, filterable, real-time histogram representation of performance that’s tied directly to distributed tracing for root-cause analysis. To get back to our example, below is the live latency distribution corresponding to the percentile measurements above:

LightStep - A Real-Time View of LatencyA real-time view of latency for a particular API call in a particular microservice. We can clearly distinguish distinct modes (the “bumps”) in the distribution; if we want to restrict our analysis to traces from the slowest mode, we filter interactively.

The histogram makes it easy to identify the distinct modes of behavior (the “bumps” in the histogram) and to triage them. In this situation, we care most about the high-latency outliers on the right side. Compare this data with the simple statistics from “Phase 1” and “Phase 2” where the modes are indecipherable.

Having identified and triaged the modes in our latency distribution, we now need to explain the concerning high-latency behavior. Since [x]PM has access to all (unsampled) trace data, we can isolate and zoom in on any feature regardless of its size. We filter interactively to hone in on an explanation: first by restricting to a narrow latency band, and then further by adding key:value tag restrictions. Here we see how the live latency distribution varies from one project_id to the next (project_id being a high-cardinality tag for this dataset):

LightStep - Isolate and Zoom In on Any FeatureGiven 100% of the (unsampled) data, we can isolate and zoom in on any feature, no matter how small. Here the user restricts the analysis to project_id 22, then project_id 36 (which have completely different performance characteristics). The same can be done for any other tag, even those with high cardinality: experiment ids, release ids, and so on.

Here we are surprised to learn that project_id 36 experienced consistently slower performance than the aggregate. Again: Why? We restrict our view to project_id=36, filter to examine the latency outliers, and open a trace. Since [x]PM can assemble these traces retroactively, we always find an example, even for rare behavior:

LightStep - End-to-End Transaction TracesTo attempt end-to-end root cause analysis, we need end-to-end transaction traces. Here we filter to outliers for project_id 36, choose a trace from a few seconds ago, and realize it took 109ms to acquire a mutex lock: our smoking gun.

The (rare) trace we isolated shows us the smoking gun: that contention around mutex acquisition dominates the critical path (and explains why this particular project — with its own highly-contended mutex — has inferior performance relative to others). Again, compare against a bare percentile: simply measuring p99 latency is a far cry from effective performance analysis.

Stepping back and looking forward…

As practitioners, we must recognize that countless disconnected timeseries statistics are not enough to explain the behavior of modern applications. While p99 latency can still be a useful statistic, the complexity of today’s microservice architectures warrants a richer and more flexible approach. Our tools must identify, triage, and explain latency issues, even as organizations adopt microservices.

If you made it this far, I hope you’ve learned some new ways to think about latency measurements and how they play a part in diagnostic workflows. LightStep continues to invest heavily in this area: to that end, please share your stories and points of view in the comment section, or reach out to me directly (Twitter, Medium, LinkedIn), either to provide feedback or to nudge us in a particular direction. I love to nerd out along these lines and welcome outside perspectives.

Want to work on this with me and my colleagues? It’s fun! LightStep is hiring.

Want to make your own complex software more comprehensible? We can show you exactly how LightStep [x]PM works.

KubeCon 2017: The Application Layer Strikes Back

You know it’s a special event when it snows in Texas.

Several of my delightful colleagues and I just returned from a remarkably chilly – and remarkably memorable – trip to Austin for KubeCon+CloudNativeCon Americas. We went because we were excited to talk shop about the future of microservices with 4,500 others involved with the larger cloud-native ecosystem. We had high hopes for the conference, as you won’t find a higher-density group of attendees when it comes to strategic, forward-thinking infrastructure people; yet even our lofty expectations were outdone by the buzz and momentum on display at the event.

On Wednesday at 7:55am, I emerged from my hotel room and had the good fortune of running into the inimitable Kelsey Hightower on my way to the elevator. I never miss an opportunity to learn something from Kelsey, so I asked him what was new and special in k8s-land these days. His response, paraphrased, was that “the big feature news is that – finally – we don’t have a big new feature in Kubernetes.” He went on to explain that this newfound stability at the infrastructural layer is a huge milestone for the k8s movement and opens the door to innovation above and around Kubernetes proper.

From an ecosystem standpoint, I was also lucky to speak with Chen Goldberg as part of a dinner that IBM organized. It was fascinating to hear how she and her team have architected the boundaries of Kubernetes to optimize for community. The project nails down the parts of the system that require standardization, while carving out white-space for projects and vendors to innovate around those core primitives.

This Kubernetes technology and project vision, along with its API stability, have led us to the present reality: Kubernetes has won when it comes to container orchestration and scheduling. That was not clear last year and was very far from clear two or three years ago, but with even the likes of AWS going all-in on Kubernetes, we have both OSS developers, startup vendors, and all of the big cloud providers bought in on the platform. So now everyone and their dog are going to become a Kubernetes expert, right?

Not really. It’s even better than that: our industry is evolving towards a reality where everyone and their dog are going to depend on Kubernetes and containers, yet nobody will need to care about Kubernetes and containers. This is a huge and much-needed transformation, and reminiscent of how microservice development looked within Google: every service did indeed run in a container which was managed by an orchestration and scheduling system (internally code-named “Borg”), but developers didn’t know or care how the containers were built, nor did they need to know or care how Borg worked.

So what will devs and devops care about? They will care about application-layer primitives, and those primitives are what KubeCon + CloudNativeCon was about this year. As I mentioned in my keynote on Wednesday, this means that devs and devops will be able to take advantage of CNCF technologies like service mesh (e.g., Envoy and Istio) as well as OpenTracing in order to tell coherent stories about the interaction between their microservices and monoliths.

We were humbled to hear existing LightStep customers telling folks who stopped by our booth how our solution has helped them tell clear stories about the most urgent issues affecting their own systems. Because LightStep integrates at the application layer – through OpenTracing, Envoy, transcoded logging data, or in-house tracing systems – it’s easy to connect our explanations for system behavior to the business logic and application semantics, and to steer clear of the poor signal-to-noise ratio of unfiltered container-level data.

Given the momentum behind Kubernetes and microservices in general, KubeCon felt like a glimpse into the future. That future will empower devs/devops to build and ship features faster and with greater independence. With CNCF’s portfolio of member projects fleshing out the stack around and above Kubernetes, we’re all moving to a world where we can stop caring about containers and keep our focus where it belongs: at the application layer where our developers write and debug their own software.

Announcing LightStep: A New Approach for a New Software Paradigm

(Image Credit: daneden.me)

Today, LightStep emerged from stealth, announced its first product, LightStep [x]PM, as well as its Series A and Series B funding.

With today’s launch, we’re excited to speak more openly about what we’ve been up to here at LightStep. As a company, we focus on delivering deep insights about every aspect of high-stakes production software. With our first product, LightStep [x]PM, we identify and troubleshoot the most impactful performance and reliability issues. This post is about how we got here and why we’re so excited.

I started thinking about this problem in 2004. It began during an impromptu conversation I had with Sharon Perl, a brilliant research scientist who came to Google in the early days. She was mainly working on an object store (à la S3) at the time but also had a few prototype side projects. We talked through five of them, I believe, but one captured my attention in particular: Dapper.

Dapper circa 2004 was not fully baked, though the idea was magical to me: Google was operating thousands of independently-scalable services (they’d be called “microservices” today), and Dapper was able to automatically follow user requests across service boundaries, giving developers and operators a clear picture of why some requests were slow and others ended with an error message. I was so enamored of the idea that I dropped what I was doing at the time, adopted the (orphaned) Dapper prototype, and built a team to get something production-ready deployed across 100% of Google’s services. What we built was (and is still) essential for long-term performance analysis, but in order to contend with the scale of the systems being monitored, Dapper only centrally recorded 0.01% of the performance data; this meant that it was challenging to apply to certain use cases, such as real-time incident response (i.e., “most firefighting”).

Ten years later, Ben Cronin, Spoons (Daniel Spoonhower), and I co-founded LightStep. Enterprises are in the midst of an architectural transformation, and the systems our customers and prospects build look a lot like the ones I grew up with at Google. We visit with enterprise engineering and ops leaders frequently, and what we see are businesses that live (or die) by their software, yet often struggle to stay in control of it given the overwhelming scale and complexity of their own systems.

We built LightStep to help with this, and we started with LightStep [x]PM to focus on performance and reliability in particular. Our platform is not a reimplementation of Dapper, but an evolution and a broadening of its value prop: with LightStep’s unconventional architecture, we can analyze 100.0% of transaction data rather than 0.01% of it like we did with Dapper. This unique – and technically sophisticated – approach gives our customers the freedom to focus on the performance issues that are most vital to their business and jump to the root cause with detailed end-to-end traces, all in real-time.

For instance, Lyft sends us a vast amount of data – LightStep analyzes 100,000,000,000 microservice calls every day. At first glance, that data is all noise and no signal: overwhelming and uncorrelated. Yet by considering the entirety of it, LightStep can measure how performance affects different aspects of Lyft’s business, then explain issues and anomalies using end-to-end traces that extend from their mobile apps to the bottom of their microservices stack. The story is similar for Twilio, GitHub, Yext, DigitalOcean, and the rest of our customers: they run LightStep 100% of the time, in production, and use it to answer pressing questions about the behavior of their own complex software.

The credit for what LightStep has accomplished goes to our team. We value technical skill and motivation, of course; that said, we also value emotional sensitivity, situational awareness, and the ability to prioritize and leverage our limited resources. LightStep will continue to innovate and grow well into the future, and the people here and their relationships with our inspiring customers are the reason why. The company has also benefited in innumerable ways from early investors Aileen Lee and Michael Dearing, the staff at Heavybit, and of course our board members from Redpoint and Sequoia, Satish Dharmaraj and Aaref Hilaly. Our board brings deep company-building experience as well as a humility and humor that we don’t take for granted.

It’s no secret that software is getting more powerful every day. As it does, it becomes more complex. LightStep exists in order to decipher that complexity, and ultimately to deliver insights and information that let our customers get back to innovating. Nothing gets us more excited than the success stories we hear from our customers. As we continue to build towards our larger vision, we look forward to hearing many more.