Lightstep from ServiceNow Logo

Products

Solutions

Documentation

Resources

Lightstep from ServiceNow Logo
< all blogs

APM is dying — and that’s okay

In APM’s heyday (think “New Relic and AppDynamics circa 2015”), the value prop was straightforward: “Just add this one magic agent and you’ll never need to wonder why your monolithic app is broken!”

APM is Dying

But then everything changed, and APM wasn’t able to change along with it. Here’s what happened…

Systems Got Deep

APM was designed for monoliths, where development revolved around a single application server. It turned out that monoliths slowed down developer velocity, so we broke them into layer upon layer of microservices.

apm-is-dying-and-thats-okay

In doing so, we enabled developers to build and release software faster, and with greater independence, as they were no longer beholden to the elaborate, Sisyphean release processes associated with monolithic architectures. 

But as these now-distributed systems scaled, it became increasingly difficult for developers to see how their own services depend on or affect other services, especially after a deployment or during an outage, where speed and accuracy are critical.

Conventional APMAPM tools weren’t built to understand or even represent these multi-layered architectures, let alone provide guidance on how to identify and improve performance when it matters most. 

Deep Systems

There are two ways systems scale. They can scale wide, or they can scale deepscale deep.

Countless real-world systems scale wide: Lakes scale wide. Pizzas scale wide. Traffic jams scale wide. And in software, MapReduces and memcache pools scale wide. When things scale wide, you “just add more of them”: more water, more dough, more cars, more processes.

But some systems scale deep: Cities scale deep. Brains scale deep. And when things scale deep, they don’t just get bigger, they get different. Paris is nothing like a very large village. The brain in your pet goldfish is nothing like the brain in your head.

And when microservice architectures scale, they scale deep.

To make the depth of typical microservices architectures more tangible, here are images taken from typical microservice architecture diagrams (blurred for confidentiality reasons). Even with just a dozen services, there are already 6+ layers of depth!

apm-is-dying-and-thats-okay

Telemetry Got Portable

The most valuable thing about APM had been the agents. They gave us telemetry where before there had been — literally — none. Recently, though, OpenTracing, OpenCensus, and now OpenTelemetry have made that telemetry portable — and free. 

Outdated pricing units are not only ill-suited to analyze deep systems, but they are typically priced per-host or per-container. That is neither the unit of cost (COGS) nor the unit of perceived value. With the container explosion, that’s brutal for customers. 

And perhaps the biggest problem for APM is that deep systems aren’t just bigger than monoliths, they’re different, and products designed for one don’t work for the other.

APM has a lucrative sweet spot; it just doesn’t cover where large-scale systems are headed.

What’s Replacing APM

Historically, most approaches to monitoring or observabilityobservability have almost no way to analyze or represent the elaborate dependencies between services in deep systems. 

They treat metrics and logs (and possibly traces) as loosely coupled products or tools, and fundamentality lack the context required to solve the complex challenges of today’s multi-layered architectures. 

In recent years, metrics- or logging-oriented products have thrown in distributed traces “on the side,” typically as individual data-points that can be inspected manually in a trace visualizer. This blunt, simplistic approach can be effective in identifying some limited number of egregious problems, but complex issues in production are more subtle. 

Lightstep’s approach is unique: We ingest 100% of event data, and aggregate and analyze in order to address specific high-value questions:

  • “What went wrong during this release?” 

  • “Why has performance degraded over the past quarter?” 

  • “Why did my pager just go off?!” 

For instance, one of our customers recently experienced a sudden regression in the performance of a particular backend, deep in their stack. It turns out that the underlying issue was that one of their 100,000 customers changed their traffic pattern by 2000x. This was obvious within seconds after looking at aggregate trace statistics, though they estimated it would have taken days just looking at logs, metrics, or even individual traces on their own. 

This is all possible because Lightstep’s Satellite architecture grants us access to about 100x more data than a conventional SaaS solution at the same (or lower) cost. With so much more data, and colocated storage and compute, we extract more context about deep systems.  This is why we have earned the trust of customerstrust of customers like Lyft, GitHub, Twilio, UnderArmour, and many more.

Check out our documentationdocumentation to learn more about Lightstep, or request a custom demo nowcustom demo now.

December 15, 2019
4 min read
Observability

Share this article

About the author

Ben Sigelman

Ben Sigelman

Read moreRead more

How to Operate Cloud Native Applications at Scale

Jason Bloomberg | May 15, 2023

Intellyx explores the challenges of operating cloud-native applications at scale – in many cases, massive, dynamic scale across geographies and hybrid environments.

Learn moreLearn more

2022 in review

Andrew Gardner | Jan 30, 2023

Andrew Gardner looks back at Lightstep's product evolution and what's in store for 2023.

Learn moreLearn more

The origin of cloud native observability

Jason English | Jan 23, 2023

Almost every company that depends on digital capabilities is betting on cloud native development and observability. Jason English, Principal Analyst at Intellyx, looks at the origins of both and their growing role in operational efficiency.

Learn moreLearn more
THE CLOUD-NATIVE RELIABILITY PLATFORM

Lightstep sounds like a lovely idea

Monitoring and observability for the world’s most reliable systems