Last week, we participated in our first Tech Field Day. We had a great time hosting the esteemed bloggers from literally all over the world. We shared the LightStep story with them and all of the people who watched the livestream. It was truly an opportunity to geek out and show them why we’re so passionate about application performance management for today’s complex distributed systems from monolith to microservices and serverless.
After we finished giving them a tour of our amazing office and highlighting all of our donut-themed conference rooms, we got down to business. First up was Ben Sigelman, CEO and Co-founder at LightStep, who shared his thoughts on why companies are moving away from monoliths to microservices and discussed the complexities that arise with this transition. He’s been thinking about these complexities since his days at Google where he focused on Google’s large scale distributed monitoring solutions including Dapper (a large scale distributed tracing system), and Monarch.
People are moving to microservices because of the promise of increased speed, agility, and scalability. Plus, microservices solve a number of managerial challenges. With microservices architectures, services and the teams that support them are independent actors who are managed on their own and can break free of the logjam created by monolithic architectures and organizational structures. Once you have 100 developers, or even potentially 50, the move away from the monolith to microservice is inevitable. Human communication patterns and Conway’s Law have proven that large teams are extremely inefficient, and organizations end up shipping their org chart. Vijay Gill recently shared in a guest blog post that shipping the org chart is the only good reason to adopt microservices.
Ben went on to explain that while microservices solve managerial issues, they also introduce new dependencies and complexities. The reality of microservices is that people often feel a little out of control and have poor visibility into what’s happening in the system. Application performance management tooling needs to adapt to deal with that complexity which is precisely why LightStep [x]PM and OpenTracing, the distributed tracing standard, were created.
Next up was Dennis Chu, our head of Product Marketing at LightStep. He gave a great demo of LightStep [x]PM which customers use to diagnose problems and pinpoint the root cause of application performance issues in their distributed system/microservices environments. Dennis showed how customers use distributed traces from within [x]PM users to jump directly to the critical path of the transaction and understand the function or microservice that’s causing the latency. He also demonstrated how [x]PM can move beyond simple p99 metrics of latency and use histograms to understand the shape of the latency and get better insight into what’s normal and what might be anomalies. [x]PM doesn’t do any upfront sampling, so teams never miss an outlier or anomalous transaction. Dennis demonstrated that with no limits to cardinality, teams can monitor what matters most to them. Ben chimed in to show how our approach is completely different from the traditional APM way of monitoring systems. LightStep can use any correlation ID to stitch together the flow of the transaction. This enables [x]PM users to really isolate the performance of their biggest customers or a specific release or project.
Spoons, CTO and Co-founder at LightStep, then explained how the [x]PM magic happens. The unique [x]PM satellite architecture enables customers to send 100x more data than they send to any other monitoring solution. He explained what a distributed system is, and joked that distributed is a term computer scientists put in front of things to mean “but harder”.
[x]PM has two primary components: Satellites and the LightStep engine (SaaS). The Satellites run within the customer’s environment or network region. They observe all transactions and communicate continuously with the LightStep engine (SaaS). They can scrub PII or other data before it exits the customer’s environment. The LightStep engine (SaaS) uses statistical models and customer configuration to identify and record detailed time-series data and the end-to-end traces that will be most valuable to the customer.
OpenTracing is one way to get data into the system, but people can use any correlation id including home-grown tracing solutions, logs, etc. There’s also a great blog that goes into more depth on the [x]PM architecture.
Ben then brought us home with some closing commentary on why we need a new rubric for observability. The conventional wisdom has been that metrics, logging, and tracing are “the three pillars” of observability, yet organizations check these boxes and still find themselves grasping at straws during emergencies. The problem is that metrics, logs, and traces are just data – if what we need is a car, all we’re talking about is the fuel. We’ll continue to disappoint ourselves until we reframe observability.
Watch the Tech Field Day videos to get lots more info and hear the Q&A with the bloggers.