This article originally appeared on Medium.
There’s a rule of thumb about startup fundraising: “Raise when you can.”
But that’s not how we think about it. For our recent Series C financing, the idea wasn’t to “raise when we could” — in that case, we would have closed something much earlier. Rather, it was to “raise when we’re ready to use it,” and that’s exactly the situation we find ourselves in: everywhere we look, there are high-ROI projects to bet on, and we should pursue as many as possible.
Thinking back on the past year, it’s remarkable how much has changed. Business is great, of course — we are excited about each and every customer we’ve brought on. Lately, we are working with more traditional enterprises who have embarked on their own microservices adventure and want to use LightStep [x]PM to maintain confidence and control along the way. From a people standpoint, we’ve more than doubled in size, established multiple new departments while keeping our collaborative culture, formalized our values and built practices around them, and tried to lay the foundations — in particular, a company-wide sense of responsibility and autonomy — for more rapid, healthy growth in the years to come.
Our [x]PM product has evolved and continues to lead the industry in terms of microservice APM and observability at scale. If I had to summarize:
- (Not) Sampling: Random sampling hobbles distributed tracing, especially during incident resolution: otherwise you miss the outliers because they are, by definition, rare. This has been obvious to us since we founded LightStep, and indeed we’ve been running in production without upfront sampling since 2015, but it still bears repeating since many of our fellow vendors are now “inventing” this idea. 🙂
- Traces are the fuel, not the car: The most impactful work we’ve done in the past year involves trace aggregates: when we can look at the statistics of trace structures, we can make higher-level statements about our customers’ systems that go well beyond the mysteries of individual transactions.
- Performance is a Shape: High-percentile latency is better than median latency, but neither holds a candle to real-time histograms with no cardinality limitations.
- Snapshots: In my recent KubeCon talk about a new scorecard for observability, I presented what Cindy Sridharan referred to as the “CAP Theorem for Observability”: namely that a positive-ROI observability solution can’t have high throughput, high cardinality, historical context, and unsampled data. This is where Snapshots come in — they give us the fidelity of unsampled data, but in the past; all at scale and without the cardinality limits that cause trouble for traditional time-series statistics. The “big picture” for Snapshots will come into greater focus in 2019 as we deliver insights on top of the core abstraction; suffice it to say that Snapshots will present a more detailed, actionable picture of system behavior than anything else that’s out there.
As a company, LightStep exists to give developers and operators greater confidence as their software scales. What we’ve seen is that scaled-out software begets many developers, many developers beget many small teams, and the presence of many small teams forces an organization to adopt microservices and/or serverless for managerial reasons. Vijay Gill described this in his excellent blog post about the only good reason to adopt microservices. And, sure enough, big enterprises are now running microservices; not in some zero-throughput labs environment, but in production, powering their bread-and-butter applications.
Of course this shift was evident at KubeCon — in fact, it probably originated there! This month at KubeCon North America, my colleague Ted Young organized the first-ever Observability Practitioners Summit, with speakers from academia, open-source observability projects, other great vendors, and in-house practitioners. The talks went beyond “Observability 101” material, delving deep into the details of these new monitoring technologies, visualization strategies, and novel use cases. The slides for all of the talks are available via the link above (and are recommended to any other observability nerds out there). Moreover, the O.P.S. event was packed: two years ago we were still explaining what distributed tracing was, and now the conversation is far more developed and far larger to boot. Ted also gave a great talk during the main KubeCon event about using distributed traces as a way to make “distributed assertions” about the behavior of microservice applications: Trace Driven Testing.
I remember doing internal tech talks at Google twelve years ago, trying to get highly-specialized Google software engineers (who develop scaled-out distributed systems for a living) to care at all about my Dapper project and distributed tracing in general. Frankly, it was a bit like pushing a car uphill — at the time, the concepts were simply a bit too new. Jump to 2018, where Lew Cirne, New Relic’s founding CEO, is talking about distributed tracing by name during NEWR’s quarterly earnings call. What planet are we living on here? I’m not sure, but it’s a lot of fun.
In closing, what could be better than a growing company — built around a remarkable team of wonderful people with strong shared values — creating a novel product that’s leading the industry into a dynamic and rapidly-growing market? There’s a lot to be excited about, and that’s why we raised when we did: not because we needed to, not because we can, but because we know what to build with it, and we want to build it faster. We can’t wait for 2019.