With today’s launch, we’re excited to speak more openly about what we’ve been up to here at Lightstep. As a company, we focus on delivering deep insights about every aspect of high-stakes production software. With our first product, Lightstep, we identify and troubleshoot the most impactful performance and reliability issues. This post is about how we got here and why we’re so excited.
I started thinking about this problem in 2004. It began during an impromptu conversation I had with Sharon Perl, a brilliant research scientist who came to Google in the early days. She was mainly working on an object store (à la S3) at the time but also had a few prototype side projects. We talked through five of them, I believe, but one captured my attention in particular: Dapper.
Dapper circa 2004 was not fully baked, though the idea was magical to me: Google was operating thousands of independently-scalable services (they’d be called “microservices” today), and Dapper was able to automatically follow user requests across service boundaries, giving developers and operators a clear picture of why some requests were slow and others ended with an error message. I was so enamored of the idea that I dropped what I was doing at the time, adopted the (orphaned) Dapper prototype, and built a team to get something production-ready deployed across 100% of Google’s services. What we built was (and is still) essential for long-term performance analysis, but in order to contend with the scale of the systems being monitored, Dapper only centrally recorded 0.01% of the performance data; this meant that it was challenging to apply to certain use cases, such as real-time incident response (i.e., “most firefighting”).
Ten years later, Ben Cronin, Spoons (Daniel Spoonhower), and I co-founded Lightstep. Enterprises are in the midst of an architectural transformation, and the systems our customers and prospects build look a lot like the ones I grew up with at Google. We visit with enterprise engineering and ops leaders frequently, and what we see are businesses that live (or die) by their software, yet often struggle to stay in control of it given the overwhelming scale and complexity of their own systems.
We built Lightstep to help with this, and we started with Lightstep to focus on performance and reliability in particular. Our platform is not a reimplementation of Dapper, but an evolution and a broadening of its value prop: with Lightstep’s unconventional architecture, we can analyze 100.0% of transaction data rather than 0.01% of it like we did with Dapper. This unique – and technically sophisticated – approach gives our customers the freedom to focus on the performance issues that are most vital to their business and jump to the root cause with detailed end-to-end traces, all in real-time.
For instance, Lyft sends us a vast amount of data – Lightstep analyzes 100,000,000,000 microservice calls every day. At first glance, that data is all noise and no signal: overwhelming and uncorrelated. Yet by considering the entirety of it, Lightstep can measure how performance affects different aspects of Lyft’s business, then explain issues and anomalies using end-to-end traces that extend from their mobile apps to the bottom of their microservices stack. The story is similar for Twilio, GitHub, Yext, DigitalOcean, and the rest of our customers: they run Lightstep 100% of the time, in production, and use it to answer pressing questions about the behavior of their own complex software.
The credit for what Lightstep has accomplished goes to our team. We value technical skill and motivation, of course; that said, we also value emotional sensitivity, situational awareness, and the ability to prioritize and leverage our limited resources. Lightstep will continue to innovate and grow well into the future, and the people here and their relationships with our inspiring customers are the reason why. The company has also benefited in innumerable ways from early investors Aileen Lee and Michael Dearing, the staff at Heavybit, and of course our board members from Redpoint and Sequoia, Satish Dharmaraj and Aaref Hilaly. Our board brings deep company-building experience as well as a humility and humor that we don’t take for granted.
It’s no secret that software is getting more powerful every day. As it does, it becomes more complex. Lightstep exists in order to decipher that complexity, and ultimately to deliver insights and information that let our customers get back to innovating. Nothing gets us more excited than the success stories we hear from our customers. As we continue to build towards our larger vision, we look forward to hearing many more.
Interested in joining our team? See our open positions herehere.
November 12, 2017
4 min read
About the author
Ben SigelmanRead moreRead more
Explore more articles
Strengthening our commitment to the OpenTelemetry projectCarter Socha | Apr 20, 2023
Lightstep is the first company to natively provide customers with complete control of their telemetry pipeline which saves time and money, and provides the freedom to innovate at scale. By embracing OpenTelemetry support without vendor lock-in, Lightstep helps you make complex app development easier and faster.Learn moreLearn more
Transform ServiceNow workflows with Service Graph Connector for Observability - LightstepAndrew Gardner | Dec 20, 2022
The Service Graph Connector for Observability - Lightstep is the bridge between IT Operations and DevOps teams. When combined with ITOM Visibility, it provides organizations with a complete, end-to-end view of their entire cloud estate.Learn moreLearn more
Evolving our incident response strategyLightstep | Nov 2, 2022
Lightstep’s Incident Response offering will be sunset effective January 31, 2023. Current customers may continue to use the service until then. Lightstep Observability will not be affected.Learn moreLearn more
Lightstep sounds like a lovely idea
Monitoring and observability for the world’s most reliable systems