Lightstep from ServiceNow Logo

Products

Solutions

Documentation

Resources

Lightstep from ServiceNow Logo
< all blogs

The Think Tank: The Future of Cloud-Native Observability

Observability is one of the hottest, and most overloaded, topics of the day. What does it really mean, and who is it really for? In this episode of The Think Tank, I brought together two preeminent voices in the space -- Ben Sigleman, founder of Lightstep, and Dr. KellyAnn Fitzpatrick of Redmonk, to discuss what’s new and next for the observability world.

You can read the full transcript below, but here’s the highlights --

  • Observability is an overloaded concept, but fundamentally, it’s about how you can understand change, and forecast performance, in a production software system.

  • Change is inevitable, and it’s happening faster than ever -- quality observability is a must for every team. As you scale to hundreds or thousands of deployments a day, it’s a real requirement.

  • The three biggest challenges in adoption of observability are data quality, the ROI of telemetry creation and storage, and a lack of clarity about how observability ties into business goals.

  • Solutions to these challenges include OpenTelemetry, sampling, and instruments such as SLOs which can tie observability efforts to specific business KPIs.

  • Central observability teams can greatly aid in the adoption of observability; Making it into a problem for every individual can lead to inconsistent results.

  • OpenTelemetry and improved developer tools are two of the most exciting future developments in this space.

If you’d like to listen to the interview in full, you can catch a recording here.

--

The following transcript has been lightly edited for clarity.

Austin Parker: Welcome to The Think Tank. My idea here is really, there's a lot of different ways to talk about what's going on right now in sort of the world of cloud and cloud native, right? I think we've actually lost out on a bit by losing that hallway track experience at different conferences. And people, you'd see someone and then six months later, you'd see them again, or a year later, you'd see him again. You'd talk and say, "Hey, what's you working on? What's going on?" That kind of active sharing and listening, that meant a lot to me personally. And obviously with the pandemic having gone the way it's gone, those events are kind of few and far between. So I wanted to get together with some people I know, some people I don't know and ask them some questions and see if we can all find some new understanding. So today our first topic or our first show iis going to be about the future of cloud native observability. And to talk to me about this, I have two excellent human beings. Ben Sigelman, CEO Lightstep. I actually don't know what your title is now officially, after the acquisition.

Ben Sigelman: That's a good question. I;ll just say co-founder of Lightstep and then I usually make a joke about how I try to be the CEO, but ServiceNow wouldn't let me do it anymore. So, general manager, it doesn't matter.

Austin Parker: Yeah. Co-founder of Lightstep, formerly of Google, helped create Dapper, did a lot of cool observability through tracing stuff. And also joining us is Dr.KellyAnn Fitzpatrick from RedMonk.

KellyAnn Fitzpatrick: Hey all.

Austin Parker: So, yeah.

KellyAnn Fitzpatrick: And I guess, I should be like, okay, what RedMonk is real quick, industry analyst firm. Basically we follow tech trends and we specifically care about that kind of tech landscape from their perspective of practitioners like developers.

Austin Parker: Fantastic. Yeah. RedMonk, great people, love them all. One of these days, Barktoberfest we'll ride again and we will all go have the beer together, and it'll be fantastic. Hopefully this year.

KellyAnn Fitzpatrick: Hopefully.

Austin Parker: Knock on wood, everyone. With our introductions out of the way, let's get into it. So just for everyone's reference, for how I want to do this, at least, for the first little bit of this, I just kind of want to have some prepared questions that you two, the three of us can talk about. And then once we get done with those, I'll open the floor up. And if people have questions or comments, or want to add something, then we'll do that then. So take notes, I guess, if you want to come back in later. Let's start off with the obvious one. What is observability to you? I was actually doing research on this, this morning.

Austin Parker: If you look over kind of like a five or 10 years in the world of developer tooling, APM monitoring, whatever, observability has kind of gone from this very niche sort of topic to seems like you throw a rock now and you're going to hit an observability startup, right? Or you're going to hit a very large company that has rebranded to become an observability company. So I'm wondering, is there a single useful general definition of observability that we can all agree on? Or are we a little too far into the hype cycle for that now Ben, Kelly, whichever one of you wants to start.

Ben Sigelman: Dr. Kelly, you want to go first?

KellyAnn Fitzpatrick: You are too kind. And I think my perspective on this is I as an analyst I'm not kind of sit sitting within the viewpoint of a specific vendor or specific a technology, which for me makes us an even more, I think, open question, because I don't have a perspective of observability that I'm trying to sell or one that I've helped even kind of found, which, I think, some folks have. And I think there's a couple different ways to go from it is, I always like going back to the kind of control theory definition of observability as kind of understanding a system from its outputs. Because I think that's one that gives us a lot of flexibility and can give us a lot of flexibility as the observability landscape and the tooling, kind of within it and what it's capable about, like it expands from here.

KellyAnn Fitzpatrick: But I think it's also useful to think of observability in terms of negative definitions. So you have folks who are very adamant that observability can and should not be defined by this three pillars model of like logs, traces, and metrics. And I think that's an interesting way to look at as well, because clearly there are people who do like to define it in that way for various reasons. So, yeah, I think I'll pause there, because I feel like Ben you have a lot to add to this.

Ben Sigelman: Oh, I don't know about that part, but the adding part. But, yeah, I mean, I definitely think about this a lot. Austin your question mentioned that this is so far along the hype cycle that is difficult. I mean, that's absolutely the case. Unfortunately, if I had to step back and talk about observability really broadly, not just talking about production software or cloud native software, the control theory definition from the 60s is the most academically accurate thing, I guess we can go back to, but maybe one way to explain it would be that observability is, trying to understand something without changing it. Right?

Ben Sigelman: So it used to be the case that you would attach a debugger to a process running on your own box. And that provided a really high level observability because you could inspect the stack and even stop the process, and look around, and stuff like that. Right? And now we're trying to do that for production software. And it's hard, right? Now in terms of a more practical definition. I mean, I guess the simplest way I'd put it is that observability is about understanding changes and forecasting changes in production software. That's really what it comes down to, and doing so without becoming an expert on how the tooling is built and just being able to follow a guided path. That's what observability should be. But it's absolutely the case that a number of really large vendors have acquired their way into a "Platform play," and then they call that observability. And if you have enough people doing that with different definitions, it gets pretty muddled. So, yeah, it's a problem.

KellyAnn Fitzpatrick: And just jumping in, I love that. And just thinking of it as from the kind of Clayton Christensen jobs to be done perspective that, what is it that devs and SREs need to do their jobs, and to be able to make changes, and figure out what's going on, because of those changes. Because as software becomes more complex, there are more changes to be made during every cycle.

Austin Parker: Yeah. I think that's both really good points. One thing, I want to circle back on is, we're talking about complexity, right? And I think the past couple of years have shown... If we just pick a point in time, right? Let's arbitrary point in time, say March, 2020, right? Where, we had this sudden very large change in how we work. And because of that change, because we had this shock to not only to the economy, not only to the world, but a pretty fundamental shock in how a lot of us worked and how knowledge was transferred. Right? Because I think, there's still a model that most people I've talked to and most software team, most development teams, you still kind of, you have this tribal knowledge, right?

Austin Parker: You've got, something breaks and maybe you've codified all that into a runbook, maybe you haven't. And so you're going, and you're saying, "Ah, who knows what broke? Who knows how to fix this?" How many times have you said, "Something happened and who's seen this before?" Because we always have to go back to those people in our organizations, those kind of individuals that have shepherded these complex systems from wherever they were to wherever they are now. So observability, I think, has been kind of a part of that... Or maybe hasn't been as much of a part of that transition as it should be, as we've kind of moved towards this much more distributed world. But, I think that maybe that also helps explain why we see things getting more complex. We see the nature of how we're actually interacting with these systems.

Austin Parker: Like you said, Ben, it used to be, if you wanted to know what was going on in prod, you would need to attach a debugger and so on and so forth, you would need to halt production in some way, or you would need to kind of set up a production like environment that you could replicate this on. So, I think that's why we're seeing those rebrandings. I also think why you're seeing people that maybe traditionally... It's a live debugger. So now it's an observability thing. It's a network inspection thing. So now it's an observability thing. Right? What does that mean for the future of both the observability space, but also maybe dev tooling in general, that observability is kind of what... It's like a foam, right? It's filling these cracks in the dev tool landscape.

KellyAnn Fitzpatrick: And I can defer to Ben on this one, because I feel like I've got to go first last time.

Ben Sigelman: It's a real privilege to go second though, because then I have more time to think. I don't know, Austin, you packed a lot of stuff in there. I'm not going to try and address all of it. But I would say I actually don't really, frankly, think the pandemic has a whole lot to do with the trend to rebrand everything that's observability. I think part of it is just, I don't know, buyer behavior and it's seeming like a new cool thing, but, I think the trend towards observability really began more with the trend towards distributing teams and distributing software developed, and paralleling software development. Maybe the pandemic helped that along a little bit, but I think the main driver for the move to let's call it cloud native technology was about increasing release velocity. That's ultimately from a business standpoint, the reason why cloud native makes sense is that you can take, 1,000 people who are developing software and split them into smaller teams that can operate with some independence.

Ben Sigelman: I mean, limited independence, but some independence and that increases velocity. And then the natural side effect of the increase in velocity is that you go from deploying once every six months or once every month, or even once every two weeks to deploying 1,000s of times per day, if you're talking about individual CICD deployments in some organizations. And that change rate, especially given that the changes can appear anywhere in the stack, creates so many opportunities for unintended side effects and that's when observability comes into play. So in my mind, the pandemic, I guess it might have hastened the move to distributed teams because, they're physically distributed and communication wise more distributed. So maybe that was a bump towards cloud and cloud native. But I think the cloud native and distributed teams trend is really what created the need for what I would call troops or mobility, like understanding change across team boundaries and system boundaries.

Ben Sigelman: And then because there's so much money racing into that category, you do see, I think the foam analogy is good, Austin. You see a lot of other tooling that feels a bit niche, is trying to rebrand itself as observability because then you become part of this larger category, which improves access to funding and things like that. But, I mean, that's a bit cynical, but that's how I see it.

Austin Parker: Yeah. I think you actually got to what I was trying to say with the pandemic stuff about, hey, everyone is distributed now, right? Maybe it feels a little bit different depending on, how work at cool San Francisco startup that was, very heavily remote or remote friendly, but a lot of people that moved into a remote work situation weren't necessarily not the same boat. But, yeah, you got what I meant. Kelly, do you have anything to add on this?

KellyAnn Fitzpatrick: I always have things to add.

Austin Parker: Excellent.

KellyAnn Fitzpatrick: And I also really like the kind of the foam analogy, the like this is the spaces in between all these things, because I think that what we are seeing with observability and the kind of collapse of what used to be these kind of distinct kind of categories into this kind of observability bucket is something that we're seeing in tech, in other spaces as well. The kind of, we have so many pieces is like little niche tooling for different things that, at some point putting them together in a larger bucket, which I think then as you put it that it's something that can then be understandable and then kind of marketed in different ways, is... I think that's a large part of a larger trend.

KellyAnn Fitzpatrick: For observabilities kind of specifically, I agree that the pandemic is probably not the only thing that affected this at all, but it certainly has evolved an appetite for kind of adopting cloud native kind of architectures and way of doing things from folks who before may not have been ready to do so. But we've heard stories of companies that had absolutely no mechanism for just going about their day-to-day kind of business from, without having people there on site and folks like that had to adapt very quickly. And the stories of people having to do that, and the amount of mind share that they moved to kind of distributed work, and even this kind of option of different types of architectures kind of happening at the same time. That has created opportunities for technology buckets like observability to end up, I think being something that people are willing to pay for where they would not have been willing to do so before.

Austin Parker: Yeah. Great points. So let's start, let's kind of dive into this a little bit. One thing that I've been thinking a lot about is what the pain points are, not so much the pain points that lead you to adopt observability practice or start to. Like Ben said, you want to know what's changed, right? You need to know what's changed. So you kind of have these table stakes of like, well, I know there's this problem I have, and I know that I need the answer to solve it. But when it comes to the actual implementation, you start putting the rubber of the road so to speak, what do you see as like the... Let's maybe pick one or two things that hold observability back at that individual level and also at a leadership level. And maybe they're the same thing, I don't know. But where are the real road blockers in your mind right now?

Ben Sigelman: Well, I guess I'll go first. I'd like to restrict the question to observability for applications that are starting to distribute the work across multiple services and teams. Right? So the cloud native applications.

Austin Parker: Yeah.

Ben Sigelman: Otherwise it's just too broad of a question, I think for me to take a crack at it. But I guess there's a lot of discussion out there and I'm probably part of the problem, not the solution here of very highfalutin, sophisticated aspects of observability and best practices and so on, and so forth. But when you go and actually look at what's happening on a team by team basis, there's still a struggle, I think, just to get an adequate signal quality out of the data. There's like a lot of really poor data that's coming in from sources that are manifold. And I mean, this is like a straight up lead in to the OpenTelemetry pitch, which is not a vendor pitch, but it's super, super important.

Ben Sigelman: I think having high quality data to sort of ubiquitously available is still holding people back quite a bit. A lot of the really sophisticated tooling that's out there works much better if the data is high quality. So I would say that's the first thing and I think, moving towards OpenTelemetry as a standard, both for end users, as well as vendors and other people in the ecosystem cloud providers, et cetera, is a really, really important aspect of improving observability for everyone. The other thing that I would say, we're talking more about persona that's responsible for thinking about ROIs. So not just about features, but whether the tooling is worth it. There's a little bit of a dirty secret right now, I think in cloud native observability, particularly around just the amount of data that's ever used for any purpose, like if it's ever queried, we did the study at Google.

Ben Sigelman: We looked at, we instrumented the query path and something like 5% of the data that's collected was ever queried by a human being for any reason in the observability stack. And then the other 95% of it is really expensive, but kind of provably useless. Now it not an easy thing to necessarily predict exactly which five percent is going to be used, but it's not that hard actually to find the 50 or 70% that's never going to be used. And that's been something that I think incumbents have not been very eager to solve that problem because that's also their revenue stream, but there's a bit of a glaring issue, I think around ROI, particularly around high cardinality metrics and conventional log data that just doesn't scale that well in cloud native, from a utility standpoint. And, I think that's really holding back the practice because it creates a trust issue with the tooling and then that into artificial limitations on the way that the tooling is adopted and used in order to try and manage that cost in a sort of inefficient way. So that's my quick answer for you guys.

KellyAnn Fitzpatrick: I mean, I can't disagree with any answer around visibility that's kind pointing to data and the kind of access to data. And we're definitely seeing companies like Observe Inc for instance, which is built on Snowflake. And they're kind of like, all right, here is you can store this giant metric ton of data at some kind of like readable cross. And you get the argument of again, sampling, right? So you have companies I think are actually looking at observability from that kind of data perspective.

KellyAnn Fitzpatrick: But, I think there are also for individual developers who are not already into that kind of observability world and don't even kind of know where to start. I think there are perhaps still some other blockers in terms of individual developers dealing with kind of silos in their own organizations, right? In being the person who is trying to introduce kind of observability processes and practices in there. I feel that because we are... There's no tech without kind of like social systems around it as well. And it's like to be the person who's like, "I want to bring this in there." And suddenly to have the burden of being the person who is responsible for this stuff working, to then somehow also potentially be the person who is inheriting responsibility for those systems working, for being the person who's there when outages are kind of there.

KellyAnn Fitzpatrick: I feel like there's a very complicated, kind of ecosystem is the wrong word, but it goes beyond, I think one little point. And I think we could very easily talk about this one question for the rest of the entire hour as well.

Ben Sigelman: I would agree.

Austin Parker: Yeah.

Ben Sigelman: One thing I would say, add on to that Kelly is that... And I shouldn't name a specific customer, but I was talking with a giant, giant financial institution, right? Like Fortune 10 kind of thing or whatever. And they're trying to they're... I was talking to someone whose job is basically to try and Institute modern observability practices across this organization, which is pretty tough. Right? Because the state there goes back to mainframes and then also includes totally serverless Kubernetes stuff, whatever. So it's like the full spectrum. And it's hard to do that. Right? I think it's really hard to take something that's kind of, as we were saying earlier in this conversation, it's pretty mushy like observability doesn't have a very clean definition. And then this person decided to say, "Well, actually I'm just going to go and try, and develop a... I'm going to make this narrower and make this about SLOs."

Ben Sigelman: Okay, great. So SLOs, it's tighter, it's a service level objectives. It's a little bit less mushy in observability, and it feels like that's you can define what that is and there's a project there. However, that also ran the trouble because it wasn't a priority for anyone except for this person, to actually make SLOs something that cuts across the various functions and roles of the organization. And then there's a third attempt. And this one actually worked, which is interesting, which was to say, "Well, I'm not going to talk about observability, although that's what I'm doing. I'm not going to talk about SLOs, although that's what I'm doing. I'm going to go to this critical line of business application and say, you have a problem in that you're trying to release software quickly, but you are creating a reliability issue and you can't decide, what's an acceptable amount of risk, in terms of software release velocity. I'm going to solve that problem for you and increase your release velocity for the PM organization while maintaining acceptable reliability for the engineering organization and built the deck around that." And that actually worked, right?

Ben Sigelman: Which of course is the whole point of SLOs. But it was leveled up to something that made more sense to the business at the altitude of the business and the leadership. And I think that actually was somewhat effective. And a lot of times when we're talking about observability practices or even some aspect of observability, like SLOs, those of us who spend all of our time thinking about this, whether it were people like me or you Kelly, studying this from a vendor or an analyst perspective, or if you are like the champion for observability at your organization and are willing to spend your lunch hour or whatever, joining this Twitter space or something like that, it's really important to remember that most people don't really want to think that hard about this stuff.

Ben Sigelman: And it's much easier if you can translate things to their language, which is like "We're releasing software too slowly, or our software is incredibly unreliable or both." And then they were able to get some traction, but those sorts of imperatives are much easier for people to relate to. So, I think that's really the point I'm trying to make is just, I think the need to always frame things in terms of velocity or reliability, or both, or whatever.

KellyAnn Fitzpatrick: Yeah. And that's a great point and it gets to the other part of Austin's question. So we started out talking about the individual developer level, like what it-

Ben Sigelman: You're right.

KellyAnn Fitzpatrick: ... Right? And then tying into what maybe holding observability practice back at the leadership level. And I think your point then is that they are related in ways that it can be really hard to kind of, to sever.

Ben Sigelman: Yeah. I remember when I was working on Monarch, which was Google's multi-tenant metrics infrastructure, basically, I was like way, way, way down in the weeds of figuring out how this was going to work. And had designed this, I won't go into the details of it just in the interest of time, but designed this pretty elaborate, but in my mind incredibly elegant way to define in schemas for the monitoring of these large distributed systems at Google. And I was pretty proud of it. The only catch was that it was really complicated and you had to read this lengthy document to understand all the nouns and verbs and so on and so forth. And I was pitching and pitching, and pitching it. And then this person Manoj Plakal was incredibly smart, sort of took me aside.

Ben Sigelman: And I remember what he said. He was like, "People don't want to be monitoring astronauts, they don't want to understand this. They just want to check the box and move on." Which is true. It's absolutely true. And, I think that those of us who are kind of on the bleeding edge of observability, it's important to remember that most people who are consuming observability don't want to be observability astronauts, they just want to get their job done. And I think that that's a very difficult and painful thing to remember, but also critical for success and getting these initiatives to gain some traction in a brown field environment, in a real application.

KellyAnn Fitzpatrick: Yeah, absolutely. And to the earlier part of our discussion about, observability is becoming this in kind of category where other things have kind of collapsed upon in it, but one that has more recognition, that's actually a good thing for anyone who is trying to translate these kind of granular developer level concerns and idea of what observability can be to the kind of larger business needs of an organization, in order to get support for it.

Austin Parker: So on that topic, let's keep talking about the enterprise for a second. I, as personally, like in a previous life, I was the guy that brought in monitoring to a system that didn't have any, it was complex, it broke a lot. There was a ton of confusion about like, well, why did it break this time? And we had very poor observability into our system. And I got a hair at my butt because I got tired of waking up at 3:00 AM to deal with this. And I said, "I'm going to bring in a tool and I'm going to set up some dashboards, and you can go look at all the SQL metrics you want." And it worked out very well except that, people didn't actually use it. And that was a problem, obviously. So that's one, that's kind of like the observability champion at a low level coming in, bringing these ideas in and then not panning out. In your experience or just in your opinion, is this something that is solved by having kind of that executive buy-in to begin with?

Austin Parker: I guess what I'm really trying to say is, where does observability sit in the enterprise? Is this something that, we should have this at the individual dev team level? Is this something that we should have an existing platform team take ownership of, should there be a separate observability team, right. Kelly? Do you want to go first?

KellyAnn Fitzpatrick: Yeah, sure. And I love the narrative you told. It is a familiar one me as well. And, I think it very much speaks to the idea that when you're talking about software developers, it is very difficult to kind of walk in there and tell them what to use. And in fact, one of my colleagues and one of the RedMonk co-founders like Steve O'Grady wrote an entire book called The New Kingmakers about kind of developer led adoption and how things like open source and like self-service kind of services we did so that developers actually have a lot more influence on the stuff that they actually use, partly because they will go and just use things without telling anybody higher up that that's what they're using. And getting to the point where all you have to do is convince your team to use this and just see how long you can kind of get away with it.

KellyAnn Fitzpatrick: And, I think what you're talking about is the flip side of that in that you... Just because technology or a tool, or something that could be very useful is there does not mean that it is going to kind of get used. And part of that can be because of education, right? You set up a system of... But at that point you may have been the only person who knew how it worked or what it could you, but also the idea of whose job is it to take on those responsibilities. And what we are absolutely seeing is that developers like your typical even just app developer is expected to do so much more than just write code. And the pressure for them to kind of take on more is increasing. So you see things shift left and security and shift left in say like testing.

KellyAnn Fitzpatrick: And then if something like dealing... Being the person who's dealing with observability tooling, who decides where that lands? Is every single developer now supposed to have a base education in observability, in what it means and how it can use, and how it can benefit them in order to be a developer? I don't think we are anywhere close to that, at all. And what we tend to see or what I've seen with people, the folks and the organizations that I talked to is, sometimes you do have that observability kind of champion where it is that type of single person, but often it does fall to specific teams. And the names of those teams are changing as well. So it's not just ops teams, sometimes you have like platform teams or even DevOps teams or developer experience teams.

KellyAnn Fitzpatrick: And, I think that there's something to be said about having some type of idea or method by which observability can have this base of knowledge within an organization or within parts of organizations so that it doesn't become one of those things where nobody is doing it because as everybody should be doing it or it's somebody else's job.

Ben Sigelman: I agree about almost everything you said. I would add to it with a couple of critical observations about observability in particular. So if we're talking about observability of an application that has many teams involved, the fact of the matter is that these teams are interconnected in pretty fundamental ways, because they all work on a single application. And that the kind of corollary there is that observability solutions, no matter how they're constructed, need to be coherent across team boundaries. And that says a lot about the way that, I mean to your question, Austin, I think says a lot about how things have to be structured. I think with OpenTelemetry you've shown that it is possible, I don't know how, it's not easy and it definitely involves a ton of effort and coordination, but it is possible to get the participants in the ecosystem to actually collaborate and to find conventions or standards, if you want to use the S word, conventions for how data is going to arrive in the tool, which is honestly the easiest part.

Ben Sigelman: But there are really three layers. There is the telemetry itself, there is the durable store to that telemetry and then there's all the workflows built on top of that. And to actually have a coherent observability architecture at an organization, that experience at the workflow level, ideally is pretty unified and is consistent across team boundaries. And I do not know how to do that in a completely, truly independent dev driven way, until things are so commoditized that you could have something, some set of rigorous standards for how the different pieces of observability fit together. But we haven't even really defined what those are, we're still in the second inning or something.

Ben Sigelman: I don't remember how many innings are in baseball, but definitely a lot more than two, right? So we've got a long ways to go. And that basically forces the issue of having a platform level decision about observability. Now, whether that should be an observability team or a platform engineering team, or something else is, somewhat TB. I think the hallmark of when it's done well, is when that team shares objectives with people who are trying to think about product. I will say, and I'll stop talking a second that, I actually get a bit irritated about how much Google's looked at as an example of how SRE should be done. Having worked at Google for nine years, I was actually very frustrated about how SRE was run there in particular with no fault to any of the individuals involved. The entire SRE organization reported into a completely separate VP who did not share most of the management chain with product engineering or infrastructure engineering.

Ben Sigelman: And so there was not a healthy tension within SRE, balancing off product velocity and reliability. It was just reliability all the time. And that tension was only resolved in the cases of a conflict by going way, way, way, way, way up the org chart. I would say regardless of the reporting structure, it's important that the crosscutting team that thinks about observability and other aspects of platform needs to really share goals with the product team, where you can get into this world like we were saying earlier, where there's kind of like a purity about the solution, that's just impractical and can actually result in deadlock, I think, for the organization. So my vote is, yes, central team for observability, regardless of what you call it. And also that team needs to share objectives with and collaborate in a really coherent solution minded way with the product engineering organization.

Austin Parker: Cool. Well, since you both kind of got us started on this idea of standards, let's talk about some standards. Some not actually officially standards in the, I can only-

Ben Sigelman: Conventions.

Austin Parker: ... [crosstalk 00:39:36]. Yeah. Some conventions, shall we say. Of course, I'm referring to OpenTelemetry for those that don't know. Many of us on this space, I believe have a several vested interests or many hours sunk into this project, but it is a CNCF, Cloud Native Computing Foundation project seeking to build effectively an open standard for the creation representation of software telemetry, things like update or things like application and infrastructure metrics, traces and logs. So otel, which is the short version of it has been pretty exciting, in terms of the amount of adoption we're seeing both from kind of a vendor side and also an end user side. What do you see as maybe the three big things that otel is either already done or do you think it's going to do over the next, let's say nine to 12 months to push this area forward? Start with Ben since he's unmuted still.

Ben Sigelman: Sure. Yeah. And I'm kind of biased on otel because I've spend a ton of time on it myself, but for me, I think what I'm most excited about this coming up soon, OpenTelemetry has suffered a little bit from just the amount of interest in the project. I think that we were surprised by how much momentum we got from a committer standpoint and now have over 3000 committers or something, but we don't really have the project management apparatus that something like Kubernetes had from earlier age, but we're starting to get some of the governance pieces in place that will, I think, make the project more paralyzable and scalable. It's a bit of a meta benefit, but I think will allow us to hit roadmaps, that we have more consistently. So that's a meta thing, but I will say from the inside out, I'm excited about that.

Ben Sigelman: And what I think what that will turn into is from a delivery standpoint is that, I think we've already done a good job of [inaudible 00:41:52], the tracing piece of this and open tracing, which is the predecessor project for OpenTelemetry was finally formally deprecated and archived actually just this week. So it's really good to see that that has happened. And then metrics, which, everyone knows that those are... We've been working pretty hard with the Prometheus community to make OpenTelemetry a viable kind of co-participant in the metrics ecosystem. And to be compatible with Prometheus deployments and also bring in the benefits of having a multi-signal approach to telemetry. I think that's going to become a reality in the next couple of months.

Ben Sigelman: And I think will be a really good thing, because tracing was greenfield, but it's also a little bit of nice to have for some organizations, metrics like is an absolute must have. And I think having OpenTelemetry offering its kind of pretty significant value proposition of having ubiquitous high quality telemetry in the metrics ecosystem is going to be a really big thing for the way the whole landscape fits together. Particularly since you won't have... You'll never need to select a vendor because they're integration surface, again. You can just choose a vendor based on what they offer from a functionality standpoint and let you know open standards take the place of the integration page. So I think that's a great thing for metrics.

KellyAnn Fitzpatrick: And, of course, I kind of insisted that Ben go first without even with saying, because I feel like you have much more insight into like workings of OpenTelemetry project than I do. For me the one thing that I will say is OpenTelemetry comes up in increasingly more conversations. So going from the point where, I think I had only first heard of not just OpenTelemetry, but OpenTracing or OpenCensus only a couple years ago. And now it's something where, it's not just on my radar, it's on the radar of vendors and perspective kind of tech companies that could come to talk to us.

KellyAnn Fitzpatrick: What I find fascinating is how companies are using the relationship with OpenTelemetry in different ways, first because it is a project, it becomes something that folks can point to and be like, "Look, I'm contributing to this kind of larger community and world of observability, because our organization is helping you contribute to this project." Which in the small part, I think contributes to the of, I think contributions that maybe that you were kind of speaking about Ben, which then leads into, okay, now how do we actually kind of manage all of this? But then also even companies who have not necessarily embraced that we're all in on OpenTelemetry, it becomes something that they have to address if they are framing themselves as an observability company.

KellyAnn Fitzpatrick: So it's very difficult for any vendor to talk about an observability offering without at some point addressing their relationship and understanding to OpenTelemetry. And I think that that can't be a bad thing.

Austin Parker: I mean, I certainly it's an interesting almost prisoners dilemma, I guess, from the observability vendor side. Right? Where, I think at first there was a lot of concern that, are people going to get behind this? And I'm pleasantly surprised at how many people have, maybe it's due to the fear that like, oh, we don't want to be missing out or the cynics take on this, right? Is that everyone sort of realized like, Ben, you brought this up earlier, where instrumentation is a requirement. You have to have code that instruments your code. And everyone kind of had their commodify, everyone had their solutions for this. Everyone had their agents or whatever they would drop in. And that was all pretty commodified, otel means that maybe we can actually that out of these specific implementations and upstream it into the libraries and the products that people are using.

Austin Parker: And, I think you're starting to see that surprisingly, not just in third party code, like Kafka or MySQL, or whatever, right? There's been some stuff I've heard about them integrating otel into their clients or into the actual servers, but also you're seeing it companies like AWS are emitting otel metrics and, or at least allowing otel as kind of a translation layer to admit internal metrics or internal prices through. So that's pretty interesting. And we'll see how that all shakes out in the future. I want to move on to kind of one last question for both of you and then we can switch into sort of an open floor. First off, thank you both, you've been fantastic guests. I think we've had a really great conversation. And I just want to know what's the next big thing you're excited about? Right? We're talking about the future of cloud native observability. So what's the next big thing for cloud native observability? Ben, do you want to start?

Ben Sigelman: Yeah, sure. I mean, I think that next, of course there's a time horizon that we're talking about there, but I would say we are nearing the point where OpenTelemetry has crossed some critical threshold of ubiquity. And that's going to be an amazing, amazing thing for the space in general. Because I think once we have like high quality data everywhere, then the focus, I think for observability will shift way up the stack in terms of what observability can provide and focus more on unification of the data. Right now, unfortunately I think there's this pitch that vendors are making that if you are able to just centralize metrics and traces, and logs into one vendor, then you've won.

Ben Sigelman: It's absolutely not sufficient. I mean, it is necessary, but it's not sufficient. And I think what we're going to find is a much clearer articulation of what it means to do that thoughtfully and to actually join that data to benefit the end users who are trying to use this tooling to do something, right? Because that conversation right now is still very mushy and markety. And it's not based on anything that you can verify or even really try in many cases. So I think what we're going to see is a push towards unification at the data layer and the actual workflow layer for SREs and developers who are on call, to take advantage of a much higher quality signal that's coming in from, at the ground floor, from OpenTelemetry.

KellyAnn Fitzpatrick: Okay. Going second is great, because you have more time to think, but then you're like Ben says such a great thing. You're like, how do you follow that in any way, shape or form. But I'm going to take a slightly different kind of angle on it. And, I think for what I'm excited about next for observability is, kind of like twofold. A, what are some of the evolutions of tooling that we get, that helps kind of improve that and make the kind of experience of working with observability just better? So, for instance, there's a company called, Lightrun that is very kind of developer first, kind of company. And it's about working with you in the IDE and being able to kind of change instrumentation from there without bringing down pod.

KellyAnn Fitzpatrick: And I think things like that are a really good sign for where observability can go from that kind of developer perspective. But then I'm also just really excited to see what else improved observability enables. And from this we're thinking of things like there's a term called progressive delivery that we talk about at RedMonk. One of my colleagues, James Governor has really kind of helped coin that term. And it's felt like the evolution of continuous delivery, that takes advantage of things like canarying and feature flags and A/B testing and these types of advances in the way that software is delivered that are taking advantage of these very kind of like cloud native technologies. And observability is one of this things that, I think is driving something like that in terms of how people are building and delivering, and experiencing software.

Austin Parker: Cool, great answers, great responses both of you. Personally, I'm also, I'm just excited for what's next in general. I think we're at the, like Ben said otel it's so close you can taste it, you can touch it. It's very cold and metallic, but once we get there, it's going to be, I think, a real revolution in what we think of it and around observability as a product or tool.

February 8, 2022
40 min read
Observability

Share this article

About the author

Austin Parker

Austin Parker

Read moreRead more

How to Operate Cloud Native Applications at Scale

Jason Bloomberg | May 15, 2023

Intellyx explores the challenges of operating cloud-native applications at scale – in many cases, massive, dynamic scale across geographies and hybrid environments.

Learn moreLearn more

2022 in review

Andrew Gardner | Jan 30, 2023

Andrew Gardner looks back at Lightstep's product evolution and what's in store for 2023.

Learn moreLearn more

The origin of cloud native observability

Jason English | Jan 23, 2023

Almost every company that depends on digital capabilities is betting on cloud native development and observability. Jason English, Principal Analyst at Intellyx, looks at the origins of both and their growing role in operational efficiency.

Learn moreLearn more
THE CLOUD-NATIVE RELIABILITY PLATFORM

Lightstep sounds like a lovely idea

Monitoring and observability for the world’s most reliable systems