Why Join LightStep Engineering

I joined the LightStep engineering team 7 months ago, and so far, it’s been an incredible ride. As many startups are, we’re hiring quickly, and I’ve been involved in several interview panels so far. The question I get asked the most by candidates always boils down to: “why did you join LightStep?” or “what keeps you excited here?”. My answer has always been the same (which usually ends up being pretty long).

The people

I’ve had the opportunity to work with a number of people within engineering and across different departments, and I’ve been blown away by how smart, humble, and passionate everyone is about their work. At LightStep, we take our work seriously (without taking ourselves too seriously).

The technical challenges

Working on the LightStep SaaS distributed system poses all kinds of exciting technical challenges that you can sink your teeth into (e.g. anomaly detection, time-series visualization, and aggregate data searching). We aim to build high-throughput, low-latency services for our customers, and continuously optimize the performance, reliability, scalability and operability of our system.

The values

The company values fit exactly what I was looking for. We believe in knowing why our work is important, seeking out diverse opinions, being a team multiplier, and learning from both our failures and our successes.

The product

Last, but definitely not least, I love working on a cutting-edge product that I really believe in. LightStep gives developers visibility into complex systems in ways that I’ve never seen before. We empower engineers to quickly resolve performance degradations, diagnose anomalies, and monitor what matters the most. And we enable engineers to build better, faster systems.

And that about sums it up! If this aligns with what you’re looking for next, check out our Careers page – we’re hiring! Or to get a glimpse of the team, check out our instagram.

DevOps and Site Reliability Engineering – What’s Different at LightStep

At LightStep, we spend every day helping our customers understand performance behavior in their distributed applications. We’re proud our product is used to diagnose problems for many important software systems. And as a tool used to improve performance and reliability in other applications, we must hold our product to even higher standards when it comes to those metrics. At the same time, we challenge ourselves to innovate quickly while still meeting (or exceeding) those standards.

As one of the co-founders and the CTO at LightStep, I’d like to share a bit of what it’s like to work on the engineering team, how we collaborate, and our process for bringing ideas to market.

One critical part of running highly available services is determining who is responsible for making sure that those services are available. Two related terms that get tossed around a lot here are DevOps and Site Reliability Engineering (SRE). Unfortunately, neither of these terms are particularly well defined – just Google them and see for yourself!

One of the parts of DevOps that I like best (though certainly not the only part) is that individual teams are responsible for the entire application lifecycle, from design, to coding and testing, to deployment and ongoing maintenance. This gives teams the flexibility to choose the processes and tools that will work best for them. However, that autonomy can lead to fragmentation across the org in how services are managed and duplication of effort across teams.

On the other side, SRE is often used to describe organizations that are laser-focused on product availability, performance, and incident response. While these are all important, these SRE organizations can sometimes build antagonistic relationships with the rest of engineering where SRE is seen as impeding progress for the sake of its own goals.

At LightStep, we believe in a hybrid implementation of these two philosophies, where our engineers are organized into small groups with split responsibilities but shared objectives. SRE at LightStep is responsible in part for building shared infrastructure that is leveraged by the whole organization, but they are also embedded within teams to help spread best practices and understand current developer pain points. This structure has enabled our teams to remain agile, to conduct rapid product experiments, and to have the flexibility to quickly adopt new (or discard old) technologies and tools. Retaining the natural and healthy tension between maintaining product stability and accelerating innovation to market ensures every decision we make is a balance that ultimately focuses on our customers’ success.

When considering prospective DevOps engineers or SRE (titles don’t really matter much to us at LightStep), we look for engineers who are excited about working side-by-side with the rest of our team. To us, SRE isn’t a separate organization so much as a mindset: we look for engineers who are excited to collaborate and apply a broad set of tools – including traditional operational tools like automation and monitoring as well as robust software development practices – to improve the reliability of our product and increase the velocity of individual teams and of our organization as a whole.

We’re always striving to improve how we do things and looking to new team members to help us on this journey. All of our engineers bring complementary skills and experience from both academia and industry. Above all, we value those who respect differing opinions, communicate clearly, and are empathetic towards their peers.

If you’d like to be part of this journey and would enjoy working on these engineering challenges, we’d love to hear from you!