Lightstep from ServiceNow Logo

Products

Solutions

Documentation

Resources

Lightstep from ServiceNow Logo
< all blogs

Tips for Monitoring Your System Over the Holidays

What do the holidays mean for developers?

For many, the joys of the holidays also include a major increase in system traffic — coupled with a major reduction in support and coverage. 

In short: The holidays can be an especially challenging time for software developers.  

For the teams responsible for monitoring system health — and handling issues when things go wrong — the combination of increased demand and reduced support can create a cauldron of stress. Fear not. With the right planning and communication strategies, your team can keep the holiday season merry and full of cheer. 

We have some tried-and-true advice to help you prepare for this unique time of the year. 

Understand System Challenges

Be prepared! Do your best to know when and where things will go wrong in your system.

  • Look at system performance datasystem performance data from previous holiday seasons to understand the shift in patterns unique to your system

  • Map these patterns to your updated numbers to run tests and simulate increased traffic loads  

Focus on your system’s critical path: what services are most likely to experience extra stress?

  • Have autoscaling and redundancy solutions in place for your most relied-on services

  • Avoid barebones teams for these mission-critical services

Make sure your automated alerts reflect updated performance expectations. 

  • With so much on the line for you and your customers, adjust alert thresholds around uptime, latency, and error rate to guarantee your team stays ahead of a debilitating problem

  • Structure alert information to be as helpful as possible: there is a real cost to downtimereal cost to downtime

Prepare Your People

Before the holidays get started, clearly communicate on-call responsibilities and strategies. 

  • This isn’t just a concern for dev teams! Make sure sales, customer success, and other customer-facing teams know who to reach in the event of an issue

  • No software is an island: know who will be on call with your system’s critical third-party dependencies 

Have an agreed upon backup plan if your current procedures assume everyone and everything will work properly.

  • Have personnel redundancies in place! 

  • What will you do if your alerting tool goes down? Make sure somebody is actively paying attention to system health

Be Empathetic

It’s the holiday season! Many employees look forward to this time all year long to visit rarely seen friends and family. Consider additional compensation for oncall employees. This will show you value their availability during this special time of year, _and_ underscore the business importance of keeping systems well-monitored and performant. 

Interested in joining our team? See our open positions herehere.

December 17, 2019
3 min read
Monitoring

Share this article

About the author

Eric O'Rear
Monitoring

Kubernetes vs Docker Swarm: Which is better?

Austin Parker | Mar 19, 2020

You may be looking into the pros and cons of Kubernetes vs Docker Swarm. Both platforms are excellent, but they both have qualities that are unique to each other. What exactly are Kubernetes and Docker Swarm? Let’s dive in and learn a bit more.

Learn moreLearn more
Monitoring

Managing SLOs and SLIs in Lightstep

Ashley Rahimi Syed | Mar 18, 2020

This blog will show step-by-step how Lightstep can help you monitor and meet your Service-Level Objectives (SLOs) and resolve incidents quickly.

Learn moreLearn more
Monitoring

How Lightstep’s Slack Integration Makes It Easier to Resolve Performance Regressions

Ashley Rahimi Syed | Jan 28, 2020

If you find a performance issue or regression, you can quickly troubleshoot it with your team using Lightstep’s Slack integration. We’ve made it easy to establish shared context with your entire organization – right from the app!

Learn moreLearn more
THE CLOUD-NATIVE RELIABILITY PLATFORM

Lightstep sounds like a lovely idea

Monitoring and observability for the world’s most reliable systems