Tips for Monitoring Your System Over the Holidays
by Eric O'Rear
What do the holidays mean for developers?
For many, the joys of the holidays also include a major increase in system traffic — coupled with a major reduction in support and coverage.
In short: The holidays can be an especially challenging time for software developers.
For the teams responsible for monitoring system health — and handling issues when things go wrong — the combination of increased demand and reduced support can create a cauldron of stress. Fear not. With the right planning and communication strategies, your team can keep the holiday season merry and full of cheer.
We have some tried-and-true advice to help you prepare for this unique time of the year.
Be prepared! Do your best to know when and where things will go wrong in your system.
- Look at system performance data from previous holiday seasons to understand the shift in patterns unique to your system
- Map these patterns to your updated numbers to run tests and simulate increased traffic loads
Focus on your system’s critical path: what services are most likely to experience extra stress?
- Have autoscaling and redundancy solutions in place for your most relied-on services
- Avoid barebones teams for these mission-critical services
Make sure your automated alerts reflect updated performance expectations.
- With so much on the line for you and your customers, adjust alert thresholds around uptime, latency, and error rate to guarantee your team stays ahead of a debilitating problem
- Structure alert information to be as helpful as possible: there is a real cost to downtime
Before the holidays get started, clearly communicate on-call responsibilities and strategies.
- This isn’t just a concern for dev teams! Make sure sales, customer success, and other customer-facing teams know who to reach in the event of an issue
- No software is an island: know who will be on call with your system’s critical third-party dependencies
Have an agreed upon backup plan if your current procedures assume everyone and everything will work properly.
- Have personnel redundancies in place!
- What will you do if your alerting tool goes down? Make sure somebody is actively paying attention to system health
It’s the holiday season! Many employees look forward to this time all year long to visit rarely seen friends and family. Consider additional compensation for oncall employees. This will show you value their availability during this special time of year, _and_ underscore the business importance of keeping systems well-monitored and performant.