Over Christmas, when many are away from work, we want to know that the data centre, and everything it represents for the business, is safe. Moreover, if a problem does occur we want to find out about it quickly so that we can act accordingly.
To achieve this, we need monitoring systems, processes and people. You may think all of this is is in place, but when did you last test them, and how many levels of human redundancy do you have? Who has codes, spare keys, what is the rota? How will you know if the alert systems fail, and how will you manage each disaster that could befall the data centre?
This is the time of year, not to confirm that the processes are written down and accessible, but that they are known to all that need to play their role in protecting the DC. It is time to test! In the same way that the military or an airline pilot will go through simulations, the same approach should be taken with the IT staff that will be on call over the holiday season. They should know the steps to take with each incident that could occur and just as importantly understand the path of escalation if the problem is worse than initially thought or deteriorates beyond their skillset.
You no doubt test the failover between individual servers and clusters already to ensure the data centre continues to fulfill all services in the event of a crash or hardware failure, but what about when a power failure occurs? Is everything that should happen when the DC loses power happening? Is it switching over to generators, informing staff and shutting down any unnecessary servers consuming power? These kinds of tests should be happening regularly and should also be run for connectivity.
The final thought I wanted to leave you with was on the monitoring systems that you are using, and more specifically how they are configured. Now is not the time to put a new system in, but it is important to check what it is monitoring, the conditions and parameters that will trigger an action and alert.
Are they tight enough, or in place at all? Use this time to fully assess them against your processes and IT ‘red list’ of problems. Check alerts are going to the right people – it is more common than most would like to admit that someone that left the company two years ago, is still in the monitoring software.
Finally, get your monitoring software to bring you good news too. Better to get a daily report and know all is well, rather than have a system only configured to send alerts with bad news! Silence breads fear, and you’ll just worry about whether the data centre has disappeared down a sink hole!
Have a great Christmas and use the time you have now to make sure of it!