Click to learn more about author Tim Mullahy.
Are you on top of your data center’s maintenance schedule? You need to be – and Cloud Automation can help you get there.
In January, the Australian Taxations Office announced plans for extended downtime. The reason? To fix ongoing hardware problems that emerged over a month ago.
This will be the latest in a series of ‘critical maintenance’ periods undertaken by the organization since the event, with downtime occurring both on the weekend of the 14 and over the Christmas break. According to a representative of the organization, the event was both unusual and unprecedented. A ‘world first.’
According to the ATO, the main reason they’ve taken so long to fully repair the problem is tied to the fact that system restoration has been made extremely complex by the hardware failure.
“What compounded the problem beyond the initial failure was the subsequent failure of our back-up arrangements to work as planned,” Commissioner of Taxation Chris Jordan explained to ZDNet. “The failure of our back-up arrangements meant that restoration and resumption of data and services has been very complex and time consuming.”
On the one hand, it’s very possible that this was a worst-case scenario, and that both systems failed in spite of regular maintenance. It’s very possible that this was the sort of situation data center operators have nightmares about – an unpreventable failure that more or less crippled their organization. On the other hand, it’s also possible that this whole snafu was the result of one small, apparently minor issue that IT put off fixing in lieu of addressing more pressing matters; maybe they even forgot about it entirely – in other words, a failure in Data Center Infrastructure Management (DCIM).
After all, human error is one of the leading causes of data center downtime.
Data centers are incredibly complex beasts – complex enough that human beings aren’t really capable of managing them unassisted. Documented operational processes and employee accountability are a good first step in keeping your staff on top of maintenance and testing. With the Cloud, you can take things a step further.
By integrating a sensor network with a cloud workflow management platform, it’s possible to give your staff real-time visibility into the status of each piece of equipment in your facility. The moment something’s awry – the moment a system shows even the faintest sign of failure – you can configure automatic notifications. More importantly, you can give yourself direct visibility into how your employees address the problem, and take action if they don’t.
“The arguments for Cloud-based DCIM solutions are getting stronger and stronger,” reads a post on the Schneider Electric Blog. “The economics are compelling with no upfront capital costs and no impact upon existing staff resources. Clearly documented and visible security standards mean that critical organisational data is not compromised. Stuff scales better and provides better performance than a traditional client-based server architecture, [while] sharing and integration of data across silos enables cost reductions through better efficiencies and effectiveness.”
In other words, by incorporating the Cloud into your DCIM processes, you can ensure that problems are fixed the moment they arise, and that your facility operates like a well-maintained, well-oiled machine. And while it’s impossible to completely do away with downtime, that’ll go a long way towards reducing it.