Ten Steps To Resolving Issues In DCIM Implementations


Andrew Waterston, transformation executive/independent consultant, has written an exclusive follow-up to his DCIM/Dead Camel In Motion article, with 10 steps to resolving issues in DCIM implementations.

Andrew Waterston

Following my blog suggesting that data centre infrastructure management (DCIM) tools are like the horse designed by committee and hence should be renamed ‘Dead Camel In Motion’ several folk have asked how to mitigate risk in DCIM implementations. Ironically, some of these folk tried to avoid risk by selecting large ‘financially robust’ vendors and are now in the middle of expensive DCIM implementation failures, with the vendors pulling support and laying off staff. These are no longer ‘risks’, but are now ‘issues’. A ‘risk’ has both ‘likelihood’ and ‘impact’ while an ‘issue’ merely has ‘impact’ because it is a risk that has now happened.

Before considering how to mitigate risk and resolve issues we should establish a baseline of understanding. Both risks and issues need to be considered in the context of their impact on benefits realisation.

Successful implementation of a DCIM tool will deliver outcomes. For example:

• To see, and therefore manage, assets throughout their lifecycle, thereby reducing ghost equipment and enabling effective configuration and capacity management

• To understand power and network connectivity

• To enable auditable changes to physical assets (devices, cables, ports etc)

• To understand power usage and highlight power and cooling risks etc.

These outcomes will deliver agreed, measurable, timebound benefits defined in the business case, eg.

• To delay the purchase of additional capacity in DC1 by 18 months saving £3.25m in FY18

• To enable DC3 to close on 24th July 2018 (end of contract) with capacity moved to DC4 saving £2.5m in FY18

• To delay the purchase of additional network, power and server capacity by four months, saving £216k in Q2 2017

• To reduce the cost of IT maintenance by £55k per quarter

• To enable a reduction in energy consumption of £50k per month

• To delay the hiring of 3 DC ops staff by 12 months, saving £100k in FY17

• To reduce corporate risks ‘XR24 & CR19’ relating to cyber security (perimeter threat) and sensitive data loss to levels deemed acceptable by the risk board

In reality, quantifying the benefits and attributing them solely to the DCIM implementation is often more difficult in practice, raising the risk that the benefits are overstated in order to justify the high cost of the DCIM.

trackitaSo, there are implementation issues, the DCIM is not delivering the outcomes expected, costs are escalating and benefits are not being realised. What can you do?

‘Traditional’ organisations talk about ‘failure’ and how to avoid it. Failure is embarrassing and especially ‘career limiting’ if it is expensive (as it is in DCIM). This leads to a culture that refuses to recognise the ‘Dead Camel In Motion’ for as long as possible, in the hope that it will either miraculously cure itself or, more likely, the SRO accountable for delivering the benefits, and who approved the investment, will move on. The next executive will then be able to lambast and write off the programme and start again.

What seems to be happening in data centres at the moment is that after a lengthy implementation, the original DCIM product is end of life before it has been fully installed. This offers the SRO some interesting opportunities for issue management.

Next generation organisations take a different approach to ‘failure’. They see failure as successfully proving that something does not work. Their mantra is to “fail as quickly and as cheaply as possible and move on” as the next idea could well be the one that delivers massive benefits to the company.

So, here then is an SRO’s 10 step plan to resolve issues in failed DCIM implementations:

1. Accept you have successfully proved the original DCIM tool does not work and stop flogging the dead camel. Document the lessons learned and the data and processes that can be reused.

2. Talk to finance about the impact of writing off the original investment.

3. Consider an action against the original DCIM supplier in the (unlikely) event that culpability is clear to recover some of the original costs. Successfully implementing a simpler alternative may well help with such an action.

4. Extend the original programme to incorporate the upgrade to an entirely new DCIM solution, which can be completely different and that leverages the lessons learned. A low cost replacement might be fully funded from the savings in the forward costs of support and maintenance of the original tool.

5. Do a MoSCoW analysis on your requirements based on lessons learned, and your benefits realisation plan, to focus on the areas that will deliver the greatest benefits. In practice this always starts with effective asset management (an asset being anything there is value in managing, rather than having intrinsic value).

6. Carry out a 100% audit of the data centres and all outlying technology equipment rooms and get the asset information into a simple proven tool eg. Trackit Mobile, that will enable you to see the data immediately and will give you confidence that data can be exported into your chosen DCIM replacement. Depending on its audit workload, Trackit sometimes offers free use of Trackit Pro (data centre operations tool) for up to six months after completing an audit.

7. Implement a robust change process in the data centres so you can be sure that the asset data is always accurate. Include the ability to carry out quick ‘check’ audits that a manager can do to keep the data centre operators on their toes and build trust in the quality of the data.

8. Procure a much simpler, cheaper, modern data centre operations tool. The lower the cost, the fewer benefits need to be realised to justify the investment. Import the audit data into a demo environment of the new tool before you buy. Start realising the benefits.

9. Create the API links between the new DC Ops tool into your power monitoring solution.

10. Create the API links to the configuration management system to highlight inaccuracies in the CMDB.

Now you have a successful programme, continue to develop your processes and reporting to realise new benefits.