The headlines about failed outsourced public sector IT projects in the UK are becoming incredibly monotonous, with contracts in the NHS, HMRC and other departments being halted or delayed despite huge sums being invested. But the UK is not alone in facing this kind of difficulty. The Australian Tax Office (ATO) recently disclosed that its new arrays had an ‘issue’ and put 1PB of public data at risk. Ten days later, the ATO was still trying to restore data from back up tapes – a situation that is far from ideal.
With this in mind, Peter McCallum, director of data centre solution architecture at FalconStor, explains what lessons the UK can learn from Down Under.
Public sector outsourcing has proved hugely popular around the world as a way to deliver better value for tax payers, harness private sector expertise and deliver much needed improvement to vital digital services. Yet, it has a serious down side. What is our recourse when things go wrong? What happens when SLAs are not met? And when projects do go off course, who foots the bill?
The headlines about failed outsourced public sector IT projects in the UK have a disturbing ‘Groundhog Day’ feel about them. Contracts in the NHS, HMRC and a range of other departments have been halted or delayed despite huge sums being invested in order to deliver a range of vital services or improvements.
Less than a year ago, the Cabinet Office recommended a review of £500m of outsourced IT contracts, and separately, the Scottish Police Authority terminated an outsourced IT contract because ‘the technical solution cannot be delivered within expected timeframes and budget’. Last October, the Financial Times ran a story that called for the risk of public sector IT projects to be shared with providers, rather than being ‘borne by the state alone’.
2017 had barely begun when a National Audit Office report detailed how IT problems were a major contributing factor to the failure of an HMRC contract intended to actually save £1bn over three years. Yet, the UK is far from alone in facing this kind of difficulty. A closer look at some of the recent challenges faced in Australia helps underline the point that governments and public sector organisations should look beyond their own borders and learn from each other, no matter where they are.
The Australian Tax Office (ATO), for example, ran for many years on a heterogeneous infrastructure platform (many vendors and products working together) until they changed approach and went with a single-vendor strategy on what was, at the time, a ‘true active/active symmetric highly available environment’.
In mid-December 2016, the ATO disclosed that its new arrays had an ‘issue’ that put at risk over 1PB of public data. As it turns out, one storage system experienced data corruption and replicated it to the other. Ten days later, the ATO was still trying to restore data from back up tapes. For weeks following the incident, there were maintenance outages as systems were brought back online and repaired.
Remember, the tax office went from a multi-vendor system with different technology bridged via ‘abstraction layer’ software to provide services. In this situation, individual hardware failures are far less impactful. Think on this: If you have two of the exact same items, a flaw in one is a flaw in the other. Activating that flaw is highly likely to activate the flaw in the other.
This ‘flaw-sharing’ issue is compounded by the fact that these arrays (and almost every SAN out there on the market) are only designed to talk to each other. A 3PAR SAN cannot replicate to an EMC SAN, any deduplication and snapshots are all located within the same box, and any reporting or early warning systems only apply to itself. It’s a systemic condition designed to retain customers and force a single-vendor affinity.
Counting the cost of counting
Similarly, finding out who your citizens are is a very expensive process. The five-year budget for Australia’s census was roughly $470m, and the Australia Bureau of Statistics (ABS) hired a service provider to put it all online and save taxpayers over $100m.
On the night of census, there was an apparent problem in the networking equipment that triggered false alarms of attacks which shut down the website. The result was that 23 million people (or more) were unable to access the service. By way of comparison, in the US, the Affordable Care Act (ObamaCare) website (cost $2.1bn) and didn’t function like it was supposed to on day one either. So the ABS website was down for over 40 hours due to a bit of code that apparently didn’t load correctly during a reboot of a piece of hardware.
The reason why this is important is that in the follow-on press conferences with government officials, the service provider was squarely blamed for the problem. The service provider, of course blamed one of their providers, who, of course, said that they were offered, but did not partake in, the service that would have prevented this. ABS stated that it made some mistakes and probably should have extended the consulting time a little longer.
But look at it this way – the geography and location are, in one regard, irrelevant – take location out of it and there’s a very familiar ring to these issues.
So, what lessons can be shared internationally? Maybe if systems were designed with resiliency in mind from the outset, these problems wouldn’t have happened? In the case of the census project, for example, there is no excuse in this day and age that a full back up system couldn’t have been running in another data centre, behind a different router. There is no reason why the website couldn’t have failed over within minutes to Amazon Web Services, load balanced with Microsoft Azure, for example.
Every system today needs to be designed with a multiple vendor, multiple location perspective. Point products (products that can only do one thing) cannot be allowed to operate in oversight vacuums and must integrate or be monitored as part of an intelligent system. Public sector bodies need to demand more from their infrastructure and stop operating to the limitations of their service providers and their technology. By demanding accountability from vendors, providers and their technology choices there is the real prospect for better delivery, where savings and service benefits make the headlines more often than the failures.