By Tony Collins
Experts are questioning BA’s explanation of the power problems that disrupted the travel plans and arrangements for 75,000 people at the weekend.
BA says it is “reviewing” what went wrong at the weekend but is under no regulatory duty to publish the findings.
There is little pressure from shareholders to hold BA to account. The share price of BA’s parent International Airlines Group is higher today than a month ago.
Yesterday the BBC’s business editor Simon Jack accused IAG of dodging tough questions it will “surely have to answer” and the FT quoted IT and electricity experts who are sceptical of the airline’s explanations.
But MPs on the Transport select committee – a new one will be formed after the general election next week – could decide, if pressed by their constituents, to have an inquiry into BA’s power problems.
If so, they could question BA’s chief Alex Cruz or Willie Walsh, the chief executive of IAG.
In 1997 the committee held an inquiry into the escalating costs and problems on IT contracts at the Swanwick air traffic control centre in Hampshire. MPs decided to publish the contents of an independent report into the problems by technology consultancy Arthur D Little.
Any 2017 inquiry by the committee could hold BA to account in a way that would not otherwise be possible. Lessons from the failures may be useful to the public and private sectors.
Meanwhile what went wrong and why seems confused.
The Telegraph says the BA review is focusing on the uninterruptible power supply (UPS) to Boadicea House, one of two data centres close to Heathrow airport.
The UPS in question delivers power through the mains, diesel and batteries.
On Saturday morning, shortly after 8.30am, power to Boadicea House through its UPS was shut down. The reasons are unclear.
If power had returned to the servers in Boadicea House slowly this would have allowed the airline’s other Heathrow data centre, at Comet House, to take up some of the slack, said the Telegraph.
But, on Saturday morning, just minutes after the UPS went down, power was resumed in what one Telegraph source described as “uncontrolled fashion.”
This caused “catastrophic physical damage” to BA’s servers, which contain everything from customer and crew information to operational details and flight paths.
The Telegraph said that if power had been restored more gradually, BA would have been able to cope with the outage, and return services far more quickly than was the case.
The FT said yesterday that the UPS malfunctioned, cutting off the power supply. But it said that “some people working in the field have questioned” the explanation. They said it is very rare for UPS systems to fail. Even if they do, it should not affect the continued supply of mains electricity to the data centres they serve.
Not a technology problem?
BA has said there was an “immediate loss of power” from the UPS. When power returned, a surge physically damaged its IT servers. It had to replace the damaged equipment.
Willie Walsh said the meltdown was not a technology problem. The FT quoted him as saying, “You give me any IT system in the world and I’ll show you how good it is when it doesn’t have any electrical power going to it.”
Walsh insisted there was “no data loss, no data corruption”. He said the IT systems “functioned how they are supposed to function.”
But the FT quoted Jonathan Glover, co-founder of PSI, a company that helps businesses protect their equipment against sudden, unexpected power surges, who said the failure of a UPS “was relatively unlikely as they are robust and well-proven pieces of equipment”.
He added that, even if the UPS system did fail, it should not make a difference to the power supply to the airline’s IT system. The answers given don’t make a lot of sense, he said.
Alan Woodward, visiting professor at the department of computer science at the University of Surrey, agreed. He told the FT,
“It is like on your laptop and if you just pull the plug out of the back, it shouldn’t affect your laptop. It keeps running until the battery runs down. Even if you unplug the battery [of a laptop], it doesn’t like it from a data perspective, but plug it back in again, you don’t suddenly get a big power surge.”
Woodward said one possible explanation was that a voltage regulator contained within the UPS might have malfunctioned but when they fail the power usually stops, he added.
Another expert on UPS technology said that even if the system had failed, it would simply have been bypassed and normal electricity supply should have continued.
Why would the failure of the UPS affect BA’s back-up data centre? The answer is unknown. BA would not comment on whether their two Heathrow-based data centres relied on the same UPS.
Ryanair on Tuesday pointed out that it had IT systems in three locations around Europe and if one went down, there were backups at each of its data centres. Ryanair’s data centres are not close to each other.
Two electricity companies whose low-voltage networks cover Heathrow airport and the surrounding area have denied there were any issues on their networks on Saturday morning.
Transient voltage surge arresters can shield against power surges from the local electricity network and malfunctions in a company’s own equipment but it is unclear whether BA had these fitted and if it did whether they worked.
The FT quoted an expert as saying that BA either had inadequate defences or didn’t have the right level of industrial-level surge protection. BA has not commented on what protection measures it had.
Will BA publish its review?
BA may be reluctant to reveal the results of its review for various reasons. Parts of its IT appear in the UK could be run by non-BA staff. The failures could raise questions about the corporate oversight of any non BA specialists, possibly at board level.
It is also possible that an internal review could highlight fundamental managerial weaknesses – such as unclear or confused IT responsibilities in the UK or at IAG – after the outsourcing of IT skills to India last year.
Damian Brewer, an analyst at RBC Capital Markets, told the Telegraph that if BA’s early diagnosis of the cause of the crisis is correct, bosses’ failure to prepare for such an incident in the light of other carriers’ problems “suggests fundamental management and planning weakness”.
“It seems highly questionable why similar incidents with major US carriers in the last year have failed to see IAG move to ensure its airlines had plans in place to mitigate this risk, already seen elsewhere, and also to have contingency plans in place,” he said.
“At present, it appears that BA management have seemingly not taken account of IT risk precedent already seen and already known at other carriers.”
In what BA has said publicly about the IT problems, much of it has focused on what didn’t happen (a cyber attack) and on the people who were not responsible (Tata in India or energy companies). It told the BBC the problems were “definitely not a consequence of underinvestment or cost-cutting.”
“All the parties involved around this particular event have not been involved with any type of outsourcing in any foreign country,” said Cruz. “They have all been local issues around a local data centre who [sic] has been managed and fixed by local resources.”
Without an inquiry by the newly-formed Transport Committee, BA will find it easy to keep the lid on the results of its inquiry into the failures. This would be a pity given the lessons that could be learned.
It’s ironic that the aviation industry has an exemplary reputation for reporting even minor problems that relate to safety. There is a duty to report even a ruffled carpet in an aircraft aisle that could trip up passengers or crew.
But there is no duty to account for an IT failure that disrupted the lives of 75,000 people across the world because it was not a safety issue. Provided the company pays satisfactory compensation, the fiasco will probably be out of the public eye in a few months.
But MPs, on behalf of their constituents, could hold BA to account.
Anyone who wants to ask MPs to hold an inquiry into the BA failures could write to:
House of Commons
Telephone: 020 7219 3266
The Committee’s clerk is Gordon Clarke: firstname.lastname@example.org
Thank you to Dave Orr for his regular updates on the BA problems
BA’s IT: Will Transport Committee MPs ask the tough questions? – Government Computing
Full details of meltdown revealed (says Daily Telegraph)
BA board to demand IT chaos inquiry – Simon Jack, BBC