By Tony Collins
Outsourcing to India and losing IBM mainframe skills in the process? The failure of CA-7 batch scheduling software which had a knock-on effect on multiple feeder systems?
As RBS continues to try and clear the backlog from last week’s crash during a software upgrade, many in the IT industry are asking how it could have happened.
Stephen Hester, RBS’s boss, told the BBC today:
“In simple terms there was a software change which didn’t go right. Although that was put right quickly there then was a big backlog of things that had to be reprocessed in sequence. That got on top of our technical teams … it is like the landing path at Heathrow. Once you get out of sequence it takes a time to get back into sequence even if the original fault is put right.
“Our people are working incredibly hard … I am pleased to report that as of today RBS and Natwest systems are operating normally.
“We need to make sure they stay normal for the next few days. There is still some significant catch-up today, much less tomorrow and so on as we go through the week.”
The immediate technical cause of the problems might not have been too difficult for those inside the bank to establish – but finding out how and why it happened, why processes were not in place to stop a backlog of work building up, and why testing of the upgrade did not pre-empt the failure may take weeks and possibly months to establish.
Attributing blame could take many years. After BSkyB appointed EDS to supply a CRM system in 2000, and the project failed, it was ten years later before a court reached a judgment on blame. The cause of the failed project was never definitively established.
Official cause of system crash
The official cause of RBS/Natwest’s problems was given at the weekend by Susan Allen, Director of Customer Services, RBS Group which includes Natwest and Ulster Bank. She told Paul Lewis of BBC’s Moneybox programme:
“Earlier this week we had a problem in our overnight backup. So a piece of software failed that started all the updates that happened to our systems overnight.
“What that has meant practically is that information on customers’ accounts has not been updated… It is horrendous.
“The underlying problem has been fixed, so the computer software that failed has been replaced. That is in and working. The challenge we now have is bringing all the systems back up and working through all the data that should have been gone through over the last three nights …
“We have 12 million customers in Natwest and RBS and just over 100,000 in Ulster Bank. So it is affecting a serious number of people. It is having a terrible impact.
“We are encouraging all of our customers to call us, come and see us in our branches … we have branches open late .. and have doubled the number of people on the phone. Call centres are open 24 hours a day.”
Call centres use 0845 numbers which are chargeable for some. Lewis asked, Why are you making people pay to fix a problem that’s your fault?
“Customers should not be having to pay for those calls,”replied Allen. “If that is a problem for people we will take a look at that.”
Lewis: Will you re-imburse people for their calls?
“Absolutely. We recognise there will be lots of different expenses as a result of this. We apologise and want to make sure they are not out of pocket. If people have got claims they should put them through to us…we will need the information to deal with the claims.”
Lewis: Will you refund charges by credit card companies for late payments?
“We will. We will… we will make sure nobody is out of pocket… in one instance we got cash in a cab to a customer’s home… clearly we trust our customers so if we can see that somebody has a certain amount coming in every week we will give them money against that. So we ask people to come in and bring identification with them such as their bank card, we will do what we can to help.
“We will look after our customers. We realise this has had a huge impact on people. We are not underestimating it … clearly there are things that have gone wrong and we cannot put everything right.”
Lewis: How much damage has this done to the reputation of the bank?
“Time will tell. For us it is pretty devastating. We pride ourselves on being a bank that really cares about our customers and wants to deliver great service. We absolutely mean it.”
Lewis: Should you get a bonus?
“We only get performance bonuses when we perform and this has not been a good performance.”
Her explanation of the cause of the IT crash is unclear but otherwise Susan Allen’s answers to Paul Lewis’s questions were exemplary. Her openness and unaffected humility is surely the best way to handle a PR crisis. Small comfort for the millions affected though.
Technical cause of the crash?
Some of those commenting to The Register appear to have a good knowledge of RBS systems. There are suggestions RBS has lost some important IBM mainframe software skills in outsourcing.
One or two have suggested that the crash was caused by a failure of the bank’s CA-7 batch scheduling software. In February RBS had an “urgent requirement” in Hyderabad, India, for people with four to seven years experience of CA7.
One comment on The Register said that RBS runs updates on customer accounts overnight on an IBM mainframe, via a number of feeder systems that include BACS. “The actual definitive customer account updates were carried out by a number of programs written in assembly language dating back to about 1969-70, and updated since then. These were also choc-full of obscure business rules … and I do not believe anyone there really knew how it all worked anymore, even back in 2001…
“Of course the moral is complex mainframe systems require staff with the skills, and in this case, the specific system knowledge to keep things smooth. The fewer of these you have, the more difficult it is to recover from problems like this.”
Robert Peston, the BBC’s Business Editor, asks whether outsourcing was to blame.
“In my conversations with RBS bankers, there is an implication that outsourcing contributed to the problems – though they won’t say whether this is an issue of basic competence or of the complexities of co-ordinating a rescue when a variety of parties are involved.”
An RBS spokesperson told The Register that the software error occurred on a UK-based piece of software.