The dirty secret of systems development in financial services and capital markets is that poor quality has become the norm. Technology ‘support’ has become a euphemism for addressing user complaints and fixing poor performing software. But IT systems can be designed and implemented so that they work straight out of the box and keep on working.
The dirty secret of systems development in financial services and capital markets is that poor quality has become the norm. Late and overbudget delivery of new systems is the perennial experience of many, while a single word — “support” — summarizes the quality problem. How many finance directors have scratched their heads and asked, “What are they all doing?” when confronted with headcounts and budgets for support personnel? How many have received a convincing answer? Here is the honest answer: Support is just a euphemism for fixing it.
In this context, “support” does not mean the essential processes needed for real-time service delivery. Instead, it means the process of translating a multitude of complaints from internal and external customers about poor quality software (functionality, utility, and performance) into a list of items that either are missing and should have been there in the first place; or are in there but not working correctly. Low customer satisfaction is the outcome of this quality gap. Even when adding new functionality for new versions (a fundamentally worthwhile activity), support personnel can consume much more time and money than expected because of underlying quality problems with the system.
For the avoidance of doubt, “support” is not a cunning plan to extract more money in perpetuity from customers when the system is built by a third party, though that is how it turns out in reality. Rather, “support” has been allowed to become an industry norm. If everyone assumes — based on experience — that all systems will have to be fixed on an ongoing and permanent basis, then everyone behaves accordingly when planning, budgeting and building. But that is a mistaken assumption.
The fact that support of the fixing variety does not arise in some business domains because it cannot be allowed to arise shows us what is achievable for our industry as well. Those domains of zero support are where software is embedded on the chip of a device out in the field, meaning that updating or patching the code is impossible, inconvenient or undesirable. Such domains include military, space, and consumer goods technologies (e.g., nuclear missiles, deep space exploration satellites, and washing machines). Mobile technology and the Internet of Things has made it possible recently for real-time updating to happen on some remote devices; but even where it is possible in theory, it might still be undesirable. You definitely would not want internet communications on many military products, unless you relish the idea of the kid next door holding the world ransom!
It is no coincidence that countries where the education system for computing is centred on embedded software development are producing engineers who do not assume post-hoc fixing. Those educators and engineers have a mind-set in which any need for subsequent fixing of a system once built and deployed is viewed, as was described to me, as “total project failure” — even when it is possible to implement the fix (e.g., software sitting on a sever in a bank). A mind-set baffled by the easy acceptance of lower-quality software in our industry.
Those countries (or at least the ones with which I am familiar) are in Eastern Europe. The quality difference that such an education system makes to technology-based products, services and systems is huge. The more I worked with people and firms from these regions over the past 20 years, the more I realized that support didn’t exist because they had a different mind-set (different assumptions), rather than by them doing work differently. They didn’t do work differently; they did work better. Like the Ferrari Formula One racing team, I also change the wheels on my car — but they do it a lot better than me! Eastern European software engineers and firms have a “right first time” mind-set that is so deeply embedded as to be self-fulfilling, giving the right functionality, utility and performance (especially reliability and stability). “Support” goes back to being what it is supposed to be: helping customers use and maximize the value of the system, product or service.
Where does quality come from? Culture is clearly a crucial driver, but what is it that comes out of culture that ensures quality? It does not come from testing, or at least not entirely. Not if tests are based on a set of requirements that are incomplete or just plain wrong (one of the main causes of “support”). What is missed originally cannot be tested for subsequently. Yet quality cannot come only from requirements documents; otherwise the obsessive effort of requirements definition in “waterfall” software development methods would produce the best solutions, when in reality it often produces the worst solutions. In my experience, the best business solutions come from iterative agile methods in which requirements and designs pivot (change) on feedback from real customers after regular releases and demonstrations of updated and evolving prototypes during the development cycle — i.e., not at the end of the process with the big reveal of the new system that you get in waterfall approaches. However, iterative approaches do not automatically solve the support issue either as it is the technical implementation in all its disguises (rather than the business design) that is the crux of whether the solution will need to be fixed in perpetuity.
The way that technical designs impact quality can be compared to the infinite number of ways that the components of a car could be configured into a machine that “worked” — e.g., the driver facing backwards and using the mirrors to see where he or she is going could work in theory, but accidents (aka: support calls) would be rife. Now think of the infinite opportunity for system developers to configure code into objects and modules and then join them together. Sure, you can join them together in ways that appear to work in a development environment. They may even appear to work in many user acceptance testing environments. (And when they don’t, someone always says, “Don’t worry. That issue will be fixed before the big day!”) But once they go into operational acceptance testing, the fact that they have not been designed and integrated in the most logical, efficient and effective way will soon become apparent.
So why do so many poor systems go into production? Answers:
- [the obvious one — and a real bug-bear of mine] Weak or non-existent user and operational acceptance testing.
- Political, competitive or egotistical drives for systems to go live on a certain date, no matter what.
- Lack of operational (service delivery) skills and experience in the development team. (Thank goodness for the DevOps movement, but it has a long way to go to be the norm and to have the power that it needs to say “no.”)
- Poor communication among the development team and service delivery and infrastructure management organisations (see comment above about DevOps).
[Related: “The DevOps Journey: Winning the Battle of Hearts and Minds”]
Sometimes the consequences of these deficiencies are so serious that they hit the national and international press, with service outages for hours, days or weeks for new high-profile systems — e.g., the major online banking outage for a UK bank in 2018. Or, new systems developments are abandoned completely and all costs written off — e.g., an incredible £11 billion written off in a National Health Service system development in the UK. In 2012, McKinsey reported that 17% of large IT projects fail so critically as to threaten the very existence of the company.
When a poorly designed or poorly built system gets into production, that is when the support industry around it becomes most expensive, perpetually fixing a system that is fundamentally flawed and effectively unfixable. That explains why there are more support tickets after three years than after one year, and more after five years than after three (happy to solve that mystery for those who were wondering). You cannot fix an unfixable system; all you can do is cover it in Band-Aids (and those Band-Aids themselves get added to the list to be fixed). When the same mind-set is used to support a system as was used to build it, then all you ever do is make things worse. That is when you feel like you are slowly sinking to the bottom of the pond but cannot understand why. Then, one day someone finally proposes that you build a new system to replace the old one.
In summary, it doesn’t have to be this way. The dirty secret can be eliminated, but only if we admit its existence and do something about it. The need for perpetual fixing of IT systems is not preordained. IT systems can be designed and implemented such that they work straight out of the box, having arrived on time and on budget, and can keep on working. For all IT systems in all industries to work the way that they should, customers and suppliers need to believe that it can be done and then insist of each other that their cultures and methods are consistent with that assumption.
If you design systems well; adopt the right assumptions; employ people with the best attitudes, educations and motivations; measure quality in terms of customer satisfaction and whole-of-life cost; then solutions can be built on time, on budget and up to quality, and you will not spend the rest of your life paying to fix them and trying in vain to make customers happy. It can be done!
By Cliff Moyce,
Chairman of Advisory Board, Finance Practice at DataArt.
Originally published at TABB Forum.