Skip to main content

Fighting Entropy, one data element at a time

Martijn Groot

Entropy means a lack of order or predictability; a gradual decline into disorder. In the world of physics, entropy is linked to the second law of thermodynamics which states that heat dissipates and disorder increases, unless energy is added to the system. In information management, entropy was introduced by Claude Shannon in the late 1940s and is linked to the element of surprise or unexpectedness in information that is conveyed.

In the world of financial information management and IT systems, we observe first-hand how entropy seems to grow naturally over time. The continuous change and growth in the number of incoming data sets, the changing set of required reports and output sets and the evolution of the business logic in the application landscape all contribute to that. Add to this the number of internal stakeholders and departments who all bring their own unique functional perspective on a data set plus a tendency to keep multiple copies of similar data sets, and we have the perfect recipe for a proliferation of data and definitions, ambiguity and dissipation of meaning.

There are many forces at work to increase data chaos, leading to high operational risk stemming from information uncertainty. There is also a vicious feedback loop at work here: because users don’t trust the data available they will try to warehouse and guard their own copies. This leads to further proliferation and, to add insult to injury, a continuously increasing IT and operational cost base. Few organisations can afford this cost and, concurrently, regulators increasingly scrutinize the soundness of the processes behind the numbers and reports firms submit.

There are two ways to fight this tendency of entropy growth. One is the ‘cure’ of trying to impose policies and procedures around introducing new information sets, data governance policies and top-down semantics in the form of enterprise standards. This is essentially a form of damage control, trying to contain the spread of chaos. As any physician will tell you, prevention is better than cure. The second way to fight entropy growth is to try to nip it in the bud. This approach focuses on ensuring key capabilities to maintain a grip on infrastructure including data lineage, tracking as much contextual information such as quality measures and preventing keeping redundant data copies by bringing calculation logic to the data rather than the other way around. It focuses on keeping infrastructure and architecture simple and accessible to prevent overgrowth of separate databases. It also includes clear choices between internally sourced IT development and externally sourced standard products and services with clear SLAs. It seeks to prevent improvisation that inevitable becomes anchored in business processes. Prevention is better than cure. Cheaper too.