Skip to main content

7 Data Sins Series: Metadata matters

Martijn Groot

Tracking the contextual information of financial data

Asset managers and other financial services firms are faced with massively increasing amounts of data, both in the investment process as well as in client and regulatory reporting processes. Providing easy access for different user types in terms of reporting, querying, discovery and modelling is perhaps the most important data management function.

In our seven data sins series, we have been exploring different aspects and challenges of data management. One area which is not often the primary focus or driver of improvement initiatives is that of tracking the metadata surrounding basic financial information such as issuer data, corporate actions, terms and conditions, and, above all, market data.

Tracking and exposing contextual information can happen top-down – at a data set level – as well as bottom-up – looking at the metadata of individual data attributes. To cater to the requirements of different stakeholders, firms need to do both.

From a top-down perspective, it is critical to know what the data sets are that a firm has at its disposal. These data sets include commercial data sets from market and reference data providers but also public data sources, internally produced (proprietary) data and data that comes from business relationships such as customer data.

Different Channels of Sourcing Financial Information

To properly harness all these different data sets requires, first of all, to put stakeholders within a firm on a common footing exposing what data is available, through a data catalog or other inventory of data sets. Contextual information at a data set level includes usage permissions as well as any license restrictions when it comes to commercially acquired data. This can also include any geographic restrictions on data usage or transfer imposed by different legislations. Mostly, metadata at a data set level is about where a data set can be used, which use cases, user roles, business applications, departments, or geographies. Metadata at this level could also include sourcing frequency, destinations including current usage by users and applications as well as any quality checks set on a data set level as well as data derivation rules or models that the data set feeds into. As the number and diversity of data sets continue to grow, keeping track of what data is already available is critical to increasing productivity, turnaround time in getting the data you need, and preventing redundant data sourcing.

From a bottom-up perspective, tracking metadata includes tracing the individual actions that took place on an attribute level. This would include tracing the lineage of a data field: for example, what sources, business rules, and validations went into a price used to value a specific position. Increasingly, firms need to document the data points that went into any decision. Clients, regulators, and auditors alike may dig into the background as to the value of individual data fields. Regulation such as MiFID II has imposed further requirements on documenting decision making around order execution.

The trend towards closer integration of data and analytics has further increased the need to document the properties of data sets. Advances in analytics have sped up automated decision making and assessment of information, increasing the risk of financial models and algorithms going off the rails if fed with inappropriate data without the ability to explain what happened as certain algorithms are black boxes.

Adopting a strategy of Data Quality Intelligence, i.e. tracking the data quality rules that acted on the data as well as their results will help both to continuously improve data operations as well as to shed light on whether a data set is fit for purpose in a specific context. Tracking the impact of exceptions raised, i.e. whether they were false positives or led to manual intervention, helps to calibrate rules and optimize business logic. Furthermore, tracking the set of operations to derive data that a data set feeds into will help document its proper use cases.

Data quality has different aspects including timeliness, completeness, and accuracy, and different use cases can require different trade-offs. Tracking the rules, how they have changed over time as well as the changes to the actual data values are required to have a complete picture.

Alveo has recently launched its Ops360 solution which provides users with a complete overview of the pricing information, reference data and other data sets, its sourcing status, and any exceptions flagged by business rules. It also provides the configuration of different workflows to make sure data is properly used. Through our data lineage capabilities, we provide complete insight into the origin of any data field. A quick intro video can be found here.

Metadata matters. Tracking and easily exposing user permissions, quality rules, sources, and destinations as well as changes over time is increasingly part and parcel of core capabilities in data management.