Do regulatory data projects really need design-time data lineage? Probably not.
- by 7wData
Your regulatory data project likely has no use case for design-time Data lineage.
Mapping Data lineage at design time, for its own end, has no regulatory use case or ROI. Buying a specialist tool to support that mapping has even less ROI.
Regulations see that kind of documentary data lineage as ancillary at best. Most regulators won’t ask to see the visualizations but will ask for the specific data values that make up the regulatory reports, i.e., a query time view of the data and where it came from. Put another way, ask for the workings with the constituent values for each reported item when the report was run.
To meet those regulations’ requirements, software vendors will have you believe that buying their lineage tool will do just fine. Rather, you need to invest in capabilities that capture data provenance at query time in a data store. This store will include the data flow path and values used in the calculations, and the results reported. That store will hold the data bitemporally so that the clock can be wound back to when those queries were run. Being able to interrogate, analyze, and publish that data from any point in time will meet the form and the substance of the regulations and, most importantly, give you a host of valuable insights if you hold that data.
Finally, good data lineage visualization is a consequence of a well-managed estate rather than a goal in and of itself. A well-realized API or Data Quality framework will provide data lineage metadata as a byproduct of their delivery.
Just after the publication of BCBS 239, I was sitting in a meeting with the Risk function of a large bank. A consultant declared he was “passionate about lineage” and proudly walked us through examples of his work in spreadsheets and drawing tools. All static and manually compiled by an army of graduates. The consultant leaned back in his chair, looking inordinately self-satisfied while I saw six months of work and hundreds of thousands of pounds squandered – leaving the bank no closer to meeting its regulatory requirements.
A couple of years later, a vendor presented a slide; their design-time data lineage tool, which worked by scanning platforms’ metadata, and asserted *that it may* be able to help with regulatory compliance, especially BCBS 239, CCAR, and GDPR. When prompted, they could provide no further explanation. The product was rejected.
In 2021, the Fed asked a bank to provide all of the specific numbers used to make up a series of risk metrics on a report, where they came from when the report was generated. This request took that bank eight weeks and many manual queries to comply with.
What does this mean?
Data lineage for its own sake, mapped at design time, has little or no value, and post 2021, has no explicit regulatory justification.
Lineage vendors still invoke BCBS 239, CCAR, IFRS-9, and GDPR as major reasons for buying their tools. As we’ve seen with the Fed’s recent request, buying a specialist data lineage tool that scans systems, code, and related metadata is a waste of time and money if your primary use case is regulatory compliance and all you’re doing is mapping visualizing data flows.
Data Lineage isn’t mentioned at all in the commonly cited regulations. In Dr. Irina Steenbeek’s excellent discussion on Data Lineage, she notes:
My professional journey to data lineage has started with investigating the Basel Committee requirements [for BCBS 239]…. Many specialists consider data lineage the ultimate ‘remedy’ to meet these requirements.
The funny thing is that you never find the term ‘data lineage’ literally mentioned in these regulatory documents.
The argument advanced by the great* Australian jurist Dennis Denuto, “… it’s the vibe of the thing …”
It’s the vibe of the thing…
The vendors and experts try to assert, with varying degrees of success, that design-time Data Lineage is implied by both the language of the regulations and the current set of best practices in data management.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
Shift Difficult Problems Left with Graph Analysis on Streaming Data
29 April 2024
12 PM ET – 1 PM ET
Read MoreYou Might Be Interested In
One Big Question: Why is artificial intelligence still kind of dumb?
27 Mar, 2017Once the domain of science-fiction authors and script writers, artificial intelligence is steadily marching into the real world. Recently we’ve …
The true value of data in the smart home’s future
22 Jul, 2017The smart home market is a place of considerable excitement, dynamism and growth. The global smart home market was valued …
Why Modern Business Runs On Data Streaming
26 Sep, 2022Data moves. Almost all sources of data have an element of dynamism and motion about them. Even data at rest …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.