Do regulatory data projects really need design-time data lineage? Probably not.

Do regulatory data projects really need design-time data lineage? Probably not.

Your regulatory data project likely has no use case for design-time Data lineage.

Mapping Data lineage at design time, for its own end, has no regulatory use case or ROI.  Buying a specialist tool to support that mapping has even less ROI. 

Regulations see that kind of documentary data lineage as ancillary at best. Most regulators won’t ask to see the visualizations but will ask for the specific data values that make up the regulatory reports, i.e., a query time view of the data and where it came from. Put another way, ask for the workings with the constituent values for each reported item when the report was run.

To meet those regulations’ requirements, software vendors will have you believe that buying their lineage tool will do just fine. Rather, you need to invest in capabilities that capture data provenance at query time in a data store. This store will include the data flow path and values used in the calculations, and the results reported. That store will hold the data bitemporally so that the clock can be wound back to when those queries were run. Being able to interrogate, analyze, and publish that data from any point in time will meet the form and the substance of the regulations and, most importantly, give you a host of valuable insights if you hold that data.

Finally, good data lineage visualization is a consequence of a well-managed estate rather than a goal in and of itself. A well-realized API or Data Quality framework will provide data lineage metadata as a byproduct of their delivery. 

Just after the publication of BCBS 239, I was sitting in a meeting with the Risk function of a large bank. A consultant declared he was “passionate about lineage” and proudly walked us through examples of his work in spreadsheets and drawing tools. All static and manually compiled by an army of graduates. The consultant leaned back in his chair, looking inordinately self-satisfied while I saw six months of work and hundreds of thousands of pounds squandered – leaving the bank no closer to meeting its regulatory requirements.

A couple of years later, a vendor presented a slide; their design-time data lineage tool, which worked by scanning platforms’ metadata, and asserted *that it may* be able to help with regulatory compliance, especially BCBS 239, CCAR, and GDPR. When prompted, they could provide no further explanation. The product was rejected.

In 2021, the Fed asked a bank to provide all of the specific numbers used to make up a series of risk metrics on a report, where they came from when the report was generated. This request took that bank eight weeks and many manual queries to comply with.

What does this mean? 

Data lineage for its own sake, mapped at design time, has little or no value, and post 2021, has no explicit regulatory justification.

Lineage vendors still invoke BCBS 239, CCAR, IFRS-9, and GDPR as major reasons for buying their tools. As we’ve seen with the Fed’s recent request, buying a specialist data lineage tool that scans systems, code, and related metadata is a waste of time and money if your primary use case is regulatory compliance and all you’re doing is mapping visualizing data flows.

Data Lineage isn’t mentioned at all in the commonly cited regulations.  In Dr. Irina Steenbeek’s excellent discussion on Data Lineage, she notes:

My professional journey to data lineage has started with investigating the Basel Committee requirements [for BCBS 239]…. Many specialists consider data lineage the ultimate ‘remedy’ to meet these requirements.

The funny thing is that you never find the term ‘data lineage’ literally mentioned in these regulatory documents.

The argument advanced by the great* Australian jurist Dennis Denuto, “… it’s the vibe of the thing …”

It’s the vibe of the thing…

The vendors and experts try to assert, with varying degrees of success, that design-time Data Lineage is implied by both the language of the regulations and the current set of best practices in data management.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

One Big Question: Why is artificial intelligence still kind of dumb?

27 Mar, 2017

Once the domain of science-fiction authors and script writers, artificial intelligence is steadily marching into the real world. Recently we’ve …

Read more

The true value of data in the smart home’s future

22 Jul, 2017

The smart home market is a place of considerable excitement, dynamism and growth. The global smart home market was valued …

Read more

Why Modern Business Runs On Data Streaming

26 Sep, 2022

Data moves. Almost all sources of data have an element of dynamism and motion about them. Even data at rest …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.