Self-Service Data Presentation: Data Quality, Lineage and Cataloging

Self-Service Data Presentation: Data Quality

When an organization has mastered the use of automated data ingest and the appropriate application of metadata, there are a number of additional concerns to be addressed with using data at scale. These include data quality, data lineage, and a searchable data catalog. All of these are factors in presenting an effective and useful data catalog. The data catalog is the foundation of the self-service capability for a Business-facing data presentation and transformation layer.

It is best to apply data quality rules in an automated fashion. Simple rules can be defined, like operators in a programming language, corresponding to atomic operations such as greater than or less than. Those simple operators can be hierarchically organized into a collection of operations to establish basic rules such as social security number validation. Those rules can also be logically organized into collections to create “rule sets”. For example, a rule set might contain all the data quality operations that are needed for a specific data feed. Business transactions that require application validation and pre-processing such as loan processing or credit line increase request processing. All the fields in that set of data can be analyzed automatically, creating clean records for upstream analytics automatically. This type of data quality pre-processing is exactly what we as consumers of data should be demanding of our data ingest process.

Knowing where data has come from and how it was transformed not only by data quality rules but by all of the transformations required by business rules specific to the use case at hand is invaluable. Being able to retrace the steps in data ingest and processing is critical to many data users in terms of regulatory requirements. Data in both banking and pharmaceutical research, for example, must be trackable based upon regulations in the respective industries. At a lower level, a simple data engineering level, being able to track data lineage is also a very valuable debugging tool. This includes both business process debugging (i.e., is our business process correct) but also from a simple data science perspective (i.e., are advanced algorithms functioning correctly). With Bedrock, all of these concerns can be answered by the proper application of maintaining proper data lineage for datasets over time.

A functioning data catalog seems like such a simple concept in this day and age. The fact is that many organizations still face the challenge of simply finding data. Having data is not the problem. Everyone has data now. What many are missing is an effective method for finding datasets in what can be a sea of ingested data. Part of the issue is simple organization.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Why quality data is critical for small business

14 Nov, 2016

In the digital marketing world, the term “big data” thrown is thrown around a lot. By definition, big data is …

Read more

7 Keys To Building A Successful Big Data Infrastructure

20 Oct, 2016

Since the release of Office 365 five years ago, many of the “easy” Office 365 migrations have been The infrastructure …

Read more

Artificial Intelligence Can Be Our Best Travel Guide

2 Jul, 2017

Travel has become a mainstay in American society. Whether traveling abroad or exploring the best places to live in California, …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.