Self-Service Data Presentation: Data Quality, Lineage and Cataloging Blog

Self-Service Data Presentation: Data Quality, Lineage and Cataloging

by 7wData
November 7, 2016

When an organization has mastered the use of automated data ingest and the appropriate application of metadata, there are a number of additional concerns to be addressed with using data at scale. These include data quality, data lineage, and a searchable data catalog. All of these are factors in presenting an effective and useful data catalog. The data catalog is the foundation of the self-service capability for a Business-facing data presentation and transformation layer.

It is best to apply data quality rules in an automated fashion. Simple rules can be defined, like operators in a programming language, corresponding to atomic operations such as greater than or less than. Those simple operators can be hierarchically organized into a collection of operations to establish basic rules such as social security number validation. Those rules can also be logically organized into collections to create “rule sets”. For example, a rule set might contain all the data quality operations that are needed for a specific data feed. Business transactions that require application validation and pre-processing such as loan processing or credit line increase request processing. All the fields in that set of data can be analyzed automatically, creating clean records for upstream analytics automatically. This type of data quality pre-processing is exactly what we as consumers of data should be demanding of our data ingest process.

Knowing where data has come from and how it was transformed not only by data quality rules but by all of the transformations required by business rules specific to the use case at hand is invaluable. Being able to retrace the steps in data ingest and processing is critical to many data users in terms of regulatory requirements. Data in both banking and pharmaceutical research, for example, must be trackable based upon regulations in the respective industries. At a lower level, a simple data engineering level, being able to track data lineage is also a very valuable debugging tool. This includes both business process debugging (i.e., is our business process correct) but also from a simple data science perspective (i.e., are advanced algorithms functioning correctly). With Bedrock, all of these concerns can be answered by the proper application of maintaining proper data lineage for datasets over time.

A functioning data catalog seems like such a simple concept in this day and age. The fact is that many organizations still face the challenge of simply finding data. Having data is not the problem. Everyone has data now. What many are missing is an effective method for finding datasets in what can be a sea of ingested data. Part of the issue is simple organization.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Self-Service Data Presentation: Data Quality, Lineage and Cataloging

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Why quality data is critical for small business

7 Keys To Building A Successful Big Data Infrastructure

Artificial Intelligence Can Be Our Best Travel Guide

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Self-Service Data Presentation: Data Quality, Lineage and Cataloging

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change