Improving the Quality of Data on Hadoop –

Improving the Quality of Data on Hadoop -

As the value and volume of data explodes, so does the need for mature data management. Big data is now receiving the same treatment as relational data -- integration, transformation, process orchestration, and error recovery -- so the quality of big data is becoming critical.

Because of the promise and capacity of Hadoop, data quality was initially overlooked. However, not all Hadoop use cases are for analytics; some are driving critical business processes. Data quality is now a key consideration for process improvement and decision making based on data coming out of Hadoop.

With the size of our data stores in Hadoop, we must consider whether data quality practices can scale to the potential immensity of big data. Hadoop obviously shatters the limits of data storage, not only in terms of data volume and variety as well as in terms of structure. One way that data quality is maintained in a conventional data warehouse is by imposing strict limits on the volume, variety, and structure of data. This is in direct opposition to the advantages that Hadoop and NoSQL offer.

We must also consider the cost of poor data quality within a Hadoop cluster. From an analytics perspective, "bad data" may not be as troublesome as it once was, if we consider the statistical insignificance of incorrect, incomplete, or inaccurate records. The effect of a statistical outlier or anomaly is reduced by the massive amounts of data around it; the sheer volume effectively drowns it out.

In conventional data analysis and data warehousing practice, "bad data" was something to be detected, cleansed, reconciled, and purged.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

How Digital Transformation is Driving Customer Experience?

12 May, 2019

It has been years since computers had been introduced into the world. Fact, we have become familiar with modern technology …

Read more

Predictive Analytics: Intersection of Humans, Machines

12 Jul, 2017

Putting all your eggs in one cloud basket is risky, because clouds are not immune to denials of se Is …

Read more

BI and Data Science: Deliver Insights Through Embedded Analytics

9 Apr, 2021

Data Scientists are trailblazers. They look for value inside of data and seek to ask the right questions, disseminating insights …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.