Data quality for big data should include a focus on usability
- by 7wData
Data quality processes have become more prominent in organizations, often as part of data governance programs....
For many companies, the growing interest in quality is commensurate with an increased need to ensure that analytics data is trustworthy. That's especially true with data quality for big data; more data usually means more data problems.
One of the main challenges of effective data quality management is articulating what quality really means to a company. What are commonly referred to as the dimensions of data quality include accuracy, consistency, timeliness and conformity. But there are many different lists of dimensions, and even some common terms have different meanings from list to list. As a result, solely relying on a particular list without having an underlying foundation for what you're looking to accomplish is a simplistic approach to data quality.
This challenge becomes more acute with big data. In Hadoop clusters and other big data systems, data volumes are exploding, and data variety is increasing. An organization might accumulate data from numerous sources for analysis -- for example, transaction data from different internal systems, clickstream logs from e-commerce sites and streams of data from social networks.
Additionally, the design of big data platforms exacerbates the potential problems. A company might create data in on-premises servers, syndicate it to cloud databases and distribute filtered data sets to systems at remote sites. This new world creates issues that aren't covered in conventional lists of data quality dimensions. We need to re-examine what is meant by quality in the context of a big data analytics environment. To compensate, we need to re-examine what is meant by quality in the context of a big data analytics environment. Too often, we equate the concept of data quality with discrete notions such as data correctness or currency, putting in place processes to fix data values or objects that aren't accurate or up to date. But managing data quality for big data is also likely to include measures designed to help data scientists and other analysts figure out how to effectively use what we have. In other words, we must transition from simply generating a black-and-white specification of good versus bad data to supporting a spectrum of data usability.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read MoreCategories
You Might Be Interested In
Data Insights: What Would You Do If You Knew?
18 May, 2017When you try to make sense of the petabytes of data flowing across your desk you really don’t know what …
The Role Of Business Intelligence In Social Media Marketing
22 Jul, 2016Social media provides a wealth of information that marketers can use to understand their target audiences’ behaviours and preferences, but …
8 Best Practices to Maximize ROI from Predictive Analytics
1 Oct, 2016Back in 2010, Forbes.com forecasted that something new and interesting called predictive analytics was emerging as a “game changer.” Well, …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.