Why Data Quality Should Be the Red Thread of your Data Strategy [Interview]
- by 7wData
I believe this year's Gartner Magic Quadrant for Data Quality Tools report represents further proof of the fundamental shift in what enterprises need to support their Data Quality initiatives. With the continued growth in volume, variety, and velocity of data collected and managed by data lakes not showing any sign of stopping, it would make sense that the requirements for data quality tasks such as defining relevancy, recency, and range increase at a similar pace.
Open source is one thing that is likely more causal than correlative in the proverbial "changing of the guard" taking place in the market in terms of companies' approach to data integration in the era of big data. Over the last decade, there has been increasing acceptance of open-source technologies as formidable enterprise solutions, enabling frameworks like Apache Spark to replace their proprietary and now antiquated counterparts. Resulting from this change, we see the emergence of new customer requirements demanding interoperability with their framework of choice, which gives them the flexibility to adapt to ever-evolving market needs. This makes one wonder: Are vendor solutions that restrict or exclude interoperability with Spark mean they are out of touch with both the business and customer demands of not only today but in the future? Perhaps proprietary vendors still believe they know best.
We believe this year's Gartner Magic Quadrant for Data Quality Tools confirms that market dynamics are changing in a direction Talend forecast some time ago. The market is shifting to cloud and big data and customers need flexible platforms that can keep pace with rapidly evolving technologies that help them manage those new frontiers. As I see it, the only way to do this is to be open source-based. Talend has always been open source-based, but what many may not know is that data quality has also always been part of our Data Integration DNA, which is why it is at the core of our Talend Data Fabric platform. As the saying goes, "garbage in, garbage out." How can companies make accurate decisions based on poor quality data? We believe Talend's move from a Visionary to a Leader in this year's Gartner Magic Quadrant for Data Quality Tools is due to our completeness of vision and ability to execute, further validating that Talend is moving in the right direction — addressing otherwise unmet customer needs to be more data-driven.
Now, I imagine the publication of this MQ will prompt blogs, announcements, and articles opining the merits of various Data Quality products or approaches. For my side, I'd like to highlight an interview I had recently with one of our community members, Michael Covert, CEO of Analytics Inside. In our discussion, he spoke about his company's use of Talend to solve a Data Quality and Governance initiative for a Healthcare customer.
Nick: When starting a data governance initiative, what's one of the first things organizations should do?
MC: One of the first things we advise our customers to do is undergo a data review and cleansing task. It is important to gain a quick understanding of just what you are dealing with... get a sense of how "dirty" the data is, whether date formats are invalid, data requires preprocessing to remove punctuation, to capitalize, etc. In this particular project, the customer had a variety of data sources, both structure and unstructured, from which they needed to extract legal entity information. This consisted of company names, addresses, phone numbers, employer identification numbers (EINs), and other pieces of information that could be placed into a corporate wide master file.
Nick: That's not an easy task, given the variety of both file type and format.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More