How to reuse your data effectively
- by 7wData
More and more enterprises are embracing meaningful data strategies to drive innovation, with the ability to reuse and repurpose data serving as a core component to long-term success. Successful planning for reuse and flexibility is about preserving optionality and agility, and avoiding lock-in — specifically lock-in to one application platform or service, lock-in to closed data formats that are difficult to transform in and out of and lock-in of one system or environment to data gravity or inability to serve or move data. By planning for flexibility to move and share data across applications and environments, organizations are better prepared to react to new demands on their data, which may not have previously been planned for. As an example, data backup and recovery, in particular, are becoming common areas where organizations are finding opportunities to drive data reuse. Originally designed to meet business continuity requirements, backup and retention copies of data are now being fed into modern analytics tools to drive ongoing business outcomes. Analytics is another area driving greater data-sharing and reuse — with common data formats such as Parquet and Apache Iceberg allowing single data sets to be shared and analyzed by different applications. In order to plan for data reusability, it’s important for organizations to assess and invest in data platforms and technologies that truly enable the flexibility in modes of use, and the scalability of capacity and processing required to handle greater future demands.
Given data is the lifeblood of nearly every industry and the foundation of modern machine learning, an increasing number of companies are setting their sights on creating a “golden data set” — data that is perfectly reusable, clean, integrated and compliant — as a mission-critical task. To that end, much-needed investments in data governance, data lakes, clear lineage tracking and data observability tools to automatically surface issues are accelerating. There’s just one problem: Seeking perfection, companies often end up delivering perfectly reusable data that no one actually uses. Ironically, the best way to ensure data reusability is to spend less time on planning and processes and more on flexibly arming internal customers of data with what they need to make models work and establish feedback loops with real-world systems for ongoing active learning. In that sense, investing in machine learning is a great shortcut for ensuring data reusability because it forces you to learn from past data and continuously improve by default. Reusability aside, many organizations simply need to better understand their data. Teams deploying deep-learning models (like ones that scan images of store shelves to automate inventory orders), for example, often lack visibility into how the model performs in the real world until either a human labeler checks a small subset of individual predictions (was milk really out of stock?) or something goes wrong (customers complain). Even tech companies with sophisticated teams struggle to have AI consistently flag things like hate speech; better insights and monitoring, not just reusability, are needed.
Our survey of 850 C-suite executives across 20 industries found 57% regard AI as a critical enabler of their strategic priorities. CEOs and CIOs are pushing for data reusability because it can accelerate speed and scale. But there’s a gap between wanting to reuse data and possessing the mechanical ability to achieve that goal. Closing that gap effectively calls for the following three steps. Connect data to business value: Making data reusable is costly and time consuming.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More