If Big Data Is the New Crude, Data Virtualization Is the New Refinery

If Big Data Is the New Crude

Big data is like an abundant, expanding natural resource emerging from the modern data landscape. IoT (sensor), mobile, social, clickstream, web and open data are important contributors to the proliferation of data we’re witnessing today. Worldwide data is expected to increase tenfold by 2025—reaching a total of 163 ZB—according to a recent IDC-Seagate study.

Data is plentiful, but not necessarily useful in its raw, unrefined form. As with any natural resource, “crude” data must be refined before it can be harnessed for productive purposes, such as equipment maintenance, product innovation, competitive intelligence, marketing, data monetization and active health care. The refinement process can incorporate data exploration, preparation, correlation and contextualization, labeling and annotating, unification and integration, and application of security and governance policies. metadata is also an important component, as it serves a role in both the input and output stages of the overall data-refinement process.

The extent to which data analysis contributes to unbiased conclusions, accurate predictions and insightful decision-making is constrained by the veracity of that data. If it hasn’t been provisioned for analysis, the data may suffer from fragmentation, minimal labeling and missing information. Such characteristics can be evident in electronic health records (EHRs), which illustrate the challenges of data refinement. One hurdle to gathering and analyzing EHR data is the scarcity of proper labeling and consistent semantics.

EHRs are designed primarily to fulfill patient-care, administrative and financial needs. The multipurpose objectives of EHRs—which don’t take into account data analysis per se—can create data fragmentation, which requires rectification before the data can be provisioned for analyses such as clinical research. Another challenge to building data sets from shared patient health records is the lack of standardization in how EHRs are implemented among health-care organizations, and even within the same health-care system. For example, distinct departments (e.g., radiology, orthopedics and internal medicine) in the same hospital may employ EHRs differently to satisfy their unique data-entry requirements, documentation and ordering needs, and preferences, thereby creating data silos.

Data security and privacy can also be impediments to analyzing regulated data, such as that in EHRs. The best approach to surmounting this obstacle is applying proper security and governance during the refinement process. Companies such as Google are experimenting with federated learning in their effort to advance analytics while ensuring privacy.

Data refinement is crucial to achieving reliable outcomes from data analysis, including meaningful conclusions, accurate predictions and informed decisions. Ideally, the process of refining raw data to produce complete and meaningful information does the following:

Modern analytics relies on data from myriad fragmented data sources. Experience tells us that big data sources aren’t always amenable to replicating and relocating when the data is distributed across multiple systems. Data virtualization delivers the scale to work effectively with big data sources by offering an alternative paradigm: move processing to the data.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

IoT security: It’s all about identity

25 Jun, 2019

As industry experts project a continued explosion in the number of IoT devices connected globally, security remains a hot topic …

Read more

Five cloud pitfalls that create data management problems

14 Apr, 2019

Migrating to the cloud is now a key IT strategy, often used to enable digital transformation, for almost every organisation. …

Read more

The SMART Way to Use Big Data for Retail Businesses

22 Dec, 2017

The democratization of information and the quick access to the Internet have transformed most markets by making them transparent and …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.