If Big Data Is the New Crude, Data Virtualization Is the New Refinery Blog

If Big Data Is the New Crude, Data Virtualization Is the New Refinery

by 7wData
July 16, 2017

Big data is like an abundant, expanding natural resource emerging from the modern data landscape. IoT (sensor), mobile, social, clickstream, web and open data are important contributors to the proliferation of data we’re witnessing today. Worldwide data is expected to increase tenfold by 2025—reaching a total of 163 ZB—according to a recent IDC-Seagate study.

Data is plentiful, but not necessarily useful in its raw, unrefined form. As with any natural resource, “crude” data must be refined before it can be harnessed for productive purposes, such as equipment maintenance, product innovation, competitive intelligence, marketing, data monetization and active health care. The refinement process can incorporate data exploration, preparation, correlation and contextualization, labeling and annotating, unification and integration, and application of security and governance policies. metadata is also an important component, as it serves a role in both the input and output stages of the overall data-refinement process.

The extent to which data analysis contributes to unbiased conclusions, accurate predictions and insightful decision-making is constrained by the veracity of that data. If it hasn’t been provisioned for analysis, the data may suffer from fragmentation, minimal labeling and missing information. Such characteristics can be evident in electronic health records (EHRs), which illustrate the challenges of data refinement. One hurdle to gathering and analyzing EHR data is the scarcity of proper labeling and consistent semantics.

EHRs are designed primarily to fulfill patient-care, administrative and financial needs. The multipurpose objectives of EHRs—which don’t take into account data analysis per se—can create data fragmentation, which requires rectification before the data can be provisioned for analyses such as clinical research. Another challenge to building data sets from shared patient health records is the lack of standardization in how EHRs are implemented among health-care organizations, and even within the same health-care system. For example, distinct departments (e.g., radiology, orthopedics and internal medicine) in the same hospital may employ EHRs differently to satisfy their unique data-entry requirements, documentation and ordering needs, and preferences, thereby creating data silos.

Data security and privacy can also be impediments to analyzing regulated data, such as that in EHRs. The best approach to surmounting this obstacle is applying proper security and governance during the refinement process. Companies such as Google are experimenting with federated learning in their effort to advance analytics while ensuring privacy.

Data refinement is crucial to achieving reliable outcomes from data analysis, including meaningful conclusions, accurate predictions and informed decisions. Ideally, the process of refining raw data to produce complete and meaningful information does the following:

Modern analytics relies on data from myriad fragmented data sources. Experience tells us that big data sources aren’t always amenable to replicating and relocating when the data is distributed across multiple systems. Data virtualization delivers the scale to work effectively with big data sources by offering an alternative paradigm: move processing to the data.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

If Big Data Is the New Crude, Data Virtualization Is the New Refinery

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

IoT security: It’s all about identity

Five cloud pitfalls that create data management problems

The SMART Way to Use Big Data for Retail Businesses

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

If Big Data Is the New Crude, Data Virtualization Is the New Refinery

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change