What is big data? Real-time analytics of disparate data at web scale Blog

What is big data? Real-time analytics of disparate data at web scale

by 7wData
September 9, 2017

Every day human beings eat, sleep, work, play, and produce data—lots and lots of data. According to IBM, the human race generates 2.5 quintillion (25 billion billion) bytes of data every day. That’s the equivalent of a stack of DVDs reaching to the moon and back, and encompasses everything from the texts we send and photos we upload to industrial sensor metrics and machine-to-machine communications.

That’s a big reason why “big data” has become such a common catch phrase. Simply put, when people talk about big data, they mean the ability to take large portions of this data, analyze it, and turn it into something useful.

But big data is much more than that. It’s about:

In the early days, the industry came up with an acronym to describe three of these four facets: VVV, for volume (the vast quantities), variety (the different kinds of data and the fact that data changes over time), and velocity (speed).

What the VVV acronym missed was the key notion that data did not need to be permanently changed (transformed) to be analyzed. That nondestructive analysis meant that organizations could both analyze the same pools of data for different purposes and could analyze data from sources gathered for different purposes.

By contrast, the data warehouse was purpose-built to analyze specific data for specific purposes, and the data was structured and converted to specific formats, with the original data essentially destroyed in the process, for that specific purpose—and no other—in what was called extract, transform, and load (ETL). Data warehousing’s ETL approach limited analysis to specific data for specific analyses. That was fine when all your data existed in your transaction systems, but not so much in today’s internet-connected world with data from everywhere.

However, don’t think for a moment that big data makes the data warehouse obsolete. Big Data systems let you work with unstructured data largely as it comes, but the type of query results you get is nowhere near the sophistication of the data warehouse. After all, the data warehouse is designed to get deep into data, and it can do that precisely because it has transformed all the data into a consistent format that lets you do things like build cubes for deep drilldown? Data warehousing vendors have spent many years optimizing their query engines to answer the queries typical of a business environment.

Big data lets you anayze much more data from more sources, but at less resolution. Thus, we will be living with both traditional data warehouses and the new style for some time to come.

To accomplish the four required facets of big data—volume, variety, nondestructive use, and speed—required several technology breakthroughs, including the development of a distributed file system (Hadoop), a method to make sense of disparate data on the fly (first Google’s MapReduce, and more recently Apache Spark), and a cloud/internet infrastructure for accessing and moving the data as needed.

Until about a dozen years ago, it wasn’t possible to manipulate more than a relatively small amount of data at any one time. (Well, we all thought our data warehouses were massive at the time. The context has shifted dramatically since then as the internet produced and connected data everywhere.) Limitations on the amount and location of data storage, computing power, and the ability to handle disparate data formats from multiple sources made the task all but impossible.

Then, sometime around 2003, researchers at Google developed MapReduce.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

What is big data? Real-time analytics of disparate data at web scale

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

AI can’t replace doctors. But it can make them better.

Artificial Intelligence Continues to Evolve in Government and Elsewhere

Machine-Learning Solar Tracking Technology Nudges PV Field Production Nearer Optimum Levels

Recent Jobs

IT Engineer

Data Engineer

Applications Developer

D365 Business Analyst

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

What is big data? Real-time analytics of disparate data at web scale

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change