What is big data? Real-time analytics of disparate data at web scale

What is big data? Real-time analytics of disparate data at web scale

Every day human beings eat, sleep, work, play, and produce data—lots and lots of data. According to IBM, the human race generates 2.5 quintillion (25 billion billion) bytes of data every day. That’s the equivalent of a stack of DVDs reaching to the moon and back, and encompasses everything from the texts we send and photos we upload to industrial sensor metrics and machine-to-machine communications.

That’s a big reason why “big data” has become such a common catch phrase. Simply put, when people talk about big data, they mean the ability to take large portions of this data, analyze it, and turn it into something useful.

But big data is much more than that. It’s about:

In the early days, the industry came up with an acronym to describe three of these four facets: VVV, for volume (the vast quantities), variety (the different kinds of data and the fact that data changes over time), and velocity (speed).

What the VVV acronym missed was the key notion that data did not need to be permanently changed (transformed) to be analyzed. That nondestructive analysis meant that organizations could both analyze the same pools of data for different purposes and could analyze data from sources gathered for different purposes.

By contrast, the data warehouse was purpose-built to analyze specific data for specific purposes, and the data was structured and converted to specific formats, with the original data essentially destroyed in the process, for that specific purpose—and no other—in what was called extract, transform, and load (ETL). Data warehousing’s ETL approach limited analysis to specific data for specific analyses. That was fine when all your data existed in your transaction systems, but not so much in today’s internet-connected world with data from everywhere.

However, don’t think for a moment that big data makes the data warehouse obsolete.  Big Data systems let you work with unstructured data largely as it comes, but the type of query results you get is nowhere near the sophistication of the data warehouse. After all, the data warehouse is designed to get deep into data, and it can do that precisely because it has transformed all the data into a consistent format that lets you do things like build cubes for deep drilldown? Data warehousing vendors have spent many years optimizing their query engines to answer the queries typical of a business environment.

Big data lets you anayze much more data from more sources, but at less resolution. Thus, we will be living with both traditional data warehouses and the new style for some time to come.  

To accomplish the four required facets of big data—volume, variety, nondestructive use, and speed—required several technology breakthroughs, including the development of a distributed file system (Hadoop), a method to make sense of disparate data on the fly (first Google’s MapReduce, and more recently Apache Spark), and a cloud/internet infrastructure for accessing and moving the data as needed.

Until about a dozen years ago, it wasn’t possible to manipulate more than a relatively small amount of data at any one time. (Well, we all thought our data warehouses were massive at the time. The context has shifted dramatically since then as the internet produced and connected data everywhere.) Limitations on the amount and location of data storage, computing power, and the ability to handle disparate data formats from multiple sources made the task all but impossible.

Then, sometime around 2003, researchers at Google developed MapReduce.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

AI can’t replace doctors. But it can make them better.

7 May, 2019

Several years ago Vinod Khosla, the Silicon Valley investor, wrote a provocative article titled “Do We Need Doctors or Algorithms?” …

Read more

Artificial Intelligence Continues to Evolve in Government and Elsewhere

13 Sep, 2022

The concept of using artificial intelligence to help mitigate dull, repetitive or manpower-intensive jobs within government is nothing new. For …

Read more

Machine-Learning Solar Tracking Technology Nudges PV Field Production Nearer Optimum Levels

13 Jul, 2017

Solar energy products and services developers and vendors continue to leverage the latest in distributed information and communications technology (ICT) …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.