Evolution of the Modern Data Warehouse

Evolution of the Modern Data Warehouse

There are a lot of definitions of the data warehouse. I grabbed a random definition off the web. It fits the general understanding in the data management industry of what a data warehouse is, and what it isn’t.

If you’re looking at that definition and thinking, “That looks right to me,” then read on. Once upon a time, I probably would have agreed with this definition as well. But times have changed.

The processes and technologies of data warehousing have changed a lot in the last ten years, but as industry professionals, we often still think about data warehousing the same way. In our minds, we’re still using the same definition of a data warehouse that we used a decade ago.

So, in what ways is that definition wrong?

In the last decade, the data management field has radically transformed. Most people don’t believe that change touched the data warehouse in any way. That is a misconception.

Doug Laney’s classic three V’s hit the data management industry with a tsunami of data, and with it, a sea change in the types of analytics we can do with that data. Old school analytical databases had too much data to handle affordably without valuable data falling through the cracks. We had business demand for analysis of types of data we’d never even tried to deal with in the past, semi-structured data like JSON and Avro, log files from sensors and components, geospatial data, click stream data, and on and on. We had data coming at us too fast for the old technology to take it in, scrub it, combine it with other data sets, and provide it to the business in a useful way.

But that was only part of the problem.

Having all this data meant we could do things we’d never done before. Machine learning, data science and artificial intelligence were all fields of study back in the 90’s when I was at college, but we didn’t have the full capacity to put them to useful work back then.

The problem was also the promise.

Having tons of data in all these varieties of formats enabled new and exciting, and potentially industry-disrupting analyses. Early experiments showed machine learning could provide impressive improvements in existing systems, whole new systems for making organizations more successful, and even whole new industries and business models.

The challenge was figuring out a way to store and process all that data to get it into a good form for all those cool new types of advanced data analytics.

Hadoop to the Rescue … or Not.

Along came a cute yellow elephant with a life boat promising to store all that data affordably, and process it for us, plus give us a great platform to do fancy big data analytics like machine learning. It seemed like exactly what we needed. This was the new hotness.

Throw away your old and busted data warehouse that you’ve been running essential Business Intelligence on for decades.

What could go wrong?

Obviously, a lot of things. Putting all their important data on a giant group of reasonably priced servers along with a bunch of other data didn’t work out so well for a lot of companies. That was never the intention of the people who invented the data lake concept, but that was largely what it was used for; a dumping ground for data. It was a great place to archive years of data that wouldn’t fit in transactional systems, and didn’t seem valuable enough to put in the data warehouse itself. Great. It was all stored. Then what?

Let’s not even talk about Business Intelligence. Hadoop vendors claimed data warehouse-like capabilities, but a query engine on top of a big pile of data does not a data warehouse make. It lacked concurrency, so only a few people could use it. It lacked security, it lacked governance. Above all, it lacked the reliability and performance speed that was the hallmark of the data warehouse.

Data scientists were supposed to be the ones who could do amazing things on Hadoop.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

SaaS Data Migration & Data Integration

28 Jul, 2016

With the continued growth in Cloud computing, more and more organizations are moving their data to Software as a Service …

Read more

What organizations must do in the six months leading to GDPR

19 Nov, 2017

With the General Data Protection Regulation set to take effect in just over six months, businesses in the EU or …

Read more

5 Reasons Why Data Management Is Essential for User Experience

13 Jun, 2018

Delivering an excellent user experience is essential to attracting and retaining customers. And although data management may not be the …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.