The Evolution of Data Warehousing to Modern Data Engineering

The Evolution of Data Warehousing to Modern Data Engineering

The data industry has seen a great deal of evolution since the early days of traditional data warehousing. We now rely on the data engineer, as opposed to the ETL developer. DevOps has made its way into the data strategy and is a clear differentiator between data warehousing and modern data engineering.

Platforms like Spark and Python have become crucial tools for the data engineer. Algorithms are starting to play a larger part in Business Intelligence and decision making. Soon enough we will be able to extract analytics without even knowing where that data is located.

Joe: Good question and because I’m a consultant I can say it depends. I can talk about the trends of what we’re doing with our clients, which is a pretty good sample set of what’s going on in the industry and in the world.

So modern data engineering for us typically means being on the cloud. Nearly 100% of our work in 2017 has been either migrating to the cloud or building something from scratch in the cloud. If there is some legacy on-premises tech there, sometimes we extend it, but typically any new initiative where analytics is a central feature is going to be a cloud solution. Whether it’s AWS or Google or Azure or something else, is a mixed bag. It really depends on some of the features, functions, religions that is involved.

So, with that, once we have a cloud infrastructure, then typically what’s included is some kind of object data store. So if it’s AWS or S3, if it’s Google Cloud, it would be Google Storage, GCS. Azure would be what they call Microsoft Blobs. Which flavor of the cloud doesn’t really matter that much. So now we have a cloud storage, object store, and then we need some kind of queryable BI-friendly environment. So typically that is still some kind of relational MPP-type database still on the cloud. So in Google it would be BigQuery. On AWS it would be Redshift or Snowflake.

Then, the final piece is the data transformations and the orchestration. Typically what we’ve been doing is using Spark for all of that, so all of the ETL that is done, all of the movement of data from the object data store to the relational database, that would all be done using Python code or SQL code, always in Spark, and then the analytics on top of it still would be Spark, and then in the relational database some kind of newer lightweight BI tool and that’s really the infrastructure.

This all fits very neatly into what we call “the corporate data pyramid”, which we’ll get into. So that’s just the infrastructure side and then we have to build it, maintain it, and productionalize it.

In 2011, probably 20% of our business was doing this and it has increased by 10% to 20% each year, and now in 2018 it’s all of our business. But what has changed in the past year is now we’re integrating the concept of DevOps with the analytics platform. And this is one of the big differences between traditional data warehousing and modern data engineering and analytics platforms–you used to have some kind of business application and it would generate all of the data and then it would waterfall all of the data into a data warehouse and then it will do reporting for human consumption. The really big thing that’s changed in the last year, last couple of years, is now the analytics platform is pretty tightly integrated and tightly coupled with the business applications. So the business applications now depend on the analytics in order to function and in order to create the user experience based on recommendation engines or scoring, things like propensity to buy or propensity to do whatever we’re trying to measure.

So, now, we need some kind of different SLAs. And so, what’s happening is the development and the deployment of change has to be very tightly coupled with the business applications.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

How is Big Data Changing the World?

3 Jun, 2016

When we talk about Big Data, many of the examples and use cases we share center around how Big Data …

Read more

4 ways AI is unlocking the mysteries of the universe

8 Jul, 2021

Astronomy is all about data. The universe is getting bigger and so too is the amount of information we have …

Read more

How AI Will Help Navigate Overstock Issues of Tomorrow

9 Jul, 2022

The global supply chain is no easy thing to manage. Complex lines link manufactured parts and pieces from nearly every …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.