why data preparation needs to change Blog

why data preparation needs to change

by 7wData
June 27, 2021

ETL is giving way to ELT. Why does it matter? Because data preparation remains one of the toughest obstacles that a data-aware organization must overcome. Whether it's a data lake, a cloud data warehouse, or AI-enabled data prep, there are critical factors to consider.

Cycles of innovation in data management and analytics appear to drive classic ETL (Extract-Transform-Load) functions to a reversal, ELT (Extract-Load-Transform). The implications of this reversal exceed merely the sequence of events.

The cornerstone of ETL was a schema or numerous schemas, but nothing was on the fly. The source data could be incomprehensible, but ETL always knew the target. ELT does not necessarily see the target.

Today, with many information architecture variations, "prep" (prep is shorthand for turning raw data into valuable materializations, metadata, abstraction, processes, pipelines, and models) continues to play a crucial role. Still, with ELT, it is elevated because the "T" in ELT is more lightweight than its predecessor. The numerous ways of handling information today, cloud architectures, data lakes, data lake houses, and cloud-native data warehouses require unique prep capabilities for data science, AI, analytics, edge intelligence, or data warehousing.

The terms "data prep" (or only "prep") and "wrangling" (or "data wrangling") refer to the preparation of data for analytics, reporting or any downstream applications. Prep refers to directly manipulating after accessing it from a data source or sources. In the past, wrangling was referred separately to preparing data during the interactive data analysis and model building. For the sake of simplicity, we use "prep" for both. In the past, they were different disciplines, but they are nearly the same now.
In the same way, a profusion of data management architectures now exist that align themselves with an ELT approach, and prep plays an essential role in the process.

Drivers of change in data
"Après Moi le deluge." After me, the flood," attributed to French King Louis XV or Madame de Pompadour (history is a little fuzzy about the attribution), alludes to signs about the approaching Revolution. The virtually unlimited amount of data, processing capacity, and tools to leverage it are a modern deluge without a doubt. The challenge today is not the volume of data. It's making sense of it, at scale, continuously.
The vision of big data freed organizations to capture far more data sources at lower levels of detail and vastly greater volumes, which exposed a massive semantic dissonance problem. For example, data science always consumes "historical" data, and there is no guarantee that the semantics of older datasets are the same, even if their names are. Pushing data to a data lake and assuming it is ready for use is a mistake.

Harkening to the call to be "data-driven" (we prefer the term "data-aware"), organizations strove to ramp up skills in all manner of predictive modeling, machine learning, AI, or even deep learning. And, of course, the existing analytics could not be left behind, so any solution must satisfy those requirements as well. Integrating data from your own ERP and CRM systems may be a chore, but for today's data-aware applications, the fabric of data is multi-colored. So-called secondary data such as structured interviews, transcripts from focus groups, email, query logs, published texts, literature reviews, and observation records present a challenging problem to understand. Records written and kept by individuals (such as diaries and journals) and accessed by other people are secondary sources.
The primary issue is that enterprise data no longer exists solely in a data center or even a single cloud (or more than one, or combinations of both).

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

why data preparation needs to change

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

5 business analytics trends that shaped the start of 2022

How AI and Blockchain are Transforming the Supply Chain Management

The Line Between Data Lakes and Data Warehouses Is Blurring. Will It Disappear?

Recent Jobs

IT Engineer

Data Engineer

Applications Developer

D365 Business Analyst

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

why data preparation needs to change

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change