why data preparation needs to change

why data preparation needs to change

ETL is giving way to ELT. Why does it matter? Because data preparation remains one of the toughest obstacles that a data-aware organization must overcome. Whether it's a data lake, a cloud data warehouse, or AI-enabled data prep, there are critical factors to consider.

Cycles of innovation in data management and analytics appear to drive classic ETL (Extract-Transform-Load) functions to a reversal, ELT (Extract-Load-Transform). The implications of this reversal exceed merely the sequence of events.

The cornerstone of ETL was a schema or numerous schemas, but nothing was on the fly. The source data could be incomprehensible, but ETL always knew the target. ELT does not necessarily see the target.

Today, with many information architecture variations, "prep" (prep is shorthand for turning raw data into valuable materializations, metadata, abstraction, processes, pipelines, and models) continues to play a crucial role. Still, with ELT, it is elevated because the "T" in ELT is more lightweight than its predecessor. The numerous ways of handling information today, cloud architectures, data lakes, data lake houses, and cloud-native data warehouses require unique prep capabilities for data science, AI, analytics, edge intelligence, or data warehousing.

The terms "data prep" (or only "prep") and "wrangling" (or "data wrangling") refer to the preparation of data for analytics, reporting or any downstream applications.  Prep refers to directly manipulating after accessing it from a data source or sources. In the past, wrangling was referred separately to preparing data during the interactive data analysis and model building. For the sake of simplicity, we use "prep" for both. In the past, they were different disciplines, but they are nearly the same now.
In the same way, a profusion of data management architectures now exist that align themselves with an ELT approach, and prep plays an essential role in the process.

Drivers of change in data
"Après Moi le deluge." After me, the flood," attributed to French King Louis XV or Madame de Pompadour (history is a little fuzzy about the attribution), alludes to signs about the approaching Revolution. The virtually unlimited amount of data, processing capacity, and tools to leverage it are a modern deluge without a doubt. The challenge today is not the volume of data. It's making sense of it, at scale, continuously.
The vision of big data freed organizations to capture far more data sources at lower levels of detail and vastly greater volumes, which exposed a massive semantic dissonance problem. For example, data science always consumes "historical" data, and there is no guarantee that the semantics of older datasets are the same, even if their names are. Pushing data to a data lake and assuming it is ready for use is a mistake.

Harkening to the call to be "data-driven" (we prefer the term "data-aware"), organizations strove to ramp up skills in all manner of predictive modeling, machine learning, AI, or even deep learning. And, of course, the existing analytics could not be left behind, so any solution must satisfy those requirements as well. Integrating data from your own ERP and CRM systems may be a chore, but for today's data-aware applications, the fabric of data is multi-colored. So-called secondary data such as structured interviews, transcripts from focus groups, email, query logs, published texts, literature reviews, and observation records present a challenging problem to understand. Records written and kept by individuals (such as diaries and journals) and accessed by other people are secondary sources.
The primary issue is that enterprise data no longer exists solely in a data center or even a single cloud (or more than one, or combinations of both).

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

5 business analytics trends that shaped the start of 2022

25 Jul, 2022

Insights without actions are irrelevant, so one of the major business analytics trends during the first half of 2022 was …

Read more

How AI and Blockchain are Transforming the Supply Chain Management

13 Jul, 2022

Modern supply chains have reached an unprecedented and rather extraordinary level of complexity. The increasing digitization of the physical world, …

Read more

The Line Between Data Lakes and Data Warehouses Is Blurring. Will It Disappear?

27 Sep, 2020

How much does it cost to get medicine to market? It’s a fiercely debated figure, but one oft-cited estimate puts the …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.