The modern data stack — a short intro Blog

The modern data stack — a short intro

by 7wData
August 11, 2022

Data has become a valuable asset in every company, as it forms the basis of predictions, personalization of services, optimization and getting insights in general. While companies have been aggregating data for decades, the tech stack has greatly evolved and is now referred to as “the modern data stack”.

The modern data stack is an architecture and a strategy that allows companies to become data driven in every decision they make. The goal of the modern data stack is to set up the tools, processes and teams to build up end-to-end data pipelines.

A data pipeline is a process where data flows from a source to a destination, typically with many intermediate steps. Historically this process was called ETL, referring to the 3 main steps of the process:

E = Extract: getting the data out of the source T = Transform: transforming the raw source data into understandable data L = Load: loading the data into a data warehouse

The first step in a data pipeline is to extract data from the source. The challenge is that there are many different types of sources. These sources include databases, log files but also business applications. Modern tools such as Stitch and Fivetran make it easy to extract data from a wide range of sources, including SaaS business applications. For the latter, the APIs of those SaaS applications are used to read the data incrementally.

For large databases, the concept of CDC is often used. CDC (change data capture) is a method where are changes that occur in a source database (inserts of new data, updates of existing data and deletions of data) are tracked and sent to a destination database (or data warehouse) to recreate the data. CDC avoids the need to make a daily or hourly full dump of the data which would take too long to import.

Data transformations can be done in different ways, for example with visual tools where users drag and drop blocks to implement a pipeline that has multiple steps. Each step is one block in the visual workflow and applies one type of transformation.

Another popular and “modern” approach is to use SQL queries to define transformations, this makes sense because SQL is a powerful language, known by many users and it’s also “declarative” meaning that users can define in a concise manner what they want to accomplish. The SQL queries are executed by the data warehouse to do the actual transformations of the data. Typically this means that data moves from one table to the other, until the final result is available in a set of golden tables. The most popular tool to implement pipelines using SQL queries is called “dbt”.

Data scientists will often use programming languages to transform the data, for example Python or R.

The third step in a classic ETL pipeline, is to load the data into tables in a data warehouse. A well known pattern to load the data is the so called star schema, which defines the structure of the tables and how information is organized in these tables.

Funny enough, ETL does not cover the entire data pipeline. An ETL pipeline stops at the data warehouse as its final destination. The data warehouse is a central location to store data, but the end goal is typically either BI tools to create dashboards (Qlik, Tableau, Power BI, Looker) or a machine learning model that uses the data from the data warehouse to make for example predictions.

More recently companies have adopted an ELT approach, switching the L and the T. This means that the data is extracted and then loaded into a central repository, but the transformation takes place at a later date, if and when it’s necessary. The switch from ETL to ELT is a result of the explosion of data which is being generated. Since the transformation step is the most complex and most costly step, it makes sense to wait and only transform the data that is actually required at a certain point in time.

ELT is considered a more modern approach compared to ELT.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

The modern data stack — a short intro

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

How to Represent Data with Intelligent Use of the Coordinate System

Top 8 Data Science Use Cases in Construction

11 Must Read Big Data case studies in Telecom Industry

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

The modern data stack — a short intro

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change