Data Pipelines of Tomorrow

by 7wData
December 15, 2018

By the time humans got around to creating systems that imported user data at regular fixed intervals (e.g., banks with nightly upload over ETL), they also began to see the potential for input data to provide an effective feedback loop on the system itself. Data was, by then after all, not just a message, but a key part of how the data pipeline — or the organization using it — would construct and harmonize itself.

In business systems, analytics data was also used to improve the process or product in question. Banking data, for instance, was fed back to the consumer as account balance statements, while also being used to optimize the business process. The data was used to automatically calculate incentive interest rates and fees for account holders, for example, and determine for product owners which demographic preferred which financial products.

Nowadays, data (and data pipelines) are pretty ubiquitous: data no longer flows merely from nightly batch ingestion to central data stores and out to user dashboards, but typically in both directions. Consumer devices may even have their own data pipelines built in, which provide input and feedback to the larger system. This polydirectionality of data flowing through such systems is just one of many factors causing the amount of data in the greater datasphere to grow exponentially.

Indeed, as IDC points out, the 16.1 zettabytes of user data generated around the world in 2016 is expected to grow tenfold to 163 zettabytes by 2025. Far from the days of nightly import cycles at the bank, users in this world will be interacting with a data-driven endpoint on average once every 18 seconds.

To get a better sense of this future, we'll look at data — and data pipelines — from a few different perspectives: which direction the data of the future will flow, what data engineers can expect with distributed ledgers and blockchain technologies, and how regulatory compliance will work in a future with the immutable, ordered event log. We'll also consider pipeline requirements like those of scalability, performance, and design(ability) for our future pipelines.

Today, data often runs in near real time, polydirectionally, is fairly ubiquitous to users, and can even help save lives. Consider the core-to-endpoint (also known as core-to-edge, or C2E) data pipeline.

In the past, a data pipeline was something where data went in one end (often as a batch import) and came out the other end, in the form of analytics or a dashboard that helped (an often fairly limited group of) users understand the data.

In a C2E model, data may run polydirectionally, that is, from many points of ingestion back to central or edge data stores for processing, aggregation or analytics, and then back out to endpoint devices or dashboards for more processing. The data can also serve as instructions or training data for subsequent systems that run on AI (more on this later). There's no one-size-fits-all for data pipelines, anymore.

Where can we see examples of C2E pipelines?

Order is critical for transactions on the blockchain. You could expect that, if events were written in an arbitrary order, it would be impossible to reconstruct the state of your data at any point in time, or who did what to whom and when in a given transaction.

However, whenever data is partitioned and distributed across a network, one must consider the CAP Theorem, that is, the idea that a user of such a network may, at scale, need to tweak the tradeoff between data consistency and availability. For this reason, we expect to see more users implementing their distributed ledgers as tunably-consistent distributed databases.

Currently, a pub-sub architecture is what moves data from one datastore or location to another.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Data Pipelines of Tomorrow

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Creating a Data Advantage: CIOs Discuss Best Practices

Winning with Data (and How Data Governance Can Help)

The benefits of graph databases in relation to supply chain transparency

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Data Pipelines of Tomorrow

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change