Data Pipelines of Tomorrow

Data Pipelines of Tomorrow

By the time humans got around to creating systems that imported user data at regular fixed intervals (e.g., banks with nightly upload over ETL), they also began to see the potential for input data to provide an effective feedback loop on the system itself. Data was, by then after all, not just a message, but a key part of how the data pipeline — or the organization using it — would construct and harmonize itself.

In business systems, analytics data was also used to improve the process or product in question. Banking data, for instance, was fed back to the consumer as account balance statements, while also being used to optimize the business process. The data was used to automatically calculate incentive interest rates and fees for account holders, for example, and determine for product owners which demographic preferred which financial products.

Nowadays, data (and data pipelines) are pretty ubiquitous: data no longer flows merely from nightly batch ingestion to central data stores and out to user dashboards, but typically in both directions. Consumer devices may even have their own data pipelines built in, which provide input and feedback to the larger system. This polydirectionality of data flowing through such systems is just one of many factors causing the amount of data in the greater datasphere to grow exponentially.

Indeed, as IDC points out, the 16.1 zettabytes of user data generated around the world in 2016 is expected to grow tenfold to 163 zettabytes by 2025. Far from the days of nightly import cycles at the bank, users in this world will be interacting with a data-driven endpoint on average once every 18 seconds.

To get a better sense of this future, we'll look at data — and data pipelines — from a few different perspectives: which direction the data of the future will flow, what data engineers can expect with distributed ledgers and blockchain technologies, and how regulatory compliance will work in a future with the immutable, ordered event log. We'll also consider pipeline requirements like those of scalability, performance, and design(ability) for our future pipelines.

Today, data often runs in near real time, polydirectionally, is fairly ubiquitous to users, and can even help save lives. Consider the core-to-endpoint (also known as core-to-edge, or C2E) data pipeline.

In the past, a data pipeline was something where data went in one end (often as a batch import) and came out the other end, in the form of analytics or a dashboard that helped (an often fairly limited group of) users understand the data.

In a C2E model, data may run polydirectionally, that is, from many points of ingestion back to central or edge data stores for processing, aggregation or analytics, and then back out to endpoint devices or dashboards for more processing. The data can also serve as instructions or training data for subsequent systems that run on AI (more on this later). There's no one-size-fits-all for data pipelines, anymore.

Where can we see examples of C2E pipelines?

Order is critical for transactions on the blockchain. You could expect that, if events were written in an arbitrary order, it would be impossible to reconstruct the state of your data at any point in time, or who did what to whom and when in a given transaction.

However, whenever data is partitioned and distributed across a network, one must consider the CAP Theorem, that is, the idea that a user of such a network may, at scale, need to tweak the tradeoff between data consistency and availability. For this reason, we expect to see more users implementing their distributed ledgers as tunably-consistent distributed databases.

Currently, a pub-sub architecture is what moves data from one datastore or location to another.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Creating a Data Advantage: CIOs Discuss Best Practices

9 Jun, 2021

Success in business starts with the realization that competitive advantage is with those that are great with data. Winners create …

Read more

Winning with Data (and How Data Governance Can Help)

23 Oct, 2016

In the book Winning with Data: Transform Your Culture, Empower Your People, and Shape Your Future, authors Tomasz Tunguz and …

Read more

The benefits of graph databases in relation to supply chain transparency

25 Sep, 2016

The benefits of graph databases in relation to supply chain transparency Opinion 20 September 2016 Graph databases have the ability …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.