Going with the stream: unbounded data processing with Apache Flink Blog

Going with the stream: unbounded data processing with Apache Flink

by 7wData
February 8, 2017

Previously, we introduced streaming, saw some of the benefits it can bring and discussed some of the architectural options and vendors / engines that can support streaming-oriented solutions. We now focus on one of the key players in this space, Apache Flink, and the commercial entity that employs many of the Flink committers and provides Flink-related services, data Artisans (dA).

We talked with dA CEO and Flink PMC member, Kostas Tzoumas. Tzoumas, who has a solid engineering background and was one of the co-creators of Flink, was keen to elaborate on an array of topics: from the streaming paradigm itself and its significance for applications, to Flink's latest release and roadmap and dA's commercial offering and plans.

A few people, including Tzoumas, have made the case for seeing traditional, bounded data and processing as a special case of its unbounded counterparts. While this may seem like a theoretical construct, its implications can be far-reaching.

As Dean Wampler, author ofFast Data Architectures for Streaming Applications argues, "if everything is considered a "stream" -- either finite (as in batch processing) or unbounded -- then the same infrastructure doesn't just unify the batch and speed layers, but batch processing becomes a subset of stream processing."

This as a paradigm shift, argues Tzoumas, as it means that the database is no longer the keeper of the global truth.

The global truth is in the stream: an always-on, immutable flow of data that is processed by an unbounded processing engine. State becomes a view on that unbounded data, specific to each application and kept locally utilizing whatever storage makes sense for the application.

In this architecture, applications consume streams of unbounded data, but they also use streams to publish their own data that may in turn be consumed by other applications. So the streaming engine becomes the hub of the entire data ecosystem. According to Tzoumas:

"What most people think of when it comes to streaming is applications like real-time analytics or IoT. We do believe that these are super-important and we fully support them, however what unbounded processing has the potential to do is offer a new pathway for all applications whose nature fits the streaming model, and those go way beyond the typical examples one would think of. Basically, these are all applications that periodically update their data. They may not necessarily be real time -- they may have latency that goes into the hours range, but that's not the point. as long as there is an inflow of data, we see them as streaming data applications. These are operational, not analytics applications. So for me streaming is not in any way confined to the analytics world, or the Hadoop world for that matter."

Tzoumas further says that:

Tzoumas offers two reasons for this:

1) Time management. You can manage time correctly (by using event time and watermarks), so you can group records correctly, based on when an event occurred, and not just artificially, based on when an event was ingested or processed (which is very often wrong).

2) State management. By modeling your problem as a streaming problem, you can keep state across boundaries. So two events that arrive in different time intervals but still belong to the same logical group can still be correlated with each other.

This may sound compelling, but why go for Flink when there are so many alternatives out there?

Why would anyone choose Flink over Spark in specific, which is enjoying wide popularity and vendor support? After all, if it's latency you're worried about, as Tom Reilly, Cloudera's CEO put it, "we're able to offer sub-second responses and we don't hear any complaints from customers."

Spark is a really good community, but Spark as a platform has some fundamental problems when it comes to streaming, argues Tzoumas.

"It's not about sub-second responses, it's about how to approach a continuous application that needs to keep state.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Going with the stream: unbounded data processing with Apache Flink

Leave a Reply Cancel reply

Upcoming Events

World Wide Data Vault Consortium 2024

Shift Difficult Problems Left with Graph Analysis on Streaming Data

MarkLogic World | Amsterdam

Categories

Tags

You Might Be Interested In

Artificial or Augmented Intelligence: Talks with Intel’s Chief Data Scientist, Bob Rogers

What is the Team Data Science Process?

Data Management Governance Demystified: Achieving Data-Driven Success

Recent Jobs

Applications Developer

D365 Business Analyst

Judiciary Research Manager (Court Executive 2B)

Associate Director for Impact and Analytics

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Going with the stream: unbounded data processing with Apache Flink

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change