Hortonworks unveils roadmap to make Hadoop cloud-native Blog

Hortonworks unveils roadmap to make Hadoop cloud-native

by 7wData
September 13, 2018

It would be pure understatement to say that the world has changed since Hadoop debuted just over a decade ago. Rewind the tape to 5 - 10 years ago, and if you wanted to work with Big Data, Hadoop was pretty much the only platform game in town. Open source software was the icing on the cake of cheap compute and storage infrastructure that made processing and storing petabytes of data thinkable.

Since then, storage and compute have continued to get cheaper. But so has bandwidth, as 10 GbE connections have supplanted the 1 GbE connections that were the norm a decade ago. The cloud, edge computing, smart devices, and the Internet of Things have changed the Big Data landscape, while alternatives such as dedicated Spark and AI services offer alternatives to firing up full Hadoop clusters. And as we previously noted, capping it off, cloud storage has become the de facto data lake.

Today you can run Hadoop in the cloud, but Hadoop is not currently a platform that fully exploits the capabilities of the cloud. Aside from slotting in S3 or other cloud storage in place of HDFS, Hadoop does not fully take advantage of the cloud architecture. Making Hadoop cloud-native is not a matter of buzzword compliance, but making it more fleet-footed.

The need for Hadoop to get there is not simply attributable to competition from other bespoke big data cloud services, but from the inevitability of cloud deployment. In addition to cloud-based Hadoop services from the usual suspects, we estimate that about 25% of workloads from Hadoop incumbents -- Cloudera, Hortonworks, and MapR -- are currently running in the cloud. But more importantly, by next year, we predict that half of all new big data workloads will be deployed in the cloud.

So what's it like to work with Hadoop in the cloud today? It can often take up to 20 minutes or more to provision a cluster with all the components. That flies against the expectation of being able to fire up a Spark or machine learning service within minutes -- or less. That is where containerization and microservices come in -- they can isolate workloads or entire clusters, making multi-tenancy real. And they can make it far more efficient to launch Hadoop workloads.

Another key concept for cloud operation is separating compute from storage. This actually flies in the face of Hadoop's original design pattern, where the idea was to bring compute to the data to minimize data movement. Today, the pipes have grown fat enough to make that almost a non-issue. As noted above, separate compute and storage is already standard practice with most managed cloud-based Hadoop services, although in EMR, Amazon does provide the option of running HDFS.

We're still in the early days of making Hadoop container-friendly. MapR fired the first shot with its support of persistent containers in its platform, allowing you to isolate workloads to reduce contention for resources. Hadoop 3.1 in turn now lets you launch Docker containers from YARN. But while Kubernetes will inevitably be on Hadoop's roadmap, there is no timeline yet for when it will make it into the trunk.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Hortonworks unveils roadmap to make Hadoop cloud-native

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Building a digital learning culture at Avery Dennison

Continuous Analytics on Graph Data Streams Using WSO2 CEP

How Data Subject Requests are at the heart of protecting privacy

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Hortonworks unveils roadmap to make Hadoop cloud-native

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change