Hortonworks unveils roadmap to make Hadoop cloud-native
- by 7wData
It would be pure understatement to say that the world has changed since Hadoop debuted just over a decade ago. Rewind the tape to 5 - 10 years ago, and if you wanted to work with Big Data, Hadoop was pretty much the only platform game in town. Open source software was the icing on the cake of cheap compute and storage infrastructure that made processing and storing petabytes of data thinkable.
Since then, storage and compute have continued to get cheaper. But so has bandwidth, as 10 GbE connections have supplanted the 1 GbE connections that were the norm a decade ago. The cloud, edge computing, smart devices, and the Internet of Things have changed the big data landscape, while alternatives such as dedicated Spark and AI services offer alternatives to firing up full Hadoop clusters. And as we previously noted, capping it off, cloud storage has become the de facto data lake.
Today you can run Hadoop in the cloud, but Hadoop is not currently a platform that fully exploits the capabilities of the cloud. Aside from slotting in S3 or other cloud storage in place of HDFS, Hadoop does not fully take advantage of the cloud architecture. Making Hadoop cloud-native is not a matter of buzzword compliance, but making it more fleet-footed.
The need for Hadoop to get there is not simply attributable to competition from other bespoke big data cloud services, but from the inevitability of cloud deployment. In addition to cloud-based Hadoop services from the usual suspects, we estimate that about 25% of workloads from Hadoop incumbents -- Cloudera, Hortonworks, and MapR -- are currently running in the cloud. But more importantly, by next year, we predict that half of all new big data workloads will be deployed in the cloud.
So what's it like to work with Hadoop in the cloud today? It can often take up to 20 minutes or more to provision a cluster with all the components. That flies against the expectation of being able to fire up a Spark or Machine Learning service within minutes -- or less. That is where containerization and microservices come in -- they can isolate workloads or entire clusters, making multi-tenancy real. And they can make it far more efficient to launch Hadoop workloads.
Another key concept for cloud operation is separating compute from storage. This actually flies in the face of Hadoop's original design pattern, where the idea was to bring compute to the data to minimize data movement. Today, the pipes have grown fat enough to make that almost a non-issue. As noted above, separate compute and storage is already standard practice with most managed cloud-based Hadoop services, although in EMR, Amazon does provide the option of running HDFS.
We're still in the early days of making Hadoop container-friendly. MapR fired the first shot with its support of persistent containers in its platform, allowing you to isolate workloads to reduce contention for resources. Hadoop 3.1 in turn now lets you launch Docker containers from YARN. But while Kubernetes will inevitably be on Hadoop's roadmap, there is no timeline yet for when it will make it into the trunk.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
Shift Difficult Problems Left with Graph Analysis on Streaming Data
29 April 2024
12 PM ET – 1 PM ET
Read MoreYou Might Be Interested In
EDISON Data Science Framework to define the Data Science Profession
20 Oct, 2016EDISON Data Science Framework provides conceptual, instructional and policy components required to establish the Data Science profession. Abstract The effective …
Cloud Data Warehousing: Understanding Your Options
12 Apr, 2021Cloud data warehouses have emerged as the go-to repositories for amassing huge amounts of data and running advanced analytics and …
Qlik Acquires Talend, Combining its Best-in-Class Data Integration, Transformation Quality and Governance capabilities
16 May, 2023Talend and Qlik’s Data Integration and Quality solutions automate the delivery of trusted, business-ready data, enabling smarter decisions, operational efficiency, …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.