Five often-overlooked Hadoop, Big Data analytics project killers Blog

Five often-overlooked Hadoop, Big Data analytics project killers

by 7wData
September 7, 2016

When you’re getting ready to perform analytics on a data set, attention often gets focused on the software you’re going to use to analyze and create your reports. Often, companies are thinking about how they are going to store data and build visualizations for one project and one instance. However, to truly achieve maturity in your big data analytics projects, you have to be thinking about the big picture. You must be thinking through all the criteria that can take them into success, both today and in the future.

Overlooked One: How will I get and manage the data?

In organizations where data management is immature, users and business units tend to hoard the data. Business users often have mistaken that if you own the data, you own the power. As IT professionals, we should move the organization toward data sharing – the enemy is not within, but it is with your data savvy competitors. IT can help by introducing technologies that make for easy democratization of the data. By supporting technologies like Kafka, IT can setup a publish and subscribe infrastructure for the data to help break the data fiefdoms.

At HPE, of course we support Kafka in our HPE Vertica platform. In addition, we’re working on the data democratization problem by doing things like supporting Hadoop file formats like ORC, Parquet, JSON and others so that data may be loaded into the analytics platform and anyone can be a data consumer. The high performance of our analytics database is not only about the speeds and feeds, it’s also about giving more end users the capability to leverage the data. We rely on a strong partnership network for ETL and data curation including partners like informatica, Talend, SyncSort, Tamr and Pentaho to name just a few.

Overlooked Two: Am I running the right hardware for the task?

Corporations often have banks of IT infrastructure that they can draw upon. HPE has sold a ton of Proliant DL380P servers over the years, offering a solid foundation for most IT tasks and a very predictable plan for power usage, management and operations. However, you should be considering that different workloads in your project may have different requirements for compute, storage and latency. For example, in the Hadoop world, ETL jobs may require lots of storage and the fastest network connection to deliver performance, while BI dashboards will rely on fast CPU and lots of memory to perform better. By thinking through how the hardware is going to be used, you can optimize and save.

This is really what HPE is accomplishing with our recent announcements of big data reference architectures for Vertica SQL on Hadoop. We have been working with the open source community on reference architectures on both the Proliant and Apollo platforms that can be optimized for the task. For example, if you need to turn up Hadoop compute resources, you can adjust some settings in YARN and get it. If you need to turn up storage performance, adjust the YARN labels and go.

Overlooked Three: Is it scalable and elastic?

The big data reference architectures also help you when start to get killed by your own success. Project managers should consider what to do if the project is a wild success and you get more data, more users and more queries.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Five often-overlooked Hadoop, Big Data analytics project killers

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Artificial intelligence set to transform regulatory compliance

How The Chief Data Officer Role Is Starting To Fork

Deep Learning Meets Recommendation Systems

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Five often-overlooked Hadoop, Big Data analytics project killers

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change