Five often-overlooked Hadoop, Big Data analytics project killers

Five often-overlooked Hadoop

When you’re getting ready to perform analytics on a data set, attention often gets focused on the software you’re going to use to analyze and create your reports.  Often, companies are thinking about how they are going to store data and build visualizations for one project and one instance. However, to truly achieve maturity in your big data analytics projects, you have to be thinking about the big picture.  You must be thinking through all the criteria that can take them into success, both today and in the future.

Overlooked One: How will I get and manage the data?

In organizations where data management is immature, users and business units tend to hoard the data. Business users often have mistaken that if you own the data, you own the power. As IT professionals, we should move the organization toward data sharing – the enemy is not within, but it is with your data savvy competitors.  IT can help by introducing technologies that make for easy democratization of the data. By supporting technologies like Kafka, IT can setup a publish and subscribe infrastructure for the data to help break the data fiefdoms.

At HPE, of course we support Kafka in our HPE Vertica platform.  In addition, we’re working on the data democratization problem by doing things like supporting Hadoop file formats like ORC, Parquet, JSON and others so that data may be loaded into the analytics platform and anyone can be a data consumer.  The high performance of our analytics database is not only about the speeds and feeds, it’s also about giving more end users the capability to leverage the data. We rely on a strong partnership network for ETL and data curation including partners like informatica, Talend, SyncSort, Tamr and Pentaho to name just a few.

Overlooked Two: Am I running the right hardware for the task?

Corporations often have banks of IT infrastructure that they can draw upon. HPE has sold a ton of Proliant DL380P servers over the years, offering a solid foundation for most IT tasks and a very predictable plan for power usage, management and operations. However, you should be considering that different workloads in your project may have different requirements for compute, storage and latency. For example, in the Hadoop world, ETL jobs may require lots of storage and the fastest network connection to deliver performance, while BI dashboards will rely on fast CPU and lots of memory to perform better.  By thinking through how the hardware is going to be used, you can optimize and save.

This is really what HPE is accomplishing with our recent announcements of big data reference architectures for Vertica SQL on Hadoop.  We have been working with the open source community on reference architectures on both the Proliant and Apollo platforms that can be optimized for the task.  For example, if you need to turn up Hadoop compute resources, you can adjust some settings in YARN and get it.  If you need to turn up storage performance, adjust the YARN labels and go.

Overlooked Three: Is it scalable and elastic?

The big data reference architectures also help you when start to get killed by your own success. Project managers should consider what to do if the project is a wild success and you get more data, more users and more queries.

 

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Artificial intelligence set to transform regulatory compliance

3 Oct, 2016

Most people have heard of the headline-making achievements in artificial intelligence (AI); systems winning quiz shows and beating world champions …

Read more

How The Chief Data Officer Role Is Starting To Fork

3 Apr, 2022

In the beginning, there was a lot of talk about being data-driven. This sparked a race that started in the …

Read more

Deep Learning Meets Recommendation Systems

17 Dec, 2017

Almost everyone loves to spend their leisure time to watch movies with their family and friends. We all have the …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.