Rethinking Data Marts in the Cloud

by 7wData
April 16, 2018

Many of us are all too familiar with the traditional way enterprises operate when it comes to on-premises data warehousing and data marts: the enterprise data warehouse (EDW) is often the center of the universe. Frequently, the EDW is treated a bit like Fort Knox; it’s a protected resource, with strict regulations and access rules. This setup translates into lengthy times to get new data sets into an EDW (weeks, if not months) as well as the inability to do exploratory analysis on large data sets because an EDW is an expensive platform and computational processing is shared and prioritized across all users. Friction associated with getting a data sandbox has also resulted in the proliferation of spreadmarts, unmanaged data marts, or other data extracts used for siloed data analysis. The good news is these restrictions can be lifted in the public cloud.

Business intelligence (BI) and analytics in the cloud is an area that has gained the attention of many organizations looking to provide a better user experience for their data analysts and engineers. The reason frequently cited for the consideration of BI in the cloud is that it provides flexibility and scalability. Organizations find they have much more agility with analytics in the cloud and can operate at a lower cost point than has been possible with legacy on-premises solutions.

These capabilities make for an amazing one-two-three punch.

Because the cloud offers the ability to decouple storage and compute, all of an Organization’s data can now live in a single place, thus eliminating data silos, and departments and teams can provision computes to run analytics for their use cases as needed. This new arrangement means self-service BI and analytics are a reality for those who adopt such a model. And with an open architecture, there are no worries about technology lock-ins.

Now that we’ve discussed what technology options there are for BI in the cloud, what are the considerations an Organization should think about?

Generally speaking, there are two common use cases for BI and analytics in the cloud that map to the two main architecture patterns: long-lived clusters and short-lived (or transient) clusters. Let’s discuss each in more detail.

Often, data analysts, data scientists, and data engineers want to investigate new and potentially interesting data sets, but would like to avoid as much friction as possible in doing so. It’s quite common for data sets to originate in the cloud, so storing and analyzing them in the cloud is a no-brainer. Such data sets can easily be brought into S3 or ADLS in their raw form as a first step.

Next, a cluster can easily be provisioned with the instance type and configuration of choice, including potentially using spot instances to reduce cost. Generally, instances for transient clusters need only minimal local disk space, since data processing runs directly on the data in the cloud storage. There are tools, like Cloudera Director, that can assist with the instance provisioning and software deployment, making it as easy as a few clicks to provision and launch a cluster. Once the cluster is ready, data exploration can take place, allowing the data analyst to perform an analysis. If a new data set will be created as part of the work, it can be saved back to the cloud storage.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Rethinking Data Marts in the Cloud

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Aerospike, ThoughtSpot, Alteryx and AI-inspired integration

What Hiring Managers Need to Understand About Data Scientists

How to Build the Organisation of Tomorrow?

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

IT Engineer

Data Engineer

Applications Developer

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Rethinking Data Marts in the Cloud

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change