Rethinking Data Marts in the Cloud

Rethinking Data Marts in the Cloud

Many of us are all too familiar with the traditional way enterprises operate when it comes to on-premises data warehousing and data marts: the enterprise data warehouse (EDW) is often the center of the universe. Frequently, the EDW is treated a bit like Fort Knox; it’s a protected resource, with strict regulations and access rules. This setup translates into lengthy times to get new data sets into an EDW (weeks, if not months) as well as the inability to do exploratory analysis on large data sets because an EDW is an expensive platform and computational processing is shared and prioritized across all users. Friction associated with getting a data sandbox has also resulted in the proliferation of spreadmarts, unmanaged data marts, or other data extracts used for siloed data analysis. The good news is these restrictions can be lifted in the public cloud.

Business intelligence (BI) and analytics in the cloud is an area that has gained the attention of many organizations looking to provide a better user experience for their data analysts and engineers. The reason frequently cited for the consideration of BI in the cloud is that it provides flexibility and scalability. Organizations find they have much more agility with analytics in the cloud and can operate at a lower cost point than has been possible with legacy on-premises solutions.

These capabilities make for an amazing one-two-three punch.

Because the cloud offers the ability to decouple storage and compute, all of an Organization’s data can now live in a single place, thus eliminating data silos, and departments and teams can provision computes to run analytics for their use cases as needed. This new arrangement means self-service BI and analytics are a reality for those who adopt such a model. And with an open architecture, there are no worries about technology lock-ins.

Now that we’ve discussed what technology options there are for BI in the cloud, what are the considerations an Organization should think about?

Generally speaking, there are two common use cases for BI and analytics in the cloud that map to the two main architecture patterns: long-lived clusters and short-lived (or transient) clusters. Let’s discuss each in more detail.

Often, data analysts, data scientists, and data engineers want to investigate new and potentially interesting data sets, but would like to avoid as much friction as possible in doing so. It’s quite common for data sets to originate in the cloud, so storing and analyzing them in the cloud is a no-brainer. Such data sets can easily be brought into S3 or ADLS in their raw form as a first step.

Next, a cluster can easily be provisioned with the instance type and configuration of choice, including potentially using spot instances to reduce cost. Generally, instances for transient clusters need only minimal local disk space, since data processing runs directly on the data in the cloud storage. There are tools, like Cloudera Director, that can assist with the instance provisioning and software deployment, making it as easy as a few clicks to provision and launch a cluster. Once the cluster is ready, data exploration can take place, allowing the data analyst to perform an analysis. If a new data set will be created as part of the work, it can be saved back to the cloud storage.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Aerospike, ThoughtSpot, Alteryx and AI-inspired integration

18 Mar, 2019

Partnerships are nothing new in the analytics world, and neither are integrations between technologies. But this week has seen a …

Read more

What Hiring Managers Need to Understand About Data Scientists

2 May, 2016

Michael Li recently wrote inEntrepreneur.com, “AtThe Data Incubator, I’ve spoken to hundreds of employers looking to hire data scientists — …

Read more

How to Build the Organisation of Tomorrow?

23 Oct, 2018

A question that I get asked a lot is how should organisations navigate today’s digital transformation? An interesting question that …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.