Five Essential Capabilities: Elastic Data Processing & Storage Blog

Five Essential Capabilities: Elastic Data Processing & Storage

by 7wData
May 4, 2019

Over the last 20 years of doing business we have seen a number of different analytical data storage and query concepts fall in and out of favor. Throughout this time, a wave of digital transformation in business has dramatically increased the volume of collected data. Machine learning and other probabilistic methods benefit greatly from the law of large numbers so if by now it wasn’t already clear, all that talk about “big data” has really been about the analytics that it enables. As a result, today’s knowledge workers are predisposed to data hoarding, preferring to save everything including the data for which there are no known use cases, since its future value to the organization may still yet be discovered.

We have intentionally avoided describing this capability as either a data lake or a data warehouse, as we have observed that both these terms are highly subjective and can carry vastly different connotations from business to business. This concept is less about how you choose to model the data, and more about how elasticity (often characterized as “the cloud”) fundamentally changes the economics of data storage and processing in ways that will reduce the time and cost to solve familiar problems, as well as dramatically lower the risks associated with experimentation and innovation. Let’s explore the key elements of data processing and storage elasticity that make it such a valuable capability for the competitive data-driven enterprise.

Separate Storage and Compute
Traditional relational database management systems (RDBMS) and warehouse appliances conflate the concepts of data storage and data processing power. If you wish to store more data, you must also purchase additional processing capacity, and because these platforms cannot be dynamically scaled you must make these purchases in larger stepwise increments (i.e. software licenses, blades, or racks) that will almost certainly remain an underutilized asset. Modern analytical data processing technologies such as Snowflake, Amazon Athena, Hadoop and Spark are designed with separate storage and compute in mind. These new models allow for you to keep your data in some form of object storage such as Amazon S3 or HDFS, and query or process it where it sits using a wide variety of fit-for-purpose engines. With this key point of differentiation from traditional RDBMS, platform owners can lower their cost of query-able storage from quoted highs of around $10,000 per year/terabyte for an appliance to as little as $40 on an elastic platform.

Massive Parallelism and Dynamic Scaling
Modern analytic data platforms are designed with parallelism in mind, and allow the platform owner or analyst to dynamically increase or decrease the number of resources they have available which in turn can speed up queries, and reduce the time (and additional hardware and software) required for both data transformation tasks and machine learning model training. A native parallel architecture also allows these systems to scale up or down seamlessly without the costly data replication and clunky overhead of classic “database clustering” paradigms.

Business Agility & Innovation
Again, pay-per-use models prevail that can be leveraged to manage expenses, and also encourage data experimentation and innovation. For example, a project team could quickly provision an Amazon Elastic Mapreduce (EMR) cluster to perform an experimental data processing job atop data already stored in Amazon S3.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Five Essential Capabilities: Elastic Data Processing & Storage

Leave a Reply Cancel reply

Upcoming Events

World Wide Data Vault Consortium 2024

Shift Difficult Problems Left with Graph Analysis on Streaming Data

MarkLogic World | Amsterdam

Categories

Tags

You Might Be Interested In

Turning Big Data from Cost to Revenue

Developing an enterprise data strategy: 10 steps to take

Obama’s report on the future of artificial intelligence: The main takeaways

Recent Jobs

IT Engineer

Data Engineer

Applications Developer

D365 Business Analyst

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Five Essential Capabilities: Elastic Data Processing & Storage

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change