Five Essential Capabilities: Elastic Data Processing & Storage

Five Essential Capabilities: Elastic Data Processing & Storage

Over the last 20 years of doing business we have seen a number of different analytical data storage and query concepts fall in and out of favor. Throughout this time, a wave of digital transformation in business has dramatically increased the volume of collected data. Machine learning and other probabilistic methods benefit greatly from the law of large numbers so if by now it wasn’t already clear, all that talk about “big data” has really been about the analytics that it enables. As a result, today’s knowledge workers are predisposed to data hoarding, preferring to save everything including the data for which there are no known use cases, since its future value to the organization may still yet be discovered.

We have intentionally avoided describing this capability as either a data lake or a data warehouse, as we have observed that both these terms are highly subjective and can carry vastly different connotations from business to business. This concept is less about how you choose to model the data, and more about how elasticity (often characterized as “the cloud”) fundamentally changes the economics of data storage and processing in ways that will reduce the time and cost to solve familiar problems, as well as dramatically lower the risks associated with experimentation and innovation. Let’s explore the key elements of data processing and storage elasticity that make it such a valuable capability for the competitive data-driven enterprise.

Separate Storage and Compute
Traditional relational database management systems (RDBMS) and warehouse appliances conflate the concepts of data storage and data processing power. If you wish to store more data, you must also purchase additional processing capacity, and because these platforms cannot be dynamically scaled you must make these purchases in larger stepwise increments (i.e. software licenses, blades, or racks) that will almost certainly remain an underutilized asset. Modern analytical data processing technologies such as Snowflake, Amazon Athena, Hadoop and Spark are designed with separate storage and compute in mind. These new models allow for you to keep your data in some form of object storage such as Amazon S3 or HDFS, and query or process it where it sits using a wide variety of fit-for-purpose engines. With this key point of differentiation from traditional RDBMS, platform owners can lower their cost of query-able storage from quoted highs of around $10,000 per year/terabyte for an appliance to as little as $40 on an elastic platform.

Massive Parallelism and Dynamic Scaling
Modern analytic data platforms are designed with parallelism in mind, and allow the platform owner or analyst to dynamically increase or decrease the number of resources they have available which in turn can speed up queries, and reduce the time (and additional hardware and software) required for both data transformation tasks and machine learning model training. A native parallel architecture also allows these systems to scale up or down seamlessly without the costly data replication and clunky overhead of classic “database clustering” paradigms.

Business Agility & Innovation
Again, pay-per-use models prevail that can be leveraged to manage expenses, and also encourage data experimentation and innovation. For example, a project team could quickly provision an Amazon Elastic Mapreduce (EMR) cluster to perform an experimental data processing job atop data already stored in Amazon S3.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Turning Big Data from Cost to Revenue

31 Jul, 2017

Big Data and the Internet of Things are currently being lauded in many industries as the new frontier for business …

Read more

Developing an enterprise data strategy: 10 steps to take

12 Apr, 2020

With data and analytics increasingly driving business decision-making in organizations, data management is no longer an isolated technical function. As …

Read more

​Obama’s report on the future of artificial intelligence: The main takeaways

16 Oct, 2016

The Obama administration released a report on the future of artificial intelligence and addressed everything from job loss, ethics, bias …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.