Backing Up Big Data? Chances Are You’re Doing It Wrong

Backing Up Big Data? Chances Are You’re Doing It Wrong

The increasing pervasiveness of social networking, multi-cloud applications and Internet of Things (IoT) devices and services continues to drive exponential growth in big data solutions. As businesses become more data driven and larger, more current data sets become important to support the online business processes, analytics, intelligence and decisions. Additionally, data availability and integrity become increasingly critical as more and more businesses and their partners rely on these (near) real-time analytics and insights to drive their business. These big data solutions typically are built upon a new class of hyper-scale, distributed, multi-cloud, data-centric applications.

While these NoSQL, semi-structured, highly distributed data stores are perfect for handling vast amounts of big data on a large number of systems, they can no longer be effectively supported by legacy data management and protection models. Not only based on the sheer data size and the vast number of storage and compute nodes, but also because of built-in data replication, data distribution, and data versioning capabilities – a different approach for backup and recovery is needed. Even though these next-generation data stores have integrated high availability and DR capabilities, events like logical data corruption, application defects, and/or simple user errors still require another level of recoverability.

To meet the requirements of these high-volume and real-time applications in a scale-out, cloud centric environments, a wave of new data stores and persistence models has emerged. Gone are the days of just files, objects and relational databases. The next-generation key-value stores, XML/JSON document stores, arbitrary width column stores and graph-databases (sometimes characterized as NoSQL stores) share several fundamental characteristics that enable the big data driven IT. Almost without exception, all big data repositories are based on a cloud-enabled, scale-out, distributed data persistence model that leverages commodity infrastructure while providing some form of integrated data replication, multi-cloud distribution and high-availability. The big data challenges aren’t limited to just the data ingest, data storage, data processing, data queries, result set capturing, visualization, but also pose increasing difficulties around data integrity, availability, recoverability, accessibility and mobility/movement. Let’s see how this plays out in a couple example case studies.

A first case study revolves around an Identity and Access Management service provider that uses Cassandra as its core persistence technology. The IDaaS (Identity as a service) is a multi-tenant service with a mixture of large enterprise, SMB and development customers and partners. The Cassandra database provides them with a highly scalable, distributed, high available data store that supports per tenant custom user and group profiles (i.e. read dynamic extensible schemas). While the data set may not be very large in absolute storage size, the number of records definitely will be in the 10’s, if not 100’s of millions.

What drives the unique requirements for recoverability is the multi-tenancy and the 100% availability targets of the service. Whether it is through user error, data integration defects and changes, or simply tenant migrations, it may be required to recover a single tenant’s data set without having to restore the whole Cassandra cluster (or replica thereof) in order to restore just one tenant instance. Similarly, the likelihood that the complete Cassandra cluster is corrupt is slim and in order to maintain (close to) 100% availability for most tenant service instances, partial recovery would be required. This drives the need for some level of application aware protection and recovery.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Precision Medicine, Big Data Partnerships Will Enhance Treatment

29 Dec, 2016

The calendar year may be coming to its end, but the nation’s work on precision medicine is just getting started.  …

Read more

Why Cortana’s new boss is obsessed with artificial intelligence

15 Oct, 2016

Last week, Microsoft took the unusual step of placing its Cortana and Bing product teams inside the same organization as …

Read more

We Almost Gave Up On Building Artificial Brains

16 Oct, 2017

Today artificial neural networks are making art, writing speeches, identifying faces and even driving cars. It feels as if we’re …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.