Backing Up Big Data? Chances Are You’re Doing It Wrong Blog

Backing Up Big Data? Chances Are You’re Doing It Wrong

by 7wData
November 27, 2017

The increasing pervasiveness of social networking, multi-cloud applications and Internet of Things (IoT) devices and services continues to drive exponential growth in big data solutions. As businesses become more data driven and larger, more current data sets become important to support the online business processes, analytics, intelligence and decisions. Additionally, data availability and integrity become increasingly critical as more and more businesses and their partners rely on these (near) real-time analytics and insights to drive their business. These big data solutions typically are built upon a new class of hyper-scale, distributed, multi-cloud, data-centric applications.

While these NoSQL, semi-structured, highly distributed data stores are perfect for handling vast amounts of big data on a large number of systems, they can no longer be effectively supported by legacy data management and protection models. Not only based on the sheer data size and the vast number of storage and compute nodes, but also because of built-in data replication, data distribution, and data versioning capabilities – a different approach for backup and recovery is needed. Even though these next-generation data stores have integrated high availability and DR capabilities, events like logical data corruption, application defects, and/or simple user errors still require another level of recoverability.

To meet the requirements of these high-volume and real-time applications in a scale-out, cloud centric environments, a wave of new data stores and persistence models has emerged. Gone are the days of just files, objects and relational databases. The next-generation key-value stores, XML/JSON document stores, arbitrary width column stores and graph-databases (sometimes characterized as NoSQL stores) share several fundamental characteristics that enable the big data driven IT. Almost without exception, all big data repositories are based on a cloud-enabled, scale-out, distributed data persistence model that leverages commodity infrastructure while providing some form of integrated data replication, multi-cloud distribution and high-availability. The big data challenges aren’t limited to just the data ingest, data storage, data processing, data queries, result set capturing, visualization, but also pose increasing difficulties around data integrity, availability, recoverability, accessibility and mobility/movement. Let’s see how this plays out in a couple example case studies.

A first case study revolves around an Identity and Access Management service provider that uses Cassandra as its core persistence technology. The IDaaS (Identity as a service) is a multi-tenant service with a mixture of large enterprise, SMB and development customers and partners. The Cassandra database provides them with a highly scalable, distributed, high available data store that supports per tenant custom user and group profiles (i.e. read dynamic extensible schemas). While the data set may not be very large in absolute storage size, the number of records definitely will be in the 10’s, if not 100’s of millions.

What drives the unique requirements for recoverability is the multi-tenancy and the 100% availability targets of the service. Whether it is through user error, data integration defects and changes, or simply tenant migrations, it may be required to recover a single tenant’s data set without having to restore the whole Cassandra cluster (or replica thereof) in order to restore just one tenant instance. Similarly, the likelihood that the complete Cassandra cluster is corrupt is slim and in order to maintain (close to) 100% availability for most tenant service instances, partial recovery would be required. This drives the need for some level of application aware protection and recovery.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Backing Up Big Data? Chances Are You’re Doing It Wrong

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

Knowledge Graph — The Ultimate Center of Excellence

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Precision Medicine, Big Data Partnerships Will Enhance Treatment

Why Cortana’s new boss is obsessed with artificial intelligence

We Almost Gave Up On Building Artificial Brains

Recent Jobs

IT Engineer

Data Engineer

Applications Developer

D365 Business Analyst

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Backing Up Big Data? Chances Are You’re Doing It Wrong

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change