Busting 5 Myths about Data Lakes

by 7wData
June 23, 2017

The data lake is still quite new, so it's natural that a few myths and misunderstandings have proliferated across the data management community. To set the record straight, I'd like to bust five myths that I keep hearing. First, I need to define the data lake so that we're all on the same page. A data lake is a user-defined method for organizing large volumes of highly diverse data. The data of a lake may be deployed on diverse data management platforms, including Hadoop clusters, relational databases, clouds, or a combination of these. Depending on the platform, a data lake may handle diverse data types, ranging from unstructured to semistructured to structured data. For most organizations, a data lake supports multiple use cases, including broad data exploration, advanced analytics, data warehouse extensions, and data landing and staging. Lakes may also serve specific departments (marketing, supply chain) or industries (healthcare, logistics). Busted! Although it's true that any database can become a dumping ground, this isn't what successful early adopters are doing with their data lakes. Lake owners interviewed by TDWI say that a data lake is a balancing act. Some users are allowed to dump, but others are not. For example, data analysts, data scientists, and some power users need to create "sandboxes" of data in their work; they are allowed to bring data into and out of the lake freely as long as they govern themselves. Most other users must petition the lake steward or curator, who vets incoming data. Myth #2: Data lakes are only for Internet firms. Busted! Hadoop and the data lake were pioneered by Internet firms, and we owe them our thanks for those innovations. However, TDWI has found data lakes in production in several mainstream industries, including finance, insurance, telco, pharma, and healthcare. As noted above, some lakes serve departmental operations or analytics. TDWI has also found multiple forms of analytics operating on a lake data, including data and text mining, clustering, graph, and predictive analytics, and natural language processing. Lake-based analytics supports a number of analytics application types, including risk calculations, customer segmentation, and the detection of fraud, security breaches, and insider trading. Data lakes serve a wide range of enterprises, and each lake is typically multitenant because it serves multiple business units and use cases. Busted! A recent TDWI survey showed that over half of data lakes in production are on Hadoop exclusively (53 percent).

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Busting 5 Myths about Data Lakes

Leave a Reply Cancel reply

Upcoming Events

World Wide Data Vault Consortium 2024

Shift Difficult Problems Left with Graph Analysis on Streaming Data

MarkLogic World | Amsterdam

Categories

Tags

You Might Be Interested In

The Hospitality Industry Benefits From the Emergence of Big Data

How to choose a data science vendor

What Lawyers Want Everyone to Know About AI Liability

Recent Jobs

D365 Business Analyst

Judiciary Research Manager (Court Executive 2B)

Associate Director for Impact and Analytics

Data Scientist: Support NYS Attorney General Investigations

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Busting 5 Myths about Data Lakes

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change