The great data science hope: Machine learning can cure your terrible data hygiene

The great data science hope: Machine learning can cure your terrible data hygiene

Will there ever be a technology that can fix decades of poor data hygiene? Probably not, but that isn't going to stop technology vendors from trying. The good news: Machine learning may come closest to saving your data management hide.

Data hygiene isn't easy. You can't hire enough interns to even come close to rectifying past mistakes. The reality is enterprises haven't been creating data dictionaries, meta data and clean information for years. Sure, this data hygiene effort may have improved a bit, but let's get real: Humans aren't up for the job and never have been. ZDNet's Andrew Brust put it succinctly: Humans aren't meticulous enough. And without clean data, a data scientist can't create algorithms or a model for analytics.

Luckily, technology vendors have a magic elixir to sell you...again. The latest concept is to create an abstraction layer that can manage your data, bring analytics to the masses and use machine learning to make predictions and create business value. And the grand setup for this analytics nirvana is to use machine learning to do all the work that enterprises have neglected.

I know you've heard this before. The last magic box was the data lake where you'd throw in all of your information--structured and unstructured--and then use a Hadoop cluster and a few other technologies to make sense of it all. Before big data, the data warehouse was going to give you insights and solve all your problems along with business intelligence and enterprise resource planning. But without data hygiene in the first place enterprises replicated a familiar, but failed strategy: Poop in. Poop out. And you wouldn't want to make your in-demand data scientists deal with poo.

TechRepublic: Cheat sheet: How to become a data scientist | Job description: Data scientist (Tech Pro Research)

IBM's Seth Dobrin, chief data officer for IBM, said "the idea that you could use a data lake and Hadoop (MapReduce) instance where you can dump all this crap in is a mistake.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

Linked data and semantics: Key to data integration and governance

22 Jan, 2019

The combination of linked data and semantics can unlock anything from regulatory compliance to new communication channels A little semantics …

Read more

Big Data is Too Big to Die

12 Sep, 2016

As the traditionalist data analytics professionals dig their heels in and refuse to give in to the Big Data deluge, …

Read more

AI Won’t Kill The Job Market But Keep It Steady, PwC Report Says

20 Jul, 2018

It’s impossible to say precisely how artificial intelligence will disrupt the job market, so researchers at PwC have taken a …

Read more

Recent Jobs

Senior Cloud Engineer (AWS, Snowflake)

Remote (United States (Nationwide))

9 May, 2024

Read More

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.