Data quality and cloud computing: What are the risks?

2 min read

If I have one gripe about technology today, it’s the seemingly universal belief that manifests social and business issues for the first time. Sure, Twitter may be a relatively new advent and only recently have people been fired for tweeting really inappropriate jokes. Still, if you think that losing your job for publicly saying something objectionable is new, think again.

And the same holds true with respect to data quality. Yes, contemporary cloud computing arrived fairly recently, although its roots in grid computing go back decades. Make no mistake, though: the notion that duplicate, erroneous, invalid and incomplete information harms a business is hardly new. It predates today’s rampant technology, big data and even the modern-day computer. Bum handwritten general-ledger entries caused problems centuries ago.

This begs the question: What are the data quality risks specific to cloud computing? In a nutshell, I see three.

Get the AI & data signal, daily.

335k+ subscribers read this every morning. One email, both newsletters. Unsubscribe anytime.

In the traditional on-premise world, data is often integrated and stored through a very controlled series of periodic ETL batch jobs. For the most part, that data is internal to the enterprise, although exceptions certainly exist such as interfaces to banks, insurance carriers and the like. In an era of cloud computing, though, data flows much more frequently and quickly, often via APIs. That data is often external to the enterprise. Examples include feeds from Twitter, LinkedIn, Google Maps, etc.

With much less control over the data, organizations could certainly see “their” data quality take a hit.;

 

Yves Mulkers

Yves Mulkers is the founder of 7wData and a widely followed voice in the data and AI community. He curates the 7wData and AI Beat newsletters, reaching hundreds of thousands of data and AI professionals, and writes on data strategy, analytics, AI, and the evolving data ecosystem.