6 best practices on data governance for big data environments

3 min read

In many organizations, data governance used to be relatively straightforward. The business data being governed was mainly generated internally in transaction processing systems and ensconced behind the firewall. Data analysis and reporting applications enabled by the governance program were the province of a select group of IT and BI professionals, who typically used slow-changing processes to analyze data and planned projects well in advance.

As a result, data governance efforts were often treated as a behind-the-scenes IT process.

Governance was considered synonymous with a bureaucracy tax within traditional data environments to manage risk and drive multiyear data and analytics initiatives,” said Yasmeen Ahmad, vice president of global business analytics at data platform vendor Teradata.

Get the AI & data signal, daily.

335k+ subscribers read this every morning. One email, both newsletters. Unsubscribe anytime.

The rise of low-cost storage and compute resources and access to more types of data changed all that, inspiring data scientists and business users throughout the enterprise to find new ways to analyze data for operational insights and a competitive edge. Data analytics became decentralized and more self-service, allowing businesses to move faster. “But with greater freedom to access and leverage data comes great responsibility,” Ahmad said.

The advent of big data analytics has increased that responsibility. Data governance for big data requires keeping pace with a much faster rate of change. With incremental application updates on a continuous basis and the addition of new data sources and analytics methods, data governance has gone from a one-time bureaucratic tax to an integral — and highly dynamic — component of big data projects. Big data governance must track data access and usage across multiple platforms, monitor analytics applications for ethical issues and mitigate the risks of improper use of data. In a big data environment, it’s also important that data governance programs validate new data sources and ensure both data quality and data integrity. In addition, enterprises need to watch out for how data from different sources could be combined to create new combinations that violate privacy regulations. Based on those needs, here are six best practices for managing and improving data governance for big data environments.

Big data isn’t just about large amounts of data; it’s also about different types of data and where the data is coming from. Cloud services, social media and mobile apps provide new sources of data to organizations for use in enterprise applications. Companies are also finding ways to democratize the use of this data in order to expand their analytics applications and make them more productive. But the images, videos, tweets and tracking data that give companies a better understanding of their customers and other aspects of business operations also create a variety of governance challenges, said Ana Maloberti, a big data architect at IT consultancy Globant. For example, new data privacy laws like GDPR and the California Consumer Privacy Act add urgency to getting the governance of big data right. The challenges presented by new sources of data were there in the past, Maloberti added, “but nowadays all companies are scrutinized like never before, so a breach or policy violation could mean heavy fines and the loss of customer trust.” New sources of data also introduce challenges on data quality and reliability, Maloberti said. The needed validations to keep a big data environment trustworthy require up-to-date technologies and monitoring tools. It’s also important to confer with the legal department on what policies and regulations need to be considered when adding new sources to a big data platform.

Data governance for big data must pay special attention to data quality, agreed Emily Washington, executive vice president of product management at Infogix, a vendor of data governance and management software.

Continue Reading

Enjoyed this summary? Read the complete article at the source:

Continue at searchdatamanagement.techtarget.com →

Yves Mulkers

Yves Mulkers is the founder of 7wData and a widely followed voice in the data and AI community. He curates the 7wData and AI Beat newsletters, reaching hundreds of thousands of data and AI professionals, and writes on data strategy, analytics, AI, and the evolving data ecosystem.