Understanding the role of governance in data lakes and warehouses
- by 7wData
data lakes and data warehouses are both used to store data. And while they have innate differences, and serve organizations differently, there is a universal thread that runs through both, without which, would render them useless - data governance.
Data Lakes are repositories of data that can be structured or unstructured and can contain traditional transaction-type data, phone logs – you name it! It is truly a repository of all types of organizational data.
With data lakes, data can be brought in quickly, without complex provisioning, and there is no time spent on how it relates or should interact with other data sitting in the lake. It should be kept as close to its raw form as possible so that it can be used in multiple functions and isn’t locked into a particular use. Because all data is available, it allows for much deeper analytics.
Data lakes allow more flexibility for what-if analysis and modeling to identify relationships and likely outcomes that may not have been as obvious, such as with Market Basket analysis. With data scientists able to quickly access more Information to identify such obscure relationships, companies can use that information to in turn better service customers.
At the same time, it allows for identification of negative indicators which can help to protect the business and identify risks early on so they can be mitigated.
A key example of this comes from a regulatory perspective. A key regulatory metric for reporting is probability of default – in which models are built out to calculate the probability of default for different classifications of customers (whether based on geographic location, credit limit, etc.).
With a wide range of factors used in the model, data lakes can provide access to more data more quickly, greatly increasing the accuracy of the models. This in turn allows organizations to better serve their clients and provides them the insight to possible risks early on so that they can be mitigated.
Data warehouses are structured data sets that include both current and historical data. They are structured in a manner to meet reporting or analytical requirements. Creating a “single source of truth” for multiple reporting and analytical requirements reduces risk of inconsistent and inaccurate reporting across the enterprise.
Data warehouses bring data together in a structured way – it is modeled and set up in physical structures via a set of requirements, with performance and capture of consistent data relationships being the key goal.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More