Data mesh: it’s not just about tech, it’s about ownership and communication
- by 7wData
As companies continue to digitize key areas of their business, they collect more and more data about their own processes and about their customers. Consequently, they want to use this data to drive fact-based decision making in order to better serve their customers’ needs. In some industries, the level of data-drivenness, i.e. how quickly a company can make decisions based on data instead of gut feeling, has already become the deciding competitive advantage.
In traditional business intelligence (BI), a centrally maintained data warehouse is the basis for many business decisions, e.g. by providing up-to-date reports that support those decisions. As big data technology has matured and with the growing popularity of data science, many companies invest in building a central data lake — sometimes to replace the data warehouse but more often in addition to the existing data warehouse. The main difference between the two approaches is when curation and modeling happens: with the data warehouse, data is already transformed at ingestion time to fit a certain application;
One of those patterns that I’ve seen again and again, is that of an overwhelmed and stressed-out central “data team”. This team maintains the central data infrastructure, be it the data warehouse or the data lake. More importantly, however, this team is solely responsible for delivering data sets or reports to stakeholders, product teams and data scientists in a reliable and timely fashion. I am consciously calling this a data team, and not more specifically a data engineering or a data insights team, because it reflects the unclear mix of responsibilities that this team is often dealing with.
Consequently the members of this data team often find themselves in a tight spot. They spend a lot of their time “firefighting” and fixing issues that have been introduced by data producing teams, while being the recipient of frustration from data consuming stakeholders. That is particularly sad, because those team members are often the most most data-savvy individuals in the company, and
Now why can’t such capable engineers solve this situation? The reason is that the problem is not a technological but an organizational one. One of the main issues is an unfortunate distribution of responsibilities to the parties involved.
One party — the data producers — has the domain expertise, i.e. they understand the meaning of the data and they can directly change the way the data is shaped; Another party — the data consumers — has the vested interest in the data, understands its business potential and can therefore clearly describe requirements about,
Instead, as a goal state, data producers and data consumers should be working together as closely as possible. From an organizational perspective, the ideal situation is when the same team is both producing and consuming the same data, so that interest, responsibility, and ability are combined in the same team. In practice, this is often not feasible, as a data producing team already owns so many responsibilities in their particular domain that they cannot fully own a data consuming application too. Thus, splitting those roles into two teams that directly communicate without middlemen, is already a big step forward. The goal of a data producing team should be to provide their data in such a way that others can get value out of that data without requiring detailed domain knowledge, i.e. data producers should hide “implementation details.” Such a data producing team, of course, can also be in a data consuming position at the same time. There are consumer-oriented data domains that are complex enough to justify a whole team of domain experts but who themselves consume data from a source-aligned data domain.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More