In the Middle of DatA Integration Is AI
- by 7wData
The connection between data integration and Artificial Intelligence is growing.
Have you never noticed that acronym hiding in plain sight in the midst of data integration? Artificial Intelligence (AI) can be like that -- it creeps up on you unnoticed until suddenly it's all over the place.
How Are Data Integration and AI Related?
One of the early occurrences is in a 2013 research paper: "Data Curation at Scale: The Data Tamer System" by Michael Stonebraker and others. Although labeled as curation, the topic is largely the same as data preparation, data integration, data unification, ETL, MDM, DWA, or whatever you choose to call it. In essence, it is the set of processes needed between the multiple, inchoate data sources of a modern business and any cohesive system claiming to deliver consistent insights from them.
According to the authors, "At ... scale, data curation cannot be a manual (human) effort, but must entail Machine Learning approaches with a human assist only when necessary." This MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) research was productized in 2014 as Tamr.
Combining machine learning (AI) and human assistance makes sense in the context of data integration. Training an unsupervised AI system requires enormous amounts of data, even where the approach is technically appropriate. Supervised learning -- human assistance in tagging the training set -- is often more effective for situations where training data is more limited. In data integration, where data volumes are more limited, human audit and correction after AI training is the likely scenario.
The Data Tamer (and Tamr product) methodology explicitly calls out three levels of context-setting information that are encountered in data integration and how they can be addressed:
Machine learning aside, this approach dates back to the earliest days of data warehousing.
Dr. Stonebraker's most recent white paper extends his thinking to "The Seven Tenets of Scalable Data Unification." Here, he claims that traditional data integration approaches miss most or all of the listed conditions, while Tamr (unsurprisingly) meets them all. Longtime practitioners of data warehouse population are unlikely to agree with the first statement. Each tenet has been considered and implemented in some form or another by data integration products as technology allowed.
As AI plays an increasing role in data integration, the important questions become how completely can the process be automated and what degree of human assistance is still required.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read More