AI powers the catalogs of next-generation big data

Data’s relevance doesn’t always jump out at you. It takes work to distill useful insights from enterprise data lakes that are increasingly too large, diverse and dynamic to be explored through entirely manual methods.
Discoverability and visibility are what unlocks data’s value. More enterprises are embracing big-data catalogs to harness insights that would otherwise stay dormant and overlooked. Recognizing this growing demand, more data management solution providers are building sophisticated catalogs into their solution portfolios, as discussed in Wikibon’s recent big-data market study.
Artificial intelligence is a key force driving the evolution of big-data catalogs into enterprisewide platforms for collaboration curation. Increasingly, providers are integrating AI into their offerings to help users discover, refine, explore, analyze and apply complex data sets more rapidly and intelligently to diverse applications.
Among data management vendors, Informatica LLC has set the pace in the weaving of AI-infused metadata-management capabilities into its solution portfolio. In the breadth and sophistication of its AI capabilities, Informatica stands apart from other data catalog solution providers such as Alation Inc., Cloudera Inc., Hortonworks Inc. and Microsoft Corp.
The company briefed Wikibon last summer on its roadmap to integrate AI as an enabling capability across its entire product line, with its Enterprise Data Catalog at the center. At that time, Informatica had already incorporated AI — which it brands as “CLAIRE” — into its catalog to automate data clustering, tagging, and domain/entity recognition. The AI-powered catalog intelligently scans data assets from across the enterprise and automatically adds business context metadata. In its data integration offerings, Informatica had already integrated such CLAIRE AI technologies as genetic algorithms (to identify complex data sub-structures), natural language processing algorithms (to drive semantics-based modifications to data models) and machine learning algorithms (to parse clickstream, log, system, JSON and other “internet of things” data).
At Informatica World 2017, CEO Anil Chakravarthy spoke to theCUBE about how CLAIRE figures into its product roadmap going forward. “When we built CLAIRE, “ he said, “we did not invent the artificial intelligence or the machine learning. A lot of that is already available. So we took a lot of the best algorithms in machine learning and applied them to metadata and data management. That’s the secret sauce. It’s not the building the AI itself, it’s the use of the AI for data management.


