Automate the business intelligence pipeline
- by 7wData
Demand for data by today’s business users is growing exponentially in two ways. First, business users have exhausted the opportunities in the data they hold. They want more sources of data to find new value, and they want the data to be accurate to deliver analytic outcomes. Second, the number of data-savvy business analysts is larger than ever and growing fast. To satisfy the increasing demand, IT departments must field a continuous stream of data requests—big and small.
While business execs see tremendous potential in putting more data at more users’ fingertips, organizations struggle to deliver that data. Traditional data integration tools, developed in the 1990s for populating data warehouses, are too brittle to scale to many data sources. They’re also prohibitively time-consuming and costly.
At the same time, new big data platforms only solve the data collection part of the problem. The data loaded into Hadoop is never enterprise-ready; it requires preparation before reaching a business user. The reality is that most enterprises are dealing with hundreds and thousands of sources. The only path to delivering clean, unified data across all of these sources is automation.
Let’s take a simple example from the insurance industry. For instance, business analysts want to understand claim risk of an upcoming flood. They pull customers for the affected geographic region and filter for those with active home insurance. Immediately they need to drill into high property values and sparsely populated areas. This requires individual policies to be classified into more granular categories. Then they want a broad, accurate view so that they can correlate customers with both home and car insurance (in a different system, of course).
They also want to enrich the data with up-to-date property value information, so they need to bring in an external benchmark of real estate pricing. Finally, this analysis needs to be done in near-real time to take action. Waiting a quarter or even a full month is out of the question. In all of these steps, the analysts need to be working off of clean and trusted data in order to draw correct conclusions and make data-driven decisions. To summarize the challenges that need to be met:
The two most challenging aspects of automating the delivery of data across many different sources are mastering and classification. Competent ETL engineers can do basic transforms like look-ups or minor calculations quickly and easily. But advanced tasks such as identifying global corporate entities or product categories across millions of records can't be easily scripted and maintained. We’ve all heard the example of matching “I.B.M.” to “International Business Machines,” but the problem is actually much more difficult. IBM has hundreds of subsidiaries, brands, and products. Mastering all of those together and bundling it up to be used by a businessperson is no small matching feat.
At Tamr, we’ve built a solution to automate these complex tasks to ensure rich and accurate data. Tamr uses a machine-driven but human-guided workflow to ensure the automation is efficient, accurate, and trustworthy. Tamr uses machine learning algorithms to predict how individual records should be classified and matched (as products, organizations, or individuals). For instance, if an invoice comes in with the description “Latex Gloves,” our algorithm might classify it as “Laboratory Supplies” and match it to the product “Rubber Gloves.” Tamr uses the entire record to “predict” these classifications and matches—everything from the description to the price to less obvious indicators like who created that invoice.
To ensure that these predictions are accurate, we have a workflow for experts and users to give feedback. Tamr’s algorithms are built to iterate on the feedback. Under the covers, we use supervised learning techniques to tune weights and improve accuracy.
[Social9_Share class=”s9-widget-wrapper”]
Upcoming Events
From Text to Value: Pairing Text Analytics and Generative AI
21 May 2024
5 PM CET – 6 PM CET
Read MoreCategories
You Might Be Interested In
Enterprise-grade property graphs debut on cloud
25 Jul, 2016Graph databases are powerful tools for many business applications. A graph database stores data based on its relationship to other …
Five Maturity Levels in Data-Driven Organizations
18 Apr, 2017Data is a critical competitive factor and is becoming more and more crucial for achieving success. In theory, hardly anyone …
Artificial intelligence quietly relies on workers earning $2 per hour
10 Dec, 2021In the late 18th Century, an automaton chess master known as the ‘Mechanical Turk’ toured Europe and the US. Designed …
Recent Jobs
Do You Want to Share Your Story?
Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.