Automate the business intelligence pipeline Blog

Automate the business intelligence pipeline

by 7wData
February 13, 2017

Demand for data by today’s business users is growing exponentially in two ways. First, business users have exhausted the opportunities in the data they hold. They want more sources of data to find new value, and they want the data to be accurate to deliver analytic outcomes. Second, the number of data-savvy business analysts is larger than ever and growing fast. To satisfy the increasing demand, IT departments must field a continuous stream of data requests—big and small.

While business execs see tremendous potential in putting more data at more users’ fingertips, organizations struggle to deliver that data. Traditional data integration tools, developed in the 1990s for populating data warehouses, are too brittle to scale to many data sources. They’re also prohibitively time-consuming and costly.

At the same time, new big data platforms only solve the data collection part of the problem. The data loaded into Hadoop is never enterprise-ready; it requires preparation before reaching a business user. The reality is that most enterprises are dealing with hundreds and thousands of sources. The only path to delivering clean, unified data across all of these sources is automation.

Let’s take a simple example from the insurance industry. For instance, business analysts want to understand claim risk of an upcoming flood. They pull customers for the affected geographic region and filter for those with active home insurance. Immediately they need to drill into high property values and sparsely populated areas. This requires individual policies to be classified into more granular categories. Then they want a broad, accurate view so that they can correlate customers with both home and car insurance (in a different system, of course).

They also want to enrich the data with up-to-date property value information, so they need to bring in an external benchmark of real estate pricing. Finally, this analysis needs to be done in near-real time to take action. Waiting a quarter or even a full month is out of the question. In all of these steps, the analysts need to be working off of clean and trusted data in order to draw correct conclusions and make data-driven decisions. To summarize the challenges that need to be met:

The two most challenging aspects of automating the delivery of data across many different sources are mastering and classification. Competent ETL engineers can do basic transforms like look-ups or minor calculations quickly and easily. But advanced tasks such as identifying global corporate entities or product categories across millions of records can't be easily scripted and maintained. We’ve all heard the example of matching “I.B.M.” to “International Business Machines,” but the problem is actually much more difficult. IBM has hundreds of subsidiaries, brands, and products. Mastering all of those together and bundling it up to be used by a businessperson is no small matching feat.

At Tamr, we’ve built a solution to automate these complex tasks to ensure rich and accurate data. Tamr uses a machine-driven but human-guided workflow to ensure the automation is efficient, accurate, and trustworthy. Tamr uses machine learning algorithms to predict how individual records should be classified and matched (as products, organizations, or individuals). For instance, if an invoice comes in with the description “Latex Gloves,” our algorithm might classify it as “Laboratory Supplies” and match it to the product “Rubber Gloves.” Tamr uses the entire record to “predict” these classifications and matches—everything from the description to the price to less obvious indicators like who created that invoice.

To ensure that these predictions are accurate, we have a workflow for experts and users to give feedback. Tamr’s algorithms are built to iterate on the feedback. Under the covers, we use supervised learning techniques to tune weights and improve accuracy.

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Automate the business intelligence pipeline

Leave a Reply Cancel reply

Upcoming Events

MarkLogic World | Amsterdam

From Text to Value: Pairing Text Analytics and Generative AI

Bringing Data Closer to Decision Makers with Data Fabric

Categories

Tags

You Might Be Interested In

Enterprise-grade property graphs debut on cloud

Five Maturity Levels in Data-Driven Organizations

Artificial intelligence quietly relies on workers earning $2 per hour

Recent Jobs

IT Engineer

Data Engineer

Applications Developer

D365 Business Analyst

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

Automate the business intelligence pipeline

Leave a Reply Cancel reply

Upcoming Events

Categories

Tags

You Might Be Interested In

Recent Jobs

Do You Want to Share Your Story?

Join our community

Our Services

Company

Work With Us

Follow Us

Get the 3 STEPS

To Drive Analytics Adoption And manage change

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.

To Drive Analytics Adoption
And manage change