Cognitive Analytics Answers the Question: What’s Interesting in Your Data?

Cognitive Analytics Answers the Question: What's Interesting in Your Data?

Big Data or Not Big Data: What is question?">Dimensionality reduction is a critical component of any solution dealing with massive data collections. Being able to sift through a mountain of data efficiently in order to find the key descriptive, predictive and explanatory features of the collection is a fundamental required capability for coping with the Big Data avalanche. Identifying the most interesting dimensions of data is especially valuable when visualizing high-dimensional (high-variety) big data and when telling your data’s story.

There is a “good news, bad news” angle here. First, the bad news: the human capacity for visualizing multiple dimensions is very limited: 3 or 4 dimensions are manageable; 5 or 6 dimensions are possible; but more dimensions are difficult-to-impossible to assimilate. Now for the good news: the human cognitive ability to detect patterns, anomalies, changes, or other “features” in a large complex “scene” surpasses most computer algorithms for speed and effectiveness. In this case, a “scene” refers to any small-_n_ projection of a larger-N parameter space of variables.

In data visualization, a systematic ordered parameter sweep through an ensemble of small-_n_ projections (scenes) is often referred to as a “grand tour”, which allows a human viewer of the visualization sequence to see quickly any patterns or trends or anomalies in the large-N parameter space. Even such “grand tours” can miss salient (explanatory) features of the data, especially when the ratio N/_n _is large.

Machine learning algorithms (e.g., the random forest algorithm) are increasingly effective at finding the most explanatory (most predictive) features in big data. But that presumes that you already know what needs to be explained! That is a supervised learning approach (in which you know in advance the key classes of objects and events represented within your data). But what if you don’t those key classes yet? How do you find the interesting features within your data in the first place? That requires an unsupervised learning approach along with some human understanding of what defines “interesting.”

Consequently, a cognitive analytics approach that combines the best of both worlds (machine learning algorithms and human perception) will enable efficient and effective exploration of large high-dimensional data. One such approach is to apply computer vision algorithms, which are designed to emulate human perception and cognitive abilities.

Computer Vision (CV) is a methodology (based on a set of algorithms) that enables computers to interpret what a sensor visually perceives. CV is not a new field, but it has traditionally been applied primarily to image processing and image analysis. CV algorithms include edge-detection, gradient-detection, motion-detection, change-detection, object-detection, segmentation, template-matching, and pattern recognition. Many of these same algorithms can be applied to high-dimensional data streams that are not images but are “scenes” (such as still frames in a grand tour) that are projections of high-dimensionality data into lower-dimension parameter spaces. This is truly a cognitive analytics approach.

One possible outcome of using CV is the generation of “interestingness metrics” that signal to the data end-user the most interesting and informative features (or combinations of features) in high-dimensional data (or that are discovered in a grand tour). Interestingness can be measured using specific observable parameters or can be inferred via the detection of interesting patterns in the data. An example of the latter is latent (hidden) variable discovery.

Latent variables are not explicitly observed but are inferred from the observed features in a data set. Latent variables are inferred primarily because they are the variables that cause the all-important interesting descriptive, predictive, and explanatory patterns seen in the data set. Latent variables can also be concepts that are implicitly represented by the data (e.g.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

How Universities and Big Business are Solving Big Data Problems Together

11 Nov, 2016

Scott graduated from Cardiff University with a degree in English Literature and a diploma in Magazine Journalism. He has a …

Read more

How AIOps Revolutionizes Alarm Management

12 Oct, 2018

If you work in IT Ops, alarm management is likely one of your greatest and most persistent challenges. Your monitoring …

Read more

What Are a Few AI Research Labs on the West Coast?

6 Oct, 2019

Artificial Intelligence is still a nascent technology; much of the groundbreaking work moving the industry forward is done inside AI …

Read more

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.