How Does a GPU Database Play in Your Machine Learning Stack?

How Does a GPU Database Play in Your Machine Learning Stack?

Machine learning (ML) has become one of the hottest areas in data, with computational systems now able to learn patterns in data and act on that information. The applications are wide-ranging: from autonomous robots, to image recognition, drug discovery, fraud detection, etc.

At the cutting edge is deep learning, which draws its inspiration from the networks of neurons that comprise the cerebral cortex. These networks are massively parallel. As such, it’s no surprise that an increasing number of ML approaches are turning to graphical processing units (GPUs)—a key hardware component for general-purpose parallel computation.

Kinetica has been leveraging GPUs for massively parallel data analysis since 2012. As an in-memory analytical database, Kinetica is able to utilize multiple GPUs across many nodes to perform massively parallel statistical and analytical queries. Users can also apply custom code for analytical processing by leveraging user-defined functions, allowing Kinetica to integrate with a growing number of GPU-accelerated ML libraries, such as TensorFlow, Caffe, Torch, and BIDMach.

But this raises the question: if your ML library is already leveraging GPUs, what does Kinetica add to the ML stack?

Kinetica is tried and tested in large-scale enterprises, with production clusters deployed over dozens of nodes. At this scale most ML models are trained on subsets of the raw data, and most do not actually retain this raw data. Instead, they use the raw data to learn a state (e.g., the strengths of various network connections) before disposing of it—or siloing it in a data warehouse, never to be seen again.

With Kinetica, data can be stored in-memory and be rapidly accessed by the ML model as necessary. One key advantage to having the data closely integrated means that the user can always go back and fit their model as necessary.

Consider an example using time series data. It turns out that by learning the data in two stages—first forwards in real time but then again backwards—you will generally achieve a better overall fit to the entire dataset (i.e., Kalman smoothing vs. Kalman filtering).

To return to the neuroscience analogy, there is a close parallel to wake-sleep cycle animals. The networks of the brain are thought to learn online throughout the course of the day but require a period of sleep in which these model are re-fit to stored memories, most famously in the auto-associative networks of the hippocampus.

Theorists in machine learning have long been aware of the No Free Lunch Theorem. Simply put, there is no magic algorithm that can perform any better than any other in general — that is, when averaged over all conceivable inputs. What this means is that ML models can only succeed to the extent they are well-constructed for the problem at hand. A model that has been developed for image recognition is unlikely to do well when applied to credit card fraud.

This is true even with deep learning. It is often asserted that deep learning is a fundamentally new innovation that solves the feature selection problem—that is, deep learning will learn features from raw data obviating the need for feature selection. Unfortunately, there is no getting around the No Free Lunch Theorem.

Let’s again consider the cerebral cortex. It is certainly true that the cortex is capable of selecting and refining features via feedback, such as in the the early visual cortex. But note that before even arriving in the cortex, visual information has been extensively filtered, such as in the complex circuitry of the human retina. And most of this is fairly hard-wired: if the rules of physics suddenly changed, your eyes would probably not be of much use.

What this means for ML is that models can benefit enormously from incorporating field expertise and the discovered insights of data scientists.

Here Kinetica is an invaluable addition to your machine learning stack.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

AI in Workforce Management

17 Nov, 2020

Workforce management leaders face an extraordinary challenge as the global pandemic affects strategic workforce planning, resourcing, and talent utilization. Opportunity …

Read more

How Machine Learning Helps With Fraud Detection

29 Jul, 2017

Fraud detection with machine learning requires large datasets to train a model, weighted variables, and human review only as a …

Read more

How can you secure big data in the information age?

25 Mar, 2017

Organisations are collecting more information than ever — much of it now coming from an exploding range and number of …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.