The truth about in-memory computing

The truth about in-memory computing

A few weeks back one of my favourite analysts, Merv Adrian tweeted the following:

““Just move it to memory and it will speed up.” Not so fast (pun intended.) Serious engineering required – even for a KV store. ”

I could not help but smile when I saw this. I’ve spent years telling anyone who would listen that putting data into memory doesn’t instantly transform software, originally written for disk-based data, to “in-memory”.

In 1988, at White Cross Systems (a pioneer in MPP in-memory systems, which later evolved into Kognitio) we set out to use the concept of MPP to build a database that would support, what we now call data analytics, but was then called Decision Support. Most databases at that time were designed and optimised for transaction processing, rather than decision support and so we were effectively starting from scratch. We wanted to build a system that was fast enough to support train of thought analysis and could scale linearly to support large and growing data volumes.

We never set-out to build an in-memory system, but it became clear to us early on, that if we wanted to exploit massive parallelisation, then we could not be limited by disk IO speeds. Reading data from slow physical disks seriously limits the amount of parallelisation you can effectively deploy to any task, as the CPUs (processors) very quickly became starved of data as everything became disk IO bound.

This is the most basic and important point that is often missed when talking about in-memory. It’s not the putting of data “in memory” that makes things faster. Memory, like disk, it is just another place to park the data. It’s the Processors or CPUs that run the actual data analysis code. Keeping the data in memory allows the CPUs fast access to the data, keeping them fed with data and enabling parallelisation.

For this reason we decided to build a system which kept the data of interest in fast computer memory or RAM (Random Access Memory). In retrospect this was a brave decision to make in the late 80s.  Memory was still very expensive, but because we were rather young and naïve, we believed that the price would fall relatively quickly making the holding of large data sets in-memory, an economical proposition. Ultimately we were right, even if it did take a couple of decades longer than we thought!

The point I’m making is this. When we took the decision to go in-memory, it dramatically changed our code design philosophy. Not being disk IO bound meant we became CPU bound, so code efficiency became hugely important. Every CPU cycle was precious and needed to be used as effectively as possible. For example, in the mid 90s,  we incorporated “dynamic code generation” into the software, a technique that involves dynamically turning the execution phase of any query into low level machine code, which is then distributed across all of the CPUs in the system. This technique reduced code path lengths by 10-100 times. I am not saying that advanced techniques like machine code generation are essential components of an in-memory system but I am saying that using an efficient programming language is important when machine cycles matter. So probably not JAVA.

Designing code specifically for in-memory also has another important benefit because, besides being faster, RAM is also accessed in a different way to disk.

Share it:
Share it:

[Social9_Share class=”s9-widget-wrapper”]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You Might Be Interested In

4 Data Virtualization Vendors to watch in 2017

4 Feb, 2017

Data Virtualization tools are being deployed by companies that want to light a fire under their Data Discovery projects. With …

Read more

Career roadmap: Data engineer

25 May, 2022

Data engineering combines elements of software engineering and data science and is one of the fastest-growing roles in IT. According to Indeed.com, …

Read more

Bring structure to your data and simplify your life

5 Jul, 2016

When you read the books about raising children, we are often taught that children need structure. Structure helps our children …

Read more

Recent Jobs

IT Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Data Engineer

Washington D.C., DC, USA

1 May, 2024

Read More

Applications Developer

Washington D.C., DC, USA

1 May, 2024

Read More

D365 Business Analyst

South Bend, IN, USA

22 Apr, 2024

Read More

Do You Want to Share Your Story?

Bring your insights on Data, Visualization, Innovation or Business Agility to our community. Let them learn from your experience.

Get the 3 STEPS

To Drive Analytics Adoption
And manage change

3-steps-to-drive-analytics-adoption

Get Access to Event Discounts

Switch your 7wData account from Subscriber to Event Discount Member by clicking the button below and get access to event discounts. Learn & Grow together with us in a more profitable way!

Get Access to Event Discounts

Create a 7wData account and get access to event discounts. Learn & Grow together with us in a more profitable way!

Don't miss Out!

Stay in touch and receive in depth articles, guides, news & commentary of all things data.