How to embrace event-driven graph analytics using Neo4j and Apache Kafka

With the new Neo4j Kafka streams now available, my fellow Neo4j colleague Tom Geudens and I were keen to try it out. We have many use-cases in mind that leverage the power of graph databases and event-driven architectures. The first one we explore combines the power of Graph Algorithms with a transactional database.
The new Neo4j Kafka streams library is a Neo4j plugin that you can add to each of your Neo4j instances. It enables three types of Apache Kafka mechanisms:
You can get a more detailed overview of how each of these might look like here.
Graph algorithms provide powerful analytical abilities. They help us understand the context of our data better by analysing relationships. For example, graph algorithms are used to:
Neo4j offers a set of graph algorithms out of the box via a plugin that can run directly on data within Neo4j. This library of algorithms has been very popularly received. Many times I’ve received feedback that the plugins are as fast or faster than what clients have used before. With such wonderful feedback, why wouldn’t we want to apply these optimised and performant algorithms on a Neo4j database?
Getting the full advantage of any analytical process needs resources. To get a nice, performant experience, we want to provide as much CPU and memory as we can afford.
Now, we could run this kind of work on our transactional cluster. But in this typical architecture, we’re going to run into some challenges. For example, if one machine is big, the other machines in the cluster should be matching. This could mean that the set up architecture is expensive.
The other challenge we face is that our cluster is supposed to be managing transactions — day-to-day queries such as processing requests. We don’t want to weigh it down with crunching through various iterations and permutations of a model. Ideally, we want to offload this along with associated analytical work.
If we know that the heavy querying that is going to take place is read-only, then it’s an easy solution. We can spin up read replicas to manage the load. This keeps the cluster focussed on what it’s supposed to be doing, supporting an operational, transactional system.
But how do we handle write backs to the operational graph as part of the analytical processing? We want those results, such as recommendations, as soon as they are available.


