Fetching all edges of a SuperNode using Janusgraph

The sole purpose of writing this article is to solve the issue of getting Read time out Exception while fetching all edges of a SuperNode using Janusgraph.

For finding the solution to this problem, there are many links available online on github, Google forum and Stackoverflow.

This article specifies the following things additionaly:

  1. What is a graph database (Janusgraph here) and how to use it.
  2. A simple Janusgraph Java client for OLAP queries.
  3. How to enable Janusgraph to support super-node i.e a node which have millions of incoming and/or outgoing edges.
  1. What is a graph database (Janusgraph) and how to use it.

A graph database consists of a collection of nodes (named as vertex) and edges (relationships between the nodes/vertices). A node represents an object ( or an entity in terms of database) and and edge represents the relationships between the nodes.

Nodes in a graph are connected via relationships/edges. A node may have different kinds of relationships with other nodes. Like a Person may have different kind of relationships with other Person, and a Person can have any relationship with other entity like an Item/product.

There are different kinds of graph databases available currently. We are focusing primarily on Janusgraph for the scope of this article.

Next, how to use Janusgraph as your graph database?

The documentation provided by Janusgraph provides all the necessary details about internals, basics and much more.

For connecting to Janusgraph, you got options to choose about the storage backends. You can choose Cassandra, HBase, Google Bigtable, Oracle Berkeley or Inmemory storage backends. We will be focusing on Cassandra primarily for now.

Janushgraph provides following cassandra options:

  • cql - CQL based driver. This is the recommended driver.
  • cassandrathrift - JanusGraph’s Thrift connection pool driver. (deprecated now)
  • cassandra - Astyanax driver. The Astyanax project is retired. (depricated)
  • embeddedcassandra - Embedded driver for running Cassandra and JanusGraph within the same JVM (deprecated)

Note : Starting with JanusGraph version 0.4.1 onwards, all non CQL-backends are deprecated, including cassandrathrift, cassandra and embeddedcassandra.

So we are left with cql based driver only.

We will be using Janusgraph Embedded Mode for the scope of this article. Here is a sample configuration for embedded mode:

This is the most simplistic properties required to connect to Janusgraph in embedded format. However, there are a lot of properties with which you can tweak and change as per the requirement.

2. A simple Janusgraph Java client for OLAP queries.

You can get the simple Java client from the following github repository:

Janusgraph Java Client Github Repository

and follow the readme file.

A simple ready to use Java Client to add the nodes using the JanusGraph is as follows:

Another standalone Java client to get the supernode and its edge count is as follows:

3. How to enable Janusgraph to support super-node i.e a node which have millions of incoming and/or outgoing edges.

So now comes the question of how to look out for the supernode or get the incoming or outgoing edges to a node which has edges in millions?

And the answer to this lies in the following options:

  1. If cost is not an issue and you are running the cluster mode for janusgraph and backend storage, then horizontally scaling the system will work.
  2. If you are running a single machine of both janusgraph and storage backend, try to move to clustered format of both.
  3. If you are running Janusgraph and Cassandra on a single machine, then try to separate both.
  4. If you are using the embedded mode of janusgraph and if you are using cassandrathrift as storage backend, then changing it to cql will resolve the issue of finding the supernode. Simply change

to

and everything will work fine.

There are also a lot of configurations available at the configuration reference section of Janusgraph documentation, which can be used for further experimentation and different type of requirements.

Hope this helps the masses and help resolving the major production issues.

Happy Learning and Happy Coding!!!

Software Architect