Many times I’m being asked how to design the Space object data model to manage complex object graph in an optimal manner while stored within the In-Memory-Data-Grid. Here are few recommendations that I’ve seen implemented in the field:
Large Collection Handling
When your application constructs a collection of objects that are stored within a single space object you should take into consideration the overhead of replicating the object with its collection to the backup space and the garbage collection activity executed when the primary and backup JVMs generating and removing the collection objects.
The problem becomes sever when the business logic is collocated with the space and the collection is very large and being updated in very rapid manner or when you write/remove/update large amount of space objects in a single batch that stored collections with large amount of objects.
In such a case, the overhead of the replication and the garbage collection (mainly at the backup space) could be enormous, where the space JVM constructing and removing large amount of objects in very short time. Such a scenario could move the JVM into a “panic mode” – i.e. allocating large amount of memory and releasing it in very short time in a repeated manner. This could happen for relatively large amount of time. During this time the primary space is waiting for the backup space to stabilize, which in turn could stale and halt the clients – A classic ripple affect in a distributed environment.
Here is an example of the memory utilization of a JVM in a “panic mode”:
To eliminate the above problem you should avoid writing/updating/removing large collections as part of the space object. As an alternative you should store a collection of object IDs/proxies and construct the actual collection objects data on-demand. The object data within the collection could be fetched from some external resource such a database (this will allow you also to reduce the overall footprint and memory utilized by the Data-Grid), or fetch it from the space itself. In such a case the actual collection objects will be stored as separate space objects. The readById operation that has been introduced with XAP 7.0 could be very useful in such a case.
Space object with Multiple References to the Same Object
When constructing a Space class object and writing it into a collocated space, from a business logic such as a Task or Service Executor, Polling or Notify container with an embedded space, where the space object fields referencing other objects (different or the same one) be aware that once the space object is being replicated to the backup space, the referenced objects will be serialized separately. This means that the replicated object at the backup space will consume larger amount of memory compared to the original primary copy in case the original object has multiple references to the same object.
In the same manner a remote client that is writing a space object that has 2 or more root fields pointing to the same object will have the same issue with the primary space (and also the backup space). Within the space, each reference will be translated into a different object, and when the object will be read back from the space, these fields will point to different objects – this means these will be duplicated objects with identical data , but have different reference in memory.
To minimize the extra memory footprint at the backup space and the duplicated objects, make sure you design your Space domain classes to have only a single reference to a particular object. This can be implemented via a getter method that will be used by the relevant business logic or data access method.