I had interesting discussions in the past few weeks about the use of Object Graphs and how that approach maps into a space model. It reminds me of the discussions I use to have when I used Versant as an object database and object store in the late 90s. Object Graphs assume a certain hierarchy in which the data is ordered. The model also assumes that the navigation path to that data takes that specific order.
One of the promises of the Object Graph model is that through navigation you are able to point directly to the data and therefore be very efficient in how fast you can retrieve it. The fact that it could be just a pointer makes the navigation task very simple and native to the programmer. For example — all you do is Customer.getAccounts().. and you will get a direct reference to the relevant account list.
So the idea behind object graphs is that by ordering the data in a certain hierarchy you can speed up significantly the time it takes to retrieve large sets of data simply by pointing directly to it.
Another reason is usability, because browsing through the data model in an object-oriented fashion can be more intuitive. In this post I’ll focus mostly on the first aspect – the performance. I’ll address the usability in a separate post.
So how do Object Graphs relate to the space model?
The space stores tuples as objects, or entries, as they are called in JavaSpaces terminology. A space maintains entries as objects which can contain graphs as attributes. The way a space handles the relationship between different entries is closer to a relational model than it is to an object-oriented model. In other words, you can view each entry-class as a table and each entry-instance as a row within that table. That means that two separate entries CAN’T maintain a direct Java reference to the same attribute. As with a relational model, it is possible to use logical references through object IDs and even create a hierarchical view out of the individual entries using a parent-child pattern.
The following link provides a reference to how you can build Object Graphs with GigaSpaces. Hibernate is also a good example of how you could create Object Graphs out of the relational model using O/R mapping techniques, and I can imagine you can build equivalent mapping tools over a space model to address such requirements.
Seems straightforward, right? Well, the underlying implementation can be complex, especially when we need to deal with large data structures. In such cases we need to provide the ability to get specific objects, smart indexing, explicit locking of related objects, cascading deletes, etc. Due to the relationship between the different objects this approach tends to be relatively costly on inserts and updates. It also tends to be limited in terms of concurrency, due to the complex locking structure.
While all these are technically doable, I ask: do we really need all that? What value do I get if I order my data in such a hierarchical model? If performance is the objective, isn’t there a different way to achieve it?
What if I could retrieve that data just at the same speed or even faster through a search?
Google Desktop as a search mechanism for emails is a good analogy. Even though there are lots of technical differences on the implementation side, it provides a very good equivalent to the issue above.
Let me explain:
Before desktop search technologies such as Google Desktop, the way we arranged our emails and documents was in hierarchical folders. Some people use the sender name as the hierarchy model, some use date — but we all ordered our information in some sort of hierarchy, and when we looked for a specific item, we simply navigated to the appropriate folder and then browsed to the appropriate item. As long as the way we looked for that data-item was consistent, the model worked fine, but even then, if we ordered the data based on name and then wanted to look an item up by date, the model broke.
So the main reason we organized our emails in folders was to speed up the way we could find data items (archived emails in this specific case). The main reason we all went through that process was because the original search provided by Microsoft was too slow. Now, when things such as Google Desktop emerged, and search was as fast as navigation — if not faster — that changed the entire method in which I organized my data. I stopped placing items in folders. It doesn’t make sense anymore. With this approach I don’t have to organize the information in a certain hierarchical order I simply select the right keywords and search for the relevant item from the search results.
Now back to my original point:
A space, and an In-Memory-Data-Grid, provides the opportunity to run search in-memory, which can be as fast as navigation because the index in a space points directly to the reference of the object in-memory. There are also open source projects such Compass, led by a GigaSpaces colleague of mine, Shay Banon, that provides Google-like search for POJOs. Compass has the promise of having an effect similar to that of Google Desktop on the way we order our in-memory data. That could change the whole motivation for using object graphs in the first place.
I’m over-simplifying things here for the sake of the discussion, however, unless I’m missing something fundamental, I think that we should really ask ourselves the question of: why browse (or navigate) when you can simply search?
We need to ask that question before we start thinking about how we build complex distributed object graphs, how we deal with referential integrity, locking of hierarchical trees, managing lifecycles of our objects based on reference counts — all those “nice” things that need to be addressed as soon as you start dealing with graphs.
Anyway, those are just my initial thoughts and I hope that I’m making sense here – I would be very interested in hearing what others think about this topic.