From time to time, I like to visit NoSQL conferences and talk to people from the community to see how they solve their problems. I guess I find it more neutral to do this sometimes, as opposed to speaking to our customers, as they might be biasedJ. In any case, it seems that the hype around MongoDB is increasing in an amazing way … and so as no surprise, they got a huge investment and they have an excellent product, which has come a long way.
One thing though (or maybe two) that you keep hearing from the MongoDB community (and that I would think also applies to Cassandra and HBase) is the lack of real transactions support. For the record, MongoDB does provide some support for transactions, but to have real distributed transaction support in Mongo is not an easy task (and Mongo is probably the most advanced solution when it comes to transactions as compared to its competitors).
The Challenge of Distributed Transactions
Now where do we need distributed transactions? Let’s say that we have a financial system that stores accounts across the cluster, so we can have many partitions and each one of them would store 100K accounts. Now we want to perform operations between multiple accounts, for instance transferring funds from one account to another. We can’t guarantee that both of the transactions would be located in the same partition (node)—and actually, it’s more likely that they would be in separate nodes. That means that we need to have a distributed transaction that would deduct funds from the first account on partition one, add funds to the second account on partition two, and then commit, so this operation has to be atomic. In this case, the entire operation will be a success or a failure, so we must ensure that both accounts are updated.
Why is it so complicated then? Well, there are many things that can go wrong, starting from locking of items to failure of nodes. Simple scenarios to demonstrate this would be a failure of the second node right after we updated the first node, and then we are left with the first account that has less funds, which didn’t go to the second account since there was a failure.
Mongo, as stated above, does provide some sort of transaction capabilities, but when it comes to distributed transactions, it provides kind of a “like transactions support.” The user still needs to maintain the state of the transaction and to either commit or roll back on the base of some factors.
Adding a Real-Time Distributed Transactional Layer on top of MongoDB
Let’s meet XAP, an in-memory data grid provided by GigaSpaces that can help solve all of the challenges stated above—and much more. XAP, similar to MongoDB, is a distributed environment introducing a cluster of multiple nodes on multiple machines that stores the data and business logic in memory. That very same business logic which is running in memory can be transactional. XAP is using spring behind the scenes and that is why it also benefits from spring capabilities—also the transaction support.
If we take the example we had before, we would store multiple accounts in the XAP cluster. Moreover, those accounts will be stored in memory so it would be much faster to access them, query, update, and so on. Then we can have a polling container which is a piece of code that is running within the cluster in the same JVM processes that store the data and is very similar to a JMS queue. So this polling container would “listen” to some criteria, let’s say a new Funds Transfer Request object with the state of NEW, and then validate the request (validating that both accounts exist and that the source account has enough funds in the balance to make that transfer). The actual validation part would be in the processing method of that polling container and then it would update the state of that object to VALID. Now we have another polling container that listens to the Valid Funds Transfer Request object that makes the actual transfer between the accounts—this piece of code can be declared as transactional as simple as with an @Transactional annotation support like in spring, and the processing part would be transactional and updated in one atomic operation, so that logic would either entirely go through or fail.
Highly Available Cluster That Supports Strong Consistency
All of these behaviors would benefit from the strong consistency and high availability that XAP provides. In the case of machine or node failure, XAP has automatic failover from primary to backup node and the backup is completely synchronized with the primary. Unlike many other distributed environments, XAP ensures strong consistency and not eventual consistency. That means that when you write an object, you get acknowledgment of the operation once both the primary and the backup got the update, ensuring that the backup nodes are reliable in case of failure of the primaries.
How to Integrate Transactional XAP with MongoDB
XAP provides very intuitive integration to MongoDB (also to Cassandra and other relational DBs—you can have integration to your own data store just by implementing an interface). That integration would utilize XAP as the system of record and asynchronously persist the changes to MongoDB. It is also possible to store only part of the data in XAP and the historical data in Mongo. Even if one needs to fetch some items which are not present in XAP (since they were evicted already and now stored in the long-term storage—our MongoDB), XAP will know to do it automatically and load them from Mongo. If we have polling containers (or other business logic) which are transactional (or not), the changes will be performed on the XAP cluster, ensuring automatic failover and strong data consistency—and only after that will it be asynchronously persisted to MongoDB, only once we know if the entire operation has succeed in XAP or rolled back.
To conclude, XAP is an in-memory data grid that can provide an ultra fast data access layer and also strong consistency distributed transactions layer support on top of Mongo. The integration to Mongo is supported in the product out of the box, so go ahead and try XAP now!