Integration and interoperability are often as crucial to an organization as low latency and scalability requirements. None of us want to use software that would take us months or even years to integrate in our current environments, force us to throw away all existing resources, and only work with a few applications in the organization because it supports limited or proprietary set of APIs.
As I feel very strongly about this topic and often get questions about it, I’d like to share with you my view on some of these FAQs.
Wouldn’t your SBA approach lock me into GigaSpaces?
One of the main principles that we’re trying to keep is making SBA seamless from the developer point of view. The idea is that you could still develop your application as if you’re running on a single server just like you did with the tier based approach. The difference will be mostly within the implementation of the middleware components keeping your application pretty much intact. The reason I chose not to use the word transparent is simply because you need to be aware of the underlying architecture as there would be performance and scaling implications if you do not design your architecture correctly. So in reality I believe you will need to change your existing application to make the best use of SBA. However, the layer of abstraction ensures that your application could work with existing alternative approaches even after applying those changes. I know that this may sound like “markitecture” but if you examine the amount of effort that we’re putting into that part you would see how important that part is in our overall approach. I’ll try to give you some ideas of what I mean by that:
1. From an API perspective we support JDBC, JMS, JCache, and JavaSpaces of course. We also support common programming patterns and models such as DAO, Declarative transactions, Remoting (I’ll go through the details of that on my next post). This enables you to get the benefit of SBA through existing API’s.
2. From the community perspective we’ve taken some strategic steps to leverage the community effort as a means to open our interfaces and make them fit into existing and popular frameworks. These include:
a) Supporting Open Source projects
i. Rio – SLA driven light weight container based on Jini
ii. Spring – JavaSpaces modules (see also my post on this issue)
iii. Mule – JavaSpaces connector
b) Established commercial agreements to support leading Open Source projects. We see a strong value in contributing to Open Source community projects.
i. Interface21 – the owner of the Spring framework
ii. Mulesource – the owner of the Mule/ESB framework
Will the use of the In Memory Data Grid (IMDG) replace the use of Databases?
In most cases, we are NOT totally replacing the RDBMS, we’re simply putting it to its proper use. RDBMS is not a solution for everything and we forecast that there will be a technology shift in this area. If we follow the logic that old technology (e.g. RDBMS) will never be replaced by new technology because so much has been invested in the old, we would never see any innovative technology being adopted, which is clearly not the case. When the pain reaches a certain degree and the new technology can produce significant enough value, the forces of change overcome the forces of resistance.
We do believe, however, that RDBMS will continue to play a major role in areas it is best suited for, i.e. managing large sets of persistent storage, and that is why we have made significant investment in the integration with RDBMS. The interesting thing is that you can make the two, seamlessly, work together and combine the benefits of both worlds. You can have the performance sensitive part of your application working with IMDG (In Memory Data Grid), and the IMDG will synchronize its state with the RDBMS. Other applications can still see the data through RDBMS as if it was written to it directly. The fact that the memory can be a reliable store enables asynchronous writes to the database ensuring zero data loss. In addition, the actual writes can be done on the same machine as the database through a Mirror service. Combining the two reduces the synchronization overhead significantly.
In our case, it’s not a matter of cost, nor a matter of having an in-memory approach vs. a disk-based approach that drives customers to use space-based technology. It is the architecture that matters. With traditional approaches, we used to think in either messaging terms or database terms. In reality, almost every application requires a combination of both: we use messaging to synchronize state, and a database to share state. Developers need to coordinate the two! That’s fine as long as we can use a centralized approach. Once you start thinking of scaling-out, i.e., scaling by adding more boxes, this model breaks. It breaks because each tier is built as a stand-alone with its own clustering and high availability models, and in many cases, from a different product. Bundling different messaging and database solutions within the same application context and trying to make that linearly scalable will not work, because of that inherited limitation and complexity.
You can find more about this issue in my previous post ‘Persistency and the reliability myth‘
It looks like in your approach messaging and data become unified – is that the case? Does that mean that I don’t need to integrate with external messaging systems to populate my data across the network?
In the traditional middleware world, the purpose of messaging is to deliver data in a reliable fashion. The purpose of databases is to enable sharing of the data in a consistent manner. The reality is such that in almost every distributed application you need to have the combination of the two – you need database for sharing data and messaging to trigger changes on that data. The fact that the two are separate technologies, and in many cases separate products, means that each has its own clustering model and scalability model. This is one of the major reasons for the current complexity with the existing middleware. i.e. in most cases you need to choose the primary approach (Messaging or data) and complement the other through the application. You would normally choose messaging as the primary approach for high performance applications and data for complex stateful applications.
Distributed Data Sharing enforces a combination of the two – i.e. messaging becomes implicit to enable synchronization of data between the different data grid instances. That being said, the need for explicit messaging becomes questionable. Once we have distributed data sharing, do we really need to separate between the two? The fact that you can write an object on one node and read it on another, addresses what in most cases falls under the category of messaging. In our case, however, this aspect is also being addressed by the In Memory Data Grid (IMDG).
The use of the IMDG for sharing data as well as distributing it, enables us to use a single technology, clustering model, and product to address the two aspects, thus improving the complexity, scalability, performance and even reliability of the application. The JavaSpaces model provides the ability to trigger events about changes of data items (through “notify”) and combined with SQL query, it provides the ability to deliver continuous query events which is one of the most complex forms of messaging.
Does GigaSpaces integrate with existing application servers?
Absolutely. It can fit nicely as an add-on to existing application server solutions and enables the scaling of those applications. Combined with our Spring support such integration becomes seamless, i.e. users who are already using Spring can use our spring abstraction layer and run their application outside J2EE within a J2EE container.
Does SBA = the JavaSpaces API? Can it be compared to any other transparent partitioning middleware?
To be clear, SBA is not about the API; it is about the architecture. SBA can be implemented with different APIs. In fact, we expose different APIs as part of our product, including JavaSpaces, JDBC, JMS. With our Spring support, we also provide a more declarative approach in which the API becomes completely abstracted.
As always, I hope you found this post useful and am looking forward to read your comments.
Nati