There are a few questions about the Space based Architecture (SBA) that I frequently encounter. In this post I'd like to share with you the answers to some of them, in particular those about the type of applications that can best benefit from SBA, and how stateful applications can be part of that group.
SBA fits only to green fields applications?
No. SBA defines a holistic view (The first attempt that I'm aware of) on how to scale stateful applications. You can apply SBA gradually, i.e. you can start by just vitalizing one layer and then gradually move your business model into services which can easily fit into the processing unit of work. Many of our users start by just introducing the In-Memory Data Grid (IMDG) part to improve the performance of their existing database and only later move their application into processing units.
Even when you move your application entirely to SBA you can still interoperate with existing legacy applications through the Mirror Service and IMDG Cache store. The Mirror and CacheStore service ensures that your existing database will be fully in sync with the state in the SBA layer. This allows legacy applications to continue working with the database as always. In this case only the performance- and latency-sensitive part of your application need to move to SBA.
Who should consider SBA and what would be the trigger for using SBA or just plain In-Memory Data Grid (IMDG)?
In general, anyone who wishes to scale the application just by plugging-in more machines without touching the application code should look into SBA. The classic early adopters of the technology are those who hit the wall with the alternatives, either due to complexity, cost, and/or performance. Typical applications that will fall into this category would be:
- Low latency transactional applications (Trading, Market-Data etc.) – The main driver for these applications is scaling in order to handle the increase in data/trade volumes while keeping the latency low at low cost.
- Real-time analytics – These are data intensive analytical applications, which basically need to process some sort of analytical algorithm over a large set of data in a short period of time. Typical applications that fall into this category would be P&L calculation, reconciliation in the financial sector, fraud detection in telco etc.
- High performance stateful SOA – This is a more horizontal category that is not industry specific. There are different initiatives in many organizations to move to an SOA based approach due to the flexibility inherited with the loosely coupled nature of SOA. The current implementations of SOA are mostly targeted for solving the integration problem between multiple applications. There is a different class of applications that need to be built as a set of Services under a context of a specific application. For those applications, the WebServices approach can't be an option due to the performance overhead and the stateless nature of this environment. For these applications SBA provides a high performance SOA platform which supports stateful applications.
- High Performance J2EE applications – This is also a horizontal category. In general, J2EE applications that need to share state at high performance can benefit from the use of IMDG as a fast storage option, as well as from the master worker approach which can parallelize the processing within the J2EE container or outside the J2EE container.
I can see how this model fits into stateless compute type applications, but how can you ensure data affinity, FIFO in this type of environment?
Traditionally, scalability and statefulness are two contradicting terms – a common practice in the industry for scaling applications was to build them as stateless applications. (As a side comment IMO there is no such thing as stateless applications in the real world. By means of stateless, users really mean that the state doesn't reside within the presentation or business tier but it exists only in the database tier.) The down side of making your application stateless is performance – every operation that involves state changes would need to go to the database. I think we all know what that means in terms of performance, right?
I agree that scaling of stateful applications is indeed a big challenge. It is a big challenge since it requires that the business tier would support and implement database capabilities for maintaining consistency and reliability of its state.
As you can imagine, by now, the first thing that needs to be addressed is the introduction of database capabilities into the business logic. This is handled by the In-Memory Data Grid (IMDG) layer. Now in an SBA type of environment we can have different instances of IMDG each one co-located within each processing unit.
The incoming requests, however, are routed to the processing units via the messaging layer. In such cases there is a potential that a message would be routed to a processing unit that doesn't contain the data that is required for the processing of that request. This could either result in a failure or an additional network call to another IMDG instance. As you can imagine, if that is the case – the processing unit is not self sufficient since it becomes dependent on data that lives on another processing unit. Ensuring that the request is routed to where the data resides is referred to as Data affinity.
Data affinity with GigaSpaces is assured by ensuring that the key for storing the data will be consistent with the key that will used to rout the request – we refer to this key as the routing-index.
Each request is first written to the space that lives within a specific processing unit. At that level the space behaves as a queue. The business logic services "take" the requests from the space and processes them. At that level, the space ensures that the business logic services will receive the requests in the order they were written.
I hope that this post answered some of your question marks. In future posts I'll be sharing more of these…