A customer recently contacted us to discuss how to use GigaSpaces XAP to minimize the number of times an expensive computation must run. The particular domain is option price stress testing, but the problem is more general and the solution demonstrates a couple of interesting GigaSpaces XAP features.
Consider a service that executes an expensive computation. In this case, “expensive” means that it takes a significant amount of time to process a request. Many clients will be making requests nearly concurrently. Some requests from different clients will be identical. The result for a particular request is valid through the end of the day.
The system is configured to have a number of instances of the service running simultaneously. Communication with the service instances is via the the space: clients write requests to the space and retrieve the results when they are available. Either blocking read or notification can be used to get the result.
The primary requirement of this system is that the service should not be invoked unnecessarily. This means that results, once computed, should be available for as long as they are valid and that requests that arrive while an identical request is being processed should not cause a second service invocation.
An additional requirement is that any mechanism for meeting the processing requirements should be transparent to the clients. Clients should simply write requests and read results.
The first requirement, that results should be available for as long as they are valid, is trivially easy with GigaSpaces XAP. When the service computes a result, it writes it into the space with a lease of the appropriate length. From then on the result is available immediately to any client making an identical request.
The second requirement, not processing identical requests even when the result is not yet available, is more technically interesting. The basic technique, unsurprisingly, uses a mutex. When a request is written to the space, it is picked up by a worker that is responsible for invoking the service. Before that invocation, the worker checks for an exiting result. If one exists, the worker moves on to another request. If no result is found, the worker attempts to write a mutex, based on the unique identifiers of the request, into the space. If an identical mutex already exists, the worker moves on to another request. If the mutex write succeeds, the worker invokes the service and, when it completes, writes the result to the space. That is:
take request if no result exists try to write mutex if write succeeds invoke service write result
The operations involved in taking the request from the space and writing the result are executed in the same transactional context. If the worker fails for any reason, the transaction will abort and the request will reappear in the space.
The implementation of the mutex takes advantage of GigaSpaces’ unique UID to cause the write to fail if an identical mutex is already in the space.
If the worker (or the machine on which the worker is running) fails after writing the mutex but before invoking the service, processing of the request could block indefinitely. Leasing, the second technically interesting GigaSpaces XAP feature in this solution, addresses this problem.
Entries in the space, like many components of the GigaSpaces XAP Space Based Architecture (SBA) have an associated lease. This is the period of time that the space agrees to hold the entry. When the lease expires, the entry is removed from the space.
Leases can be renewed before they expire. This is the feature that allows a mutex to be maintained only so long as the worker that wrote it is running. The worker first writes the mutex with a lease considerably shorter than the expected service execution time. It then creates a lease renewal manager to renew the lease periodically while the service is executing. If the worker or service fail for any reason, the mutex’ lease will not be renewed and it will disappear from the space. This will allow another worker to process the request.
To eliminate the possibility of a worker not finding a result that is about to be written and therefore invoking the service unnecessarily, when the result in written to the space, the mutex’ lease is updated to overlap slightly with the lifetime of the result.
This description is almost as long as the fully documented code, which is available at http://www.openspaces.org/display/SCN/Shared+Computation.
It’s worth noting that this technique, along with many others such as traditional master-worker, map-reduce, implicit workflow, and simple distributed caching, is just one straightforward use case of an SBA supported by GigaSpaces XAP. These techniques can be mixed and matched as appropriate for the solution being developed, all the while taking advantage of GigaSpaces XAP to address enterprise non-functional requirements, including:
- Scalability (near linear across thousands of nodes)
- Resiliency (self-healing, high availability)
- Management and monitoring
- SLA driven dynamic deployment
- extremely low latency
- high throughput