When implementing SBA application one of the first artifacts you need to implement would be a “processor”.
The processor would be the core of the system. It will consume incoming data, digest it and come up with some output result. You can think about this processor as a subscriber listening for incoming messages sent into a queue and executing some business logic based on the incoming message data. As you know, with the SBA approach, a processor is implemented via a Polling Container .
The polling container processing speed is affected by the following:
– Feeder speed
– Feeder write batch size (in case you are using batch write)
– Feeder message size
– Feeder message complexity – more fields , more serialization and marshaling involved
– Scope of transactions used by feeder and or processor
– The type and number of destructive operations used by the processor polling container
– Processor polling container ReceiveOperationHandler size
– Network and CPU speed
One of the most important decisions to make with the polling container implementation is the ReceiveOperationHandler to be used. The ReceiveOperationHandler is described here.
Choosing the right ReceiveOperationHandler is important since it determines the way the data is processed.
If we will take an order management system as an example, you might have periods of time with peak load and some with low load. The system should be able to process data without loosing transactions, as fast as it can , with minimum latency. Latency means the time is takes for the order object to enter into the system until it has been fully processed by the polling container. Part of the processing might involve generation of other objects or reading existing objects data.
Taking these activities into consideration means you need to make sure your processor can keep up with incoming orders and to be able to scale within the same machines or across multiple machines. The right ReceiveOperationHandler will make sure you will get the expected behavior.
The different ReceiveOperationHandler provides you ability to consume incoming data and removing these from the space (SingleTakeReceiveOperationHandler or the MultiTakeReceiveOperationHandler), or consume these from the space and modify their “processed attribute” once processed (ExclusiveReadReceiveOperationHandler or MultiExclusiveReadReceiveOperationHandler). The MultiTakeReceiveOperationHandler and MultiExclusiveReadReceiveOperationHandler would poll batch of objects from the space, which means these will be committed as one single transaction.
If we will go back to our order management system , the SingleTakeReceiveOperationHandler or the MultiTakeReceiveOperationHandler can be used to consume the incoming orders and write back Execution objects. The ExclusiveReadReceiveOperationHandler or MultiExclusiveReadReceiveOperationHandler can be used to update Inventory or Account objects.
There are cases when it is more proper to use Notify Container. We will cover this Container in a separate post.
We have conducted a benchmark measuring the processing speed of the different ReceiveOperationHandlers currently supported. Since the actual processing doing very little, this benchmark essentially measures the polling container overhead.
See the results below:
Figure 1 – Polling Container Benchmark – Implementations Options
The benchmark was running without backup space and when the data to be processed was loaded prior the processor started. There was one consumer running as part of the polling container.
Conclusions:
– From the results above we see that the Multi read and Multi take ReceiveOperationHandlers provides better processing rates compared to the Single read and take ReceiveOperationHandlers. Ignoring the none-transactional results (which are acting as control group) we see that the Transactional Multi read and Multi take can process up to 55,000 messages/sec with a single consumer with 50 messages as the batch size. The latency when processing incoming data in such configuration would be a bit larger compared to the Single ReceiveOperationHandlers (in case they can keep up with the rate of the incoming messages).
– The Multi read and the Single read ReceiveOperationHandlers provide better behavior when running a backup, since these will perform less destructive operations (read and update) compared to the single and multi take ReceiveOperationHandlers (take and write).
Notes:
– Running multiple transactional concurrent consumers would not gain linear performance increase due-to the synchronization points within the space transaction mechanism implementation. This will be resolved in future versions.
– To scale the system you should run your application with multiple partitions, each having its own processor collocated. You may have one partition per JVM or multiple partitions running within the same JVM.
The processed Object:
public class Message {
private Integer id; // ID field
private String info;
//indexed field
private Boolean processed = new Boolean(false);
… getters and setters
@SpaceProperty(index=IndexType.BASIC)
public Boolean getProcessed() {
return processed;
}
public void setProcessed(Boolean processed) {
this.processed = processed;
}
}
The processing business logic:
@EventDriven
@TransactionalEvent (timeout = 100000)
@Polling
public class Processor {
@SpaceDataEvent
public Message processMessage(Message msg) {
msg.setProcessed(true);
return msg;
}
…
@EventTemplate
Message getTemplate()
{
Message temp = new Messa
ge ();
temp.setProcessed(false);
return temp ;
}
}
The benchmark conducted with:
OS: Sun Solaris 10 x86-64
Architecture: 4 x AMD Opteron Dual-core 8220 2.8Ghz
RAM Size:16GB
Machine Type:Sun Fire X4600
JVM: Sun java version jdk1.6.0_05 32 bit
GigaSpaces XAP 6.6.0 ga (build 2601)
Shay