I went to visit this extraordinary and shocking exhibition a few weeks ago – Bodies.
It is located near pier 17 in Manhattan (not far from South Street Seaport, which I mentioned in my previous blog post).
You can find the same exhibition all around the world.
When walking around the different exhibition halls that present real, dead, skinless, human bodies, you can’t ignore the need to satisfy your curiosity. Looking into what is going in our own body is fascinating. Some of the bodies are sliced into thin “steaks” where you can actually see the different layers comprising the human body. In a similar effort, this post will slice the CPP processing Unit into its different pieces.
When I posted my previous blog, I mentioned the ability to have your C++ code deployed into the SLA based Container. For whoever is curious and would like to dive into the low-level details, this post is for you. It includes information about this unique and powerful capability included as part of GigaSpaces XAP 6.5 and its C++ POCO framework.
I will explain the different pieces comprising this state-of-the-art product component and how you can leverage it when building your own C++ application using Space Based Architecture (SBA).
Scalability – How Can I Get More Horsepower?
Historically software systems scaled by adding more hardware running more software instances.
You can always add more CPUs and more memory to the same machine to be able to process more data per unit of time, but at some point you will reach the physical limit of what a single machine can actually run.
The concept of OpenSpaces processing unit was designed around this fact. Your unit of scale is the processing unit. In order to scale, you need to run more processing units. You can scale your application by:
– running multiple threads concurrently within the same process
– running multiple processes concurrently within the same machine
– deploying multiple processes across multiple machines that are running concurrently and utilizing in an optimized manner your networked computer resources (aka grid)
Nevertheless, often we cannot fully take advantage of the available horse power, because we get stuck at the data access layer; i.e. we cannot feed the processes with the relevant data fast enough, for these to fully leverage their full CPU, network, and memory resources and complete a given job in the quickest manner, moving to the next one efficiently.
To solve this bottleneck, the processing unit allows you to collocate the business logic and data; i.e. both are running within the same process sharing the same memory address. In fact, the POCO C++ framework allows you to build your business logic without taking into consideration the final deployment topology. The code can be designed, implemented and unit-tested using a single, embedded space collocated with the business logic on your development machine. The same code can then be deployed across a system involving hundreds of machines having hundreds of spaces (collocated or not) with the C++ business logic, ultimately processing millions of data items per second with sub-millisecond latency.
Collocated or not Collocated?
Before actually collocating the required state and business logic, you should take the following into consideration:
- Is your business logic designed to process incoming data events without accessing remote data located in other partitions?
- Is your data model designed to support stickiness – to be routed to the same logical partition based on its content?
- Is your C++ process designed to cope with a self-healing mechanism that will restart the failed C++ instance somewhere else on the network, allowing the system to continue and function without disruption as long as there are machines available to run the application?
- What is the amount of work involved to process the incoming events? Does it involve lots of IO operations accessing many different resources?
All the above considerations are also relevant for Java and .Net business logic, since both have the ability to collocate the required state.
To help you make the right decision when deploying your application below are some guidelines that correlate to the above considerations:
- The C++ business logic may access only its collocated space or the entire cluster members.
- If the collocated space can store both the data required for the processing and the consumed data there is a good chance you can use the collocated mode.
- If the business logic needs data stored within other partitions you might have 2 space proxies used – one that access only the collocated space and consumes the incoming “tasks” that need to be processed, and one that accesses all cluster members and fetches data using space SQL queries needed for the processing.
- Advanced implementations would use the Map Reduce technique (at GigaSpaces, we call this the “Service Virtualization Framework” or “Remoting”). This popular technique invokes business logic at the relevant partitions that produce intermediate results. These results are then delivered to the client that aggregates these and returns the final result to the original caller.
- In order that incoming data will land at the correct partition, associated objects should have the same routing field value. A client accessing a clustered space has, by default, a proxy running a simple algorithm that calculates the target partition for each space operation. The calculation uses by default the hash code value of a field declared as the routing field. Each POCO class should have one routing field declared where the actual field value can be assigned by getting data from possibly several other fields.Here is an example for the POCO decoration xml config:
<class name=”myPOCO”>
<property name=”routingField” type=”int” index=”true”/>
<routing name=” routingField “/>
<property name=”myData” type=”string” index=”true”/>
<property name=”uid” type=”string”/>
<id name=”uid” auto-generate=”true” />
</class>
The routingField value hash code will be used to rout write/read operations to the correct partition.
Note: for fail-safe operations, a partition may have one or more dedicated backup spaces running in standby mode and holding identical data. - With the SBA model business logic state must be stored within the space; i.e. the space is a shared memory resource at the application level. To ensure data consistency and coherency you should conduct destructive operations using a transaction – i.e. have these as one atomic operation. Since a primary space may have in-memory backup space(s) running in different machine(s) you would never lose your required state. Once a primary fails, the existing backup spaces conduct an election voting process where only one of them becomes the primary one. As a result, the collocated C++ business logic associated with the space also moves into active mode. At the same time, the GigaSpaces Grid Service Manager (GSM) looks for an available grid container to launch the “missing” backup to, obeying and maintaining the defined SLA for that service. This constructs a self-healing system allowing your application to continue and function as long as you have machines running the GigaSpaces-provided SLA driven containers.When a processing unit hosts your C++ business logic but accesses a remote space running as a separate process within the same machine as the C++ business logic or in a different remote machine, there is some cost involved which varies depending on the topology. The remote call overhead depends on the network speed, network bandwidth, data complexity (serialization involved) and its size. The larger the size of serialized and transported data is, the longer it takes for the remote operation to be completed. This applies both to write and read operations.
- When a processing unit hosting your C++ business logic has the space collocated as well, no remote calls are involved when the C++ business logic accessing the space. Some memory allocation is conducted – this is a result of the C++ runtime passing data into the space crossing the JNI boundary- this is done via a very efficient protocol (I will have a separate post on this). If the time spent performing the business logic (worker calculation time) is much longer than the time it takes for the worker to: retrieve the task from the space, write back the result or read required data from the space, it might be logical to run the C++ worker as a stand-alone processing unit, separately from the space. As a rule of thumb, a good ratio for using remote stand alone workers would be 1:10 or more – i.e. if the average time of performing the 3 basic space remote calls (take, read, write) is 1ms and the time it takes to perform relevant worker calculation (unrelated to the space) is 10 ms, it would be wise to run the C++ worker as a stand alone processing unit. If the ratio is less than 1:10, go for the embedded space deployment topology.
The Processing Unit Declaration – Make sure you have a good XML editor!
The processing unit is declared using a simple xml file that follows the Spring Framework standard. It comprises of the:
– Space identity you want to inject into the C++ worker and its scope (local or clustered)
– C++ worker(s) you wish to deploy
That’s it! – See below an example:
<beans>
<os-core:space id=”space” url=”/./space” />
<os-core:giga-space id=”gigaSpace” space=”space” />
<bean id=”cpp” class=”com.gigaspaces.javacpp.openspaces.CXXBean”>
<property name=”gigaSpace” ref=”gigaSpace” />
<property name=”workerName” value=”CppService” />
</bean>
</beans>
The url=”/./space” instructs the GigaSpaces runtime to start the C++ worker with a collocated space instance running within the same process. This will allow the C++ worker to perform space operations in-memory without any remote calls involved. If a cluster SLA declared, accessing other remote cluster members will involve remote calls.
Having url=”jini://*/./space” means the C++ worker will access remote space(s). These spaces may span multiple machines and may have any clustered topology (replicated or partitioned).
Optional settings you may include as part of the processing unit declaration:
– Space properties settings
– Transaction Manager settings
– SLA settings such as cluster topology
– Local cache/view settings
– External Data Source settings
– Security settings
– Space Filter settings
– Replication Filter settings
– Space Mode Context Loader settings
Here is an example for a processing unit with a C++ worker deployed, using a clustered SLA-driven container that is running in a partitioned topology with one backup:
<?xml version=”1.0″ encoding=”UTF-8″?>
<beans xmlns=”http://www.springframework.org/schema/beans”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xmlns:os-core=”http://www.openspaces.org/schema/core”
xmlns:os-sla=”http://www.openspaces.org/schema/sla”
xsi:schemaLocation=”http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.openspaces.org/schema/sla http://www.openspaces.org/schema/sla/openspaces-sla.xsd
http://www.openspaces.org/schema/core http://www.openspaces.org/schema/core/openspaces-core.xsd”>
<bean id=”propertiesConfigurer” class=”org.springframework.beans.factory.config.PropertyPlaceholderConfigurer”/>
<os-core:space id=”space” url=”/./space” />
<os-core:giga-space id=”gigaSpace” space=”space” />
<bean id=”cpp” class=”com.gigaspaces.javacpp.openspaces.CXXBean”>
<property name=”gigaSpace” ref=”gigaSpace” />
<property name=”workerName” value=”CppService” />
</bean>
<os-sla:sla cluster-schema=”partitioned-sync2backup” number-of-instances=”2″ number-of-backups=”1″
max-instances-per-vm=”1″/>
</beans>
Multiple C++ Workers within the Same Processing Unit
In some cases your business logic may involve multiple C++ worker
s running within the same processing unit and dealing simultaneously with different aspects of the processing. These workers may perform different parts of the processing independently or may depend on each other (workflow), exchanging data via the space.
Here is an example for a processing unit declaration that introduces a processing unit with two C++ workers:
<bean id=”cppWorker1″ class=”com.gigaspaces.javacpp.openspaces.CXXBean”>
<property name=”gigaSpace” ref=”gigaSpace” />
<property name=”myWorker1″ value=”CppService1″ />
</bean>
<bean id=”cppWorker2″ class=”com.gigaspaces.javacpp.openspaces.CXXBean”>
<property name=”gigaSpace” ref=”gigaSpace” />
<property name=” myWorker2″ value=”CppService1″ />
</bean>
The Worker implementation – The ICppWorker Base Class – It is about the time to see some C++ code
The C++ business logic deployed into the SLA-driven container should inherit from a C++ base class called ICppWorker. The ICppWorker includes a few methods you must implement:
virtual const char* cppType() = 0;
virtual const char* className() = 0;
virtual bool Initialize(IWorkerPeer* Host) = 0;
virtual CommandObjectPtr run(CommandObjectPtr work) = 0;
virtual bool Destroy() = 0;
The Initialize will be called once the object is instantiated by the SLA container.
The Destroy method will be called once the SLA-driven container is shutdown.
The most important method is the run method. This is where you place the code that will be running in a continuous manner. This code would have very simple flow:
- Getting some data from the space
- Performing some work – might involve reading additional data from the space
- Writing back into the space some resulting information
Here is what each phase should include:
- The first phase would call the take or takeMultiple operations. In some cases these would be transactional operations. FIFO take mode might be very relevant here.
- The second phase might include some calculations. You might call here the read or readMultiple (in some cases space iterator) to feed into the calculations’ required data. These can read data from the same space the data retrieved from in phase one, or from another space(s).
- The third phase might include write or writeMutiple calls writing calculated results back into the space. These will be collected by another worker responsible for aggregating results and delegating these to the end clients. The write or writeMutiple would use the same transaction as used with the first phase. This phase would commit the transaction.
Here is a simple example:
CommandObjectPtr CppService::run(CommandObjectPtr Object)
{
genericVector replyParams = Object->getParameters();
SpaceProxyPtr proxy;
SpaceFinder finder;
long long proxyId = any_cast<long long>(replyParams[0]);
proxy = finder.attach( proxyId, false, m_callback);
Task taskTemplate;
while(true)
{
Task task = proxy->take(&taskTemplate, NULL_TX, Lease::FOREVER);
Result result = task->excute();
proxy->write(result, NULL_TX, Lease::FOREVER);
}
}
The above code injects the space proxy (the proxy could be remote or embedded, single or clustered), performs blocking take using the taskTemplate object, executes the Task::excute() method, and returns back into the space the Result object of the execution. Once this cycle is completed, another one starts all over again.
So how can I Deploy my C++ Processing Unit?
Deploying your C++ processing unit requires 2 steps:
– Placing the processing unit declaration at the deployment folder. The default deployment folder located at <GigaSpaces Root>\deploy folder. You can change this using the gsm settings.
– Placing the processing unit libraries at the native libraries folder. The default location for these located at <GigaSpaces Root>\lib\ServiceGrid\native. You can change this using the <GigaSpaces Root>\bin\setenv.bat/sh script.
Here is how your processing unit deployed folder should look like:
<GigaSpaces Root>\deploy\cppPUexample
├───META-INF
│ └───spring <-- Here you should place your processing unit declaration - called pu.xml
└───shared-lib
Run <GigaSpaces Root>\bin\gsm And few <GigaSpaces Root>\bin\gsc and deploy your processing unit using the following command:
<GigaSpaces Root>\bin\gs pudeploy cppPUexample
That’s it!
Shay Hassidim
Deputy CTO
GigaSpaces