GigaSpaces Tera-Scale Computing over Cisco UCS
General Architecture Overview:
GigaSpaces’ Tera-Scale Computing architecture follows the exact principles that I outlined in Part I of this post.
Unlike alternative approaches that are built from a fusion of different products that are pre-packaged and branded together, GigaSpaces was designed with a holistic approach in mind that consists of a single and consistent clustering architecture across the entire stack.
In addition, the entire stack was designed to run completely in-memory and support highly concurrent workloads.
The other interesting benefit that comes with this approach in comparison with the alternatives is the fact that there is no reliance on any other expensive hardware (infiniband, high-end storage…) other than the one that comes out of the box with the UCS machine. The use of a single clustering mechanism saves a lot of redundant synchronization overhead and makes the entire platform more efficient and reliable. The use of a complete in-memory stack yields extreme utilization and low latency.
GigaSpaces-UCS Integration:
The actual integration work with Cisco UCS was based on two main parts:
1. Built-in automation and provisioning to enable zero configuration
One of the unique characteristics of the Cisco UCS machine is that it exposes an API that is backed by the low-level bare metal components. This makes it possible to interact with the lower layers of the hardware and set up a complete network of blades programmatically and without reliance on a hypervisor or an operating system as a middleman.
In addition, the integration with the UCS manager enables managing and allocating a pool of blades dynamically. We can turn machines on and off on demand, and discover new machines as they are plugged into the rack mount, and use their capacity immediately without any human intervention.
A blade doesn’t need to have any GigaSpaces or Java installation to join the pool. The provisioning of the JVM and GigaSpaces is taken care of by a provisioning agent that installs GigaSpaces onto the machine remotely as soon as it is discovered and allocated.
The diagram below shows the various components that comprise this Tera-Scale Computing system.
The SLA Driven Container and Data & Messaging services are the services that come with the standard GigaSpaces XAP installation. The GigaSpaces UCS Scaling Adaptor is the service that is responsible all the communication with the physical UCS pool and turning it into a GigaSpaces pool.
You can find more details on how this integration works, including a downloadable version of the scaling and provisioning agent, here.
2. Performance tuning and optimization
The second part of the work is around performance optimization and tuning of GigaSpaces on the UCS platform.
You can find below some of the first benchmark reports we published. The benchmark was specifically geared for real-time analytics applications. In the benchmark we were able to tune the system to reach to roughly 300GB per single node with a record throughput of 7.5M reads/sec in an embedded (scale-up) scenario and 300k ops/sec in a remote (scale-out) scenario.
The table below shows the history of the performance optimization process that we carried out throughout the various stages of tuning. It also includes comparison with other similar benchmarks that were done with older versions of UCS and with other platforms.
What we could see is that we were able to continuously improve the performance through optimization of the GigaSpaces platform as well as to the hardware itself.
We are working these days on another set of optimizations that are targeted for latency-sensitive applications.
With this effort we could save the majority of the performance tuning cycles and come up with pre-defined configuration profiles that are backed into our platform and the hardware and get closer to our zero configuration goal.
How does GigaSpaces Tera-Scale Plug into Existing Applications?
There are basically two modes of operation in which existing applications can plug into this solution.
1. Virtual Appliance mode –- In the appliance mode, GigaSpaces and the UCS machine are used as a service that exposes any of the APIs that are currently supported (SQL, Memcached, JMS, MapReduce…). The application remains unchanged and sees the joint solution as a better implementation of any of those services.
The interesting aspect of this mode is that even though the services that are exposed are used as a service, a user can still use our executor API to offload pieces of the application code into the appliance and run it within the box.
This is particularly useful in cases where the application wants to leverage the processing power of the UCS machine without necessarily moving the entire application into the platform.
2. Complete pre-engineered platform –- In this mode, both GigaSpaces and the application run on the UCS platform. GigaSpaces acts as the container for the application services and the application can exploit the full capacity of the UCS machine through the GigaSpaces in-memory middleware stack. For java-based applications, GigaSpaces supports a standard deployment model that enables seamless deployment into the platform.
Final words
These are exciting days in the industry, with lots of new breakthrough technologies being introduced on a daily basis. Having been at the cutting edge of that curve for many years, I feel really excited about the work with Cisco UCS. Not just because of the elegance in the design of its hardware but also because the Cisco team that worked with us through this entire process has been a great partner to work with.
Going forward, I see a great potential here as we can finally break the hardware and software wall. On one hand, the fact that the hardware can finally interact with the software that runs on top of it with no middleman puts a lot of power in our hands. On the other hand, we see a shift where application developers can focus on the business while the platform distributes work, maintains consistency, availability, performance and security of the underlying resources including CPU, Memory, Disk and Network. In doing so they become completely abstracted from the underlying operating system and hardware.
With that in mind, we are no longer limited to the services that are provided by the operating system or even by the JVM.
As platform providers we can call directly into the network interface, we can use deep inspection into resource capacity and the traffic matrix in order to proactively respond to demand surge in an effective way and take an immediate benefit from it, while at the same time keep the application unchanged. We can also create much more robust platforms as we can get immediate alerts directly from the device when something goes wrong. I can only imagine how robust our platform can become just by getting rid of all the loops that we need to maintain today through the entire software stack, to detect failure.
We are only scratching the surface with this offering and there are probably more areas for innovation that we have not yet explored. This could be a great opportunity to send us your wish list and be part of this project.