Over the past few months, we have seen XAP implementations running millions of transactions every second, across very large multi-terabyte, multi-billion items data grid clusters. XAP’s ability to collocate incoming data with its processing logic together with the required reference data – allows it to generate a vast amount of transactions every second.
However many companies still face the following challenges:
- Reducing the overall system cost: This means reducing HW footprint and leveraging SSD/HDD instead of expensive RAM when managing large amounts of data.
- Continuous write-behind in-memory transactions outcome to external systems: For example, performing write-behind to the enterprise database without generating a major lag between XAP and the backend database.
- Sharing the state of the IMDG with remote data centers running as disaster recovery sites.
GigaSpaces XAP offers three major solutions:
1. XAP Native Persistence: Allowing for elastic, parallel, fast persistence and fast data reload while using hybrid storage approach leveraging mainly SSD / NVMe /Network storage device to store IMDG data.
2. Distributed Write-Behind: Allowing for fast, parallel, delegation of IMDG transactions into external enterprise backend database systems.
3. Distributed Gateway: Allowing for fast, parallel replication of IMDG transactions state into remote data centers.
Let’s dive a little deeper to understand each solution.
XAP Native Persistence
This functionality allows you to maintain a large amount of data within XAP data grid, using fast storage device (local, shared) as the main storage fabric instead of RAM (heap). RAM used as an L1 cache and to store the indexes. With this approach, data access is still fast since queries are evaluated against indexes which reside in-memory, where the data on file is highly compressed.
“XAP native persistence reduces the overall footprint by 75% when comparing regular All on Heap setup to Native Persistence”
Looking at the benchmark below, you can see how XAP’s Native Persistence reduces the overall footprint by 75% when comparing regular “All on Heap” setup to “Native Persistence”. The more data is stored within XAP, the bigger the ROI while still managing to avoid expensive HW usage. This allows XAP to act as the system of record for multi-terabyte data sets with billions of items.
In terms of performance, the following graph illustrates XAP’s Native Persistence capacity when executing write-and-read transactions with collocated business logic. The numbers represent a single data grid node, meaning the more nodes you add, the more your capacity will grow in a linear fashion.
These numbers are based on a benchmark we conducted using Samsung PM1725 NVMe PCIe SSD, Dell 920, Xeon E7-4860 v2 @ 2.60GHz12 cores, 24 Threads machine, 1K payload, 24 concurrent client threads.
Large scale IMDG may produce a large volume of activity that requires a distributed write-behind setup instead of the default single write-behind agent (aka Mirror service). This unique architecture allows each IMDG partition to use a separate mirror instance that may run on different machines, utilizing its full CPU and network bandwidth while pushing updates to the enterprise database in maximum speed.
By default, a single gateway instance is used to perform replication from a local data grid cluster into remote space cluster. For most systems, this provides sufficient throughput to address the activity generated by a single clustered data grid.
In some cases, we may be dealing with large space clusters or with systems producing a large volume of activity, both of which require a distributed (multi-instance) gateway setup. The distributed Gateway architecture allows each partition (primary and backup instances) to replicate its activity via a dedicated gateway to a remote data grid cluster. This allows for each gateway instance to run on a separate machine, utilizing its full CPU and network bandwidth. This setup allows for source data grids and target data grids to run a different number of partitions.
Implementing a production grade large scale distributed system requires special attention to system sizing, its overal HW footprint, the core data processing latency, downstream data flow to external systems and disaster recovery architecture. Your system will be as strong as its weakest link, hence the need for each of these pieces to successfully support parallel activity, durability, and scalability while avoiding a single point of contention anywhere within the architecture.