NoSQL databases provide vast storage and high availability, but at the cost of losing transactions, relational integrity, consistency, and read performance. This post presents an architecture that combines and in-memory datagrid as a high performance transactional layer to a NoSQL database, providing a complete application platform with the high scalability of NoSQL, with the transactionality, real time event processing, distributed execution, and ultra-low latencies of the XAP data grid.
Business have been driven by to NoSQL data stores because of several virtues, including economics, very large storage capacity, elasticity, high availability, reliability, write scaling, and a flexible data model. All of these virtues are purchased at a cost, which to a large degree can be tied to moving to a distributed architecture.
In order to scale horizontally on cheap hardware (a key NoSQL attribute), several compromises had to be made, including abandoning some highly valuable characteristics of relational databases. Among them, read consistency, transactions (and transactional isolation), stored procedure, triggers, and robust security. Since data it typically replicated across multiple cluster nodes, write operations cannot be accommodated across all replicas with acceptable latency. Comparable latencies in XAP are minimal because synchronous writes are to RAM only. This means that “eventual consistency” approaches are employed to synchronize data nodes after a write has completed. In practical terms it means that a reader is not guaranteed to get the most recent version of data. The lack of transactions means that multiple arbitrary data elements can not be modified or created atomically. This opens the door to readers seeing partial writes. The lack of stored procedures makes anything but the most simplistic data manipulation inefficient, and the lack of triggers make even simplistic event processing impossible.
The main consequence of the “compromises” is that use cases are limited. Highly concurrent transactional applications need not apply, whereas the class of massive, unstructured “information gathering” applications is well served. A typical approach to these compromises is to simply partition the system architecture into transactional and NoSQL divisions, with perhaps a data scrubbing/enrichment connection between them. For example, raw data from user web interactions is written to NoSQL, and a batch process performs ETL on it to make it suitable for relational (and/or OLAP) clients. This can manifest itself as two completely separate stacks, or a hybrid.
The problem with these solutions is they are complex, typically multi-vendor, difficult to scale, not elastic, and inefficient. Complex because of many products, multiple interfaces to manage, multiple domains of expertise, multiple upgrade cycles etc.. It is difficult to scale the entire stack because each tier scales independently in different ways. Since each part of the architecture is not elastic in it’s own right, the whole is also inelastic. Inefficiency in processing is caused by multiple physical tiers and boundaries that messages must be marshalled and transmitted across.
But what if we could access NoSQL data transactionally, retaining its scalable, elastic and highly available nature, while providing stored procedures and triggers? Delivering all of this, along with in-memory speed, is what can be achieved by integrating the Gigaspaces XAP in-memory application platform with a NoSQL database.
What Does No Compromises NoSQL Look Like?
By combining the virtues of the XAP in-memory platform with NoSQL storage, a system (not just a data store) of high scalability and low latency emerges. The attributes of the “no compromises” architecture:
Fully consistent reads
Collocated, native language business logic (procs)
Real time eventing (“triggers”/”continuous query”)
Complex event processing
Elastic & self healing
Reads in the low 10s microsecond range
Redundant writes in the low 100s microsecond range
Role based security for both data and management
The idea is simple. A two product layered architecture with data flowing through the in-memory cluster. As data flows through the in-memory subsystem, a portion is retained (typically in LRU fashion, but not necessarily) in memory. This retained data serves as a very low latency distributed cache for reads. Writers will experience data grid latencies across the distributed in memory data set, as writes forwarded to the NoSQL store are asynchronous. Clients accessing the grid for transactional processing will experience full ACID support. Data flowing through the in-memory subsystem can trigger internal distributed or external event handlers for complex event processing/enrichment, and distributed RPCs on the in-memory cluster can provide efficient business process level services (equivalent to distributed stored procedures) to clients.
The two technologies are a natural complement to each other, with XAP providing in-memory processing, transactions, elasticity, and high availability, and NoSQL providing the economical storage of vast amounts (100s of terabytes) of data. The architecture is greatly simplified and capable compared to the partitioned architecture “workarounds” mentioned earlier, with only two products necessary (as XAP even provides a JEE web container).
How XAP Makes It Possible
XAP is a distributed in-memory storage and processing platform. XAP partitions data and processing across multiple physical or virtual nodes to provide the highest operational performance possible. Data is stored redundantly in memory for safety and availability, and is fully transactional (including XA support). From the perspective of a system client, the memory across all machines appears as a continuous large memory space, and supports distributed RPC, distributed task execution, messaging, and event driven programming paradigms. These combined features lowers system complexity be reducing the number of physical tier and system interfaces. XAP supports runtime resizing of the cluster (elasticity), strongly typed data (objects), and weakly typed data (documents).
XAP also supports persistence to external data stores, and this includes conventional relational databases as well as NoSQL stores. Modifications to in-memory storage are queued for asynchronous write-behind behavior, and cache misses are forwarded to the underlying store. XAP also integrates dynamic type introduction and index creation with underlying stores for a seamless integration.
In addition to NoSQL persistence, XAP can perform off-heap persistence to flash memory. This allows XAP to present a potentially vast (100s TB) flash backed store, in combination with conventional RAM and NoSQL. This capability enables a 3 tier memory architecture that can be tuned to application memory cost vs capacity/latency requirements. The flash integration will be available in the upcoming XAP 10 release.
The combination of XAP and NoSQL is powerful because of their similarities and their differences. From the availability, scalability, and elasticity perspective, the technologies have similar virtues, with NoSQL providing the long term and very large scale persistence. From the user use case perspective, XAP fills a large gap by providing conventional support for transactions, effectively creating a bridge between transactional clients and NoSQL data, and in many cases eliminating the need for a relational database at all. From the performance perspective, XAP provides in-memory performance for both reads and writes, as well as real time event processing and distributed colocated logic. From the system management and complexity perspective, only two products are required to provide a complete application stack.
The value of a two tier, ultra-high performance, highly available, self healing, and horizontally scalable system is enormous, not only on the technical level, but also on the business TCO level, especially when large relational databases and related software can be reduced or eliminated. This capability is available and in production today, with current official support for Cassandra and MongoDB, and more on the way.