Contents
Clarifying the In-Memory Conundrum
Not all In-Memory Data Grids Are Created Equal
Clarifying the In-Memory Conundrum
Although In-Memory Computing has been available in the market since the late 1990s and is ubiquitously deployed, many people still get confused by in-memory computing terminology.ย In-memory computing holds tremendous potential for many industries, from personalized recommendations in ecommerce and real-time fraud detection, to high-frequency trading. As memory capacities and technologies continue to evolve, in-memory computing will play an even more significant role in the future of data-driven applications.
In-memory computing isn’t a single technology, but rather an umbrella term encompassing various approaches. The disciplines it encapsulates include In-Memory Data Grids (IMDGs) and In-Memory Databases (IMDBs). Are they one and the same?ย
The answer is No. So letโs break it down.
In-Memory Database
An In-Memory Database (IMDB) is a full-featured standalone database management system that primarily relies on RAM (Random Access Memory) for computer data storage. In other words, rather than employing a disk storage mechanism, it uses RAM.
The rationale is simple: The Hard Disk Drive (HDD) which is based on magnetic storage technology first introduced by IBM in 1956 is an order of magnitude slower than RAM. IMDBs are designed to first achieve minimal response time by eliminating the need to access the disk, and second for data scalability.ย
They are however limited in terms of application scalability.ย Someย IMDBS require customers to rip & replace their existing databases. They may also be limited to the types of data models stored. Depending on how data is stored, which can be a row or columnar store, IMDBs can provide a fast response time for write or read-intensive workflows.
In-Memory Data Grid
An In-Memory Data Grid (IMDG) is a simple to deploy, highly distributed, and cost-effective solution for accelerating and scaling services and applications.ย It is a high throughput and low latency data fabric that minimizes access to high-latency, hard-disk-drive-based or solid-state-drive-based data storage.
The application and the data co-locate in the same memory space, reducing data movement over the network and providing both data and application scalability. Some in-memory data engines actually support any data model that can be directly ingested to the data grid (multi-model store) from real-time data sources or copied from an RDBMS, NoSQL, or other data storage infrastructure into RAM where processing is much faster.
Some data grids also provide a unified API to access data from external databases and data lakes, and in essence expand the data managed to petabytes, while accelerating queries and analytics.
In-Memory Computing
Gartner states that โthe IMDG is a key technology enabler for in-memory computing (IMC), a computing style in which the primary data store for applications (the database of record) is in the central (or main) memory of the (distributed) computing environment running the applicationsโ.
This design is optimized for negligible data access latency, even when large data volumes need to be queried for real-time analytics.
So Whatโs the Difference?
While In-Memory Data Grids share many of the features of In-Memory Databases, there are important differences.
One significant difference is that with an In-Memory Data Grid you can co-locate the business logic (application) with the data. With an In-Memory Database, the engine running the business logic or models resides on an application (or client) while the data resides on the server-side. This is not semantic. In the latter case, the data must travel over the network, which is significantly slower than running in the same memory space (with the added network overhead).ย
This also affects the scalability factor. While IMDBs can handle data scalability, In-Memory Data Gridsโ distributed design allows complete scalability of both data and application load by simply adding a new node to the cluster.
Other differences have to do with the data type. While IMDBs usually handle structured data, some IMDGs also support semi-structured and unstructured data. And finally, some IMDGs seamlessly integrate with machine and deep learning frameworks.ย
Comparison of In-Memory Database vs. In-Memory Data Grid vs.
GigaSpaces In-Memory Computing Platform
Not all In-Memory Data Grids Are Created Equalย
When considering an in-memory data engine, itโs imperative to pay attention to several important features that may differ from one in-memory data grid to another.
Consistencyย
Some in-memory data grids offer strong consistency which means that read always returns the most updated data. Others only offer eventual consistency, meaning that reading data following an update might return the data prior to the update. Even after a successful write, data may be lost.
Eventual consistency is good for non-critical use cases. However, for critical applications such, booking, ordering, billing or money transfers strong consistency is required.
ACID
ACID (atomicity, consistency, isolation, durability) is a set of database transaction properties intended to guarantee validity even in the event of errors, or power failures.
A sequence of data operations that satisfies the ACID properties is called a transaction.
For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction. Some IMDGs support ACID transactions, some donโt.
Balancing cost and performance
Some IMDGs are limited to RAM only, or in other words, they only support hot storage but not warm storage like SSD (Solid State Drive) or cold storage such as cloud object-store. As data volumes grow, intelligent data life-cycle management that can move data according to customized business logic will accelerate access to data on SSD and external databases, data lakes, and warehouses to enable a balance of cost and performance.ย
Auto-scaling provides another way to balance cost and performance. With this approach, a cluster can automatically scale up and down based on the workload, so that payment is only made for the resources that are actually needed, with no need for overprovisioning.
The index is yet another main factor to impact performance. Some IMDGs offer only primary key access. This makes them unsuitable for users who need to perform complex queries on the data.
SQLย
SQL is the de-facto query language of business analysts. If an IMDG uses a proprietary query language, the organization’s analysts will need to learn a new language.ย
But the implications of not using SQL go beyond just the learning curve. For example, some proprietary query languages donโt support distributed JOIN clauses. This limits the application’s ability to run complex queries and calculations.ย
Multi-Model Store
Many data grids are limited to one type of data – key-value or object or document stores. This means that when the need arises to handle other types of data such as semi-structured, unstructured such as text, or images, you will need to deploy and maintain additional platforms that can handle that.
To future-proof the technology stack, ensure that the data grid solution supports multiple types of data: structured, semi-structured, and unstructured.ย This enables faster and smarter insights by accelerating applications in which real-time analytics, machine learning and deep learning are used for real time insights such as predictive maintenance, live risk analysis, fraud detection, location-based advertising, dynamic pricing and more.
Event-Driven Analytics
Event-Driven Analytics allow a method to be triggered when an event takes place. For example, with an online processing service, when a payment has been canceled this event can trigger a notification.
Additionally, the ability to contextualize streaming and transactional data with historical data at speed and at scale, feeds the machine learning feature vectors and allows for continuously retraining of models to ensure required accuracy.ย ย ย
Multi-Region Replication
Multi-Region Replication describes the ability to replicate data between different regions or clouds, where each space can be physically located in a different geographical location.ย This capability is a common deployment topologyย for disaster recovery planning, failover, to add or remove remote sites without system shutdown. In addition, it enables the maintenance of data locality per site, which enhances performance and lowers latency.ย
Multi-Region Replication can also support hybrid and multi-cloud deployments in an optimal manner. GigaSpaces’ WAN Gateway offers total data consistency and conflict resolution support among other advanced features.ย
Last words
In-Memory Computing encompasses various flavors, including In-Memory Data Grids and In-Memory Databases. While IMDBs rely on RAM for data storage, IMDGs are distributed, cost-effective solutions that co-locate application and data in memory, minimizing latency and enhancing scalability. IMDBs focus on standalone databases with RAM-based storage, while IMDGs such as GigaSpaces XAP Skyline provide distributed, memory-centric solutions for scalability and extreme performance.