Possible Impossibility – The Race to Zero Latency

Benchmarks

Possible Impossibility – The Race to Zero Latency

The term “zero latency” is commonly used when discussing the overhead of the infrastructure used to run enterprise mission-critical systems.

Looking at today’s software-based systems, running on current commodity hardware devices, we can see that systems can achieve relatively “zero latency” overhead when running with a shared memory based data store. I will describe in this post the reasons why and how we can do that today and present some results with recent benchmarks conducted.


 

Results

Here are few quick results for whoever that is looking to get a ballpark numbers what were the results without diving into the details presented below:

 

 

 

 

The Application Building Blocks

Every system in a distributed environment or running in a standalone setup performs the following basic activities:

  •  read or write data

  • calculate/process data

  • react to operations triggered by some other entity within the system

  • preserve state – to recover from some abnormal failure.

All the above activities can be implemented with a disk-based media (database) or via memory-based data store (IMDG).

How will we be implementing this with a Database?

With a database you will be using a JDBC driver, setting up a connection pool, generating relevant SQL statement, reading data from the database over the wire, mapping results to Objects, and probably using some transaction manager to coordinate the entering flow. In some situations you might be using stored procedures, triggers or even some messaging based systems (JMS) or parallel execution tools (Hadoop). The usual classic multi-tier knew architecture. Nothing you are not familiar with. Every data access routine will drag some data transformation, serialization and network calls. This is expensive and impacts the overall application end 2 end latency in a dramatic manner.

How will we be implementing this with an IMDG?

Here is how the above basic activities will be performed using IMDG:

  •  read or write data – With IMDG this means using the write operation to push the data into a shared memory area to allow other clients to read it.

  • calculate/process data – With IMDG this means sending the business logic into the data (and not the sending the data into the business logic which can be very costly). This is called Task executor.

  • react to operations – With IMDG this means using notifications that are invoking a listener.

  • preserve state – With IMDG this means replicating the state of the primary copy instance of the IMDG into a backup copy running on a different physical machine and start new backup copies to ensure high- availability even after multiple system failures.

Space Operations

We will measure the latency of each of these operations (read/write, calculate, handle events, and then preserve results) with the GigaSpaces IMDG (aka space) and examine the results using latency histogram chart that will show us the exact data points distribution and outliers we had throughout the test.

Disk-based vs. Shared Memory based Data Storage

The disk-based solution will depend on the disk media speed and the usual physical boundaries of the magnetic-based storage devices. These will suffer the inability to access different data items at the same time – a stumbling limitation that dramatically impacts their scalability. A typical Java application using a database (leveraging commodity disk devices) response time would be 20 ms when there is no load, up to several seconds (3-5 seconds) with a high load. This latency duration includes transforming objects into a relational model (ORM), transporting data over the wire, and replicating the transaction data from the primary database instance into a secondary hot backup instance located within the same LAN.

Relational databases are designed and optimized for commodity disk drives. Even if you run a relational database in memory, the code and internal structures are not optimized for operation in an in-memory distributed environment.

There is some progress lately leveraging Solid-state drive (SSD) products, as alternatives to the commodity disk devices, but, still, relational databases are monolithic servers which can’t operate in a distributed environment taking advantage of the entire network resources and scale in an elastic manner.

An IMDG, on the other hand, is designed to serve more clients at the same time and access more than a single data item at the same time for a given client. This is thankfully available due to the latest multi-core CPUs, latest blazing fast memory chips, and the advances with the virtual machine garbage collection algorithms. An IMDG that is designed to run colocated business logic, running within the same memory space as the application, allows applications to access data without going through any remote calls. This also avoids serialization and temporary object creation delays that negatively impact latency. In addition, a good IMDG is elastic and designed to utilize network resources by expanding its capacity and execute operations simultaneously across all the IMDG nodes.

Well known Deterministic Behavior

When looking closely at financial, aviation, healthcare, telecom, and defense systems among others, they all share the same basic fundamental requirement: immediate response time. There is zero tolerance for a delayed response time. These systems require deterministic, predictable, precise well-known response time. Sometimes, the negative affect a delay with these systems would be lost of money, and in some other cases, it could lead to a loss of human lives.

The Very Basic Business Logic Flow

Remember, every system predominantly goes through the same basic steps: consume data (1), process it into some other meaningful output (2) and later preserve this state to be used later on (3).  In many cases, step 2 includes hundreds of calculations, sometimes conducted serially where the entire process (1, 2 and 3) must be completed as one atomic operation – aka a transaction.

When looking at these fundamental operations, with a traditional relational database, steps one and three are usually more expensive because communicating with the data store is slow. With an IMDG, however, phase two is by far the most expensive one, because the processes innately have the data available in their address space.

The expense then becomes a factor of the complexity of the business logic and the different entities that comprise the system. These could be the data model, the different type of users the system serves or the need to interact with third-party systems. On average, such activity will consume 10-50 ms of the overall latency required to complete the flow. In some unique systems, high-frequency algorithmic trading, for instance, it could actually go below the 3 ms range.

The Results

With the following sections we will review results of a simple benchmark I’ve created that measure all the basic elements needed to implement the application building blocks mentioned above: read, write, calculate/process data (execute), react to operations (notify), and preserve the transaction state (replication).

Read/Write Latency

As we can see from the latency histograms below for write and read operations executed at a rate of 800 operations/second with an embedded space:

  •  20% of the measured latency data points for write and read operations were below 5 microseconds. In other words, maximum of 200,000 operations/sec by a single client thread.

  •  50% of the measured latency data points for write and read operations were below 7 microseconds.

  • 88% of the measured latency data points for write and read operations were below 10 microseconds.

  • 90% of the measured latency data points for write and read operations were below 15 microseconds.

  • 99% of the measured latency data points for read operations were below 18 microseconds.

  • 99% of the measured latency data points for write operations were below 22 microseconds.

write-read latency

Execute Latency

A fundamental framework for distributed computing is the map/reduce pattern. It is a cornerstone of almost every system that allows it to run activities in parallel. It could be executed across multiple processes running on the same machine, or across different machines. The Task executor IMDG operation, inspired by the Java executor service and its Future concept, allows a client application to submit a Task to be called in an asynchronous manner. The result value could be consumed by the client application when desired.

The IMDG Executor operation latency histogram presented below demonstrates the relative low overhead observed when submitting an empty dummy task against a remote space. Together with the above read/write operations we can achieve constant low latency overhead when performing complex computations across a large amount of IMDG partitions.

execute latency

  • 90% of the measured latency data points were below 340 microseconds. This works out to a maximum of roughly 3000 remote execute operations per second by a single thread.

  • 99% of the measured latency data points were below 600 microseconds. This yields a maximum of 1666 remote execute operations per second by a single thread.

Notification Latency

Getting notifications about specific events that occur within the IMDG is also a relatively very fast activity. As we can see from the histogram below, most of the events are delivered from the IMDG to event handlers within one millisecond.  This one ms duration includes the time it takes to identify the event within the IMDG, network latency, and the time it takes to transfer the event data (including serialization and de-serialization) from the IMDG process into the application process. The event sent to the client includes the data object that originated the notification. This allows the application to react and perform the necessary activities.

remote notify latency

  •  90% of the measured latency data points were below 930 microseconds.

  •  99% of the measured latency data points were below 1320 microseconds.

In some cases, the notification is delivered from a colocated IMDG instance. In such a case, there is no network overhead or data transformation involved. As we can see from the histogram below the latency involved is dramatically lower:

embedded notify latency

  •  90% of the measured latency data points were below 230 microseconds. This gives a rough maximum of 4300 notifications per second per client thread.

  •  99% of the measured latency data points were below 340 microseconds. With this, there’s a maximum of 3000 notifications per second per client thread.

Synchronous Replication Latency

Saving the outcome of the calculation and the processing – i.e., preserving the state of the calculation –  is essential. Without it, the application will never be able to operate reliably. Replication of the data must happen as soon as the data is generated and committed in order to survive system failure. It can’t be performed as a background activity. This means that once the system acknowledges that it has completed a business transaction (specific unit of work), its state must be preserved in at least two different physical locations. To achieve this, the IMDG must use synchronous replication approach to mirror the changes performed in the primary instance with a backup copy hosted within a different VM than the primary.

The overhead involved to replicate the state of an average transaction is relatively very low (under 320 microseconds in most cases). With that kind of an overhead, most business logic activities will easily have less than one-millisecond latency for a full completion. Additional concurrent users or a large amount of data will not impact these numbers since we will break the IMDG into logical partitions where each will have its own dedicated replication engine.

replication latency

  • 90% of the measured latency data points were below 320 microseconds. Maximum of 3100 replication events per second per thread.

  • 99% of the measured latency data points were below 480 microseconds. Maximum of 2100 replication events per second per thread.

Conclusion

When taking into consideration the average time it takes to consume data from the IMDG, the time it takes the business logic to process the incoming data and digest it into meaningful output data, and later to preserve it to survive system failures, we see that the actual impact of the IMDG on the overall business transaction latency is between 2-10 percent (around one millisecond). This is a constant overhead regardless the IMDG size, application size or the data model complexity.

Comparing this overhead to a file-based system where the overhead starts with 20ms and can end up with 5000 ms latency (3 orders of magnitude larger than IMDG!), we can say we have almost zero latency overhead with the IMDG. Thus, the zero latency “impossibility” is now a reality.