THE RACE TO ZERO LATENCY 


In today’s NOW economy and digitalization of business, there is increasing emphasis on ingesting, processing and analyzing data “in the moment” to leverage real-time insights for immediate impact on business logic and decisions. Achieving speed without compromising on scale pushes the limits in the way most of the existing big data solutions work and drives new models and technologies for breaking the current speed boundaries. 
 
 

THIS BENCHMARK PRESENTS RESULTS AS TESTED ON GIGASPACES IN-MEMORY COMPUTING PLATFORMS; ACHIEVING RESULTS AS LOW AS 460 NANOSECOND LATENCY 

 

HARDWARE SPECIFICATION

 

BENCHMARK CONFIGURATION

 

RESULT SUMMARY
in microseconds*

 

 

READ & WRITE LATENCY
in microseconds*

 

ABOUT GIGASPACES

GigaSpaces provides leading in-memory computing platforms for real-time insight to action and extreme transactional processing. With GigaSpaces, enterprises can operationalize machine learning and transactional processing to gain real-time insights on their data and act upon them in the moment. The always-on platforms for mission-critical applications across cloud, on-premise or hybrid, are leveraged by hundreds of Tier-1 and Fortune-listed organizations worldwide across financial services, retail, transportation, telecom, healthcare, and more. GigaSpaces offices are located in the US, Europe and Asia.
 
 
 
 

 

 

 

 

 

 

 

 

 

Web Application Scalability

The benchmark that we conducted used a classic eCommerce application (Pet Clinic) on top of web application support in the GigaSpaces XAP application server and measured the number of pages/sec that were generated when we increased the number of concurrent users.

The goal of this benchmark is to measure what would be a cost-effective HW and Software architecture that could leverage new commodity multi-core HW for enabling efficient scaling of web applications while minimizing the amount of physical machine deployment.

The diagram below shows the architecture we used for this test application.

 

 

 

As can be seen in the above diagram we used an apache load-balance up to 3 web containers, MySQL as the database and GigaSpaces data-grid to front-end the database. We used a Map/Reduce pattern for querying the entire data sets and get aggregated results.

The physical deployment in terms of the HW environment appeared as follows:
3 server boxes with 24 cores each

 

Results

  • Page views/sec - 1.4Billion page views a day (16,000 page view/sec)
  • Latency - 6 msec (in LAN environment)

Benchmark Summary

This benchmark showed that using relatively inexpensive HW and a small number of machines the combination of scale-up as well as scale-out approach provides a very cost effective solution for scaling of web applications. Adding more web servers dynamically (Using the GigaSpaces dynamic web scaling integration) enabled to curve out the latency under load while simultaneously increasing the capacity of pages that can be served.

 

Raw Performance of Space Operations in Java and .Net

The below benchmarks deal with the raw performance of basic space operations, i.e. read, write, take and notify. The first benchmark runs on a remote space where the space and client run on two separate machines. The second benchmark runs on an embedded space where the space and client co-exist within the same process.

Results

Java Results

In these tests we used DELL PowerEdge R740, Intel Xeon Processor E5-2620 CPU, 1G Network, 1K payload, Single operation, No special JVM tuning

  Write Read Take
Remote-with replication-50 client threads 45K/sec 90K/sec 45K/sec
Embedded-without replication-20 client threads 1.1 million/sec 1.8 million/sec 1.1 million/sec
.Net results

Benchmark Summary

The results show that even a single remote space instance can be fairly scalable and handle large volume of requests/sec. Embedded space results show that collocations leads to a significant increase of throughput. This indicates that we can easily write the same application code for both remote and local operations and benefit from collocation implicitly without the need to go through complete different implementations.

.Net results on a remote scenario is fairly close to those of Java. In embedded mode we see a significant gain in performance as expected. However, the gain in .NET is lower than the gain in Java due to the fact that read and write operation requirements are not passed by copy or reference as with Java. .Net native local-cache provides more then 1M reads/sec, which is relatively close to native .Net operation. In a real life situation which each embedded update would need to go through replication for backup purposes, it is expected that the performance between Java and .Net will be very close.

 

 

 

Scale-up on Multi-Core Benchmark

Building applications in a way that would enable to exploit Multi-Core technology requires specific a skill-set for parallel programming rarely available by typical organizations. It is also considered a conflicting paradigm to a scale-out model. The following test was conducted on a DELL PowerEdge R740 machine. To exploit the power of multi-core technologies we used the built-in parallel processing semantics available with GigaSpaces middleware which enabled both scaling-up and out at the same time without enforcing changes to the application code.

Figure 1T5240 Scale-up performance gain

Benchmark Summary

The GigaSpaces parallel processing API provides a Java equivalent programming model such as the Actor model to those available in languages such as Scala and Erland. The combination of all that enables to exploit the full potential of multi-core power and provides unparalleled performance at significantly reduced power consumption without exposing the complexity associated with such optimization to the application and without the need to introduce a completely new language for that purpose.

Seamless transition from scale-up to scale-out.

The user application is connected to the space using a GigaSpaces smart proxy. The smart proxy detects whether the space implementation is remote or local and uses either direct call in case it is local or network call otherwise. The application code is kept the same in both instances which makes it possible to design the application for both dimensions of scalability simultaneously or switch between scaling-up or out models at any point in time through simple configuration change.

 

 

 

Intel Benchmark Shows XAP 12.3 is 300% Faster on Intel Xeon Processor E5-2623 v4

GigaSpaces XAP 12.3 running on Intel® Xeon® Processor E5-2623 v4, has been shown to be 300% faster than XAP 11 on the best previous Intel processor. The benchmark was run by the fasterAPPS program, a joint initiative by Intel, MPI Europe and Globant, aimed at encouraging migration of financial applications to the latest multi-core technology.

XAP 12.3 - Optimized for Multi-Core

GigaSpaces XAP 12.3 has been specifically optimized to take advantage of multi-core environments, allowing highly multithreaded applications to run as efficiently as possible with the least possible resource contention. Specifically, XAP's in-memory transaction and locking mechanisms have been refactored to use more lightweight locking and synchronization constructs supported by modern processors and later versions of the Java virtual machine. This in turn helped us achieve significant performance gains, as shown in this benchmark.

Benchmark Results Summary

GigaSpaces XAP 12.3 was benchmarked on Intel’s latest Xeon 5500 Nehalem processor and achieved the following results:

  • 1 million write operations/sec (embedded mode) on one machine - "write operations" are data updates.
  • 2.6 million read operations/sec (embedded mode) on one machine - "read operations" are data retrievals.
  • 360% boost in write performance and 570% boost in read performance (embedded mode) compared to XAP 11 running on the previous best Intel processor, which achieved 276K writes/sec and 453K reads/sec.
  • 90K read operations/sec and 40K write operations/sec (remote access) on a partitioned cluster of 3 Nehalem machines, scaling up to 16 threads with near-linear scalability.
  • 30% better scalability than XAP 11 on the best previous Intel processor.

Note: Running XAP in embedded (local) mode is an order of magnitude faster than accessing it remotely. Users of XAP have the option of running data collocated with business logic (Space-Based Architecture), to enable local access for all transactions, making the high performance figure relevant for most real-life applications.

Benchmark Figures for Embedded (1 machine) vs. Remote Partitioned Cluster (3 machines)

Benchmark Configuration

  • The single embedded space scenario was conducted on a single server
  • the partitioned remote space scenario was conducted on a cluster of 3 servers
Server Specifications
  • CPU: Two Nehalem 2.80GHz C0 step processors
  • Memory: 24GB DDR3 1333MHz memory
  • Speed Step and Hyper Threading: off
  • Network interface: Mellanox ConnectX QDR Infiniband, connected to Mellanox switch
  • Operating System: CentOS 7.3
  • Java Version: Oracle JDK 8
  • GigaSpaces Version: XAP Premium 12.3 GA (build 3818)

 

 

 

Beyond Unifying Fast-Data Analytics: GigaSpaces InsightEdge Platform, Powered by Intel Architecture

Published on March, 2018 by Intel in Intel Benchmark

Intel benchmark compares GigaSpaces InsightEdge Platform on Intel® Xeon® Scalable Processors and Intel® Optane™ SSDs against previous-generation Intel products.

To learn more, download the Benchmark.

 

 

 

Benchmarks