InsightEdge Platform and XAP 14.2
In this release, we’re breaking new ground for the InsightEdge in-memory computing platform with the launch of the AnalyticsXtreme module. AnalyticsXtreme addresses the challenges faced by enterprises seeking to remain competitive, maintain innovation and leverage their vast amounts of data to gain real-time insights to make informed decisions and act instantly. It enables users to accelerate batch analytics by orders of magnitude for real-time analytics, machine learning and deep learning on mutable streaming data along with historical data stored on platforms such as Hadoop, Amazon S3, and Azure Blob Storage.
Apache Spark 2.4.0
Release 14.2 includes an upgrade to Apache Spark 2.4.0, which provides the following benefits:
- Scala 2.12 support
- Apache Avro as a built-in data source
- Support for image data source
Additionally, the upgrade to Apache Spark 2.4 also provides the following Kubernetes-related enhancements:
- PySpark and R bindings
- Client mode support for Kubernetes cluster back-end
- Support for mounting the volume in the Spark driver and executor pods
Red Hat OpenShift Support
With the 14.2 release, we are also excited to announce that GigaSpaces products are now available in the Red Hat OpenShift (Kubernetes) catalog. Customers can easily download InsightEdge and XAP container images to create, test, and run applications and deploy them to the target platform.
AnalyticsXtreme – Simply Better Insights to Action
Enterprises are striving to stay innovative and remain competitive, pushing beyond historical analysis. They are investing in operationalizing real-time AI and machine learning for a broad range of applications and use cases.
To achieve this, organizations are moving towards the Lambda architecture, which is designed to handle analytics in real-time using a speed layer, while simultaneously ingesting data into a batch layer for long-running, complex analytics models.
But the blending of batch and speed layer views in the Lambda architecture has its drawbacks. It takes time, it’s complicated and it doesn’t support fast decision-making. Although it represents an advance in data analysis, the Lambda architecture still presents challenges, particularly:
- Timely access to dynamic data
- Data quality and consistency
- Architectural complexity
AnalyticsXtreme simplifies enterprise big data architecture in the cloud and on-premise, delivering the fastest software platform that can seamlessly plug and play on existing infrastructures such as Hadoop and Amazon S3. It reduces long ETL processes, eliminates unnecessary data duplication, and prevents data ingestion bottlenecks from multiple data sources.
Consequently, AnalyticsXtreme accelerates applications in which real-time analytics, machine learning and deep learning are used for real-time insights, such as predictive maintenance, live risk analysis, fraud detection, location-based advertising, dynamic pricing, and more.
AnalyticsXtreme – Benefits to Your Business
The business benefits offered by the AnalyticsXtreme module – faster and smarter analytics, greater simplicity, faster time-to-market and optimized TCO – are illustrated in the following sections.
Faster, Smarter Analytics
AnalyticsXtreme delivers faster and smarter insights by powering real-time processing and analytics on both hot, mutable data and cold historical data. Hot data resides on InsightEdge, leveraging the platform’s in-memory computing and support for unique indexing to provide faster access to historical data stored on data lakes or data warehouses.
For example, a query run on hot data stored in InsightEdge is executed in 6 milliseconds:
Compare the query above, which was run on the speed layer, to a query run on the batch layer (cold data stored in HDFS). The query run on the batch layer take 2.5 seconds:
All data – structured and unstructured – is ingested once into the speed layer, making the use of multiple types of databases or platforms unnecessary. This reduces the operational overhead and eliminates the need to stream the data multiple times. Once ingested into InsightEdge, a lifecycle policy handles the underlying data movement from AnalyticsXtreme to data lakes or data warehouses without exposing the architectural complexity to the application layer, thereby simplifying security and data management.
As a module within the InsightEdge platform, AnalyticsXtreme also provides seamless multi-region and multi-cloud replication for data lakes and data warehouses.
A unified API including Spark Dataframe or SQL allows standard application development with access to data on both the speed layer (InsightEdge) and batch layer. Full decoupling of the application and the architecture data model simplifies the application development.
AnalyticsXtreme supports interactive SQL queries and machine learning with Spark dataset/dataframes. The InsightEdge Apache Zeppelin web notebook, which is part of the enterprise software package, includes an InsightEdge JDBC interpreter for connecting to the data source. Using the AnalyticsXtreme syntax, a complex query can be written in a single string. Additional optimization occurs when the query is pushed to the data grid because data filtering and aggregation occur at the grid level, distributed to each partition.
In the following example, the first query is passed to the data grid as a standard Spark query, while the second query utilizes the optimized AnalyticsXtreme syntax.
An InsightEdge JDBC driver is also provided for live connections over BI tools, such as Tableau and Looker, on a unified real-time and historical view. For example, the following graph displays the data from a SQL query that was run on the speed layer (InsightEdge) for 2019 data, and on the batch layer (HDFS) for 2018 data.
Combined with GigaSpaces’ MemoryXtend module, AnalyticsXtreme allows enterprises to leverage business-driven policies to automatically auto-tier hot and warm data between RAM, Persistent Memory and SSD, as well as automatically move data for cold storage and archive information in data lakes and data warehouses. By utilizing these advanced intelligent tiering mechanisms, data is efficiently stored according to performance in the appropriate storage layer and infrastructure costs across the entire solution and data lifecycle are optimized. Access to data is accelerated by up to 100X, delivering faster, smarter insights that are instantly actionable.
The following is an example of an AnalyticsXtreme data lifecycle policy:
The lifecycle policy defines the data being managed (typeName, in this case StockData), the period of time that the data should reside in the speed layer (speedPeriod, set here to 5 days), the data source and data target, as well as several other parameters such as the time format and when the data should be moved to the batch layer.
In today’s fast paced world of “now,” a process that is too slow often leads to decisions that are based on stale data and insights that may no longer be relevant or applicable.
With 100X faster access to data lakes, AnalyticsXtreme powers real-time advanced analytics, machine learning and deep learning on hot, warm and cold (historical) data for smarter insights with sub-second response time, at scale. Consequently, AnalyticsXtreme simplifies big data architecture by delivering:
- One ingestion layer
- One source of data governance and movement from the speed layer to the batch layer
- One API for applications to access all data
- One logical view of all data