Today GigaSpaces announced InsightEdge Data Lake Accelerator, AnalyticsXtreme, to accelerate Machine Learning on real-time and historical data for smarter insights with sub-second response time, at scale.
As enterprises strive to stay innovative and remain competitive, they are looking to leverage their vast amounts of data to gain real-time insights to make informed decisions and act instantly. Architectural complexity leads to integration and performance challenges, resulting in data that can neither be transformed nor leveraged in real-time when its value is highest.
“Data latency was still far from what our business needed. New data was only accessible to users once every 24 hours, which was too slow to make real-time decisions.”
– Reza Shiftehfar, Lead of Uber’s Hadoop Platform team
In today’s fast paced world of “now,” a process that is too slow often leads to decisions that are based on stale data and insights that may no longer be relevant or applicable.
Our customers want to not only accelerate batch analytics, but also require event-driven analytics and machine learning in real-time for smarter insights that can be acted upon instantly. AnalyticsXtreme further simplifies enterprise big data architecture in the cloud and on-premise with the fastest software platform that can seamlessly plug and play on existing infrastructure such as Hadoop and Amazon S3. It reduces long ETL processes, eliminates unnecessary data duplication and avoids data ingestion bottlenecks from various data sources.
AnalyticsXtreme accelerates applications, in which real-time analytics, machine learning and deep learning are used for real time insights such as predictive maintenance, live risk analysis, fraud detection, location based advertising, dynamic pricing and more.
Interactive queries and machine learning models run simultaneously on both real-time mutable streaming data and on historical data that is stored on data lakes based on Hadoop, Amazon S3 or Azure Blob Storage, as well as data warehouses, such as Snowflake, without requiring a separate data load procedure or data duplication. Moving from on premise to the cloud, or changing technology stacks for example from Cloudera to Amazon S3 is seamless to machine learning applications; increasing flexibility while reducing development and maintenance costs. This level of integration is achieved automatically without requiring any changes in the data structures or logic in the machine learning apps; solving the complexities of big data architectures such as Lambda.
Figure 1: InsightEdge architecture
Combined with GigaSpaces MemoryXtend module, AnalyticsXtreme allows enterprises to leverage business-driven policy to automatically auto-tier hot and warm data between RAM, Persistent Memory and SSD, as well as automatically move data for cold storage, and archiving to data lakes and data warehouses. By utilizing these advanced mechanisms for intelligent tiering, data is efficiently stored in the right storage layer based on performance, while optimizing infrastructure costs across the entire solution and data lifecycle. Access to data is accelerated by up to 100X for faster, smarter insights that are instantly actionable.
The solution also provides a single logical view of data that spans across real-time and historical data platforms, including SQL, Spark dataset/dataframe as well as BI tools, like Tableau and Looker.
AnalyticsXtreme benefits include:
Faster, smarter analytics
- Real-time access to and analytics on frequently used mutable data and historical data with out-of-the-box ETL
- Acceleration of batch analytics by orders of magnitude, from days to hours or hours to minutes
Faster time-to-market
-
- Agile application development leveraging unified API access to reliable, strongly consistent data across real-time and historical platforms
- Interactive SQL queries, Machine Learning with Spark dataset/dataframes and JDBC driver for live connections over BI tools, like Tableau and Looker on a unified real-time and historical view
Greater simplicity
-
- Simpler operations and data governance – automatic lifecycle policy handles the underlying data movement simplifying security and data management
- Seamless multi-region and multi-cloud replication for data lakes and data warehouses
Let’s See it In Action
In this case, data from the Deutsche Börse, which is comprised of trade data from the Eurex and Xetra trading systems, that provide the initial price, lowest price, highest price, final price and volume for every tradable security is streamed into InsightEdge via Kafka.
Figure 2: AnalyticsXtreme data flow diagram with InsightEdge and HDFS
The data can be seen below:
The trading data is automatically stored in the tiered storage layers via defined business priority, partitioned by time and indexed by security id.
- Most important (Hot) data from 2019 in RAM
- Less important (Warm) data from 2018 in SSD with MemoryXtend
- Historical (Cold) data from before 2017 in HDFS leveraging AnalyticsXtreme data lifecycle policy
Here’s a snapshot of the data lake lifecycle policy with AnalyticsXtreme:
Here’s a snapshot of the caching policy with MemoryXtend:
The InsightEdge JDBC interpreter provides a unified SQL and Spark API across speed and batch layers: %insightedge_jdbc
Decoupling the API from the data modeling and architecture results in simplified application development.
Here’s a view of a query running on Zeppelin, counting the number of trades from 2010-2017 via InsightEdge:
Queries run on 2019 (HOT) data were executed in 6 milliseconds
Queries run on 2018 (WARM) data were executed in 72 milliseconds
Queries run on 2015-2017 (COLD) data were executed in 2469 milliseconds
AnalyticsXtreme is available as part of the InsightEdge in-memory computing platform which provides high-throughput ACID transactional processing, stream processing, and co-location of applications and analytics to enable organizations to act on time-sensitive data as it is born at millisecond performance.
AnalyticsXtreme is GA as part of InsightEdge release 14.2. To learn more about the solution, watch the recorded webinar hosted with 451 Research.
See our upcoming blog which will dive into Lambda architecture complexities and limitations, and address how AnalyticsXtreme simplifies and accelerates your real-time analytics initiatives to become insight-driven.