Skip to content
GigaSpaces Logo GigaSpaces Logo
  • Products, Solutions & Roles
    • Products
      • InsightEdge Portfolio
        • Smart Cache
        • Smart ODS
        • Smart Augmented Transactions
        • Compare InsightEdge Products
      • GigaSpaces Cloud
    • Solutions
      • Industry
        • Financial Services
        • Insurance
        • Retail and eCommerce
        • Telecommunications
        • Transportations
      • Technical
        • Operational BI
        • Mainframe & AS/400 Modernization
        • In Memory Data Grid
        • Transactional and Analytical Processing (HTAP)
        • Hybrid Cloud Data Fabric
        • Multi-Tiered Storage
        • Kubernetes Deployment
        • Streaming Analytics for Stateful Apps
    • Roles
      • Architects
      • CXOs
      • Product Team
    • Building a Successful Hybrid and Multicloud Strategy
      vid-icon Guide

      Learn how to build and deploy a successful hybrid and multicloud strategy to achieve: agility and scalability, faster go-to-market, application acceleration, legacy modernization, and more.

      DOWNLOAD
    • Contact us
    • Try Free
  • Resources
    • Resource Hub
      • Webinars
      • Demos
      • Solution Briefs & Whitepapers
      • Case Studies
      • Benchmarks
      • ROI Calculators
      • Analyst Reports
      • eBooks
    • col2
      • Featured Case Studies
        • Mainframe Offload with Groupe PSA
        • Digital Transformation with Avanza Bank
        • High Peak Handling with PriceRunner
        • Optimizing Business Communications with Avaya
    • col3
      • Blog
      • Technical Documentation
    • Live Webinar: Enable Digital Transformation With a High Performing Data Platform
      article-icon Live Webinar | March 11 - 9 AM EST/3 PM CET

      Join Capgemini and GigaSpaces for a discussion on the latest modernization trends for enterprises that are embarking on digital and business transformation

      REGISTER NOW
    • Contact Us
    • Try Free
  • Company
    • Col1
      • About
      • Customers
      • Management
      • Board Members
      • Investors
      • Events
      • News
      • Careers
    • col2
      • Partners
      • OEM Partners
      • System Integrators
      • Technology Partners
      • Value Added Resellers
    • col3
      • Support & Services
      • University
      • Services
      • Support
    • GigaSpaces is Headed to CDAO 2021
      calendar-icon Event | March 2-4

      Join us at CDAO 2021, the premier Virtual Summit for Data and Analytics Leaders. We'll be moderating "Transforming Financial Services to a Customer-Centric Business", alongside USAA Bank, Regions Bank, and Capital Group.

      SIGN UP NOW
    • Contact Us
    • Try Free
  • Contact Us
  • Try Free
  • Products, Solutions & Roles
    • Products
      • InsightEdge Portfolio
        • Smart Cache
        • Smart ODS
        • Smart Augmented Transactions
        • Compare InsightEdge Products
      • GigaSpaces Cloud
    • Solutions
      • Industry
        • Financial Services
        • Insurance
        • Retail and eCommerce
        • Telecommunications
        • Transportations
      • Technical
        • Operational BI
        • Mainframe & AS/400 Modernization
        • In Memory Data Grid
        • Transactional and Analytical Processing (HTAP)
        • Hybrid Cloud Data Fabric
        • Multi-Tiered Storage
        • Kubernetes Deployment
        • Streaming Analytics for Stateful Apps
    • Roles
      • Architects
      • CXOs
      • Product Teams
    • Contact Us
    • Try Free
  • Resources
    • Resource Hub
      • Webinars
      • Demos
      • Solution Briefs & Whitepapers
      • Case Studies
      • Benchmarks
      • ROI Calculators
      • Analyst Reports
      • eBooks
    • Featured Case Studies
      • Mainframe Offload with Groupe PSA
      • Digital Transformation with Avanza Bank
      • High Peak Handling with PriceRunner
      • Optimizing Business Communications with Avaya
    • Blog
    • Technical Documentation
    • Contact Us
    • Try Free
  • Company
    • About
    • Management
    • Customers
    • Board Members
    • Investors
    • Events
    • News
    • Careers
    • Partners
      • OEM Partners
      • System Integrators
      • Technology Partners
      • Value Added Resellers
    • Support & Services
      • University
      • Services
      • Support
  • Contact Us
  • Try Free

Capturing Operational Intelligence from Unstructured Data

Subscribe to our blog!

Subscribe for Updates
Close
Back

Capturing Operational Intelligence from Unstructured Data

Ali Hodroj March 31, 2017
5 minutes read

According to a recent estimate by IDC, unstructured data occupies more than 80% of the data by volume in the entire digital space of an enterprise. This massive corpus includes call center transcripts, product reviews, feedback forms, support case descriptions, social media, and blog articles. Not to mention, that in the age of IoT, customers aren’t the only sources of unstructured data. Sensors and network equipment generate log files and valuable information. Leaving most of this data untapped hinders enterprises from gaining visibility and insight into customer-facing business operations.

Such data growth certainly adds a lot of performance and scalability demands on architecting data lakes and analytics infrastructures. With innovative in-memory computing architectures (such as GigaSpaces XAP), distributed analytics and machine learning (GigaSpaces InsightEdge), in-memory data grids can provide a solution to query and analyze live text data feeds to operationalize unstructured data lakes in real-time.

Applications of Real-time Text Analysis

Mining insights from an endless stream of textual data can unlock deeply hidden insights across many industries. Organizations can apply in-memory data processing along with text mining algorithms to improve customer experience, reduce churn, and predict future customer demands. Let’s consider some of the use cases where real-time text analytics can move the needle:

Financial Services

Fraud detection is a billion-dollar problem in finance, affecting consumers and banks alike. Financial firms can analyze call center records, voice transcripts, and combine it with geospatial data to detect and prevent fraud through predictive analytics and machine learning techniques.

Retail

Monitoring product reviews and public activity on social media is now a business necessity for the omnichannel retailer. Retails want to track topics about their brand that are trending on Twitter for real-time channel retargeting. They want to be informed instantly when their customers post something with a negative sentiment about their brand.

Healthcare

As healthcare becomes more digital, the accuracy of ambulatory and hospital patient records is critical, as structuring health record is a key requirement to improving the quality of care. Healthcare organizations can analyze unstructured physician notes in real-time to predict epidemic outbreaks and provide accurate medical decision support algorithms.
CAPTURING OPERATIONAL INTELLIGENCE FROM UNSTRUCTURED DATA

Why Data Lake + Search Engine is Not Enough

The current data processing approach analyzing unstructured data by building a data lake architecture. A data lake is a large-scale data warehouse that holds vast amounts of unstructured data — to be transformed and analyzed when needed. In data warehousing terms, a data lake implements an Extract-Load-Transform (ELT) data pipeline. Consequently, the data lake will host hundreds (if not thousands) of terabytes of unstructured data (JSON files, text files, logs). Hence, HDFS becomes a common choice for a data store. In such an architecture, a search is ultimately a necessary component for both information indexing/retrieval as well as data catalog discovery.

While popular search engines (Solr, ElasticSearch) are great managing and indexing data lake contents, they are not built for low-latency, event-driven, and real-time text search against flowing streams of data for the above use cases. What we need is the ability to ingest, consume, and analyze billions of unstructured data points and seamlessly execute continuous real-time queries against them to generate contextual insights that are immediately accessible to customer-facing applications.

XAP 12.1 Search and Query: Operationalizing Data Lake Intelligence

Because an in-memory grid consolidates the storage of data in RAM and Flash with the processing of business logic in the same runtime space, real-time and event-driven text analysis, can be accomplished in milliseconds, as opposed to the minutes or hours it takes using a traditional search engine.

With XAP 12.1, we’ve extended the data modeling capabilities of our in-memory data grid to allow for full text indexing against in-memory data. This recent capability, along with others below, provides the foundational core capabilities of an operationalized data lake:
In-Memory Full-Text Indexing: XAP 12.1 introduces Full-Text Search API based on Lucene indexes and analyzers so users can run search queries (wildcard, fuzzy match) in memory at high throughput and low latencies. Combined with the rest of XAP’s event-driven container API, applications can trigger events and messages in the moment based on a text search criteria
Hybrid RAM/Flash Data Processing: To expand the in-memory data grid footprint beyond a few terabytes, XAP provides a multi-tiered data storage architecture (also known as MemoryXtend) that can scale low-latency data processing between RAM and Flash array across hundreds of terabytes.

Apache Spark Integration via InsightEdge: The core in-memory data grid engine that powers XAP can also be used through any Apache Spark API. This means that any geospatial, full-text, or structured data query result automatically becomes an RDD or a DataFrame. For a data lake architecture, this provides a very efficient ad-hoc in-memory data ingestion and transformation from HDFS to XAP through Spark jobs.

 

 

Leveraging an in-memory data grid (such as GigaSpaces XAP) coupled with high performance distributed analytics tools (GigaSpaces InsightEdge), scales unstructured data and real-time text processing across many computing nodes. The results can be processed at millisecond latencies to produce meaningful insights at the speed of business. Such performance and scalability are one of the primary enablers of democratizing data insight from data lakes.

 

CATEGORIES

  • XAP
Ali Hodroj

All Posts (15)

YOU MAY ALSO LIKE

June 15, 2016

Building a Low Latency Highly…
9 minutes read

September 22, 2016

GigaSpaces Announces XAP 12.0.1 Release
2 minutes read

August 1, 2017

GigaSpaces and Intel Close Memory-Storage…
2 minutes read
  • Copied to clipboard

PRODUCTS, SOLUTIONS & ROLES

  • Products
  • InsightEdge Portfolio
    • Smart Cache
    • Smart ODS
    • Smart Augmented Transactions
    • Compare InsightEdge Products
  • GigaSpaces Cloud
  • Roles
  • Architects
  • CXOs
  • Product Teams
  • Solutions
  • Industry
    • Financial Services
    • Insurance
    • Retail and eCommerce
    • Telecommunications
    • Transportation
  • Technical
    • Operational BI
    • Mainframe & AS/400 Modernization
    • In Memory Data Grid
    • HTAP
    • Hybrid Cloud Data Fabric
    • Multi-Tiered Storage
    • Kubernetes Deployment
    • Streaming Analytics for Stateful Apps

RESOURCES

  • Resource Hub
  • Webinars
  • Blogs
  • Demos
  • Solution Briefs & Whitepapers
  • Case Studies
  • Benchmarks
  • ROI Calculators
  • Analyst Reports
  • eBooks
  • Technical Documentation
  • Featured Case Studies
  • Mainframe Offload with Groupe PSA
  • Digital Transformation with Avanza Bank
  • High Peak Handling with PriceRunner
  • Optimizing Business Communications with Avaya

COMPANY

  • About
  • Customers
  • Management
  • Board Members
  • Investors
  • News
  • Events
  • Careers
  • Contact Us
  • Book A Demo
  • Try GigaSpaces For Free
  • Partners
  • OEM Partners
  • System Integrators
  • Value Added Resellers
  • Technology Partners
  • Support & Services
  • University
  • Services
  • Support
Copyright © GigaSpaces 2021 All rights reserved | Privacy Policy
LinkedInTwitterFacebookYouTube

Contact Us