Skip to content
GigaSpaces Logo GigaSpaces Logo
  • Products
    • InsightEdge Portfolio
      • Smart Cache
      • Smart ODS
      • Smart Augmented Transactions
    • GigaSpaces Cloud
  • Roles
    • Architects
    • CXOs
    • Product Teams
  • Solutions
    • Industry Solutions
      • Financial Services
      • Insurance
      • Retail and eCommerce
      • Telecommunications
      • Transportations
    • Technical Solutions
      • Operational BI
      • Mainframe & AS/400 Modernization
      • In Memory Data Grid
      • Transactional and Analytical Processing (HTAP)
      • Hybrid Cloud Data Fabric
      • Multi-Tiered Storage
      • Kubernetes Deployment
      • Streaming Analytics for Stateful Apps
  • Customers
  • Company
    • About GigaSpaces
    • Customers
    • Partners
    • Support & Services
      • University
      • Services
      • Support
    • News
    • Contact Us
    • Careers
  • Resources
    • Webinars
    • Blog
    • Demos
    • Solution Briefs & Whitepapers
    • Case Studies
    • Benchmarks
    • ROI Calculators
    • Analyst Reports
    • eBooks
    • Technical Documentation
  • Contact Us
  • Try Free

Rapid Data Load

Subscribe to our blog!

Subscribe for Updates
Close
Back

Rapid Data Load

Shay Hassidim July 31, 2009
2 minutes read

I’ve been involved recently with a POC where we had to load large amount of data into the Data Grid and later perform some complex queries. I took the flight from NY to LA to present the POC and had really only few hours on the plane to build the POC code from scratch. Building the code that performs the queries was easy: Just create the relevant space domain POJO classes (based on the relevant database tables the prospect provided), implement an executor that performs the SQL query, returned the relevant data set needed and finally reduced it the client side. The ABC of Map-Reduce – Cool.

The problem was how to load large amount of data without messing with creating a database and loading its data into the Data Grid. A simple solution I managed to build very quickly was to create a data generator utility that simply push data into the Data Grid. The nice thing here is that I could adapt this data generator to push data into the relevant Data Grid partition based on a given partition ID. This how I could imitate what would happen when a remote client will write data into the Data Grid or when data will be loaded from a database into the Data Grid once the it will be started.

The idea was simple: since data was partitioned based on the Currency field (it was market data application POC) I created groups of currencies (based on their hash code) that belong to the same logical partition. The number of groups was identical to the number of Data Grid partitions (identified during runtime – so it was totally dynamic code). The Data generator would pick a random currency from the list of currencies that was part of a specific group. The only thing that needs to be passed into the Data generator was the partition ID.

The trigger the data load process I used… Yes – an Executor implementation. Since the Task implementation running within the partition, it can retrieve its “hosting” partition ID and pass it to the Data Generator to create data that fits the hosting partition.

With the above technique I managed to load large amount of data (few millions of objects) within very few seconds (pushing about 100,000 objects per seconds into a data grid with 4 partitions) on my Dual-core Del laptop.

Shay Hassidim

CATEGORIES

  • GigaSpaces
Shay Hassidim

All Posts (40)

YOU MAY ALSO LIKE

August 13, 2009

The Word is Out on…
7 minutes read

July 25, 2010

YeSQL (part II) – Putting…
12 minutes read

June 1, 2009

GigaSpaces Launches a New Version…
11 minutes read
  • Copied to clipboard

PRODUCTS, SOLUTIONS & ROLES

  • Products
  • InsightEdge Portfolio
    • Smart Cache
    • Smart ODS
    • Smart Augmented Transactions
  • GigaSpaces Cloud
  • Roles
  • Architects
  • CXOs
  • Product Teams
  • Solutions
  • Industry
    • Financial Services
    • Insurance
    • Retail and eCommerce
    • Telecommunications
    • Transportation
  • Technical
    • Operational BI
    • Mainframe & AS/400 Modernization
    • In Memory Data Grid
    • HTAP
    • Hybrid Cloud Data Fabric
    • Multi-Tiered Storage
    • Kubernetes Deployment
    • Streaming Analytics for Stateful Apps

RESOURCES

  • Resource Hub
  • Webinars
  • Blogs
  • Demos
  • Solution Briefs & Whitepapers
  • Case Studies
  • Benchmarks
  • ROI Calculators
  • Analyst Reports
  • eBooks
  • Technical Documentation
  • Featured Case Studies
  • Mainframe Offload with Groupe PSA
  • Digital Transformation with Avanza Bank
  • High Peak Handling with PriceRunner
  • Optimizing Business Communications with Avaya

COMPANY

  • About
  • Customers
  • Management
  • Board Members
  • Investors
  • News
  • Events
  • Careers
  • Contact Us
  • Book A Demo
  • Try GigaSpaces For Free
  • Partners
  • OEM Partners
  • System Integrators
  • Value Added Resellers
  • Technology Partners
  • Support & Services
  • University
  • Services
  • Support
Copyright © GigaSpaces 2021 All rights reserved | Privacy Policy
LinkedInTwitterFacebookYouTube

Contact Us