Read-Through and Write-Through in Clustered Space

  Search Here
Searching XAP 6.0 Documentation

                                               

Summary: How the ExternalDataSource interface works in both partitioned and replicated clustered spaces.

Overview

This section describes how the ExternalDataSource interface works in both partitioned and replicated clustered spaces.

To enable logging for the ExternalDataSource, edit the <GigaSpaces Root>\config\gs_logging.properties file and set the persistent level to CONFIG or FINER. For more details, refer to the Settings & Configuration section.

Optimistic locking mode is not supported when using ExternalDataSource.

Partitioned Clustered Space

Read-Through in Partitioned Clustered Space

The two diagrams above illustrate how a client application reads from a partitioned clustered space; where the actual data is loaded from a central or non-central data source (i.e., a separate dedicated database instance per space partition), or another external application. The total data stored inside the clustered space is divided into two physical partitions, each in a different JVM, where one of the fields hashcode value inside the Entry determines the partition that stores the Entry.

Each Entry stored inside the primary partition is backed up in a separate dedicated space for a continuous uninterrupted operation in case of "hot failover". In any case, the data is also stored in the database. Each space partition has a ExternalDataSource implemented, which allows the space to load relevant data from the database.

When a partitioned space works with a central data source, each partition is protected from reading data from the database that does not reside in that partition.

The operation proceeds in two stages:

  1. The client application calls the JavaSpaces API read to get object A.
    An example for the client application call:
    Person template = new Person();
    template.setId(ID);
    template.setFirstName("firstName");
    Person person = (Person)space.read(template, null, timeout);
  2. The JavaSpaces API uses the template.id hashcode to route the operation to Partition 1 and initiates data lookup in that partition for a matching Entry. The first matching Entry found from the Person class or its sub-classes instances is returned. If no such Entry is found, the ExternalDataSource interface method read in Partition 1 is called by the space, passing as a parameter the template data. The read method gets the required record data from the database, creates a relevant Person object with the Person class data and returns it back to the space. The space returns the result object to the client implicitly.

    Similarly, object B is read first from Partition 2 using a relevant template. If it is not found, it is read from the database by the partition's ExternalDataSource.read method.


Write-Through in Partitioned Clustered Space

The two diagrams below illustrate how a client application writes to a partitioned clustered space and persists the data in a distributed or central database, or another external application. Each partition has an ExternalDataSource implementation that allows the space to store the data in the database.

The operation proceeds in two stages:

  1. The client application calls the JavaSpace API write method to write object A.
    An example for the client application call:
    Lease lease = space.write(person, null, Lease.FOREVER);
  2. The JavaSpaces API uses the personID field hashcode to determine that Entry A should be located in Partition 1 (JVM2), and writes A into that partition. Entry A is then backed up in the dedicated space in JVM 1. The ExternalDataSource interface method write in partition 1 is called by the space, passing the original data as a parameter. The write method writes the required data to the database in record format.

    Similarly, object B is written first to Partition 2, backed up in the dedicated space in JVM 4, and written to the database by the partition's ExternalDataSource.write method.


Make sure the <fail-over-policy> tag is not used as part of the partitioned schema, or have the failover <policy-type> to be fail-to-backup.

Not enabling the settings above results in an incorrect data load from the database, and incorrect data in the partitions.

A partitioned space using the ExternalDataSource implementation with a central database should have the following displayed when started:

Space schema:

<external-data-source>
     <data-source-class>com.gigaspaces.datasource.hibernate.HibernateDataSource</data-source-class>
     <data-class>class java.lang.Object</data-class>
     <supports-inheritance>true</supports-inheritance>
     <supports-version>false</supports-version>
     <usage>read-write</usage>
</external-data-source>

Cluster schema:

<cache-loader>
    <external-data-source>true</external-data-source>
    <central-data-source>true</central-data-source>
</cache-loader>

Optimization – Loading Data Specific to Partition

To boost the pre-load phase, each partition would need to Query the database using its partition ID provided as part of the ManagedDataSource.STATIC_PARTITION_NUMBER. This will make sure each partition retrieves the exact result set from the database when loading data back into the partition.

The ExternalDataSource interface should be implemented with its initialLoad() method to return an implementation of the com.gigaspaces.datasource.DataIterator that allows you to load into the space the relevant data set. When running partitioned space you need to load the specific data set the partition need to store. This means your database query needs to "slice" the correct data set from the database based on the partition ID. The partition ID can be retrieved from the ExternalDataSource.init(Properties) in the following manner:

public void init(Properties prop) throws DataSourceException
     {
        int numberOfPartitions=((Integer)prop.get(ManagedDataSource.NUMBER_OF_PARTITIONS)).intValue();
 
        //load the data when the hashcode of the routing index MOD numberOfPartitions==partitionNumber-1
        int partitionNumber=((Integer)prop.get(ManagedDataSource.STATIC_PARTITION_NUMBER)).intValue();
     }
 }

For example: if you have a Person class that maps to the Person table and have the PERSON_ID as the routing field , the query each partition would need to perform to fetch the correct result set to load into its space would be:

Select * from Person where "MOD(PERSON_ID," + numberOfPartitions + ") = " + (partitionNumber -1)
Make sure the routing field (i.e. PERSON_ID) will be an Integer type.

This query should be called from the ExternalDataSource.initialLoad() implementation to retrieve the relevant database result set the space should load. The query involves the space partition ID and the relevant table column to retrieve the correct rows. Since each space partition stores a subset of the data , based on the entry routing field hash code value , you need to load the data from the database in the same manner the client load balance the data when interacting with the different partitions.

The database query using the MOD , PERSON_ID , numberOfPartitions and the partitionNumber is identical to the activity done by a space client when performing write/read/take operations with partitioned space.

prop.get(ManagedDataSource.STATIC_PARTITION_NUMBER) first number will be 1. If the space is not partitioned it will return 0.
prop.get(ManagedDataSource.NUMBER_OF_PARTITIONS) will return 1 for non partitioned space.

Replicated Clustered Space

This section shows how the ExternalDataSource interface works in a replicated clustered space.

Read-Through in Replicated Clustered Space

The two diagrams below illustrate how a client application reads from a replicated clustered space, where the actual data is loaded from a distributed or a central data source or another external application. Each space has a ExternalDataSource implementation that provides the space the ability to load data from the database when it is not found inside the space.

The operation proceeds in two stages:

  1. Entry B was previously written to Location 2 and replicated to Location 1.
  2. The client application in Location 1 calls the JavaSpaces API read to get object A.
    An example for the client application call:
    Person person = (Person)space.read(templateA, null, timeout);

    The space initiates data lookup in the space at Location 1, for a matching Entry. The first matching Entry found from the Person class or its sub-classes instances is returned. If no such Entry is found, the ExternalDataSource interface method read in Location 1 is called by the space, passing as a parameter the template data. The read method gets the required record data from the database and returns it back to the space. The space returns the result Entry to the client implicitly. Entry B is read by the client application from the space at Location 1, where it reads the copy of Entry B that was replicated from the original in Location 2. If not found, it is read from the database by the space ExternalDataSource.read method.

    For a distributed database, the database in each location maintains a copy of all the data in both locations, so that a client application backs up and retrieves data from the database in the nearest location.


Write-Through in Replicated Clustered Space

When a replicated space is using the ExternalDataSource implementation with a central data source configuration, write, take, or update operations are not replicated from the primary space to the backup spaces.

Once the backup space becomes active and applications can access it directly, it loads data from the database using its own ExternalDataSource implementation. This ensures data coherency and provides better performance.

When a replicated space uses the ExternalDataSource implementation with a non-central data source configuration – i.e. each space uses a different database instance; the write, take, or update operations are replicated from the primary space to the backup spaces, and persisted into the backup space database.

In both configurations (central and non-central data source), when data is loaded into the active space from the database using its ExternalDataSource, the loaded Entries are not replicated into the replica spaces.


Configuration

When a clustered space is using the ExternalDataSource implementation, you should start all nodes using the following property:

com.gs.cluster.cache-loader.external-data-source=true

When a clustered space is using a central database for all nodes, you should start all nodes using the following property:

com.gs.cluster.cache-loader.central-data-source=true

When a clustered space is using different database instances for each space instance, you should start each node using the following property:

com.gs.cluster.cache-loader.central-data-source=false

The following table summaries the different options:

Cache Policy Central Data Source Recovery works Amount of data loaded Data filtering done as part of initialLoad
LRU YES NO Up to amount of initial load percentage value YES
ALL_IN_CACHE YES YES All database data YES
LRU NO YES Up to amount of initial load percentage value NO
ALL_IN_CACHE NO YES All database data NO
  • Up to amount of initial load percentage value (50%) - means percentage of cache_size value.


IMPORTANT: This is an old version of GigaSpaces XAP. Click here for the latest version.
GigaSpaces 6.0 Documentation Contents (Current Page in Bold)

    Java

    C++

    .NET

    Middleware Capabilities

    Configuration and Management

Add GigaSpaces wiki search to your browser search engines!
(works on Firefox 2 and Internet Explorer 7)

Labels