|
Summary: How the ExternalDataSource interface works in both partitioned and replicated clustered spaces.
OverviewThis section describes how the ExternalDataSource interface works in both partitioned and replicated clustered spaces.
Partitioned Clustered SpaceRead-Through in Partitioned Clustered SpaceThe two diagrams above illustrate how a client application reads from a partitioned clustered space; where the actual data is loaded from a central or non-central data source (i.e., a separate dedicated database instance per space partition), or another external application. The total data stored inside the clustered space is divided into two physical partitions, each in a different JVM, where one of the fields hashcode value inside the Entry determines the partition that stores the Entry. Each Entry stored inside the primary partition is backed up in a separate dedicated space for a continuous uninterrupted operation in case of "hot failover". In any case, the data is also stored in the database. Each space partition has a ExternalDataSource implemented, which allows the space to load relevant data from the database.
The operation proceeds in two stages:
Write-Through in Partitioned Clustered SpaceThe two diagrams below illustrate how a client application writes to a partitioned clustered space and persists the data in a distributed or central database, or another external application. Each partition has an ExternalDataSource implementation that allows the space to store the data in the database. The operation proceeds in two stages:
Not enabling the settings above results in an incorrect data load from the database, and incorrect data in the partitions. A partitioned space using the ExternalDataSource implementation with a central database should have the following displayed when started: Space schema: <external-data-source> <data-source-class>com.gigaspaces.datasource.hibernate.HibernateDataSource</data-source-class> <data-class>class java.lang.Object</data-class> <supports-inheritance>true</supports-inheritance> <supports-version>false</supports-version> <usage>read-write</usage> </external-data-source> Cluster schema: <cache-loader> <external-data-source>true</external-data-source> <central-data-source>true</central-data-source> </cache-loader> Optimization – Loading Data Specific to PartitionTo boost the pre-load phase, each partition would need to Query the database using its partition ID provided as part of the ManagedDataSource.STATIC_PARTITION_NUMBER. This will make sure each partition retrieves the exact result set from the database when loading data back into the partition. The ExternalDataSource interface should be implemented with its initialLoad() method to return an implementation of the com.gigaspaces.datasource.DataIterator that allows you to load into the space the relevant data set. When running partitioned space you need to load the specific data set the partition need to store. This means your database query needs to "slice" the correct data set from the database based on the partition ID. The partition ID can be retrieved from the ExternalDataSource.init(Properties) in the following manner: public void init(Properties prop) throws DataSourceException { int numberOfPartitions=((Integer)prop.get(ManagedDataSource.NUMBER_OF_PARTITIONS)).intValue(); //load the data when the hashcode of the routing index MOD numberOfPartitions==partitionNumber-1 int partitionNumber=((Integer)prop.get(ManagedDataSource.STATIC_PARTITION_NUMBER)).intValue(); } } For example: if you have a Person class that maps to the Person table and have the PERSON_ID as the routing field , the query each partition would need to perform to fetch the correct result set to load into its space would be: Select * from Person where "MOD(PERSON_ID," + numberOfPartitions + ") = " + (partitionNumber -1)
This query should be called from the ExternalDataSource.initialLoad() implementation to retrieve the relevant database result set the space should load. The query involves the space partition ID and the relevant table column to retrieve the correct rows. Since each space partition stores a subset of the data , based on the entry routing field hash code value , you need to load the data from the database in the same manner the client load balance the data when interacting with the different partitions. The database query using the MOD , PERSON_ID , numberOfPartitions and the partitionNumber is identical to the activity done by a space client when performing write/read/take operations with partitioned space.
Replicated Clustered SpaceThis section shows how the ExternalDataSource interface works in a replicated clustered space. Read-Through in Replicated Clustered SpaceThe two diagrams below illustrate how a client application reads from a replicated clustered space, where the actual data is loaded from a distributed or a central data source or another external application. Each space has a ExternalDataSource implementation that provides the space the ability to load data from the database when it is not found inside the space. The operation proceeds in two stages:
Write-Through in Replicated Clustered Space
Once the backup space becomes active and applications can access it directly, it loads data from the database using its own ExternalDataSource implementation. This ensures data coherency and provides better performance. When a replicated space uses the ExternalDataSource implementation with a non-central data source configuration – i.e. each space uses a different database instance; the write, take, or update operations are replicated from the primary space to the backup spaces, and persisted into the backup space database. In both configurations (central and non-central data source), when data is loaded into the active space from the database using its ExternalDataSource, the loaded Entries are not replicated into the replica spaces.
ConfigurationWhen a clustered space is using the ExternalDataSource implementation, you should start all nodes using the following property: com.gs.cluster.cache-loader.external-data-source=true
When a clustered space is using a central database for all nodes, you should start all nodes using the following property: com.gs.cluster.cache-loader.central-data-source=true
When a clustered space is using different database instances for each space instance, you should start each node using the following property: com.gs.cluster.cache-loader.central-data-source=false
The following table summaries the different options:
|
(works on Firefox 2 and Internet Explorer 7)







