|
Summary: A Mirror is an asynchronous replica of data sent by spaces in a cluster. This data is asynchronously batched into the Mirror Service, which interacts with its configured data-source.
OverviewA Mirror is a centralized asynchronous replica of data sent by spaces in a cluster. This data is asynchronously batched into the Mirror Service which interacts with its configured data source. The default Mirror Service implementation uses Hibernate (RDBMS) to interact with an external data source. You can implement your own Mirror Service that interacts with any data source of your choice. The Mirror Service is an implementation of the data-cache persistency modules. The com.gigaspaces.datasource.BulkDataPersister interface is the preferred way to persist asynchronously batched data into a data source. This is done under one transactional context. Other interfaces (see the com.gigaspaces.datasource APIs) persist data and handle each piece of data under its own transactional context. Thus, asynchronous batches maximize the efficiency when interacting with the external data source. This activity is sometimes referred to as a write-behind operation. By nature, asynchronous replication is not reliable, and data might not be replicated if the source space has terminated unexpectedly. Reliability can be achieved only in the presence of other synchronous replicated spaces. Achieving reliability is later discussed in more detail.
Mirror Service as SpaceThe Mirror Service implementation replaces the storage back-end of the space.
Loading the TopologyThe Mirror Service is loaded just like any other space, with the additional schema property set with mirror-space-schema.xml as the schema to use. There are two ways to load a Mirror Service space:
The Mirror Service configuration only requires that other cluster members know that it is enabled. Cluster members do not need to explicitly define the Mirror as a member in their replication group. There are two ways to enable the Mirror Service:
Viewing the TopologyThe GigaSpaces Management Center displays the Mirror Service space in the Cluster view alongside the replication group's cluster members. The Mirror Service, like any other member, has its status displayed (alive - yellow nodes, and unreachable - green nodes) which allows you to provision and track failures or network disruptions.
ConfigurationThe Mirror Service can be configured using of the mirror-space-schema.xml. The external-data-source section in this XML specifies the data-source-class and some inherent properties which communicate what is supported by the implementation. Overriding the Mirror Service ImplementationBy default, the data-source-class is configured to use Hibernate (RDBMS) to interact with an external data source. When implementing your own Mirror Service, your should override this tag's value with the full class name, for example: <external-data-source> <!-- default is com.gigaspaces.datasource.hibernate.HibernateDataSource --> <data-source-class>com.company.datasource.MyMirrorDataSource</data-source-class> ... </external-data-source>
Communicating PropertiesThe Mirror Service properties are derived from those configurable for an external data source, see Settings & Configuration. Configuring ReplicationThe mirror-service block in the *-cluster.xml file specifies the interaction between the replication group members and the Mirror Service.
The default values are as follows: <mirror-service> <enabled>false</enabled> <url>jini://*/mirror-service_container/mirror-service</url> <bulk-size>100</bulk-size> <interval-millis>2000</interval-millis> <interval-opers>100</interval-opers> </mirror-service> <repl-policy> <repl-original-state>true</repl-original-state> </repl-policy>
Custom Mirror ImplementationSee below example (MirrorBench class) for a Custom Mirror implementation. This example calculates the throughput events are sent to the Mirror from the primary spaces. Since the events are sent from the primary spaces in asynchronous periodic manner, the throughput will be oscillating between small and large amount of events sent per second - i.e. there would be durations of time where the throughput will be high and durations of time where the throughput will be low. Here is illustration of the expected behavior:
The MirrorBench implements the BulkDataPersister and the ManagedDataSource interfaces. package com.j_spaces.examples.datasource; import java.util.List; import java.util.Properties; import java.util.concurrent.atomic.AtomicInteger; import com.gigaspaces.datasource.BulkDataPersister; import com.gigaspaces.datasource.BulkItem; import com.gigaspaces.datasource.DataIterator; import com.gigaspaces.datasource.DataSourceException; import com.gigaspaces.datasource.ManagedDataSource; public class MirrorBench implements BulkDataPersister,ManagedDataSource<Person> { AtomicInteger count = new AtomicInteger(); long lastTime = System.currentTimeMillis(); int lastVal =0; public void executeBulk(List<BulkItem> bulk) throws DataSourceException { count.getAndAdd(bulk.size()); int val = count.intValue(); if (val - lastVal >1000) { lastVal = val ; long dur = System.currentTimeMillis() - lastTime; lastTime = System.currentTimeMillis(); double tp = 1000 / (double)dur * 1000; System.out.println("got:" + val + " operations"+ " TP[sec]:" + tp); } } public void init(Properties arg) throws DataSourceException { System.out.println("init" + arg); } public DataIterator<Person> initialLoad() throws DataSourceException { return null; } public void shutdown() throws DataSourceException { } } The MirrorBench would be configured using the following properties (mirror.properties file): space-config.persistent.enabled=true
space-config.persistent.StorageAdapterClass=com.j_spaces.sadapter.datasource.DataAdapter
space-config.external-data-source.data-source-class=com.j_spaces.examples.datasource.MirrorBench
space-config.engine.cache_policy=0
space-config.external-data-source.init-properties-file=/config/mirror/hibernate.properties
The MirrorBench would be started using: gsInstance "/./mirror-service?schema=mirror&properties=mirror" "..;..\classes;%JARS%" The Partitioned Clustered space would be started using: gsInstance "/./mySpace?cluster_schema=partitioned-sync2backup&total_members=2,1&id=1&mirror" gsInstance "/./mySpace?cluster_schema=partitioned-sync2backup&total_members=2,1&id=2&mirror" gsInstance "/./mySpace?cluster_schema=partitioned-sync2backup&total_members=2,1&id=1&backup_id=1&mirror" gsInstance "/./mySpace?cluster_schema=partitioned-sync2backup&total_members=2,1&id=2&backup_id=1&mirror" The Client feeder application that performs the write operations will have the following: IJSpace space = (IJSpace) SpaceFinder.find("jini://*/*/mySpace"); long lastTime = System.currentTimeMillis(); for (int i = 0; i < 100000; i++) { space.write (new Person("first-" + i, "last-" + i, new Integer(i)) , null , Lease.FOREVER); if (i % 1000 ==0 ) { long dur = System.currentTimeMillis() - lastTime; lastTime = System.currentTimeMillis(); double tp = 1000 / (double)dur * 1000; System.out.println("wrote:" + i + " objects to space TP:" +tp ); } } Usage ScenariosWriting Asynchronously to Mirror Data Source | Reading from Mirror Data Source | Partitioning Over Central Mirror Data Source
Writing Asynchronously to Mirror Data SourceThe following is a schematic flow of a synchronous replicated cluster with 3 members, which are communicating with a Mirror Service:
The topology was loaded with the following command lines: gsInstance "/./mirror-service?schema=mirror" gsInstance "/./mySpace?cluster_schema=sync_replicated&total_members=3&id=1&mirror" gsInstance "/./mySpace?cluster_schema=sync_replicated&total_members=3&id=2&mirror" gsInstance "/./mySpace?cluster_schema=sync_replicated&total_members=3&id=3&mirror" Reading from Mirror Data SourceThe Mirror Service space is used to asynchronously persist data into the data source. As noted above, the Mirror is not a regular space and should not be interacted with directly. Thus, data can't be read from the data source using the Mirror Service space. Nonetheless, the data might be read by other spaces which are configured with an external data source. The default Hibernate (RDBMS) implementation implements all external data source interfaces, including the ones used by the Mirror Service space (i.e. com.gigaspaces.datasource.BulkDataPersister). You might have noticed this, since we didn't change the default data-source-class. The only thing that needs to be altered is the usage property. By default, the usage property in the persistent-space-schema.xml is set to read-write. This means that the space members can also perform write operations into the data source.
<external-data-source> ... <!-- data source usage mode - options - read-write,read-only --> <!-- default is read-write--> <usage>read-only</usage> ... </external-data-source> The cluster schema needs to be configured to use an external data source which, when dealing with a Mirror, is central to the cluster. <cache-loader> <external-data-source>true</external-data-source> <central-data-source>true</central-data-source> </cache-loader> Here is a schematic flow of how a Mirror Service space asynchronously receives data to persist into an external data source, while the cluster is synchronously reading data directly from it.
The topology was loaded with the following command lines: gsInstance "/./mirror-service?schema=mirror" gsInstance "/./mySpace?schema=persistent&properties=datasource&cluster_schema=sync_replicated&total_members=3&id=1&mirror" gsInstance "/./mySpace?schema=persistent&properties=datasource&cluster_schema=sync_replicated&total_members=3&id=2&mirror" gsInstance "/./mySpace?schema=persistent&properties=datasource&cluster_schema=sync_replicated&total_members=3&id=3&mirror"
Partitioning Over Central Mirror Data SourceWhen partitioning data, each partition asynchronously replicates data into the Mirror Service. Each partition can read back data that belongs to it (according to the load-balancing policy defined). Here is a schematic flow of how two partitions (each a primary-backup pair) asynchronously interact with an external data source:
The topology was loaded with the following command lines: gsInstance "/./mirror-service?schema=mirror" gsInstance "/./mySpace?schema=persistent&properties=datasource&cluster_schema=partitioned-sync2backup&total_members=2,1&id=1&mirror" gsInstance "/./mySpace?schema=persistent&properties=datasource&cluster_schema=partitioned-sync2backup&total_members=2,1&id=1&backup_id=1&mirror" gsInstance "/./mySpace?schema=persistent&properties=datasource&cluster_schema=partitioned-sync2backup&total_members=2,1&id=2&mirror" gsInstance "/./mySpace?schema=persistent&properties=datasource&cluster_schema=partitioned-sync2backup&total_members=2,1&id=2&backup_id=1&mirror"
Achieving ReliabilitySince asynchronous replication is not reliable by definition, achieving reliability requires at least one reliable member to sync with. Reliability is compromised when data is not asynchronously replicated due to an unexpected termination of a replicating source member. Therefore, reliability can be achieved only in the presence of other synchronous replicated spaces. As long as there is a synchronous member around, it asynchronously replicates data (and losses) into the Mirror Service space.
ConfigurationAll synchronous cluster schemas contain a reliable tag as part of their async-replication block, which is set to true. The default value is false - which has been applied in all asynchronous cluster schemas. <repl-policy> <replication-mode>sync</replication-mode> <recovery>true</recovery> ... <async-replication> ... <reliable>true</reliable> </async-replication> </repl-policy>
Recovery from Data LossWhen joining a Mirror Service to a synchronous cluster, e.g., primary_backup-cluster-schema.xml, the backups serve as the reliable counterpart from which a Mirror Service space can obtain data in the absence of the initial primary space. If the reliable tag is set to false, and the primary is terminated prior to the replication of data into the Mirror Service space, data loss is evident. The new elected primary (a former backup) does not replicate data received via replication channels.
Reliability ensures that the backup (once it becomes primary) retransmits replicated data into the Mirror Service space. This 'playback' consists of replicated data for which the backup did not receive acknowledgments. As long as the topology is up and running, acknowledgments are sent from the Mirror Service space to the primary, which forwards them to the backup. If at any time, the primary terminates, the backup continues to replicate, beginning from the last acknowledgment sent, thus ensuring reliability. Usage Examples
Considerations
Known Issues
TroubleshootingLog MessagesThe external data source logging level can be modified as part of the <GigaSpaces Root>\config\gs_logging.properties file. By default, it is set to java.util.logging.Level.INFO: com.gigaspaces.persistent.level = INFO Logging is divided according to java.util.logging.Level as follows:
Configuration messages when loading the default Mirror Service at a CONFIG level: > gsInstance "/./mirror-service?schema=mirror&groups=mygroup" ... CONFIG [com.gigaspaces.persistent]: schema-xml configuration: <external-data-source> <data-source-class>com.gigaspaces.datasource.hibernate.HibernateDataSource</data-source-class> <data-class>class java.lang.Object</data-class> <supports-inheritance>true</supports-inheritance> <supports-version>false</supports-version> <usage>read-write</usage> </external-data-source> cluster-xml configuration: <cache-loader> <external-data-source>false</external-data-source> <central-data-source>false</central-data-source> </cache-loader> INFO messages displayed when loading one of the replication group members connected to a Mirror Service: > gsInstance "/./mySpace?cluster_schema=sync_replicated&total_members=2&id=1&mirror&groups=mygroup" ... INFO [com.gigaspaces.core.cluster.replication]: Mirror Service Connector : Started Source space : mySpace_container1:mySpace Mirror URL : jini://*/mirror-service_container/mirror-service?groups=mygroup BulkSize : 100 IntervalMillis : 2000 IntervalOpers : 100 ... Replicator: Connection established with target space [ source space: mySpace_container1:mySpace ] [ target space: mirror-service_container:mirror-service ; target space url: jini://*/mirror-service_container/mirror-service?groups=mygroup&timeout=5000&state=started ] ... INFO messages displayed when the Mirror Service establishes a connection with a replication group member: INFO [com.gigaspaces.core.cluster.replication]:
Joined new [mySpace_container2:mySpace] member to the mirror-service_container:mirror-service Mirror Service.
FINER messages displayed when the Mirror Service receives an asynchronous bulk to persist: FINER [com.gigaspaces.persistent]: ENTRY [
BulkDataItem<Op: WRITE, IGSEntry<com.j_spaces.examples.datasource.Person,
UID: -1989577544^39^0^0^0, Fields: firstName: first-0, id: 0, lastName: last-0, >>,
BulkDataItem<Op: WRITE, IGSEntry<com.j_spaces.examples.datasource.Person,
UID: -1989577544^39^1^0^0, Fields: firstName: first-1, id: 1, lastName: last-1, >>,
...
BulkDataItem<Op: UPDATE, IGSEntry<com.j_spaces.examples.datasource.Person,
UID: -1989577544^39^7^0^0, Fields: firstName: first-7 update, id: 7, lastName: last-7 update, >>,
BulkDataItem<Op: REMOVE, IGSEntry<com.j_spaces.examples.datasource.Person,
UID: -1989577544^39^0^0^0, Fields: firstName: null, id: 0, lastName: null, >>
...
]
Handling FailoverThis section describes how GigaSpaces mirror service handles different failure scenarios. The following table lists the services involved, and how the failure is handled in the cluster. Active services are green, while failed services are red.
Unlikely Failure ScenariosThe following failure scenarios are highly unlikely. However, it might be useful to understand how such scenarios are handled by GigaSpaces. This is detailed in the table below. Active services are green, while failed services are red.
|
(works on Firefox 2 and Internet Explorer 7)
See how failure scenarios are handled by the Mirror Service 



