|
Summary: A replication group defines replication between spaces in the cluster.
Overview | An Efficient Mechanism | Destructive Operations are Smoothly Replicated | Replication Granularity | Recovery from a Replication Partner | Replicate Notify Template and Trigger Notify Template Options | Switching Off Replication for Specific Operation(s) | Replication Filter | Replicate Original State | How Replication Works When Using Transactions | Entry Lease Cluster Support | Synchronous vs. Asynchronous Replication | Asynchronous Replication | Synchronous Replication
OverviewReplication is the process of duplicating or copying application data and operations from a source space to a target space for a variety of purposes such as: space load-balancing, synchronization of multiple spaces located in different locations, or the active backup of spaces. It is crucial to replicate application data and operations since grid environments' deployment involves multiple clusters of workstations which need to share the same data. An Efficient MechanismThe replication mechanism's default design is an efficient one. Any changes that do not need propagation to the target space are not propagated. For example, if an Entry is written and removed inside the same transaction, none of these operations are replicated. You can modify such behavior by enabling the repl-original-state option. Destructive Operations are Smoothly ReplicatedWhen client performs a destructive operation on a space - such as, write, take, notify, or transaction commit - a replicator thread at the source space is triggered, and handles all the work required to copy the operations and associated Entries into the target space(s). Replication GranularityBy default, all data and operations are replicated to all target spaces that belong to the same replication group (symmetrical replication). You can define replication granularity to start at the space level and end up at the Entry level. Below are the different replication granularity options:
Replication Groups Determine Replication PolicyThere can be several replication groups in a cluster, each space can belong to one group only. The replication group determines the replication policy, which has the following main attributes:
Replication Transmission MatrixWhen a replication group is defined, by default, each member of the group replicates all changes to all other members. For example, consider a replication group of four members: SP1, SP2, SP3 and SP4. Below is the replication diagram:
<cluster-config> <groups> <group> <group-name>replicationMatrixExample</group-name> <group-members> <member> <member-name>container1:sp1</member-name> <repl-transmission-policy> <disable-transmission>false</disable-transmission> <target-member>container2:sp2</target-member> <replication-mode>sync</replication-mode> <communication-mode>multicast</communication-mode> </repl-transmission-policy> </member> <member> <member-name>container2:sp2</member-name> <repl-transmission-policy> <disable-transmission>false</disable-transmission> <target-member>container1:sp1</target-member> <sync-on-commit>false</sync-on-commit> <replication-mode>async</replication-mode> <communication-mode>unicast</communication-mode> </repl-transmission-policy> </member> </group-members> </group> </groups> </cluster-config> The Space Browser allows you to construct a Replication Transmission Matrix using the Cluster Wizard:
Recovery from a Replication PartnerThe recovery process is initiated when a space is started. Transient spaces in a replication group can be configured to pull their content from another space in the group after they initialize. This is achieved by setting the recovery property to true in the replication group settings. Persistent spaces do not pull all of the source space's data, but only the changes that took place in the time they were offline (the source space maintains a redo log for every target space). Recovery involves two types of data to recover: Entries and notify registrations. Entry recovery involved two phases: a snapshot phase and completion phase. During the snapshot phase, all source space Entries are sent to the target space in batches. The completion phase plays back the accumulated operations conducted during the Entries recovery phase. During the Entries recovery phase, the space logs each batch replication event. Once the recovery process is complete, a full report including the total amount of recovered Entries and notify registrations and their class types is logged. During the recovery phase, the source space is available and the target space is unavailable to clients. The recovery property can specify that the space should restore its content from one of the members in its replication group, or from a specific space (the "recovery source"). If there is no available source space to recover from, an error is logged and recovery is not performed. Important notes:
Recovery ProcessWhen a primary space identifies that its replica backup space is not available, is starts to accumulate the operations in its redo-log. The redo-log Entry includes:
Accumulating primary space operations in the redo log due to replication channel disconnection, or due to replica space shutdown or total failure are handled in a similar manner. This will be changed in future versions. When the primary space identifies that its backup replica space becomes available (replication connection is reestablished), it:
With in-memory spaces, the redo-log is essentially relevant only for the duration of the recovery time.
Replicate Notify Template and Trigger Notify Template OptionsThe replication group includes the ability to replicate notification registration and ability to trigger events that are result of replication and not direct client operation to the source space. Here are the system behaviors when using these options:
Switching Off Replication for Specific Operation(s)When using a replicated topology, you can define that an operation(s) be performed only on the master space, instead of being replicated to all cluster members. Meaning, you can actually "switch off" replication for a certain operation(s). This can be done in two main ways:
GigaSpaces 6.0.2 allows you to control replication at the operation level without having to write code. This is done by adding the <permitted-operations> tag to your replication schema. Inside this tag, specify a comma-separated list of the operations you want to be replicated to all spaces. The available operations (in the form they should appear in your schema) are as follows:
For example: <repl-policy> ... <permitted-operations>write, extend_lease, expire_lease, notify_registration</permitted-operations> ... </repl-policy>
Accordingly, in the example above, the take and update operations are not replicated to all spaces, while the write, extend_lease, expire_lease, and notify_registration operations are replicated to all spaces.
<repl-policy> ... <permitted-operations></permitted-operations> ... </repl-policy> no operations are replicated (all operations are performed only on the master space). To summarize, you have the following options:
It is also possible to specify for a certain operation to be replicated in a distributed manner. This is described in the section below. Distributed ReplicationIn GigaSpaces, replicated operations are usually performed first on the master space, and only then passed to the rest of the members. This ensures consistency. Distributed replication allows you to define that an operation is performed on all replicated spaces at once. A typical use-case of this feature is with the take (clear) operation. Usually, the take operation is performed first on the master space, then a list of UIDs is passed to the replica spaces, and the objects are taken from these. When working with large amounts of objects (hundreds of thousands or millions), there might be a need for the replication to be faster. With distributed replication, the objects are taken from all replicated spaces at once.
ConfigurationTo enable distributed replication, exclude the operation you want to perform from the permitted-operations list, and enable broadcast mode (using <broadcast-condition>unconditional</broadcast-condition>, see below). For example, if you want to perform the take operation in a distributed manner, configure the following: <repl-policy> ... <permitted-operations>write, notify</permitted-operations> ... </repl-policy> <load-bal-policy> ... <take> <policy-type>hash-based</policy-type> <broadcast-condition>unconditional</broadcast-condition> </take> ... </load-bal-policy>
Replication FilterYou can call your own business logic whenever the data is replicated. For example, you can modify the Entries' data, compress/decompress, or block specific operations and Entries from being replicated to other spaces (this can also be done using configuration, see above). To do so, implement the IReplicationFilter interface (com.j_spaces.core.cluster.IReplicationFilter; see Javadoc). Your business logic is called whenever the replication packet leaves the source space (output event) and arrives at the target space (input event).
Replicate Original StateWhenever you run in asynchronous replication mode and perform write and update operations, the target space might get the latest state of the entry, both for the write and the update operations. In order to replicate the entry original state with every operation, not just the latest one, turn on the <repl-original-state> option.
Cluster configuration file example: <repl-policy> <repl-original-state>true</repl-original-state> </repl-policy> How Replication Works When Using TransactionsWhenever you use space operations together with transactions, the operations are replicated to the target space only when the commit() method is called. The client gets an acknowledgement regarding the commit operation after the target space gets the transaction data and is committed. Entry Lease Cluster SupportEntryinformation is replicated across all relevant space cluster nodes. When renewing or cancelling the Entry lease, the operation is also replicated into all relevant cluster nodes. If the cancel/renew operation is performed on a primary space that has already failed, or fails at the exact time the operation is being performed, the system (the same as with any other operation) passes immediately to the backup spaces, and cancels/renews the lease there.
Synchronous vs. Asynchronous ReplicationThe GigaSpaces cluster provides synchronous and asynchronous replication schemes.
Can Synchronous Replication And Asynchronous Replication Be Mixed?Yes. You can create a cluster that mixes synchronous and asynchronous replication between a source space and multiple target spaces – each source/target pair can have a different replication mode. You might define such a configuration when there is a need to provide different levels of quality of service. For example, there might be cases where some of the target space data is less sensitive and does not need to have 100% coherency with source space – such as data used for reporting and monitoring. In such a case, asynchronous replication is sufficient. In contrast, whenever the target space is used for the runtime environment or with an active-active configuration, synchronous replication is required. An example might be disaster recovery sites which need to have full coherency with the operational site in order to ensure 100% recoverability in a failover situation. Asynchronous ReplicationHow it Works with Multiple Concurrent Users Accessing the Space | Reliable Multicast - Multicast Vs. Unicast Options | How Asynchronous Replication Works | Sync-On-Commit vs. Synchronous-Replication | Replication/Recovery Conflict Resolution
How it Works with Multiple Concurrent Users Accessing the SpaceGigaSpaces synchronous replication supports concurrent clients. Whenever multiple clients access the same space and synchronous replication uses unicast, data will be replicated to all target spaces simultaneously through separate threads, each replication channel using a different thread. The threads are taken from a dedicated thread pool whose minimum and maximum size can be configured. Reliable Multicast – Multicast Vs. Unicast OptionsIn unicast communication, where one sender sends a message to one receiver, there are both UDP and TCP message transmission modes. UDP is unreliable since packets may get lost, duplicated, or may arrive out of order, and there is a maximum packet size restriction. TCP is also unicast, but takes care of message retransmission for missing messages, weeds out duplicates as well as fragment packets that are too big, and presents messages to the application in the order in which they were sent.
How Asynchronous Replication Works
Sync-On-Commit vs. Synchronous-ReplicationThe asynchronous replication mode includes the sync-on-commit option. In some ways, this can be compared to synchronous replication. The following items compare these two options:
The cluster XML schema configuration file includes tags that define the replication parameters and policies. This file also includes parameters that define the synchronous and asynchronous parameters, as well as common parameters. Replication/Recovery Conflict ResolutionGigaspaces does not support a replication/recovery conflict resolution. This feature will be supported in future versions. When an Entry is updated by different cluster nodes at the same time (might happen with asynchronous replication mode), the last replication update event overrides the existing version, resetting the Entry's version number to ZERO.
Synchronous ReplicationWhen to Use Synchronous Replication? | How Synchronous Replication Works | Splitting Large Batches | Synchronous Replication and the Replication Filter | Should Synchronous Replication Be Used For Master/Backup Replication?
When to Use Synchronous Replication?Synchronous replication is most beneficial in the following scenarios: How Synchronous Replication Works
If the timeout (todo queue), that is responsible for getting missing packets from the target space has been elapsed, the target space requests missing packets from the source space's redo log. When running in sync-rec-ack mode, the source space receives the acknowledgement from the network layer for the replication packet arrival to the target space. Replication packet processing confirmation sent from target space to source space via heartbeat. If source space fails to receive heartbeat within defined timeout period – it tries to reestablish replication channel with the target space. Splitting Large Batches
When performing batch operations (writeMultiple, updateMultiple, takeMultiple) in a sync-replicated cluster, the actual data (Entries/UID) is replicated to the target spaces in one batch. Some of the target (replica) spaces might not be able to deal with large amounts of data, resulting in OutOfMemory exceptions. To solve this problem, you can split these Entries into several batches, thus providing better memory usage, stability, and scalability. For example, when performing the take (clear) operation, you don't necessarily know how many Entries exist in the space, and all of these need to be removed. In this case, if you know there should be a large amount of Entries in the space, it can be useful to take the Entires in several batches (instead of trying to take all of them at once). Even when performing writeMultiple, updateMultiple, and takeMultiple, where you can control the amount of Entries you are performing the operation on, it can still be useful to split the operation into batches (if you are working with a large amount of Entries). Splitting large batches is defined using the <multiple-opers-chunk-size> cluster schema element. This element's default value is -1. This means that by default the operation is performed on one batch of Entries. To split the Entries into batches, you must define a certain value. This value defines the number of batches. For example, when performing take and setting: <multiple-opers-chunk-size>100</multiple-opers-chunk-size> the Entries are taken in 100 different batches.
Synchronous Replication and the Replication FilterIf you employ synchronous replication in unicast mode, a dedicated replication channel is constructed for each target space. The replication filter will be called for each replication channel. If you employ multicast communication protocol, there is only one replication channel for all the target spaces. This means that the replication filter will be called only once for all the target spaces. Should Synchronous Replication Be Used For Master/Backup Replication?Synchronous replication, since it occurs simultaneously, offers a much higher rate of concurrency as well as full reliability if failover occurs. In order to ensure 100% data coherency in failover situations, synchronous replication must be defined between the master and its backup spaces. However, if you use asynchronous replication between the master and backup spaces running in in-memory mode, there might be a loss of data if the master space fails, while its redo log still has data that has to be replicated.
|
(works on Firefox 2 and Internet Explorer 7)

For a full description of the replication schema and its different options, refer to the