Replication Group - Cluster Schema

  Search Here
Searching XAP 6.0 Documentation

                                               

Overview

Replication is the process of duplicating or copying application data and operations from a source space to a target space for a variety of purposes such as: space load-balancing, synchronization of multiple spaces located in different locations, or the active backup of spaces. It is crucial to replicate application data and operations since grid environments' deployment involves multiple clusters of workstations which need to share the same data.
In order to perform the application data and operations replication, a special replicator component runs inside the engine of each replicated space. This component synchronizes the space synchronously or asynchronously with other spaces that belong to the same replication group in the cluster.

An Efficient Mechanism

The replication mechanism's default design is an efficient one. Any changes that do not need propagation to the target space are not propagated. For example, if an Entry is written and removed inside the same transaction, none of these operations are replicated. You can modify such behavior by enabling the repl-original-state option.
The replicator components find all its replica spaces spontaneously and reconnect automatically in case target space discovered.

Destructive Operations are Smoothly Replicated

When client performs a destructive operation on a space - such as, write, take, notify, or transaction commit - a replicator thread at the source space is triggered, and handles all the work required to copy the operations and associated Entries into the target space(s).

Replication Granularity

By default, all data and operations are replicated to all target spaces that belong to the same replication group (symmetrical replication). You can define replication granularity to start at the space level and end up at the Entry level. Below are the different replication granularity options:

  • Space level replication - using the replication transmission matrix you can setup asymmetrical replication relationships where only specific operations are replicated from a source space to specific target space(s).
  • Class level replication - using the partial replication option and the IReplicatable tag interface (com.j_spaces.core.client.Ireplicatable; see Javadoc), these replicate only specific class instances.
  • Entry Level replication - using configuration, you can block specific Entries from being replicated.

Replication Groups Determine Replication Policy

There can be several replication groups in a cluster, each space can belong to one group only. The replication group determines the replication policy, which has the following main attributes:

  • Full or partial replication – in full replication mode, every Entry written to the space is replicated to all other space members in the replication group. In partial replication mode, only Entries that implement the IReplicatable tag interface (no methods to implement) are replicated to other spaces in the replication group. This can improve the performance.
  • Replication to all or some spaces – all-to-all replication (fully symmetric replication) might be non-relevant in large clusters, and is often not necessary. You can define selective replication using the replication transmission matrix.
  • Notifications – Whether or not notification templates are propagated, and whether or not replicated Entries should trigger notifications on the target space.
  • Replication mode – common, synchronous or asynchronous
Since all of these attributes have usable defaults, the replication mechanism is extremely easy to use, yet flexible and powerful.

Replication Transmission Matrix

When a replication group is defined, by default, each member of the group replicates all changes to all other members. For example, consider a replication group of four members: SP1, SP2, SP3 and SP4. Below is the replication diagram:

Sometimes a finer replication definition is needed - some spaces do not need to replicate all operations and data to other spaces. For example, assume that SP1 is an agency, connected to the agent computers SP2, SP3, and SP4. SP1 will replicate and receive replications from SP2, SP3 and SP4, but SP2, SP3 and SP4 will not replicate data and operations between them.

To provide this functionality, define a transmission matrix over the replication group. For each pair (source and target space) define the following:
Whether or not the pair replicates with each other.

  • Which operations are allowed to replicate - write (write a new Entry/extend lease of existing Entry; update; take (take, delete an Entry, lease cancellation); and notify (replicate a notify template). By default, all operations are replicated.
  • Replication mode - asynchronous or synchronous. If using synchronous mode, you can choose to use multicast or unicast as the communication protocol. If using asynchronous, you can choose to use the sync_on_commit option.
  • Recovery mode - if enabled, the space to use for recovery when started.
    Replication Filter - you can define for each source/target pair the replication filter implementation class.
    By using the transmission matrix, you can improve performance and reduce communication overheads.
    Following is an example of replication matrix definitions with two spaces where the replication from container1: sp1 node to container2: sp2 node occurs in synchronous mode with multicast, and the replication from container2: sp2 node to container1: sp1 occurs in asynchronous unicast mode.
<cluster-config>
    <groups>
       <group>
          <group-name>replicationMatrixExample</group-name>
              <group-members>
                  <member>
                      <member-name>container1:sp1</member-name>
                      <repl-transmission-policy>
                      <disable-transmission>false</disable-transmission>
                      <target-member>container2:sp2</target-member>
                      <replication-mode>sync</replication-mode>
                      <communication-mode>multicast</communication-mode>
                      </repl-transmission-policy>
                   </member>
                   <member>
                       <member-name>container2:sp2</member-name>
                       <repl-transmission-policy>
                       <disable-transmission>false</disable-transmission>
                       <target-member>container1:sp1</target-member>
                       <sync-on-commit>false</sync-on-commit>
                       <replication-mode>async</replication-mode>
                       <communication-mode>unicast</communication-mode>
                       </repl-transmission-policy>
                    </member>
                  </group-members>
             </group>
       </groups>
</cluster-config>

The Space Browser allows you to construct a Replication Transmission Matrix using the Cluster Wizard:

For details on constructing a Replication Transmission Matrix, refer to the Defining Replication Policy in Static Cluster section.

Recovery from a Replication Partner

The recovery process is initiated when a space is started. Transient spaces in a replication group can be configured to pull their content from another space in the group after they initialize. This is achieved by setting the recovery property to true in the replication group settings. Persistent spaces do not pull all of the source space's data, but only the changes that took place in the time they were offline (the source space maintains a redo log for every target space).

Recovery involves two types of data to recover: Entries and notify registrations.

Entry recovery involved two phases: a snapshot phase and completion phase.

During the snapshot phase, all source space Entries are sent to the target space in batches. The completion phase plays back the accumulated operations conducted during the Entries recovery phase.

During the Entries recovery phase, the space logs each batch replication event. Once the recovery process is complete, a full report including the total amount of recovered Entries and notify registrations and their class types is logged.

During the recovery phase, the source space is available and the target space is unavailable to clients.

The recovery property can specify that the space should restore its content from one of the members in its replication group, or from a specific space (the "recovery source"). If there is no available source space to recover from, an error is logged and recovery is not performed.

Important notes:

  • Replication input filter events are called during recovery (into the target space).
  • Space filter events are not called during recovery.
  • The restarted space locates a space to recover from using the Jini Lookup Service – each replication group has a unique name. The source space looks for a matching space with the same replication group to recover from.
  • Partial recovery – the restarted space recovers only IReplicatable class instances (turned on only when partial replication is enabled).

Recovery Process

When a primary space identifies that its replica backup space is not available, is starts to accumulate the operations in its redo-log.

The redo-log Entry includes:

  • 3 long data types
  • 3 int data types
  • Class name (string data type)
  • UID (string data type)

Accumulating primary space operations in the redo log due to replication channel disconnection, or due to replica space shutdown or total failure are handled in a similar manner. This will be changed in future versions.

When the primary space identifies that its backup replica space becomes available (replication connection is reestablished), it:

  • Clears its redo-log – with persistent spaces, this phase does not occur if the backup has data in its database.
  • The primary space sends a full image of its content to the backup (memory recovery). With persistent spaces, this phase does not occur if the backup has data in its database.
  • The primary accumulates incoming operations in its redo log and sends these to the backup.
  • The replica space includes a mechanism that ensures no duplicate operations are conducted.

With in-memory spaces, the redo-log is essentially relevant only for the duration of the recovery time.

This means that in some cases, you can safely limit the redo-log capacity since memory recovery time is relatively small, and there should not be many operations in the redo log during this time period. Over the LAN, recovery can replicate 10K/second redo log Entries with a size of 1K.

Replicate Notify Template and Trigger Notify Template Options

The replication group includes the ability to replicate notification registration and ability to trigger events that are result of replication and not direct client operation to the source space.

Here are the system behaviors when using these options:

Replicate Notify Template Setting Trigger Notify Template Setting Explanation
True False Client gets notification from the master space while it is active after registration.
If failover has been configured, it gets notification from replica space when master space fails.
False True Client gets notification only from those spaces that it registered for notification.
Notification occurs when data has been delivered to the space, either by a client application or from the replication.
True True Client gets notification from all cluster spaces after registration.
Client gets multiple notifications for every space event.
False False Client gets notification only from those spaces to which it registered.
Client does not get notifications from spaces which received their data by replication.
Replicate notify templates and trigger notify templates are orthogonal. However, if you enable them both, you should be aware that for each Entry that matches the notify template and is replicated to another space you get an event. This may result in more events than you initially intended. You can use the source of the event to check which space triggered it.
Notify Recovery
When the <notify-recovery> element is true, it tries to recover the notify template from one of the replication group members. If it doesn't succeed, it then tries to recover notify templates from the other cluster members.

Switching Off Replication for Specific Operation(s)

When using a replicated topology, you can define that an operation(s) be performed only on the master space, instead of being replicated to all cluster members. Meaning, you can actually "switch off" replication for a certain operation(s). This can be done in two main ways:

  • Using replication filters.
  • Using configuration – supported in GigaSpaces version 6.0.2 and onwards.

GigaSpaces 6.0.2 allows you to control replication at the operation level without having to write code. This is done by adding the <permitted-operations> tag to your replication schema. Inside this tag, specify a comma-separated list of the operations you want to be replicated to all spaces.

The available operations (in the form they should appear in your schema) are as follows:

  • write
  • take
  • extend_lease
  • update
  • expire_lease
  • notify_registration

In GigaSpaces replicated topology, the take and clear operations are identical. Therefore, referrals to the take operation in this section are also correct for the clear operation.

For example:

<repl-policy>
...
  <permitted-operations>write, extend_lease, expire_lease, notify_registration</permitted-operations>
...
</repl-policy>

Note that even though you want to switch off replication for a certain operation(s), the way to do this is by switching replication on for all of the other operations. The operation(s) you leave out of the list (in the permitted-operations element) is (are) not replicated.

Accordingly, in the example above, the take and update operations are not replicated to all spaces, while the write, extend_lease, expire_lease, and notify_registration operations are replicated to all spaces.

The permitted-operations element is not part of the replication schema by default. To enable it, you must add it to your schema.

If you add the permitted-operations element to your schema, but leave it empty:

<repl-policy>
...
  <permitted-operations></permitted-operations>
...
</repl-policy>

no operations are replicated (all operations are performed only on the master space).

To summarize, you have the following options:

  • Not adding the permitted-operations element to your replication schema – in this case, replication takes place as usual (all operations are replicated to all members).
  • Adding the permitted-operations element to your replication schema with the following value:
    • A comma-separated list of operations. In this case, the specified operations are replicated, while the operations that aren't specified aren't replicated.
    • No value (empty) – in this case, no operations are replicated.

It is also possible to specify for a certain operation to be replicated in a distributed manner. This is described in the section below.

Distributed Replication

In GigaSpaces, replicated operations are usually performed first on the master space, and only then passed to the rest of the members. This ensures consistency. Distributed replication allows you to define that an operation is performed on all replicated spaces at once.

A typical use-case of this feature is with the take (clear) operation. Usually, the take operation is performed first on the master space, then a list of UIDs is passed to the replica spaces, and the objects are taken from these. When working with large amounts of objects (hundreds of thousands or millions), there might be a need for the replication to be faster. With distributed replication, the objects are taken from all replicated spaces at once.

If you choose to use distributed replication, make sure no other operations are performed on the cluster while a distributed operation (in this case, take) is being performed. Otherwise, you might encounter inconsistent data in some of the members.

Configuration

To enable distributed replication, exclude the operation you want to perform from the permitted-operations list, and enable broadcast mode (using <broadcast-condition>unconditional</broadcast-condition>, see below). For example, if you want to perform the take operation in a distributed manner, configure the following:

<repl-policy>
...
  <permitted-operations>write, notify</permitted-operations>
...
</repl-policy>

<load-bal-policy>
...
   <take>
      <policy-type>hash-based</policy-type>
      <broadcast-condition>unconditional</broadcast-condition>
   </take>
...
</load-bal-policy>

As you can see in the configuration above, take is not specified as part of the permitted-operations tag. The listed operations (write, notify) are for the purpose of the example and aren't relevant in this case.

Replication Filter

You can call your own business logic whenever the data is replicated. For example, you can modify the Entries' data, compress/decompress, or block specific operations and Entries from being replicated to other spaces (this can also be done using configuration, see above). To do so, implement the IReplicationFilter interface (com.j_spaces.core.cluster.IReplicationFilter; see Javadoc). Your business logic is called whenever the replication packet leaves the source space (output event) and arrives at the target space (input event).

For more details on how to implement the replication filter, refer to the Replication Filters section.

Replicate Original State

Whenever you run in asynchronous replication mode and perform write and update operations, the target space might get the latest state of the entry, both for the write and the update operations. In order to replicate the entry original state with every operation, not just the latest one, turn on the <repl-original-state> option.
When this option is set to true, the replication mechanism maintains for every Entry its full state, together with the operation. For example, if an Entry has been written to the space and updated several times within a specific replication batch, then the replication mechanism replicates the operations in their correct order, together with the Entry's original state for each operation.
When this option is set to false, the replication mechanism replicates the latest state of the Entry to the target space. For example, if an Entry has been written to the space and updated several times within a specific replication batch, the replication mechanism replicates the operations in their correct order. However, the replication mechanism still uses the Entry with its latest state for all operations.

repl-original-state Impacts FIFO
In order to perform FIFO based operations using replicated data, make sure the repl-original-state value is true.

Cluster configuration file example:

<repl-policy>
    <repl-original-state>true</repl-original-state>  
</repl-policy>

How Replication Works When Using Transactions

Whenever you use space operations together with transactions, the operations are replicated to the target space only when the commit() method is called. The client gets an acknowledgement regarding the commit operation after the target space gets the transaction data and is committed.

Entry Lease Cluster Support

Entryinformation is replicated across all relevant space cluster nodes. When renewing or cancelling the Entry lease, the operation is also replicated into all relevant cluster nodes. If the cancel/renew operation is performed on a primary space that has already failed, or fails at the exact time the operation is being performed, the system (the same as with any other operation) passes immediately to the backup spaces, and cancels/renews the lease there.

When registering for lease expiration events and trigger-notify-templates is enabled, the client also gets notifications from the backup space.

Synchronous vs. Asynchronous Replication

The GigaSpaces cluster provides synchronous and asynchronous replication schemes.
In a synchronous replication scheme, the client receives acknowledgement for any destructive operations only after both sets of the spaces - source and target - have performed the operation.

When the target space is defined in a backup mode but it is not available the client receives acknowledgement from the active master space; but the operation on the backup space will be performed only when the source space will reestablish the connection with the backup. The master space logs all destructive operations until the backup is started. Same behavior happens when running in asynchronous replication mode.
In the asynchronous scheme, destructive operations are performed in the source space, and acknowledgement is immediately returned to the client.

Operations are accumulated in the source space and sent asynchronously to the target space after a defined period of time elapsed or a defined number of operations were performed (the first one that occurs). The downside of this scheme is the possibility of data loss if the source space fails while transferring the accumulated operations to the target space. Another problem is data coherency - the source and the target will not have identical data all the time.
See below for a full comparison between synchronous and asynchronous replication:

Aspect Synchronous Replication Asynchronous Replication
Data loss Each space operation waits until completion is confirmed at the master space as well as the backup space.
An incomplete operation is rolled back at both locations; therefore the remote copy is always an exact duplicate of the primary.
Might sometimes lose some data if there is an unplanned failover to the backup space.
In a failover situation the backup space will be available for clients only after all data from its redo log has been processed.
This might slow down the failover.
Distance Sensitive to latency, which is tied directly to distance. Highly tolerant of latency and can be used over the master and backup spaces, which are located in different geographical sites (different cities).
Performance impact Client must wait for confirmation of each space operation from the source space and target space(s).
Performance is mainly based on source space resources (CPU/memory), target space resources (CPU/memory), and the network bandwidth and latency between the source space and the target space.
Client will get acknowledged immediately after the source space processed incoming operations.
Performance is mainly based on source space resources (CPU/memory).
Data integrity Very accurate. Less accurate
Failover time Very rapid.
Backup space does not need to process redo log data.
Backup space needs to process redo log data.
The recovery time is based on the amount of operations.
Bandwidth requirements LAN WAN/LAN

Can Synchronous Replication And Asynchronous Replication Be Mixed?

Yes. You can create a cluster that mixes synchronous and asynchronous replication between a source space and multiple target spaces – each source/target pair can have a different replication mode. You might define such a configuration when there is a need to provide different levels of quality of service. For example, there might be cases where some of the target space data is less sensitive and does not need to have 100% coherency with source space – such as data used for reporting and monitoring. In such a case, asynchronous replication is sufficient. In contrast, whenever the target space is used for the runtime environment or with an active-active configuration, synchronous replication is required. An example might be disaster recovery sites which need to have full coherency with the operational site in order to ensure 100% recoverability in a failover situation.

Asynchronous Replication

How it Works with Multiple Concurrent Users Accessing the Space

GigaSpaces synchronous replication supports concurrent clients. Whenever multiple clients access the same space and synchronous replication uses unicast, data will be replicated to all target spaces simultaneously through separate threads, each replication channel using a different thread. The threads are taken from a dedicated thread pool whose minimum and maximum size can be configured.

Reliable Multicast – Multicast Vs. Unicast Options

In unicast communication, where one sender sends a message to one receiver, there are both UDP and TCP message transmission modes. UDP is unreliable since packets may get lost, duplicated, or may arrive out of order, and there is a maximum packet size restriction. TCP is also unicast, but takes care of message retransmission for missing messages, weeds out duplicates as well as fragment packets that are too big, and presents messages to the application in the order in which they were sent.
With multicast, where one sender sends a message to many receivers, IP Multicast extends UDP: a sender sends messages to a multicast address and the receivers have to join that multicast address to receive them. Like UDP, message transmission is still unreliable, and there is no notion of membership (who has currently joined the multicast address).
GigaSpaces extends reliable unicast message transmission, such as TCP, to multicast settings. As described above, GigaSpaces multicast support provides reliability and group membership on top of IP Multicast. Since every application has different reliability needs, GigaSpaces offers users a choice of their preferred communication protocol.
If the application must replicate only between two spaces, a unicast connection is preferred. Unicast utilizes a P2P (point to point) connection using TCP/IP protocols. Between two spaces, this is the most efficient communication protocol.
On the other hand, if there are multiple target spaces, multicast communication protocol optimizes the network bandwidth consumption. With multicast, replication is published to many spaces simultaneously. All target spaces, which request to receive it, may do so. Multicast implements UDP protocols at a very rapid speed.

In some cases, unicast-based replication might perform better than multicast (up to 10 replicas).

How Asynchronous Replication Works

  1. A destructive space operation is called.
  2. The source space:
    • Performs the operation.
    • Logs the operation and the relevant Entry UID into an asynchronous redo log.
    • Sends acknowledgement to the client.
  3. A replication event is triggered:
    • The replicator constructs a batch of operations in the source space.
    • Packs these into one object – a replication packet.
    • Transmits the packet into the target space.
  4. Received at the target space, operations are processed according to their order.
  5. The next batch is sent when the target space completes processing the replication packet

Sync-On-Commit vs. Synchronous-Replication

The asynchronous replication mode includes the sync-on-commit option. In some ways, this can be compared to synchronous replication. The following items compare these two options:

  • Sync-on-commit:
    • Provides a user defined "checkpoint".
    • Useful when more then one destructive operation is performed under transaction.
    • Where there are five or more operations per transaction, performance is expected to be higher with sync-on-commit vs. synchronous-replication.
  • Sync-replication:
    • Every destructive operation is replicated.
    • Consistent overhead.
    • More efficient in cases where every operation needs to be replicated, such as for backup purposes.

The cluster XML schema configuration file includes tags that define the replication parameters and policies. This file also includes parameters that define the synchronous and asynchronous parameters, as well as common parameters.
When you use the replication transmission matrix, you can define for each source/target pair the replication mode - synchronous or asynchronous. Asynchronous is the default replication mode. If you use synchronous, choose either unicast or multicast communication protocol. All source/target pairs utilize common synchronous or asynchronous parameters.

Replication/Recovery Conflict Resolution

Gigaspaces does not support a replication/recovery conflict resolution. This feature will be supported in future versions.

When an Entry is updated by different cluster nodes at the same time (might happen with asynchronous replication mode), the last replication update event overrides the existing version, resetting the Entry's version number to ZERO.

  • In case of a take or an update replication event for a non-existing Entry, a Wrong clustered space usage – UID X doesn't exist in target Y error is added into the space log file.
  • In case of a write replication event for an Entry that already exists, a Wrong clustered space usage – UID X already exists in space error is added into the space log file.
  • In case of a cancelled or renewed lease replication event for an Entry that no longer exists, a This entry might be already expired or canceled error is added into the space log file.

Synchronous Replication

When to Use Synchronous Replication?

Synchronous replication is most beneficial in the following scenarios:
Whenever an application must replicate highly sensitive, mission critical data as soon as it arrives at the source space.
Whenever any space operation must be duplicated with 100% data integrity to the target space.

How Synchronous Replication Works

  1. A destructive space operation is called.
  2. The source space:
    • Performs the operation.
    • Logs the operation and the relevant Entry UID into an asynchronous redo log.
    • Sends acknowledgement to the client.
  3. The target space receives a replication packet processed via a todo queue. The todo queue:
    • Makes sure parallel operations are processed in the correct order.
    • Verifies that there are no missing packets.
  4. With synchronous replication, packets can arrive in any order.
  5. When all missing packets arrive, the target space sends a confirmation to the source space.

If the timeout (todo queue), that is responsible for getting missing packets from the target space has been elapsed, the target space requests missing packets from the source space's redo log.

When running in sync-rec-ack mode, the source space receives the acknowledgement from the network layer for the replication packet arrival to the target space.

Replication packet processing confirmation sent from target space to source space via heartbeat. If source space fails to receive heartbeat within defined timeout period – it tries to reestablish replication channel with the target space.

Splitting Large Batches

In GigaSpaces replicated topology, the take and clear operations are identical. Therefore, referrals to the take operation in this section are also correct for the clear operation.

When performing batch operations (writeMultiple, updateMultiple, takeMultiple) in a sync-replicated cluster, the actual data (Entries/UID) is replicated to the target spaces in one batch. Some of the target (replica) spaces might not be able to deal with large amounts of data, resulting in OutOfMemory exceptions.

To solve this problem, you can split these Entries into several batches, thus providing better memory usage, stability, and scalability.

For example, when performing the take (clear) operation, you don't necessarily know how many Entries exist in the space, and all of these need to be removed. In this case, if you know there should be a large amount of Entries in the space, it can be useful to take the Entires in several batches (instead of trying to take all of them at once). Even when performing writeMultiple, updateMultiple, and takeMultiple, where you can control the amount of Entries you are performing the operation on, it can still be useful to split the operation into batches (if you are working with a large amount of Entries).

Splitting large batches is defined using the <multiple-opers-chunk-size> cluster schema element. This element's default value is -1. This means that by default the operation is performed on one batch of Entries. To split the Entries into batches, you must define a certain value. This value defines the number of batches. For example, when performing take and setting:

<multiple-opers-chunk-size>100</multiple-opers-chunk-size>

the Entries are taken in 100 different batches.

Splitting large batches isn't supported when performing Transaction.commit().

Synchronous Replication and the Replication Filter

If you employ synchronous replication in unicast mode, a dedicated replication channel is constructed for each target space. The replication filter will be called for each replication channel.

If you employ multicast communication protocol, there is only one replication channel for all the target spaces. This means that the replication filter will be called only once for all the target spaces.

Should Synchronous Replication Be Used For Master/Backup Replication?

Synchronous replication, since it occurs simultaneously, offers a much higher rate of concurrency as well as full reliability if failover occurs. In order to ensure 100% data coherency in failover situations, synchronous replication must be defined between the master and its backup spaces. However, if you use asynchronous replication between the master and backup spaces running in in-memory mode, there might be a loss of data if the master space fails, while its redo log still has data that has to be replicated.

For a full description of the replication schema and its different options, refer to the Replication Schema section.


GigaSpaces 6.0 Documentation Contents (Current Page in Bold)

    Java

    C++

    .NET

    Middleware Capabilities

    Configuration and Management

Add GigaSpaces wiki search to your browser search engines!
(works on Firefox 2 and Internet Explorer 7)

Labels

 
(None)