Clusters Over WAN

  GigaSpaces 5.X

Documentation Home
Quick Start Guide
Release Notes

Previous release

  Search Here
Searching GigaSpaces Platform 5.X Documentation

                                               

Overview

Enterprise wide applications require the sharing of data across remote geographical sites. Several instances of the application need to share data, while the communication bandwidth and speed between the different sites is relatively slow. In many cases, due to security and limited bandwidth, each site requires only a portion of the local site data to be replicated to other remote sites. In such cases, data should be filtered before being replicated to other remote sites based on user business logic.

Support for Wide-Area Network (WAN) data distribution is addressed by the following components:

  • Asynchronous Replication – GigaSpaces allows batching Space operations and replicating them based on time intervals or amount of operations. The space includes a redo log mechanism that logs operations and entries in an efficient manner and performs a handshake between every batch replication and the relevant target Spaces. The batch asynchronous replication allows the source space to continue and serve local site clients without impacting their latency. In addition, network utilization can be optimized to replicate batches and non sparse scattered data.
  • Space Workers – GigaSpaces includes an invocation mechanism for user-defined business logic. GigaSpaces manages the user?s business logic within the Space memory address, and allows the business logic to interact with remote sites implicitly without impacting local site performance.
  • Multiple Unicast-Based Service Lookups – by default, GigaSpaces distributed Lookup Service discovers using multicast without the need to specify a specific host machine. In WAN based environments, multicast might be blocked across the different sites. To resolve this problem, GigaSpaces allows defining multiple machines that host the Lookup Service, and the discovery of these lookups is done via a unicast protocol.
  • Mirroring – the GigaSpaces Mirror Service allows local site nodes to replicate their operations in a reliable asynchronous mode into a central service whose sole responsibility is to act as the gateway to remote sites. With the Mirror Service users may perform optimizations, data conversions and transformations before moving the data to remotes sites.
  • Data Replication Granularity – WAN environments create the need to define fine-grained replication – sometimes at the object level. GigaSpaces supports space, class and object level (based on content) replication granularity that is decoupled from the actual application business logic and done via external configuration.
  • Data Bulk Load – in WAN environments there is a need to pre-load the different nodes prior to their availability for public use. GigaSpaces includes a built-in mechanism to pre-load spaces before they are accessible to clients from external resources.
  • Local Cache/View (Hub & Spoke) – in WAN environments, you might want to form smart dynamic caches on the client-side that are decoupled from the backend spaces, and store only data relevant to the application instance. This client-cache runs in the client memory address and is updated using streaming techniques with a push/pull model. The local cache can be loaded on-demand or pre-loaded using standard SQL.
  • Automatic Data Recovery – GigaSpaces provides a data and event notification registration recovery mechanism, allowing restarted spaces to pre-load data from a partner space.
  • Originator Identification – with GigaSpaces, the Entry's Unique ID (UID) includes, by default, information about the original hosting space. This allows applications on a WAN to block cyclical data replication, avoiding a resonance effect.
  • Custom Entry UID Generation – many systems have their own unique ID generated internally by an application-level ID generator. You may use this application-level ID as the basis for GigaSpaces Entry UID and manipulate the data using this UID.
  • Loosely-Coupled Architecture – because data is stored in memory within logical units (spaces) you can construct loosely-coupled architectures, having data distributed without physical relationships in terms of data complexity, machine host names or IP or actual space location.
  • System-Level Events and Fault-Detection – GigaSpaces "nervous system" – the Service Grid – provides system-level events, allowing external applications and internal components to be aware of the status of the various distributed components, such as their deployment, active mode or failure – and act upon receiving this information.
  • Intuitive Provisioning via a GUI – GigaSpaces provides an extensive enterprise-level monitoring, provisioning, management and administration GUI tool. Using one management console the user can view and query the status of the deployed services, the In-Memory Data Grid, and space in real time.

This section illustrates several patterns you can apply to construct a WAN-based cluster.

Replication Over WAN – One Cluster Using Different Groups

Replication Over WAN is archived by using one cluster configuration with multiple groups: two load-balancing groups, and one replication group using the replication matrix. With this topology, each partition at each site has a replica space. Each partition directly replicates its data and operations into his partner.

The <GigaSpacesEE ROOT>\examples\Advanced\Data_Grid\multi-site includes cluster configuration using the multiple groups approach.

Multi-Cluster Architecture over WAN using IWorker

The IWorker implementation is used to synchronize operations conducted at one local site with operations conducted in another local site. In this way, you may have different clusters with a different number of partitions at each site. Every site runs a Gateway IWorker implementation that is notified with any changes occurring in each local site partition. The IWorker registers for notifications once the space hoting it started. The IWorker will receive notifications only if the space is in active mode.

The Gateway Worker delegates each operation conducted at the local site into the remote site. Since the Gateway Worker acts as a regular client when accessing the remote site, every space operation is distributed into the remote site partitions according to the remote site load-balancing policy.
When running in a partitioned-backup topology, the IWorker implementation will perform operations only at the active space which ensures consistency.
In a case of a failure with an active space at the remote site, the IWorker implementation will route the operations to the relevant backup space.
With this approach, the IWorker implementation can register for notifications on specific space events, specific classes and even specific Entries. These are synchronized with the remote site.

For more details about implementing the IWorker interface, refer to the Space Worker Options section.

Multi-Cluster Architecture Over the WAN using Mirror Service

An advanced approach you can take when constructing a WAN-based environment is to use embedded spaces running together as part of a mirror to provide an enhanced level of high-availability to the data passed across the different sites, and also better network utilization.

With this approach, the mirror writes its incoming data from the local site spaces into an embedded space. The embedded space is part of a different cluster and has a peer in each remote site. In this way, data is replicated in an asynchronous manner across the WAN. If preferred, the embedded space can use the persistent schema running in LRU cache policy mode, allowing the embedded space to evict data from its cache and allowing an unlimited amount of data storage without exposing the VM to out-of-memory problem.

Two remote sites are shown below, A and B using a 2-way cluster mirror gateway. The gateway cluster acts as one replication channel. With this approach, an IWorker implementation is not used and there are no proxy operations conducted directly into the remote site – i.e. there is no entity at site A that is using site B's clustered proxy to interact directly with spaces at site B over the WAN. Local site operations are distributed into the remote site via a remote agent – this is the gateway cluster. Each gateway cluster node runs within the same JVM as the local site mirror – i.e. the gateway cluster node runs as an embedded space in the mirror implementation.

Once the data has been replicated from Site A's gateway cluster node to Site B's gateway cluster node, the gateway cluster node at site B fans it out into the relevant Site B cluster spaces via its output replication filter. The output replication filter uses Cluster B's clustered proxy – which provides transparent use of Cluster B's load-balancing and failover policies.

With this approach, there is one replication channel for all site nodes – you can have multiple partitions at site A or B – all using one replication channel. Since we have a dedicated cluster acting as this replication channel, we can control it very easily using the asynchronous replication settings. This capability is very important in WAN-based environments, because the network utilization is measured and might be expensive.

New in GigaSpaces 5.1
The Mirror Service is supported in GigaSpaces version 5.1 and onwards.

Resonance Affect

One important issue to deal with when implementing the cluster mirror gateway is the resonance affect. The resonance affect occurs when:

  1. Clients at Site A write data into Site A spaces.
  2. The mirror at site A writes data via the gateway cluster into the gateway node at site B.
  3. The gateway node at site B writes the data to Site B's spaces.
  4. Site B's spaces replicate the data to Site B's mirror.
  5. Site B's mirror replicates the data via the gateway cluster to the gateway node at site A.
  6. The gateway node at site A writes the data back to cluster A's spaces.

(SiteA --> MirrorA --> GatewayA --> GatewayB --> SiteB --> MirrorB --> GatewayB --> GatewayA --> SiteA, etc.)

To prevent the resonance effect, filter out data originating from a remote site when the mirror writes its data into its embedded gateway cluster node – i.e. Entries originated at site A and replicated from Site A to site B would not be replicated back from Site B to site A.

By default, the Entry's UID includes the originated container and space name – so you can use the Entry's UID to perform the filtering.

Mirror as Service Grid Service

The Service Grid can monitor the mirror and the embedded gateway life cycle and restart these once there is a failure. To accomplish this, you need to deploy the mirror into the Service Grid. Once the mirror is started, it constructs the embedded gateway node as part of its initialized() method. In this case, the Service Grid is unaware of the embedded gateway node.

If the mirror fails, the source space accumulates the data as part of its redo log to be sent to the mirror. There is no need to provide high-availability for the mirror (i.e replicated cluster for the mirror service). There is a handshake mechanism between the source space and the mirror. If the mirror implementation CachBulk.store() fails due to a mirror JVM failure, the source space resends the last batch to the mirror again when the mirror is restarted. Additionally, if the mirror fails, the Service Grid restarts it on some other available GSC – this allows the source space(s) to resend the last replication packet to the mirror (and through it to the remote site via the embedded gateway cluster space).

Recovery from Complete Shutdown

If there is a complete shutdown of all spaces, data must be pushed from relevant data source (database, remote site) into the local site using a helper service or a program that fetches all relevant data and writes it into the restarted site. One simple way to do this is via the CacheLoader interface implementation which is activated on startup of the spaces (see Read-Through and Write-Through Examples) – data loaded into the space via the CacheLoader would not send back into the mirror. In this case, the reasonance affect would not have occurred.

More in this Section

For more details, refer to the following sections:


Wiki Content Tree


Your Feedback Needed!

We need your help to improve this wiki site. If you have any suggestions or corrections, write to us at techw@gigaspaces.com. Please provide a link to the wiki page you are referring to.

Labels