The client had a need for multi-site replication. The topology was a peer to peer multi-site deployment, but with one site being used to run periodic reports that required data from all other sites. The fundamental reasons for using a multi-site deployment was scaling and high availability for clients. Scaling in the sense that two (or more) grids are bigger than one, and availability in the sense that one site could fail over to the other in a transparent way. The GigaSpaces WAN replication feature takes care of both of these with a simple configuration effort. By replicating user request processing state between sites, user state is always available for requests in progress as well as user access between requests.
The interesting wrinkle in this engagement was the need for reporting at one of the sites. One site was designated to run reports periodically that required data from all other sites. While GigaSpaces WAN replication guarantees delivery of data from configured spaces, it can’t guarantee network availability or performance. The challenge was providing information to the report generating site so it could be assured that all data from other sites had been replicated as of a certain time. In other words, to generate a summary report of total WAN site activity between 10am and 11am, I need to know I have all the data from all the other sites that was written within that time window.
The solution is made possible because of the in-order processing of space operations by the replication channel. It also relies on the participating systems clocks being synchronized. The simple data structure that facilitates the solution is a per-site timestamp. It has a state consisting of a conventional UNIX timestamp (Long), and an originating location ID (String). Since it is a space class, it also has a routing ID, which in this particular design, is a constant so that all timestamps are written to the same partition.
Once per second (by convention), a timestamp is written into the same space as the application transactional data. If more than one space were being replicated , a timestamp writer would need to write into each one. The idea is that the timestamps become intermingled with the replication flow of the application data, and represent a close approximation of the time the nearby objects were replicated. The actual UNIX timestamp field is set using an outbound replication filter, to provide the highest accuracy. Upon receiving a replicated timestamp object, a receiving space can be sure that changes in the source space have been replicated locally. However, there are limits to the accuracy of the correlation between timestamp and object replication. Under high loads, for example, thread scheduling between the thread that is writing the timestamps, and the threads writing the domain objects, can result in small discrepancies in timestamps. The intent of the design, however, is not for the receiver to assume, with millisecond accuracy, that all objects written before the timestamp will actually arrive before the timestamp at the destination site. In fact, depending on various tuning factors in the GigaSpaces (and the OS for that matter), a “fudge factor” is appropriate for the receiver to have high confidence that the desired replication has occurred. This is partially true because GigaSpaces does not (and cannot) guarantee order between objects in different partitions, unless they participate in a distributed transaction. The larger the fudge factor, the greater the level of confidence. To accommodate the possibility of partition failover, a factor of 1 minute or more may be appropriate depending on how GigaSpaces is configured.
WAN Replication Performance Measurement
Now that the timestamps are flowing between all sites, intermingled with the data of interest, the basic building blocks of replication performance management exist. When the WAN gateway on the receiving site writes incoming timestamps into the local space, a XAP polling container awaits (a singleton). The polling container (let’s call it a TimeWatcher) performs a continuous query on the receiving space for all time objects without the local site identifier. Recall that the local space has the time stamp written as well, and we don’t want to pick those up. The TimeWatcher polling container looks like this:
When the time watcher gets a time stamp from a remote site, it immediately calculates the WAN latency for that timestamp, and adds the timestamp along with the latency, to a TimeRecord object, of which there is one per remote site in the space. The original timestamp is “taken” from the space and discarded. The purpose of the TimeRecord class is to record WAN statistics for a particular remote site. In the case of this design, TimeRecords retain the last 10 latencies calculated for a given site.
The final pieces of the puzzle are a singleton space object called ReplicationStatus, and a simple service that polls the various TimeRecords. The ReplicationStatus represents a simple three-state description of the overall WAN status, and the oldest of the most recent timestamps from each site. By observing the state of this object, the client is able to make a simple determination of whether the WAN is performing in optimal, degraded, or down mode. The timestamp can be read to determine how up-to-date the local site’s WAN data is. More than likely, event listeners will be used to detect the global WAN state in order to update the monitoring code. For more fine-grained monitoring, event listeners may be installed on each of the TimeRecord objects. One can easily imagine a graphical representation of all WAN sites and their replication performance to each other site displayed as green, yellow, or red for example.
Another requirement of the system was to demonstrate that periodic, large-scale writes of reference data to a grid wouldn't interfere with normal transactional data replication. The system requires large updates of reference data that the system logic needs to properly process user requests/messages. Periodically, this data needs to be updated. A concern was that if this data flooded into the normal replication flow of timestamps and transactional data, it would essentially clog up the works and mess up latencies and possibly even throw false alarms. The solution is to realize that while the transactional data needs to be replicated to every site in a multi-master sense, the reference data really just needs to be "pushed" out from a single loading site. Fortunately the WAN gateway design makes the solution to this problem a simple configuration step. WAN replication is configured on a per-space basis, not a per-grid basis. This means that a site with N different spaces can run N different replication topologies. In the case of this design, the transactional data was placed in a space running multi-master replication, while the reference data was placed in a space running a master-slave topology. Both topologies run simultaneously and independently, making the concern about "clogging up the works" disappear. In one of the system tests, it was observed that normal transactional replication proceeded smoothly, even as massive amounts of reference data was dumped into the grid.
This simple design, which wouldn’t be nearly so simple on traditional platforms, demonstrates how the platform provided by XAP makes the construction of a production WAN-capable architecture relatively easy to understand and manage.