Summary: GigaSpaces EDG is an In-Memory Data Grid solution that supports all common distributed topologies (including replicated, partitioned and H/A), and provides unique features including continuous query, central management and a mirror service.
|
This page is specific to:
GigaSpaces 6.0
If you're interested in another version, click it below:
• GigaSpaces 6.5
|
Overview
Today's business systems must accommodate growing numbers of end-users accessing an ever-expanding set of business applications. Many IT organizations face the challenge of delivering the scalability and responsiveness needed for near-real-time applications.
Trying to solve the problem simply by adding more hardware is an expensive and often ineffective response. Hardware is almost always the brute force tool, but often a better alternative is a calibrated software solution. Distributed caching is one such solution. But while caching solutions provide scalability and data accessibility, they often struggle to meet the requirements of highly transactional applications, resulting in limited performance. In addition, Many applications require ways to utilize available memory resources, in order to boost performance.
In-Memory Data Grid solutions (IMDG; see the Wikipedia definition) allow system designers to optimize data access and increase performance by situating the data close to the business logic – wherever that application logic might be located.
The GigaSpaces Enterprise Data Grid (EDG) is an IMDG solution that provides rich caching functionality using distributed memory resources. A GigaSpaces space functions as a cache instance, and clusters of spaces allow deployment of all common cache topologies. GigaSpaces also offers unique features including SQL queries, JDBC access, continuous query, central management and a a mirror service.
But GigaSpaces EDG is much more than a high-end IMDG solution. The classic IMDG approach has serious limitations in the context of an Enterprise Grid – namely, the difficulty of achieving data awareness, and the fact that the IMDG must be coupled to a specific application, instead of functioning as a shared resource that serves all applications on the Enterprise Grid. GigaSpaces EDG uses a new approach – an SLA-Driven Data Grid – which transforms the IMDG into an enterprise-level service, while enabling data awareness with no special integration effort.
GigaSpaces Data Grid Concepts
Basic Terms
- Data Grid instance – an independent data storage unit, also called a cache. The Data Grid is comprised of all the Data Grid instances running on the network.
- Space – a distributed, shared, memory-based repository for objects. A space runs in a space container – this is usually transparent to the developer. In GigaSpaces each Data Grid instance is implemented as a space, and the Data Grid is implemented as a cluster of spaces organized in one of several predefined topologies.
- Grid Service Container – a generic container that can run one or more space instances (together with their space containers) and other services. This container is launched on each machine that participates in the Data Grid, and hosts the Data Grid instances.
- Replication – a relationship in which data is copied between two or more Data Grid instances, with the aim of having the same data in some or all them.
- Syncronous replication – replication in which applications using the Data Grid are blocked until their changes are propagated to all Data Grid instances. This guarantees that everyone sees the same data, but reduces performance.
- Asyncronous replication – replication in which changes are propagated to Data Grid instances in the background; applications do not have to wait for their changes to be propagated. Asynchronous replication does not negatively effect performance, but on the other hand, changes are not instantly available to everyone.
- Partitioning – new data or operations on data are routed to one of several Data Grid instances (partitions). Each Data Grid instance holds a subset of the data, with no overlap. Partitioning is done according to an index field in the data – operations are routed to partitions based on the value of this field.
- Topology – a specific configuration of Data Grid instances. For example, a replicated topology is a configuration in which some or all Data Grid instances replicate data between them. In GigaSpaces, Data Grid topologies are defined by cluster policies (explained in the following section).
- Reading – one way to retrieve data from the Data Grid, which will be used in this tutorial, is to call the JavaSpaces read method, supplying a read template object which specifies what needs to be read. JavaSpaces is the native API of the space.
- Notifications – GigaSpaces allows applications to be notified when changes are made to objects in the Data Grid. Applications register in advance to be notified about specific events. When these events occur, a notification is triggered on the application, which delivers the actual data that triggered the event.
Clustering Concepts
In GigaSpaces, a cluster is a grouping of several spaces running in one or more containers. For an application trying to access data, the cluster appears as one space, but in fact consists of several spaces which may be distributed across several physical machines. The spaces in the cluster are also called cluster members.
A cluster group is a logical collection of cluster members, which defines how these members interact. The only way to define relationships between clustered spaces in GigaSpaces, is to add them to a group and define policies. A cluster can contain several, possibly overlapping groups, each of which defines some relations between some cluster members – this provides much flexibility in cluster configuration.
A GigaSpaces cluster group can have one or more of the following policies:
- Replication Policy – defines replication between two or more spaces in the cluster, and replication options such as synchronous/asynchronous and replication direction.
- Load Balancing Policy – because user requests are submitted to the entire cluster, there is a need to distribute the requests between cluster members. The load balancing policy defines an algorithm according to which requests are routed to different members. For example, in a replicated topology, requests are divided evenly between cluster members; in a partitioned topology they are routed according to the partitioning key.
- Failover Policy – defines what happens when a cluster member fails. Operations on the cluster member can be transparently routed to another member in the group, or to another cluster group.
A cluster schema is an XML file which defines a cluster – the cluster name, which spaces are included in the cluster, which groups are defined on them, and which policies are defined for each group. GigaSpaces provides predefined cluster schemas for all common cluster topologies. Each topology is a certain combination of replication, load balancing and failover policies.
Data Grid Topologies
GigaSpaces EDG has common clustering topologies built-in, and allows you to implement advanced custom topologies using a cluster schema configuration file. The following table details the most common topologies built into the product; for the rest, see All Data Grid Topologies.
| Topology |
Description |
Common Use |
Options |
Replicated
• View diagram
• Read more |
Two or more space instances with replication between them. |
Allowing two or more applications to work with their own dedicated data store, while working on the same data as the other applications. |
- Replication can be synchronous (slower but guarantees consistency) or asynchronous (fast but less reliable, as it does not guarantee identical content).
- Space instances can run within the application (embedded – allows faster read access) or as a separate process (remote – allows multiple applications to use the space, easier management).
|
Partitioned
• View diagram
• Read more |
Data and operations are split between two spaces (partitions) according to an index field defined in the data. An algorithm, defined in the Load-Balancing Policy, maps values of the index field to specific partitions. |
Allows the In-Memory Data Grid to hold a large volume of data, even if it is larger than the memory of a single machine, by splitting the data into several partitions. |
- Several routing algorithms to chose from.
- With/without backup space for each partition.
|
Master-Local
• View diagram
• Read more |
Each application has a lightweight, embedded cache, which is initially empty. The first time data is read, it is loaded from a master cache to the local cache (lazy load); the next time the same data is read, it is loaded quickly from the local cache. Later on data is either updated from the master or evicted from the cache. |
Boosting read performance for frequently used data. A useful rule of thumb is to use a local cache when over 80% of all operations are read operations. |
- The master cache can be clustered in any of the other topologies: replicated, partitioned, etc.
|
Local-View
• View diagram
• Read more |
Similar to master-local, except that data is pushed to the local cache. The application defines a filter, using a JavaSpaces read template or an SQL query, and data matching the filter is streamed to the cache from the master cache. |
Achieving maximal read performance for a predetermined subset of data. |
- The master cache can be clustered in any of the other topologies: replicated, partitioned, etc.
|
Beyond the Basics: An SLA-Driven Data Grid
GigaSpaces EDG introduces the notion of a container managed by Service-Level Agreements (SLAs), or an SLA-Driven Container, a generic hosting environment for Data Grid instances. GigaSpaces IMDG instances can run independently (either standalone or deployed by the Enterprise Grid), but can also run inside SLA-Driven Containers that lend them unique capabilities.
The defining characteristic of the SLA-Driven Container is that it is sensitive to the available hardware resources. Unlike regular IMDG instances, the SLA-Driven Container can provide more services if memory and CPU resources are plentiful, or shut down services when resources are scarce. Specific IMDG topologies can have predefined service levels, and the IMDG deploys sufficient instances on machines with sufficient resources, to ensure that the desired service levels are reached.
A key advantage of this approach is the ease of maintaining high-availability. If one of the IMDG instances fails, it is automatically relocated to an available container. The state of the instance is recovered implicitly before the relocated instance becomes available, ensuring that the application accessing the data continues working without interruption.
SLA-Driven containers are defined and deployed using Open Spaces, the Spring-based development environment provided in GigaSpaces 6.X.
EDG Unique Features
| Feature |
Benefits |
| Extended and Standard Query based on SQL, and ability to connect to the IMDG using standard JDBC connectors. |
- Makes the IMDG accessible to standard reporting tools.
- Makes accessing the IMDG identical accessing a JDBC-compatible database, reducing the learning curve.
|
| SQL-based continuous query support. |
Brings relevant data close to the local memory of the relevant application instance. |
| Central management, monitoring and control. |
Allows the entire IMDG to be controlled and viewed from an administrator\'s console. |
| Mirror Service - transparent persistence of data from the entire IMDG to a legacy database or other data source. |
Allows seamless integration with existing reporting and back-office systems. |
| Real-time event notification - application instances can selectively subscribe to specific events. |
Provides capabilities usually provided by messaging systems, including slow-consumer support, FIFO Support, batching, pub/sub, and content-based routing.. |
Enabling Data Awareness on the Enterprise Grid
GigaSpaces can enable data awareness on the Enterprise Grid in two ways, each relevant to a different operational scenario:
| Scenario |
Method of Providing Data Awareness |
| IMDG instances deployed directly by the Enterprise Grid (without SLA-Driven Containers). |
Integration using affinity keys - the Enterprise Grid and users submitting tasks share special keys that identify the relevant data to each task. This way, the Enterprise Grid can execute tasks on the same machine as the relevant data. |
| SLA-Driven Containers are launched by the Enterprise Grid (each container launches the relevant IMDG instances). |
Provides data awareness implicitly - data-intensive procedures can run in the SLA-Driven Container, together with the IMDG instances. As the container itself is data aware, data affinity can be guaranteed, without making the Enterprise Grid itself data aware. |
 | The first method uses GigaSpaces as a typical IMDG – without SLA-Driven Containers – which means it is more complex and relies more heavily on the Enterprise Grid. The second method is much more elegant, requiring no integration effort and almost no involvement on the part of the Enterprise Grid. |
EDG as a Shared Enterprise Grid Resource
Because EDG is an SLA-Driven IMDG, it can function as a shared resource for the entire Enterprise Grid. This has two big advantages:
- Lower total cost of ownership (TCO) – installation, testing, configuration, maintenance and administration of the IMDG is performed centrally for all the applications on the Grid.
- Improved resource utilization – with the IMDG as a shared resource, memory and CPUs available to the IMDG instances can be shared among different applications, depending on their current data loads. It is also much easier to scale the IMDG to respond to changing data needs.
GigaSpaces meets all the unique requirements for an Enterprise IMDG:
- Sensitivity to demand and available resources – as their name implies, the GigaSpaces SLA-Driven Containers are keenly aware of the underlying memory and CPU resources, and can dynamically change the IMDG topology to meet the service levels required by each application.
- Multi-tenancy – the GigaSpaces SLA-Driven Containers can host several IMDG topologies at the same time, routing applications transparently to the relevant instance.
- Hot failover – if one container fails, its IMDG instances are relocated to another container, and applications are transparently routed to the new container with no downtime.
- Versioning with no downtime – GigaSpaces uses a downloadable codebase, with a separate classloader for each application. Thus, change management becomes as simple as updating classes in the predefined codebase server - the new code is automatically loaded by all relevant IMDG instances.
- Configuration changes with no downtime – configuration changes are made at the container level, without affecting the IMDG instances, which can be mobilized between containers.
- Schema evolution with no downtime – in the GigaSpaces IMDG instances, meta-data is stored separately from the data itself, so schema changes do not affect the data. In addition, GigaSpaces supports class inheritance, so that the data structure can be augmented without affecting existing classes.
- Isolation of IMDG groups – the SLA-Driven Containers are clustered, allowing IMDG instances to be dynamically allocated into virtual groups across the Enterprise Grid.
- Isolation of IMDG instances – applications do not communicate directly with the IMDG instances; rather, they connect to a container and are transparently routed to the relevant IMDG instance within the container.
- Isolation at the data level – GigaSpaces IMDG instances can allow or block access to specific objects based on users and roles.
- Content-based security – GigaSpaces provides role-based security, allowing administrators to define fine-grained permissions, even for specific classes. Only authenticated users with permissions to access a specific class can read objects in that class from the IMDG. GigaSpaces also offers encryption of data read from and written to the IMDG instances.
- Explicit control over IMDG instances – the GigaSpaces administration UI shows exactly which IMDG instances are running on which containers, and allows administrators to relocate instance between containers at the click of a button.
- Integration with existing systems – the GigaSpaces Mirror Service allows data from the entire IMDG to be automatically persisted to an external database; the Focal Server uses the JMX standard to notify external systems about the status and attributes of IMDG components. Put together, these features allow both back-office systems and administrative tools to integrate easily with the IMDG using a single point of contact.
Further Reading
For further reading, see the Help Portal for XAP EDG Users.