|
Search Solutions & Best Practices
Browse Solutions & Best Practices
|
Summary: Moving into Production Checklist
Sharing Infrastructure | Binding the Process into a Machine IP Address | Ports | Client LRMI Connection Pool and Server LRMI Connection Thread Pool | Lookup Locators and Groups | The Runtime Environment - GSA, LUS, GSM and GSCs | Zones | Capacity Planning | PU Packaging and CLASSPATH | JVM Tuning | Space Memory Management | Local Cache | Primaries Space Distribution | Rebalancing - Dynamic Data Redistribution | Serialization Mode | NO_RETURN_VALUE Modifier | Runtime Files Location | Log Files | Hardware Selection | OS Considerations
Author: Shay Hassidim, Deputy CTO, GigaSpaces
The following list should provide you with the main activities to be done prior moving your system into production. Reviewing this list and executing the relevant recommendations should result in a stable environment with a low probability of unexpected behavior or failures that are result of a GigaSpaces environment misconfiguration. Sharing InfrastructureThere are numerous ways allowing different systems/applications/groups to share the same pool of servers (in development or production environment) on the network. A non-exhaustive list of some of the options is delineated below: 2. Using Multiple Zones: A single GigaSpaces runtime environment spans all servers, where each group of GigaSpaces containers (across several machines) are labeled with a specific Zone. You may have multiple Zones used by different containers on the same server. For example, have on server A two containers labeled with zoneX and four containers labeled with zoneY and on server B two containers labeled with zoneX and four containers labeled with zoneY. 3. Using Multiple Lookup Groups (multicast lookup discovery): All servers running multiple GigaSpaces runtime environments, where each GigaSpaces container using a specific Lookup Group when registering with the Lookup Service. At deployment time, application services (aka processing Unit) are deployed using a specific lookup group. Use of multiple lookup group breaks logically the Infrastructure into different segments. The Lookup Group value controlled via the LOOKUPGROUPS environment variable. When using this option you should make sure multicast is enabled on all machines. 4. Using Multiple Lookup Locators (unicast lookup discovery): All servers running multiple GigaSpaces runtime environments, where each GigaSpaces container using a specific Lookup locator when registering with the Lookup Service. At deployment time, application services (aka processing Unit) are deployed using a specific lookup locator. Use of multiple lookup locators breaks logically the Infrastructure into different segments. If you have multiple lookup services running on the same server, each will use a different listening port. You may control this port using the com.sun.jini.reggie.initialUnicastDiscoveryPort system property. The Lookup Locators value controlled via the LOOKUPLOCATORS environment variable. 5. Using a shared GigaSpaces runtime environment: A single GigaSpaces runtime environment spans all servers, with no use of Zones or Lookup Groups/Locators. Application services share the servers and allocation done in a random manner without using any pre-defined segmentation. For any of the above options, GigaSpaces exposes the ability to control a deployed application service in run-time, such that new application service instances can be created or existing instances can be relocated. This tight operational control enables even more creative resource sharing possibilities. Devising the appropriate resource sharing strategy for your system should consider the breadth of operational requirements and application services' characteristics. For example, two applications with variable load may run into trouble running on a fixed-size shared environment if peak loads coincide. Binding the Process into a Machine IP AddressIn many cases, the machines that are running GigaSpaces (i.e., a GSA, GSM, or GSC), or running GigaSpaces client applications (e.g., web servers or standalone JVM/.Net/CPP processes) have multiple network cards with multiple IP addresses. To make sure that the GigaSpaces processes or the GigaSpaces client application processes bind themselves to the correct IP addresses - accessible from another machines - you should use the NIC_ADDR environment variable, or the java.rmi.server.hostname system property. Both should be set to the IP of the machine (one of them in case of a machine with multiple IP addresses). Without having this environment/property specified, in some cases, a client process is not able to be notified of events generated by the GigaSpaces runtime environment or the space. Examples: export NIC_ADDR=10.10.10.100 ./gs-agent.sh & java -Djava.rmi.server.hostname=10.10.10.100 MyApplication
PortsGigaSpaces uses TCP/IP for most of its remote operations. The following components within GigaSpaces require open ports:
Here are examples of how to set different LRMI listening ports for the GS-UI, and another set of ports for the GSA/GSC/GSM/Lookup Service: export EXT_JAVA_OPTIONS=-Dcom.gs.transport_protocol.lrmi.bind-port=7000-7500 ./gs-agent.sh & export EXT_JAVA_OPTIONS=-Dcom.gs.transport_protocol.lrmi.bind-port=8000-8100 ./gs-ui.sh & A running GSC tries to use the first free port that is not used out of the port range specified. The same port might be used for several connections (via a multiplexed protocol). If all of the port range is exhausted, an error is displayed.
Client LRMI Connection Pool and Server LRMI Connection Thread PoolThe GigaSpaces LRMI uses two independent resource pools working collaboratively allowing a client to communicate with a server in a scalable manner. The client connection pool is configured via the com.gs.transport_protocol.lrmi.max-conn-pool and a server connection thread pool is configured via the com.gs.transport_protocol.lrmi.max-threads, both should be configured on the server side as system properties. You may configure these two pools' sizes and their resource timeouts to provide maximum throughput and low latency when a client communicates with a server. The default LRMI behavior will open a different connection at the client side and start a connection thread at the server side, once a multithreaded client accesses a server component. All client connections may be shared between all the client threads when communicating with the server. All server side connection threads may be shared between all client connections. Client LRMI Connection PoolThe client LRMI connection pool is maintained per server component - i.e. by each space partition. For each space partition a client maintains a dedicated connection pool shared between all client threads accessing a specific partition. When having multiple partitions (N) hosted within the same GSC, a client may open maximum of N * com.gs.transport_protocol.lrmi.max-conn-pool connections against the GSC JVM process.
Server LRMI Connection Thread PoolThe LRMI connection thread pool is a server side component. It is in charge of executing the incoming LRMI invocations. It is a single thread pool within the JVM that executes all the invocations, from all the clients and all the replication targets.
Lookup Locators and GroupsA space (or any other service, such as a GSC or GSM) publishes (or registers/exports) itself within the Lookup Service. The lookup service acts as the system directory service. The lookup service (aka service proxy) keeps information about each service, such as its location and its exposed remote methods. Every client or service needs to discover a lookup service as part of its bootstrap process. There are 2 main options for how to discover a lookup service:
To configure the GigaSpaces runtime components (GSA,GSC,GSM,LUS) to use unicast discovery you should set the LOOKUPLOCATORS variable: export LOOKUPLOCATORS=MachineA,MachineB ./gs-agent.sh & To configure the GigaSpaces runtime components (GSA,GSC,GSM,LUS) to use multicast discovery you should set the LOOKUPGROUPS variable: export LOOKUPGROUPS=Group1,Group2 ./gs-agent.sh & When running multiple systems on the same network infrastructure, you should isolate these by having a dedicated set of lookup services (and GSC/GSM) for each system. Each should have different locators/groups settings. Space URL ExamplesSee below for examples of Space URLs you should be familiar with:
Space Configuration with Unit TestsWhen running unit tests, you might want these set up so that no remote client can access the space they are running. This includes regular clients or the GS-UI.
Here is a simple confguration you should place within your pu.xml to disable the lookup service startup, disable the space registration with the lookup service, and disable the space registration with the Rmi registry, when the space starts as a PU or running as a standalone: <os-core:space id="space" url="/./myspace" > <os-core:properties> <props> <prop key="com.j_spaces.core.container.directory_services.jini_lus.start-embedded-lus">false</prop> <prop key="com.j_spaces.core.container.directory_services.jini_lus.enabled">false</prop> <prop key="com.j_spaces.core.container.directory_services.jndi.enabled">false</prop> <prop key="com.j_spaces.core.container.embedded-services.httpd.enabled">false</prop> </props> </os-core:properties> </os-core:space> The Runtime Environment - GSA, LUS, GSM and GSCsIn a dynamic environment where you want to start GSCs and GSMs remotely, manually or dynamically, the GSA is the only component you should have running on the machine that is hosting the GigaSpaces runtime environment. This lightweight service acts as an agent and starts a GSC/GSM/LUS when needed. You should plan the initial number of GSCs and GSMs based on the application memory footprint, and the amount of processing you might need. The most basic deployment should include 2 GSMs (running on different machines), 2 Lookup services (running on different machines), and 2 GSCs (running on each machine). These host your Data-Grid or any other application components (services, web servers, Mirror) that you deploy. In general, the total amount of GSCs you are running across the machines that host the system depends on:
Configuring the Runtime Environment
Running Multiple GroupsYou may have a set of LUS/GSM managing GSCs associated to a specific group. Let's assume you would like to "break" your network into 2 groups. Here is how you should start the GigaSpaces runtime environment:
Running Multiple LocatorsYou may have a set of LUS/GSM managing GSCs associated to a specific locaator. Let's assume you would like to "break" your network into 2 groups using different lookup locators. Here is how you should start the GigaSpaces runtime environment:
The lookup service runs by default as a standalone JVM process started by the GSA. You can also embed it to run together with the GSM. In general, you should run 2 lookup services per system. Running more than 2 lookup services may cause an overhead, due to the chatting and heartbeat mechanism performed between the services and the lookup service, to signal the existence of the service. ZonesThe GigaSpaces Zone allows you to "label" a running GSC(s) before starting it. The GigaSpaces Zone should be used to isolate applications and a Data-Grid running on the same network. It has been designed to allow users to deploy a processing unit into specific set of GSCs where all these sharing the same set of LUSs and GSMs. The Zone property can be used for example to deploy your Data-Grid into a specific GSC(s) labeled with specific zone(s). The zone is specified prior to the GSC startup, and cannot be changed once the GSC has been started.
To use Zones when deploying your PU you should: export EXT_JAVA_OPTIONS=-Dcom.gs.zones=webZone ${EXT_JAVA_OPTIONS}
gs-agent gsa.gsc 2
2. Deploy the PU using the -zones option. Example: gs deploy -zones webZone myWar.war Running Multiple ZonesYou may have a set of LUS/GSM managing multiple zones (recommended) or have a separate LUS/GSM set per zone. In such a case (set of LUS/GSM managing multiple zones) you should run these in the following manner:
Note that with XAP 7.1.1 new variables provided that allows you to set different JVM arguments for GSC,GSM,LUS,GSA separately (GSA_JAVA_OPTIONS , GSC_JAVA_OPTIONS , GSM_JAVA_OPTIONS , LUS_JAVA_OPTIONS). Capacity PlanningIn order to estimate the amount of total RAM and CPU required for your application, you should take the following into consideration:
The Capacity Planning section provides a detailed explanation how to estimate the resources required. PU Packaging and CLASSPATHUser PU Application LibrariesA Processing Unit JAR file, or a Web Application WAR file should include within its lib folder, all the necessary JARs required for the application. Resource files should be placed within one of the JAR files within the PU JAR, located under the lib folder. In addition, the PU JAR should include the pu.xml within the META-INF\spring folder. Data-Grid PU LibrariesWhen deploying a Data-Grid PU, it is recommended that you include all space classes and their dependency classes as part a PU JAR file. This PU JAR file should include a pu.xml within the META-INF\spring, to include the space declarations and relevant tuning parameters. GS-UI LibrariesIt is recommended that you include all space classes and their dependency classes as part of the GS-UI CLASSAPTH . This makes sure that you can query the data via the GS-UI. To set the GS-UI classpath, set the POST_CLASSPATH variable prior to calling the GS-UI script to have the application JARs locations.
JVM TuningIn most cases, the applications using GigaSpaces are leveraging machines with very fast CPUs, where the amount of temporary objects created is relatively large for the JVM garbage collector to handle with its default settings. This means careful tuning of the JVM is very important to ensure stable and flawless behavior of the application. See below examples of JVM settings recommended for applications that might generate large number of temporary objects. In such situations you afford long pauses due to garbage collection activity.
These settings are good for cases where you are running a IMDG or when the business logic and the IMDG are collocated. For example IMDG with collocated Polling /Notify containers, Task executors or Service remoting: CMS mode - good for low latency: -server -Xms8g -Xmx8g -Xmn300m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly -XX:MaxPermSize=256m -XX:+ExplicitGCInvokesConcurrent -XX:+UseCompressedOops -XX:+CMSClassUnloadingEnabled -XX:+CMSParallelRemarkEnabled
Permanent Generation SpaceFor applications that are using relatively large amount of third party libraries (PU using large amount of jars) the default permanent generation space size may not be adequate. In such a case, you should increase the permanent generation space size and please also refer to the suggested parameters above that should be used together with the other CMS parameters (-XX:+CMSClassUnloadingEnabled). Here are a suggested values: -XX:PermSize=512m -XX:MaxPermSize=512m
See the Tuning Java Virtual Machines section and the Java SE 6 HotSpot Virtual Machine Garbage Collection Tuning for detailed JVM tuning recommendations. Space Memory ManagementThe Space supports two Memory Management modes:
When running with ALL_IN_CACHE, the memory management:
When running with ALL_IN_CACHE, you should make sure the default memory management parameters are tuned according the JVM heap size. A large heap size (over 2 G RAM) requires special attention. Here is an example of memory manager settings for a 10 G heap size: <os-core:space id="space" url="/./mySpace" > <os-core:properties> <props> <prop key="space-config.engine.memory_usage.high_watermark_percentage">95</prop> <prop key="space-config.engine.memory_usage.write_only_block_percentage">94</prop> <prop key="space-config.engine.memory_usage.write_only_check_percentage">93</prop> <prop key="space-config.engine.memory_usage.low_watermark_percentage">92</prop> </props> </os-core:properties> </os-core:space> Local CacheThe local cache is used as a client side cache that stores objects the client application reads from the space. It speeds up repeated read operations of the same object. The readById/readByIds operation has a special optimization with a local cache that speeds up the retrieval time of the object from the local cache, in the case that it is already cached. The local cache evicts objects once a threshold is met. When there is a client application with a large heap size, you might want to configure the local cache eviction parameters to control the eviction behavior: <os-core:space id="space" url="jini://*/*/mySpace" /> <os-core:local-cache id="localCacheSpace" space="space" update-mode="PULL" > <os-core:properties> <props> <prop key="space-config.engine.cache_size">5000000</prop> <prop key="space-config.engine.memory_usage.high_watermark_percentage">75</prop> <prop key="space-config.engine.memory_usage.write_only_block_percentage">73</prop> <prop key="space-config.engine.memory_usage.write_only_check_percentage">71</prop> <prop key="space-config.engine.memory_usage.low_watermark_percentage">45</prop> <prop key=" space-config.engine.memory_usage.eviction_batch_size">1000</prop> <prop key="space-config.engine.memory_usage.retry_yield_time">100</prop> <prop key="space-config.engine.memory_usage.retry_count">20</prop> </props> </os-core:properties> </os-core:local-cache> <os-core:giga-space id="gigaSpace" space="localCacheSpace"/>
Primaries Space DistributionBy default, when running GSCs on multiple machines and deploying a space with backups, GigaSpaces tries to provision primary spaces to all available GSCs across all the machines. Without setting the max-instances-per-vm and the max-instances-per-machine, GigaSpaces might provision a primary and a backup instance of the same partition into GSCs running on the same physical machine. To avoid this behavior, you should set the max-instances-per-vm=1 and the max-instances-per-machine=1. This makes sure that the primary and backup instances of the same partition are provisioned into different GSCs running on different machines. If there is one machine running GSCs and max-instances-per-machine=1, backup instances are not provisioned. Here is an example of how you should deploy a Data-Grid with 4 partitions, with a backup per partition (total of 8 spaces), where you have 2 spaces per GSC, and the primary and backup are not running on the same box (even when you have other GSCs running): gs deploy-space -cluster schema=partitioned-sync2backup total_members=4,1 -max-instances-per-vm 2 -max-instances-per-machine 1 MySpace
Rebalancing - Dynamic Data RedistributionAutomatic RebalancingGigaSpaces supports automatic discovery, rebalancing (aka Dynamic Redistribution of Data) and expansion/contraction of the IMDG while the application is running. When deploying an IMDG, the system partitions the data (and the collocated business logic) into logical partitions. You may choose the number of logical partitions or let GigaSpaces calculate this number. The logical partitions may initially run on certain containers, and later get relocated to other containers (started after the data grid has been deployed) on other machines, thus allowing the system to expand and increase its memory and CPU capacity while the application is still running. The number of logical partitions and replicas per partition should be determined at deployment time. The number of containers hosting the IMDG instances may be changed at runtime.
The component that is responsible to scale the IMDG at runtime is called the Elastic Service Manager (ESM) and it is used with the Elastic Processing Unit:
Serialization ModeWhen a client application accessing a remote space (using a clustered topology or non-clustered) the data is serialized and sent over the network to the relevant JVM hosting the target space partition. The serialization involves some overhead. The Serialization Type parameter allows you to control the serialization activities perform by GigaSpaces when non-primitive fields used with your space class. Native Serialization ModeThe default serialization mode (called also native) performs serialization of all non-primitive fields at the client side, and then de-serialize these at the space side before stored within the space. This mode is optimized for scenarios when there is a business logic colocated with the space (e.g. notify/polling container) or when having business logic that is sent to be executed within the space (e.g. Task Executor). The colocated business logic access non-primitive space object fields without going through any serialization. This speeds up any activity performed by the colocated business logic. The downside with this mode, is the relative overhead associated with remote client due-to the serialization/de-serialization involved with non-primitive space object fields. Light Serialization ModeWhen having space objects that embed large collections (e.g. List, Map data types) where there is no colocated business logic running with the space (e.g. polling/notify container colocated with the space), you should use the Light serialization type. When running with this mode, the collections within the space object are serialized at the client side but are not de-serialized at the space side before stored within the space; these are stored in their binary form. When reading the space object back into the client side, these collections are sent back into the client application without going through any serialization at the space side (as they are already stored in their binary serialized form), and de-serialized at the client side. Due-to this optimization, this mode speeds up write and read performance when the space object involves collections with relatively large amount of elements. To deploy your space using the light serialization mode your pu.xml should include the following: <os-core:space id="space" url="/./mySpace" > <os-core:properties> <props> <prop key="space-config.serialization-type">1</prop> </props> </os-core:properties> </os-core:space> More details about the different supported serialization modes can be found at the Controlling Serialization and the Externalizable Support sections. NO_RETURN_VALUE ModifierBy default the write operation returns a LeaseContext object with Lease object or the previous version of the object (via the LeaseContext.getObject()). To avoid this overhead you should use the NO_RETURN_VALUE modifier with the write operation. Once used, the write operation will have a null as a return value. This avoids the usual network traffic generated by sending the previous version of the object (update operation) or the Lease object (write operation) back into the client side. Use this option to improve application write operation performance both with remote and embedded space. Here is how you can use the NO_RETURN_VALUE modifier: gigaspace.write(employee, Lease.FOREVER, 0, UpdateModifiers.NO_RETURN_VALUE | UpdateModifiers.UPDATE_OR_WRITE); Another option to turn on the NO_RETURN_VALUE mode is to set the UpdateModifiers default mode once you get the space proxy: GigaSpace gigaspace = new GigaSpaceConfigurer(new UrlSpaceConfigurer("jini://*/*/mySpace")).gigaSpace(); gigaspace.getSpace().setUpdateModifiers(UpdateModifiers.NO_RETURN_VALUE | UpdateModifiers.UPDATE_OR_WRITE); ... gigaspace.write(employee); The writeMultiple call (batch write) support the NO_RETURN_VALUE as well: gigaspace.writeMultiple(employeesArray, Lease.FOREVER, UpdateModifiers.NO_RETURN_VALUE | UpdateModifiers.UPDATE_OR_WRITE );
Runtime Files LocationGigaSpaces generates some files while the system is running. You should change the location of the generated files location using the following system properties. See below how:
Log FilesGigaSpaces generates log files for each running component . This includes GSA, GSC, GSM, Lookup service and client side. By default, these are created within the <gigaspaces-xap-root>\logs folder. After some time you might end up with a large number of files that are hard to maintain and search. You should backup old log files or delete these. You can use the logging backup-policy to manage your log files. Hardware SelectionThe general rule when selecting the HW to run GigaSpaces would be: The faster the better. Multi-core machines with large amount of memory would be most cost effective since these will allow GigaSpaces to provide ultimate performance leveraging large JVM heap size handling simultaneous requests with minimal thread context switch overhead. Running production systems with 30G-50G heap size is doable with some JVM tuning when leveraging multi-core machines. The recommended HW is Intel® Xeon® Processor 5600 Series. Here is an example for recommended server configuration:
CPUSince most of the application activities are conducted in-memory, the CPU speed impacts your application performance fairly drastically. You might have a machine with plenty of CPU cores, but a slow CPU clock speed, which eventually slows down the application or the Data-Grid response time. So as a basic rule, pick the fastest CPU you can find. Since the Data-Grid itself and its container are highly multi-threaded components, it is important to use machines with more than a single core to host the GSC to run your Data-Grid or application. A good number for the amount of GSCs per machine is half of the total number of cores. DiskPrior to XAP 7.1, GigaSpaces Data-Grid did not overflow to a disk, and does not require a large disk space to operate. Still, log files are generated, and for these you need at least 100M of free disk size per machine running GSC(s). Make sure you delete old log files or move them to some backup location. XAP Data-Grid may overflow data to disk when there is a long replication disconnection or delay, the location of the work directory should be on a local storage at each node in order to make this replication back log data always available to the node, this storage should have enough space to store the replication back log as explained in Controlling the Replication Redo Log page. OS ConsiderationsIn general, GigaSpaces runs on every OS supporting the JVM technology (Windows, Linux, Solaris, AIX, HP, etc). No special OS tuning is required for most of the applications. See below for OS tuning recommendations that most of the applications running on GigaSpaces might need. File DescriptorsThe GigaSpaces LRMI layer opens network connections dynamically. With large scale applications or with clients that are running a large number of threads accessing the Data-Grid, you might end up having a large number of file descriptors used. The Linux OS by default has a relatively small number of file descriptors available (1024). You should make sure that your standalone clients, or GSM/GSC/LUS using a user account, have their maximum open file descriptors configured to a high value. A good value is 65536. Setting the max open file descriptors is done via the following call: ulimit -n 65536 |
Additional resources: XAP Application Server | XAP Data Grid | XAP for Cloud Computing | XAP J2EE Support



