Performance Tuning

  Search Here
Searching XAP 6.0 Documentation

                                               

Summary: Helpful recommendations for tuning GigaSpaces, boosting its performance, and improving its scalability.
Overview | Check Your Infrastructure First | Benchmark | Design Your Entries | Make proper use of indexes | Determine Cache Size | Determine Database Connection Pools | Blocking Take and Thread Consumption | Use Batch Operations | Use Transactions Cautiously | Query Optimizations | Use an Embedded Space if Possible | Use Notify Delegator to Retrieve the Entry that Triggered the Event | Use Update Instead of Take and Write | Distribute Data and User Requests among Several Spaces | Controlling Serialization | Visibility of Entries Under Transaction (Dirty Read) | Use NIO Protocol | Memory Usage Considerations | Choose Local Transactions Over Distributed Transactions | Remote GC | Using Snapshot to Reduce Object Creation and Entry Data Inspection | Max Processes and File Descriptors Limit | Tuning Java Virtual Machines

Overview

This section lists helpful recommendations for tuning the GigaSpaces, boosting its performance, and improving its scalability.

Check Your Infrastructure First

No matter which kind of optimization you perform, you cannot ignore your infrastructure. Therefore, you must verify that you have the following:

  • Sufficient physical and virtual memory
  • Sufficient disk speed
  • A tuned database
  • Sufficient CPU power to handle the load
  • Network cards configured for speed
  • A JVM with a fast JIT

Before you move any system into production we recommend you to make sure the relevant team within the organization reviewing the following:
http://natishalom.typepad.com/nati_shaloms_blog/2007/08/testability-the.html - Part I
http://natishalom.typepad.com/nati_shaloms_blog/2007/08/testability-t-1.html - Part II

The following config should be reviewed:
– Client config - local cache/view settings in case these are used
– Space Class(s) config - Index , Data types , Payload , serialization
– Space config - memory manager , delayed operations thread pool
– Cluster config - replication timeouts , replication thread pool , failover , active election , notify recovery
– GSC , GSM , LUS , TX M config - heartbeat settings , monitoring settings
– Communication protocol - LRMI thread pool, connection timeout
– JVM config - GC , memory usage , thread usage
– Network config - multicast settings
– Log files config
– Logger config

The following tests should be conducted:
– Sanity test ? run GigaSpaces out of the box benchmark against the existing topology the application is about to use
– System and load tests ? ALL JVMs memory , thread, CPU should be monitored for 24 hours to make sure there are no memory or thread leaks
– Failover/Failback tests ? primary spaces should be killed , backups should take over , new primaries should be instantiated

Benchmark

The Benchmark View provides a user interface for benchmarking the space.

For more details, refer to:

Design Your Entries

Pay attention to the size of your Entries – do you really need all this information in the space? The bigger your Entries, the longer it takes to move them around, store them to disk, and fetch them back. Consider replacing a heavyweight blob field with a simple string URL, and use it later for fetching on demand. Contact GigaSpaces support for an example of this pattern. If you are using user-defined classes for Entry fields, try efficiently implementing java.io.Externalizable with these classes. This will reduce the amount of data transferred over the network and written/read to/from the database, saving both time and space.

Make proper use of indexes

GigaSpaces includes a sophisticated built-in indexing module (regardless whether the space is persistent or not) that maintains index list data for each indexed entry class attribute. If you keep a large number of Entries of the same class in a space, consider defining one or more indexes for attributes used with template matching. Defining indexes will improve the read/take/clear operations response time significantly, up to ten to fifty times faster. Remember, indexes influence write/update operations response time, so choose your indexed fields carefully - each index has an overhead. GigaSpaces offers implicit indexing and explicit indexing.

Determine Cache Size

When using persistent space and reusing data, you must take caching into account. The cache manager caches entries for use and performs an LRU (Least Recent Use) based cleanup on the cache. When searching for an entry, the cache is searched first. Set the cache size to the number of Entries that your environment can reasonably contain as resident in virtual memory. This will prevent unnecessary queries on your database. If you want the cache size to be based on the JVM running the space you may use the memory usage options.

Determine Database Connection Pools

When using persistent space and a large number of users/threads access the space concurrently, each of them requires a database connection. Set enough connections in the connection pool so that users won't be blocked. You should calculate the number of concurrent requests the space needs to handle based on the number of users that will access the space simultaneously.

Blocking Take and Thread Consumption

When performing blocking operations – read or take with timeout >0, it is recommended to set the operation timeout for short durations (5-30 seconds), and not to FOREVER.

This allows the space's internal thread pool to balance the different requests without exhausting all pending operations thread pool.

The relevant setting as part of the space schema controls the space communication connection thread pool, and the pending operations thread pool size:

<lrmi-stub-handler> 
    <min-worker-threads>16</min-worker-threads> 
    <max-worker-threads>500</max-worker-threads> 
</lrmi-stub-handler> 
  
<engine> 
    <min_threads>4</min_threads> 
    <!--maximum threads in engine--> 
    <max_threads>64</max_threads> 
</engine>

Use Batch Operations

Batch operations (writeMultiple, readMultiple, takeMultiple, updateMultiple) perform actions on groups of Entries in one call. Instead of paying a penalty for every entry (remote call, database access, ...) you pay it only once. Try to design your hot spots around batch operations - this can drastically improve your application performance, up to ten to fifty times faster.

Use Transactions Cautiously

Each transaction has an overhead. Do not use read under a transaction if you do not have a very good reason to do so. Use non-transactional read instead. This reduces database access for persistent spaces and eliminates transaction locks. If you really need to do some operations inside a transaction, use batch operations with transactions.

Query Optimizations

When using the or logical operation together with and operations as part of your query (JDBC , JavaSpaces with SQLQuery) you can speed up the query execution by having the and conditions added to each or condition.
For example:

select uid,* from table where (A = 'X' or A = 'Y') and (B > '2000-10-1' and B < '2003-11-1')

would be executed much faster when changing it to be:

select uid,* from table where (A = 'X' and B > '2000-10-1' and B < '2003-11-1') 
or (A = 'Y' and B > '2000-10-1' and B < '2003-11-1')

Use an Embedded Space if Possible

If you access the space from a single JVM, or access it a large number of times from one JVM, use the embedded space mode. This eliminates the overhead of remote calls to the space. The slower your network compared to other resources (for example, a disk), the greater will be the noticeable improvement.

Use Notify Delegator to Retrieve the Entry that Triggered the Event

If you want to be notified of certain events and receive the entry that caused a specific event, a regular notify with take is more complex and involves at least one extra call on the space. Using Notify Delegator is easy and efficient. The call to get the entry that caused the event is a local call on the Notify Delegator object in the client JVM.

Use Update Instead of Take and Write

If you need to modify entry data, use update instead of take+ write. This saves you an extra call on the space.

Distribute Data and User Requests among Several Spaces

A single machine is always limited in the amount of data and user requests it can handle.

You can use several spaces in a cluster in order to distribute the load and partial replication that partitions the data.

See High Availability Combined with Scalability, for a relevant example.

Controlling Serialization

You can control the Entry attributes serialization mode when written/read from the space.

For more details, refer to: Controlling Serialization.

Visibility of Entries Under Transaction (Dirty Read)

The JavaSpaces specification defines the visibility of Entries for read operations as follows: a read operation performed under a null transaction can only access entries that are not write-locked by non-null transactions. In other words, entries that were written or taken by active transactions (transactions that have not been committed or rolled back) are not visible to the user performing a read operation.

Sometimes it is desirable for non-transactional read operations to have full visibility of the Entries in the space. The dirty read property, once set, enables the read/readIfExist under a null transaction and with the JavaSpace.NO_WAIT timeout parameter, to have this complete visibility.

In order to set the dirty read property, set the <dirty_read> tag to true in the configuration file.

Use NIO Protocol

If your application is running multiple threads accessing the space, you should use the NIO LRMI-stub-handler protocol-name instead of RMI. For more details, refer to the Setting Communication Protocol section.

Memory Usage Considerations

Here are several guidelines to reduce the client and space server memory footprint:

  • ClientUIDHandler and Jini UID factory produces lots of String/char[] objects at the client side. Use these very carefully.
  • In order to reduce memory consumption, you can store multiple long/integer Entry attribute values as part of a long/integer array. If you have lots of entries this will improve the server footprint.
  • Use indexes only for attributes used for matching. Make sure your space uses the -1 value for the space implicit indexing property. This will ensure that indexes will be created upon request (explicit indexing, i.e. __ getSpaceIndexedFields()) only.
  • You can use the following methods instead of extending from MetaDataEntry to set/get the entry UID. This will reduce the number of EntryInfo objects created.
    public void __setEntryUID(String inUid)
    public String __getEntryUID()
  • Make sure the statistics filter is turned off.
  • Make sure all space workers are turned off.
  • If you do not need Lease objects use the NOWriteLease=true as part of the URL.
  • Consider using object pools.
  • Sun JDK 1.5 seems to have better memory management than 1.4.2.
  • Encapsulates all non-indexed field into an inner custom class and have all primitive class (Integer,Long,..) fields as part of the inner class with primitive types (int, long).
  • Replace string entry public fields with a custom implementation, which only supports basic ascii subset (backed with byte).
  • Replaced string fields with a small number of possible (source for instance) values with enum.

Choose Local Transactions Over Distributed Transactions

Distributed Transactions perform a lengthy 2PC process. Local Transactions, in contrast, perform a 1PC process. If you are performing some operations under a transaction on a single space, use local transactions. The slower your network compared to other resources (for example, a disk), the greater will be the noticeable improvement.

For more details on transaction support, refer to the JavaSpaces Transaction Support section.

Remote GC

The sun.rmi.dgc.client.gcInterval and sun.rmi.dgc.server.gcInterval properties are set by default to 60000 milliseconds (60 seconds).

In some cases this might cause the JVM process to slow down every 60 seconds. To reduce the performance impact of redundant GC cycles, increase the interval to be an hour (3600000 milliseconds) both for the space JVM and the client JVM.

When starting the space in embedded mode or running it in remote mode using the gsInstance or gsc commands make sure you have the following system properties:
  • -Dsun.rmi.dgc.client.gcInterval=3600000
  • -Dsun.rmi.dgc.server.gcInterval=3600000

Using Snapshot to Reduce Object Creation and Entry Data Inspection

When using the same template for matching, consider using a snapshot template. A snapshot template is the result object of the JavaSpace.snapshot call. The returned result includes GigaSpaces internal representation of the template object that does not need to undergo any inspection before it is sent to the GigaSpaces server.

The snapshot returns an object you can use for subsequent matching as a template.

Max Processes and File Descriptors Limit

Linux has a Max Processes per user as well as the limit of file descriptors allowed (which relates to processes, files, sockets and threads). This feature allows you to control the number of processes an existing user on the server may be authorized to have.
To improve performance and stability you must set the limit of processes for the super-user root to be at least 8192 but note that 32k or even unlimited is also adequate:

ulimit -u unlimited
Verify you set the ulimit using the -n option e.g. ulimit -n 8192 rather than ulimit 8192. ulimit defaults to ulimit -f if no parameter is set it sets the maximum file size in 512k blocks which might cause to a fatal process crash

How do I configure the file descriptors?

In /etc/system file descriptors hard limit should be set (8192) and file descriptors soft limit should be increased from 1024 to 8192 as shown below:

set rlim_fd_max=8192
set rlim_fd_cur=8192

Edit /etc/system with root access and reboot the server. After reboot, please, run the following in the application account:
ulimit -n
It should report 8192.

To change the default value, modify the /etc/security/limits.conf file.

Modify the ulimit value when having many concurrent users accessing the space.

Tuning Java Virtual Machines

GigaSpaces, being a Java process, requires a Java virtual machine (JVM) to run. As part of configuring GigaSpaces, you can fine-tune settings that enhance system use of the JVM. A JVM provides the runtime execution environment for Java-based applications. GigaSpaces can run on JVMs from different JVM providers. When GigaSpaces starts it writes information about the JVM, including the JVM provider information, into this log file and the standard output.

Even though JVM tuning is dependent on the JVM provider, general tuning concepts apply to all JVMs.

For more details, refer to: Tuning Java Virtual Machines.


GigaSpaces 6.0 Documentation Contents (Current Page in Bold)

    Java

    C++

    .NET

    Middleware Capabilities

    Configuration and Management

Add GigaSpaces wiki search to your browser search engines!
(works on Firefox 2 and Internet Explorer 7)

Labels

 
(None)