Memory Management Facilities

Search XAP 7.0
Searching XAP 7.0.X Documentation
Browse XAP 7.0
Offline Documentation

Download latest offline documentation in HTML format:
xap-7.0.2-documentation.zip (12.3MB)

                                                              

Overview

The Memory Management facility is used to assist the client avoiding situation where a space server will not get into an Out Of Memory failure scenario. Based on the configured cache policy the memory manager protect the space (and the application in case running in collocated mode) from consuming memory beyond the defined threshold.

The client/Application is expected to have some business logic handling when a com.j_spaces.core.MemoryShortageException or org.openspaces.core.SpaceMemoryShortageException are thrown by the GigaSpaces memory manager. Without such business logic the space server or a client local cache may eventually exhaust all their parent process available memory resources.
Most of the considerations described in this topic are also relevant for the client application when running a Local Cache that is running in LRU Cache policy by Default.

The space's memory can be managed with the following facilities:

  • Using eviction policies: You can set the policy to ALL IN CACHE or LRU (Least Recently Used).
  • Using the Memory Manager: the memory manager provides options for controlling the space memory utilization and allows you to define thresholds when for situations where the memory becomes over utilized.
    The space includes a dedicated thread that is responsible for clearing expired objects - the lease manager. For more details, refer to the Lease Manager section.

Cache Eviction Policies

The space supports two cache eviction policies (0 - LRU POLICY, 1 - ALL IN CACHE ) defined via the following property:

space-config.engine.cache_policy
  • All IN CACHE (1) - the space uses only the available physical memory. When running in a persistent space mode and having the External Data Source defined, the space data is backed with the underlying database but the overall capacity of the space does not exceed the capacity of the available physical memory.
    When using the All IN CACHE, the cache size parameter is ignored.
  • Least Recently Used (0) - the space evicts the "oldest" objects from its memory. "Oldest" objects are determined by the time they were written, updated or read in the space. In a persistent space mode, evicting a space object means that a space object would simply be removed from the space memory but would still be available through the underlying RDBMS. The space reloads this object back into the space memory only if it was requested by a specific read operation.

The space memory manager using a dedicated thread called Evictor - this thread handles the eviction of objects and identifying memory shortage event. In general, Eviction can be done using:

  • Max amount of space objects - evicts objects one by one. Not using batches. Very moderate mechanism. Turned on by default when running in LRU mode.
  • Available memory - eviction done in batches.

Evicting an object from the space requires the space engine to lock the LRU chain during the object removal and to update the relevant indexes. This means the eviction based on Available memory that is done in batches, might impact the space responsiveness to client requests. Still, you might need to use this in case you can't estimate the amount of objects within the space.

Defining the Cache Size

When a persistent space (having External Data Source or JDBC Storage Adapter used) running in LRU cache policy mode and the space has been started/deployed, it loads data from the underlying durable data source before being available for clients to access. The default behavior is to load data up to 50% of the space-config.engine.cache_size value.

When the space-config.engine.memory_usage is true (evicting data from the space based on free heap size), is it recommended to have a large value for the space-config.engine.cache_size property. This instructs the space engine to ignore the amount of space objects when launching the eviction mechanism. This ensures that the eviction is based only on heap size free memory.

The combination of the above (large space-config.engine.cache_size and space restart) may lead to out of memory problems. To avoid this, configure the space-config.engine.initial_load to have a low value (5 below means 5% of the space-config.engine.cache_size - default is 50%):

space-config.engine.initial_load=5

The space-config.engine.initial_load_class property can be used to specify which class(s) to load its data.

Monitoring the Memory Manager's Activity

You can monitor the memory manager activity by moving the com.gigaspaces.core.memorymanager logging entry to ALL.
It will display log entries when evicting objects (start, during and when completing the eviction cycle) and when waiting for incoming activities.

How the LRU Eviction Works?

The LRU eviction has 2 eviction strategies:
1. Based on maximum amount of objects within the space - provides VERY deterministic behavior of the garbage collection and memory used and space responsively. With reasonable client request rate this would provide very constant behavior without client hiccups when memory is reclaimed by the JVM. This is running by default when having LRU cache policy. In order to turn it off you should have very large number for the cache size property.

This strategy checks the amount of space objects and evicts the relevant object. One object is evicted when reaching max amount of objects. This eviction routine is called when:

  • Writing new object into the space
  • Transaction is committed or roll-backed.

2. Based on the amount of available memory the JVM hosting the space has - When using this strategy, you should perform some tuning to provide deterministic behavior. This strategy is turned on when the space-config.engine.memory_usage.enabled value is true. This strategy is very complex to use when having multiple spaces running within the same JVM.

The Eviction Flow

The LRU eviction based on amount of available memory performs the following:

  • Check used memory. If not breached the space-config.engine.memory_usage.high_watermark_percentage exit. If yes, starts the eviction cycle:

    Start eviction loop

    1. Evicts a batch - this release objects from the space.
    2. Objects evicted? If no - exit eviction loop.
    3. Wait for the JVM to start garbage collection to reclaim the released memory. As much as more objects will be evicted in one batch, it will take more time to eclaim the memory. This wait time is configured using the space-config.engine.memory_usage.retry_yield_time parameter. This step makes sure the eviction cycle will not evict too many objects. This problem manifest itself when the Check used memory phase is called where the memory of the evicted objects has not been reclaimed yet, causing the JVM to return wrong result for the used memory.
    4. Check used memory. See below the exact calculation performed.
    5. If the amount of memory used has breached the low watermark percentage then exit eviction loop.
    6. Increase eviction counter by one value.
    7. If the eviction counter value is larger than space-config.engine.memory_usage.retry_count, throw MemoryShortageException
      End eviction loop
  • If amount of used memory is above the space-config.engine.memory_usage.high_watermark_percentage (for non-Write operation) or space-config.engine.memory_usage.write_only_block_percentage (for write operation) - throw MemoryShortageException.

The used memory rate calculated via:

Used_memory_rate = (Runtime.totalMemory() - Runtime.freeMemory() * 100.0) / Runtime.maxMemory()

The Memory Manager

The space-config.engine.memory_usage properties provides options for controlling the space memory utilization and allows you to evict objects from the space. Objects are evicted when the number of cached objects reaches its maximum size or the memory usage reaches its limit.
These are the default parameters given for memory the usage. They should be in the following order:

high_watermark_percentage >= write_only_block_percentage >= write_only_check_percentage >= low_watermark_percentage

space-config.engine.cache_policy=0
space-config.engine.cache_size=5000000
space-config.engine.memory_usage.enabled=true
space-config.engine.memory_usage.high_watermark_percentage=95
space-config.engine.memory_usage.write_only_block_percentage=85
space-config.engine.memory_usage.write_only_check_percentage=76
space-config.engine.memory_usage.low_watermark_percentage=75
space-config.engine.memory_usage.eviction_batch_size=500
space-config.engine.memory_usage.retry_count=5
space-config.engine.memory_usage.explicit-gc=false
space-config.engine.memory_usage.retry_yield_time=2000

The space-config.engine.memory_usage.enabled default value is true in GigaSpaces version 6.5 and onwards.

SpaceMemoryShortageException

The org.openspaces.core.SpaceMemoryShortageException (it wraps com.j_spaces.core.MemoryShortageException) is thrown when:

  • For non write-type operations - There are no more space objects to evict and the used amount of memory is above the space-config.engine.memory_usage.high_watermark_percentage.
  • For a write-type operation - There are no more space objects to evict and the used amount of memory is between space-config.engine.memory_usage.write_only_block_percentage and high_watermark_percentage.

The org.openspaces.core.SpaceMemoryShortageException or com.j_spaces.core.MemoryShortageException includes information about:

  • Space host name
  • Space container name
  • Space name
  • Total available memory
  • Total used memory

Here is an example for the org.openspaces.core.SpaceMemoryShortageException message:

org.openspaces.core.SpaceMemoryShortageException at: host: MachineHostName, container: mySpace_container1_1, space mySpace, 
total memory: 1820 mb, used memory: 1283 mb

If a client running a local cache and the local cache can't evict its data fast enough or somehow there is no available memory for the local cache to function the following will be thrown:

org.openspaces.core.SpaceMemoryShortageException: Memory shortage at: host: MachineHostName, 
container: mySpace_container_container1, space mySpace_container_DCache, total memory: 1527 mb, 
used memory: 1497 mb
Note the _DCache prefix* is part of the space name it indicates the exception is thrown from the client local cache. In such a case you should increase the space-config.engine.memory_usage.retry_count to a larger number. See more details at the Moving into Production Checklist page.

space-config.engine.memory_usage.explicit-gc

The memory manger has very delicate feature calld the explicit-gc. When enabled, the space performs an explicit GC call before checking how much memory is used. When turned on - this will block clients from accessing the space during the GC activity. This can cause a domino affect, resulting unneeded failover or client total hang. The problem would be sever with clustered environment where both primary and backup space JVM calling GC explicitly in the same time, holding back the primary from both serving the client and sending operations to the backup.

With a small value for the space-config.engine.memory_usage.retry_yield_time or when the space-config.engine.memory_usage.explicit-gc is turned off (false as a value), the space might evict most of its data once the space-config.engine.memory_usage.write_only_block_percentage or the space-config.engine.memory_usage.high_watermark_percentage is breached.

This happens since the JVM hosting the space might not perform garbage collection immediately between each eviction cycle, resulting the memory usage to remain intact, causing another evict cycle to be called.

When using the space-config.engine.memory_usage.explicit-gc option:

  • Make sure -XX:+DisableExplicitGC isn't set.
  • Adding -XX:+ExplicitGCInvokesConcurrent might help to reduce the impact of the System.gc() calls.
  • System.gc() is called before calculating available memory.

Calculating available memory is performed when the following operations are called:

  • abort
  • changeReplicationState
  • clear
  • commit
  • count
  • getReplicationStatus
  • getRuntimeInfo
  • getSpacePump
  • getTemplatesInfo
  • joinReplicationGroup
  • leaveReplicationGroup
  • notify
  • prepare
  • prepareAndCommit
  • read
  • readMultiple
  • replace
  • spaceCopy
  • update
  • updateMultiple
  • write

Garbage Collection Behavior and Space Response Time Tango

In general, when the JVM Garbage Collection (GC) is called, there is a chance clients accessing the space will be affected.
If the JVM is not using incremental GC mode (i.e. regular behavior), the GC will have the famous chain saw behavior. Rapid memory reclaim of the recent evicted/referenced objects. This means a quick garbage collection with potentially having delays at the client side or phantom OOME in case the JVM has not managed to evict fast enough.

See below regular GC behavior when eviction is going on (based on available memory) and new objects are written into the space:

The Incremental GC behavior will have more moderate activity with on going garbage collection without the risk missing a garbage collection and getting OOME - see below behavior when eviction is going on (based on available memory) and new objects are written into the space:

When the LRU eviction is based on maximum amount of objects the memory utilization graph would look like this - very small amplitude.

This behavior achieved since the memory manager evicts objects one by one from the space and not in batches. So the amount of work the JVM garbage collector needs to perform is relatively small. This also does not affect the clients communicating with the space and provide very deterministic response time - i.e. very small chance for client hiccup.

If you can estimate the amount of objects the space will hold and use the eviction based on maximum objects within the space, this would allow you to eliminate hiccups and provide very deterministic and constant response time.

Memory Manager Activity when initializing the space

In this phase of the space life cycle, the space checks for the amount of available memory. This is relevant when the space perform warm start such as ExternalDataSource.initialLoad() or persistent space using SA with RDBMS or embedded H2 database.

Memory Manager and Transient Objects

Transient Objects are specified using the @SpaceClass (persist=false) decoration. You may specify transient decoration at the class or object level (field method level decoration).
When using transient objects, note that they are:

  • Included in the free heap size calculation.
  • Included in the count of total objects (for max cache size).
  • Not evicted when running in LRU cache policy mode.
You may use the transient object option to prevent the space from evicting objects you are not interested to be removed from the space when running in LRU cache policy mode.

How can I Get Deterministic Behavior During Eviction Objects?

In order to have a deterministic behavior of the memory manager when evicting objects based on amount of free memory in such a way that it will:

  • Won't evict too many objects
  • Will not consume too much time when reclaiming released objects memory
  • Will have minimum impact of client response time

You should:

  • Have small eviction batch size - A very good rule of the thumb should be: the amount of new objects added to the space per second * 2. For example: if the clients adding 1000 new objects to the space per sec , and we have 2 partitions the batch size should be 1000.
  • Have sensible time allowing GC to reclaim the evicted objects - A very good rule of the thumb should be 2 seconds for 1000 objects for 5K object size. Needless to say the CPU speed has an affect here. The recommendation here is good for 2 MHz Intel CPU.
  • Limit the amount of objects within the space using the space-config.engine.cache_size parameter - This will make sure the space will not miss garbage collection. Have some reasonable number here as protection mechanism.
  • Have small amplitude between the high and low watermark percentage - Remember that with 2G heap size, every 1% percent means 20M of memory. Reclaiming such amount of memory takes 1-2 seconds.

Here are good settings for a JVM with 2G heap size with 5K object size. With the following settings eviction will happen once the JVM will consume more than 1.4 G:

space-config.engine.cache_policy=0
space-config.engine.cache_size=200000
space-config.engine.memory_usage.enabled=true
space-config.engine.memory_usage.high_watermark_percentage=70
space-config.engine.memory_usage.write_only_block_percentage=68
space-config.engine.memory_usage.write_only_check_percentage=65
space-config.engine.memory_usage.low_watermark_percentage=60
space-config.engine.memory_usage.eviction_batch_size=2000
space-config.engine.memory_usage.retry_count=100
space-config.engine.memory_usage.explicit-gc=false
space-config.engine.memory_usage.retry_yield_time=4000

Here are the Java arguments (using incremental GC) to use for the JVM running the Space/GSC:

-Xmx2g -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:ParallelGCThreads=8 -XX:+UseParNewGC 
-XX:+CMSIncrementalPacing -XX:MaxGCPauseMillis=1000

When having small amount of objects within the space (less than 50,000) with relatively large size (100K and above) and you are running with LRU cache policy you should:

  • Have small value for the space-config.engine.memory_usage.eviction_batch_size. A value of 10 will be a good number.
  • Have relatively large value for the space-config.engine.memory_usage.retry_yield_time. A value of 200 (ms) will be a good number.

Memory Manager Parameters

Property Description Default value
space-config.engine.cache_size Defines the maximum size of the space cache. This is total amount of space objects across all space class instances within a single space. This paramter is ignored when running in ALL_IN_CACHE cache policy. 100000
space-config.engine.memory_usage.high_watermark_percentage Specifies a maximum threshold for memory use. If the space container's memory usage exceeds this threshold, a com.j_spaces.core.MemoryShortageException is thrown. 95
space-config.engine.memory_usage.low_watermark_percentage Specifies the recommended lower threshold for the JVM heap size that should be occupied by the space container. When the system reaches the high_watermark_percentage, it evicts objects on an LRU basis, and attempts to reach this low_watermark_percentage. This process continues until there are no more objects to be evicted, or memory use reaches the low_watermark_percentage. 75
space-config.engine.memory_usage.eviction_batch_size Specifies the amount of objects to evict each time. This option is relevant only in LRU cache management policy. 500
space-config.engine.memory_usage.write_only_block_percentage Specifies a lower threshold for blocking write-type operations. Above this level only read/take operations are allowed. 85
space-config.engine.memory_usage.write_only_check_percentage Specifies an upper threshold for checking only write-type operations. Above this level all operations are checked. 76
space-config.engine.memory_usage.retry_count Number of retries to lower the memory level below the Low_watermark_percentage. If after all retries the memory level is still above space-config.engine.memory_usage.write_only_block_percentage, a com.j_spaces.core.MemoryShortageException is thrown for that write request. 5
space-config.engine.memory_usage.explicit-gc If true, the garbage collector is called explicitly before trying to evict.

When using the LRU cache policy, space-config.engine.memory_usage.explicit-gc=false means that the garbage collector might evict less objects than the defined minimum (low watermark percentage). This tag is false by default because setting the garbage collector explicitly consumes a large amount of CPU, thus effecting performance. Therefore, it is recommended to define true only if you want to ensure that the minimum amount of objects are evicted from the space (and not less than the minimum).
false
space-config.engine.memory_usage.retry_yield_time Time (in milliseconds) to wait after evicting a batch of objects and before measuring the current memory utilization. 50

A com.j_spaces.core.MemoryShortageException or org.openspaces.core.SpaceMemoryShortageException are only thrown when the JVM garbage collection and the eviction mechanism do not evict enough memory. This can happen if the space-config.engine.memory_usage.low_watermark_percentage value is too high.

When a persistent space (using External Data Source , or using the JDBC Storage Adapter) running in LRU Cache policy mode is started, it loads data from the underlying durable data source before being available for clients to access. The default behavior is to load data up to 50% of the space-config.engine.cache_size amount of objects.

When space-config.engine.memory_usage.enabled=true (evicting data from the space is based also on free heap size), you might want to have large value for the space-config.engine.cache_size property. This essentially instructs the space engine to ignore the amount of space objects when trigering the eviction mechanism. This ensures that the eviction is based only on JVM heap size free memory.

The combination of the above (large space-config.engine.cache_size and space restart) might lead to out-of-memory problems when the space starts. To avoid this problem, configure space-config.engine.initial_load to a low value:space-config.engine.initial_load=5

Exceeding Physical Memory Capacity

The overall space capacity is not necessarily limited to the capacity of its physical memory. Currently there are two options for exceeding this limit:

  • Using an LRU and External Data Source - In this mode, all the space data is kept in the database and therefore the space capacity is dependent on the database capacity rather than the memory capacity. The space would maintain in memory a partial image of the persistent view in an LRU basis.
  • Using Partitioned Space - In this mode, the space utilizes the physical memory of multiple JVMs. This means the application using the space would be able to access all the space instances transparently as if they were a single space with higher memory capacity.
IMPORTANT: This is an old version of GigaSpaces XAP. Click here for the latest version.

Labels

 
(None)