Cloud Data Persistency

Search CCF4XAP
Searching Cloud Computing Framework for XAP
Browse CCF4XAP

                                                              

Overview

When running your application on the cloud you may store your application data within the IMDG and also persist it into a Database for long term storage. One your data is stored within the database it can be used for recoverability after a complete shutdown, complex reporting or Data-Mining.
Once the application is writing its data into the IMDG, the data is stored as Objects in-memory and persist into the Database tables via the ORM mapping layer. By default the ORM mapping using Hibernate.

The application may store all its data within the IMDG (run in ALL_IN_CACHE Cache policy) or evict some of it, while all the data stored within the database. You may evict data from the IMDG by running in LRU Cache policy mode or have a limited lease for the object when writing it into the IMDG.

Persistency on the cloud supported via the following:
Synchronous-Persistency

Operations within the IMDG are persistent in synchronous manner into a mySQL database running on the cloud.
Asynchronous-Persistency

Operations within the IMDG are persistent in Asynchronous manner via the Mirror Service that is storing the data into a mySQL database running on the cloud.

In both cases the mySQL database configured to use EBS. You may configure MySQL connection information and relevant EBS volume-id.

How Persistency on the Cloud Works?

When a database machine is started on the cloud the following happens:

  1. Database initialization and creation commands, Database config files, mirror override and Mirror PU files are downloaded from the Application Repository into the database machine.
  2. Latest MySQL Database patches are downloaded and installed
  3. MySQL Database volume files are mounted to EBS volume
  4. MySQL Database configuration files are copied into /etc/mysql/conf.d/ and /etc/mysql/
  5. MySQL Database is started
  6. Database initialization script is called
  7. GSC started using the mirror override file
  8. Database host name is injected into the Mirror pu.xml
  9. Mirror Service deployed and provisioned into the GSC started on the database machine
  10. Database host name is injected into the IMDG pu.xml
  11. IMDG deployed

When the IMDG configured to use External Data Source the following happens:

  1. The IMDG PU configuration is injected with the database machine host name
  2. The space connects to the database server
  3. Data is loaded from the database (EDS.initialLoad is called) where each IMDG partition loading the relevant data belong to the partition. By default the irrelevant data is filtered out before it is loaded into the partition. More advanced and efficient filtering can be done by customizing the External Data source.

How to enable Persistency on the Cloud?

To enable persistency you should go through the following:

  1. The Cloud Application Deployment File should have the database_machine added with the relevant settings.
  2. Database Initialization Script and Database Creation commands should be constructed and placed on the Application Cloud Repository.
  3. Database configuration file should be constructed and placed on the Application Cloud Repository
  4. Mirror Override File should be constructed and placed on the Application Cloud Repository.
  5. The Mirror pu.xml should have relevant SLA settings.
  6. The Processing Unit used to deploy the IMDG (space) should have the correct EDS configuration specified.
  7. Mirror Service Processing Unit should be constructed and placed on the Application Cloud Repository. The Mirror PU should be a separate PU jar added to your cloud deploy config processing unit list. The Mirror Service PU required when running in Asynchronous-Persistency mode.
  8. Your Class Domain classes should have the Hibernate Mapping definitions (annotations or hbm config files).
  9. Bundle relevant library ORM and database driver with your IMDG PU and Mirror PU library. See the Required libraries section for the exact list of libraries to be placed as part of the shared-lib folder of the IMDG PU and Mirror PU.

The machine host name running the database server will be injected during the deployment time via the deployment process. To allow the deployment process to inject the database machine host name you should use the $mirror-url as part of the database connection URL. This would be used as part of the space processing unit pu.xml and the mirror pu.xml.

Cloud Application Deployment File Settings

To enable the Persistency, the Cloud application deployment file should include the following settings:

Tag Name Default value Description
<is-active> false. boolean value Enable the Mirror Service on the cloud
<volume-id>   EBS volume-id. Optional
<snapshot-id>   Optional
<override-file-name>   Optional
<database-name>   Database name to connect
<url>   Database connection URL
<size> 10 Size in Gigabytes of the new volume. Optional
<zone> us-east-1a EC2 zone
<database-user-name>   Database user name
<database-password>   Database user password
<init-file-name>   Database initialization file. Optional

Database Initialization Script

The Database Initialization file mirror-init-file.sh calls the database tables creation commands. It should have the following commands:

The mirror-init-file.sh
#!/bin/bash

/etc/init.d/mysql restart
# Creating database
mysql < /home/gsadmin/createDB.sql
/etc/init.d/mysql restart

Database Creation Commands

This Command script used to create database schema tables (i.e. createDB.sql).
Make sure the database schema includes relevant user permission commands:

GRANT ALL PRIVILEGES ON *.* TO 'user'@'%' IDENTIFIED BY 'password';
GRANT ALL PRIVILEGES ON *.* TO 'user'@'localhost' IDENTIFIED BY 'password';

These should be called by the Mirror init script.

Database Configuration file

The database configuration file includes the specific settings you would like the database server to use when started.

MySQL Database configuration file
[mysqld]
innodb_file_per_table
datadir          = /vol/lib/mysql
log_bin          = /vol/log/mysql/mysql-bin.log
max_binlog_size  = 1000M
log_slow_queries = /vol/log/mysql/mysql-slow.log
long_query_time  = 10

lower_case_table_names=1

Mirror Override File

The Mirror Override File used by the GSC running on the database machine. This override file marks this GSC with the special property that instructs the GSM to provision the Mirror Service into a specific machine and not arbitrary GSC.
Here is an example for such override file:

The mirror-override.xml
<overrides>
    <Component Name="org.jini.rio.qos">
        <Parameter Name="addPlatformCapabilities">
        <![CDATA[
        new org.jini.rio.qos.capability.PlatformCapability[] {            
            new org.jini.rio.qos.capability.software.SoftwareSupport(
                new Object[]{"Name", "Mirror"})
            }
        ]]>
        </Parameter>
    </Component>
</overrides>

The "Name" property with the above specifies the "Mirror". This matches the Mirror PU SoftwareSupport sla used with the Mirror PU.

Mirror PU

The pu.xml used to deploy the Mirror Service should include the following sla that instruct the GSM to provision the Mirror into the GSC that is using the above override:

The Mirror pu.xml
<beans 
	<os-sla:sla number-of-instances="1">
	    <os-sla:requirements>
			<os-sla:system name="SoftwareSupport">
				<os-sla:attributes>
					<entry key="Name" value="Mirror" />
				</os-sla:attributes>
			</os-sla:system>
		</os-sla:requirements>
	</os-sla:sla>
    
	<os-core:space id="space" url="/./mirror-service" schema="mirror" external-data-source="hibernateDataSource"/>
</beans>

IMDG PU General Settings

The IMDG should specify the EDS settings. These should be:

  • Cache Policy mode (ALL_IN_CACHE or LRU)
  • If LRU is used - Cache Size and Memory Usage settings
  • EDS Cluster mode

The IMDG PU General Settings should include the following:

IMDG pu.xml
<beans 
	<os-core:space id="space" url="/./space" schema="persistent" mirror="true" external-data-source="hibernateDataSource">
	    <os-core:properties>
	        <props>
	            <!-- Use ALL IN CACHE - Read Only from the database-->
	            <prop key="space-config.engine.cache_policy">1</prop>
	            <prop key="space-config.external-data-source.usage">read-only</prop>
	            <prop key="cluster-config.cache-loader.external-data-source">true</prop>
	            <prop key="cluster-config.cache-loader.central-data-source">true</prop>
	        </props>
	    </os-core:properties>
	</os-core:space>
</beans>

When running in ALL_CACHE_MODE Cache policy mode you may evict objects from the IMDG using the space object lease duration parameter. This will allow you to have a specific Time To Live (TTL) for the space objects once these are written into the IMDG. Expired objects are not removed from the database and do not trigger Mirror operation.

Mirror and IMDG PU External Data Source Settings

The Mirror and the IMDG (Space) PU configuration file (pu.xml) should use the following variables as part of the Database connections settings:
– data-source-url
– data-source-username
– data-source-password

These variable values will be injected during the deployment time.

See below example how these variables should be used as part of the Mirror and the IMDG pu.xml:

pu.xml with Database connection settings
<beans>
    <bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close">
        <property name="driverClassName" value="${data-source-driver}"/>
        <property name="url" value="${data-source-url}"/>
        <property name="username" value="${data-source-username}"/>
        <property name="password" value="${data-source-password}"/>
        <property name="maxActive" value="100"/>
        <property name="initialSize" value="10"/>
    </bean>

	<bean id="sessionFactory" class="org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean">
        <property name="dataSource" ref="dataSource"/>
        <property name="annotatedClasses">
            <list>
                <value>com.example.Class1</value>
                <value>com.example.Class2</value>
            </list>
        </property>
        <property name="hibernateProperties">
            <props>
                <prop key="hibernate.dialect">org.hibernate.dialect.MySQLDialect</prop>                
                <prop key="hibernate.cache.provider_class">org.hibernate.cache.NoCacheProvider</prop>
                <prop key="hibernate.cache.use_second_level_cache">false</prop>
                <prop key="hibernate.cache.use_query_cache">false</prop>
                <prop key="hibernate.hbm2ddl.auto">update</prop>
            </props>
        </property>
    </bean>	
        <bean id="hibernateDataSource" class="org.openspaces.persistency.hibernate.StatelessHibernateExternalDataSource">
        <property name="sessionFactory" ref="sessionFactory"/>
    </bean>
</beans>

Snapshoting

New In Cloud Tools 2.3.5
Disk Persistency on the cloud is based on the snapshot concept. Once a machine is attached into an EBS device, its volume data is cleared once the machine is shutdown. You must copy the machine EBS volume data into S3 (i.e. snapshot) in order you could recover the data once the machine will be restarted. This means that once the machine is started and attached into the EBS volume you should copy back the latest snapshot data into the file system volume.

The above is relevant for the database machine profile allowing the database server to have its files recovered prior having the database processes been started.

The database machine supports two initialization options:

  • No snapshot Usage – In this case no snapshot ID should be provided. The database initialization script will be called allowing the user to create relevant database tables and insert some bootstrap data in the database tables.
  • With snapshot Usage – in this case the latest snapshot ID should be provided. It will be copied back into the volume before the database processes is started.

When to Snapshot your data?

  • In general you should periodically snapshot your volume data using some scheduled crone job. This involves suspending disk activities done by the OS and database for a short time.
  • You should also call the snapshot script before the machine shutdown.
  • The current cloud CLI includes the ec2-snapshot script that should be used to perform the snapshot activity.
  • The latest Snapshot ID is preserved as part of the Amazon account info.

In the future we will provide enhanced snapshot management that will handle snapshot in automatic manner.

See more about Amazon EBS Volumes and Snapshots.

Required libraries

Make sure the PU running the IMDG (Space) and the PU running Mirror would have the following as part of their shared-lib folder:
antlr-2.7.6.jar , asm-1.5.3.jar,asm-attrs-1.5.3.jar,cglib-2.1_3.jar,commons-collections-2.1.1.jar,commons-dbcp-1.2.1.jar,commons-pool-1.2.jar,dom4j-1.6.1.jar,ehcache-1.2.3.jar,geronimo-spec-jta-1.0.1B-rc4.jar,hibernate-3.2.6.ga.jar,hibernate-annotations-3.2.1.ga.jar,mysql-connector-java-5.0.5.jar,persistence-api-1.0.jar

Have the Application Space domain classes placed into a library at the shared-lib folder within the PU jar of the IMDG and Mirror.

Example

IMDG with Synchronous-Persistency Topology

Here are examples files you may use to deploy IMDG with Synchronous-Persistency Topology:

This example will deploy IMDG PU and a Feeder PU on the cloud. The Feeder will be writing data into the IMDG. The objects stored within the IMDG will be persistent in Synchronous manner into a MySQL Database running on the cloud.

IMDG with Asynchronous-Persistency Topology

Here are examples files you may use to deploy IMDG with Asynchronous-Persistency Topology:

This example will deploy IMDG PU , Mirror PU and a Feeder PU on the cloud. The Feeder will be writing data into the IMDG. The objects stored within the IMDG will be persistent in Asynchronous-Synchronous manner via the Mirror into a MySQL Database running on the cloud.

Labels

 
(None)