Depending on the failure scenario, a recovery situation can involve the planned execution of several remedial activities.
                  
                  	  
               
            
 
            	 
             
               		
               Typical activities involved in the recovery of a failed Enterprise Server Cluster include: 
                  		
                  
 
                     		  
                     - Troubleshooting a network connection that should be reactivated as soon as possible, and ideally before the connection is
                        marked as 
                        			 disabled. 
                        		  
                     
- Collecting information to allow later analysis of the failure. 
                        		  
                     
- Releasing the locks held by an Enterprise Server Cluster client. 
                        		  
                     
- Restoring a database. 
                        		  
                     
The main objective of any recovery process is to limit work disruption as much as possible. Disruption can be minimised by
                  careful preparation for a cluster failure: 
                  		
                  
 
                     		  
                     - Identify the possible failure scenarios and prepare for them. 
                        		  
                     
- Ensure your preparation work is documented, and that the system administrator and/or operator knows what needs to be done
                        at the point of failure, so that its duration is kept to a minimum. 
                        		  
                     
  
            	 
            
               Recovery scenarios
 
               		 
               		
               There are two primary causes of an Enterprise Server Cluster failure: 
                  		
                  
 
                     		  
                     - A permanent connection failure to the Global Lock Manager (GLM). 
                        		  
                     
- Catastrophic GLM failure - caused by a disk failure, memory corruption, a resource shortage etc. 
                        		  
                     
Note:  
                  		  
                  The system will tolerate 
                     			 non-permanent connection failures for a time defined by the environment variable ES_GLM_TIMEOUT. Once the duration of a connection failure
                     exceeds that set by this variable, the state of the connection defined between the cluster client and the GLM is marked as
                     
                     			 disabled. 
                     		  
                  
 
                  		  
                  At this point, any attempt to require global locks will fail and the following message is displayed in the JCL job log: 
                     			 
JCLCM2000E Unable to acquire global lock for job JRX0033. JCLCM0181S JOB ABENDED - COND CODE S922
 
                     		  The state of the connection will be reset to 
                     			 enabled as soon as the GLM reconnects.