Friday, 11 April 2014

Diagnosing Installation Issue during CRS upgrade

Case Study: Diagnosing Installation Issue during CRS installation(ASM creation 10.2.0.4) and upgrade to 11.2.0.3 -- AIX 6.1

History:

In order to migrate SIEBEL CRM 2 node RAC database(which was running on 10.2.0.4) on new servers with SSD storage, I had to prepare 10G RAC setup first then restoration of 10g database then upgrade to 11.2.0.3.

Problems:

Prob1. During ASM instance creation after 10.2.0.4 installation, encountered below error.

ORA-00600: internal error code, arguments: [KGHLATCH_REG4], [0x000000000], [], [], [], [], [], []

Prob2. During CRS upgrade to 11.2.0.3(Issue during runinstaller), encountered below error.

[INS-41712] Installer has detected that the Oracle Clusterware software on this cluster is not functioning properly on the following nodes [test02].

test02  - PRVF-7590 : "CRS daemon" is not running on node "test02"  - PRVF-7590 : "CSS daemon" is not running on node "test02"  - PRVF-7590 : "EVM daemon" is not running on node "test02"

Prob3. During installation of 11.2.0.3 RAC binaries with database upgrade option, dbua taking forever while gathering database information.

Analysis:

Prob1.
The problem is that the (Oracle internal) latch directory size is too small when CPU_COUNT > 135. This is Bug 7115828.

Prob2. cluvfy showed.

Liveness of all the daemons
  Node Name     CRS daemon                CSS daemon                EVM daemon
  ------------  ------------------------  ------------------------  ----------
  test02      no                        no                        no
  test01      yes                       yes                       yes

Prob3.

select count(*) from v$rman_backup_job_details where status = 'RUNNING'

Resolution:

Prob1.

Patches applied 9119284 & 7115828

Prob2.

Cleaned up 10 CRS, installed 11.2.0.3 clusterware.

Step 1: Stopped CRS on all nodes.
Step 2: Taken backup of below mentioned files and directories(These files and directories can be used in case of rollback).

/etc/init.cssd
/etc/init.crs
/etc/init.crsd
/etc/init.evmd
/etc/rc.d/rc2.d/K96init.crs
/etc/rc.d/rc2.d/S96init.crs
/etc/oracle/scls_scr
/etc/oracle/oprocd
/etc/inittab.crs
/etc/inittab
/etc/oratab
/etc/oraInst.loc
/etc/oracle/ocr.loc


Step 3: Cleaned of 10G CRS.

rm /etc/init.cssd
rm /etc/init.crs
rm /etc/init.crsd
rm /etc/init.evmd
rm /etc/rc.d/rc2.d/K96init.crs
rm /etc/rc.d/rc2.d/S96init.crs
rm -Rf /etc/oracle/scls_scr
rm -Rf /etc/oracle/oprocd
rm /etc/inittab.crs
cp /etc/inittab.orig /etc/inittab
rm -f /tmp/.oracle/*
Removed the ocr.loc

Step 4: Changed orainventory location in /etc/oraInst.loc file.
Step 5: Removed ASM and database instance entries from /etc/oratab file.
Step 6: Taken one disk of 5 GB to create DG for ocr and voting and made changes accordingly(created alias change ownership to oradb:dba and permission to 660).
Step 7 : Installed 11.2.0.3 clusterware. asm instances and listener(including scan listener) created.
Step 8 : mounted all the DGs with 11G asm on all nodes.

alter system set asm_diskstring='/dev/vot*','/dev/asm*';
alter system set asm_diskgroups='DATA','ARCH','OCR_VOTE';

asmcmd mount DATA
asmcmd mount ARCH

Step 9: Added database and instances in OCR from 10G home.

srvctl add database -d CRMDB -o /Oracle/oracle/10G
srvctl add instance -d CRMDB -i CRMDB1 -n test01
srvctl add instance -d CRMDB -i CRMDB2 -n test02

Step 10 : Tried to start instances and failed with below error.

ORA-29702:error occurred in Cluster Group Service operation.

Step 11: RAC nodes pinned, execute below from root user

crsctl pin css -n test01 test02

Step 12: instances started.

Step 13: Add RAC home entry in central orainventory.

./runInstaller -silent -ignoreSysPrereqs -attachHome ORACLE_HOME="/Oracle/oracle/10G" ORACLE_HOME_NAME="ORACLE_HOME" LOCAL_NODE='test01' CLUSTER_NODES=test01,test02

Prob3.

Step 1: Cancelled DBUA

Step 2: Delete the statistics collected on the system table X$KCCRSR:

exec dbms_stats.DELETE_TABLE_STATS('SYS','X$KCCRSR');

Step 3: Run dbua again

Benefits for other upgrades:

This method can be used where one of CRS node removed without proper clean from OCR(i.e. node deletion was not proper).

===============================================================


2 comments:

  1. Thank you sir for above valuable information.

    ReplyDelete
  2. We are expecting more on performance tuning with RAC... Vivekanand

    ReplyDelete