Case Study: Diagnosing Installation Issue during CRS installation(ASM creation 10.2.0.4) and upgrade to 11.2.0.3 -- AIX 6.1
History:
In order to migrate SIEBEL CRM 2 node RAC database(which was running on 10.2.0.4) on new servers with SSD storage, I had to prepare 10G RAC setup first then restoration of 10g database then upgrade to 11.2.0.3.
Problems:
Prob1. During ASM instance creation after 10.2.0.4 installation, encountered below error.
ORA-00600: internal error code, arguments: [KGHLATCH_REG4], [0x000000000], [], [], [], [], [], []
Prob2. During CRS upgrade to 11.2.0.3(Issue during runinstaller), encountered below error.
[INS-41712] Installer has detected that the Oracle Clusterware software on this cluster is not functioning properly on the following nodes [test02].
test02 - PRVF-7590 : "CRS daemon" is not running on node "test02" - PRVF-7590 : "CSS daemon" is not running on node "test02" - PRVF-7590 : "EVM daemon" is not running on node "test02"
Prob3. During installation of 11.2.0.3 RAC binaries with database upgrade option, dbua taking forever while gathering database information.
Analysis:
Prob1.
The problem is that the (Oracle internal) latch directory size is too small when CPU_COUNT > 135. This is Bug 7115828.
Prob2. cluvfy showed.
Liveness of all the daemons
Node Name CRS daemon CSS daemon EVM daemon
------------ ------------------------ ------------------------ ----------
test02 no no no
test01 yes yes yes
Prob3.
select count(*) from v$rman_backup_job_details where status = 'RUNNING'
Resolution:
Prob1.
Patches applied 9119284 & 7115828
Prob2.
Cleaned up 10 CRS, installed 11.2.0.3 clusterware.
Step 1: Stopped CRS on all nodes.
Step 2: Taken backup of below mentioned files and directories(These files and directories can be used in case of rollback).
/etc/init.cssd
/etc/init.crs
/etc/init.crsd
/etc/init.evmd
/etc/rc.d/rc2.d/K96init.crs
/etc/rc.d/rc2.d/S96init.crs
/etc/oracle/scls_scr
/etc/oracle/oprocd
/etc/inittab.crs
/etc/inittab
/etc/oratab
/etc/oraInst.loc
/etc/oracle/ocr.loc
Step 3: Cleaned of 10G CRS.
rm /etc/init.cssd
rm /etc/init.crs
rm /etc/init.crsd
rm /etc/init.evmd
rm /etc/rc.d/rc2.d/K96init.crs
rm /etc/rc.d/rc2.d/S96init.crs
rm -Rf /etc/oracle/scls_scr
rm -Rf /etc/oracle/oprocd
rm /etc/inittab.crs
cp /etc/inittab.orig /etc/inittab
rm -f /tmp/.oracle/*
Removed the ocr.loc
Step 4: Changed orainventory location in /etc/oraInst.loc file.
Step 5: Removed ASM and database instance entries from /etc/oratab file.
Step 6: Taken one disk of 5 GB to create DG for ocr and voting and made changes accordingly(created alias change ownership to oradb:dba and permission to 660).
Step 7 : Installed 11.2.0.3 clusterware. asm instances and listener(including scan listener) created.
Step 8 : mounted all the DGs with 11G asm on all nodes.
alter system set asm_diskstring='/dev/vot*','/dev/asm*';
alter system set asm_diskgroups='DATA','ARCH','OCR_VOTE';
asmcmd mount DATA
asmcmd mount ARCH
Step 9: Added database and instances in OCR from 10G home.
srvctl add database -d CRMDB -o /Oracle/oracle/10G
srvctl add instance -d CRMDB -i CRMDB1 -n test01
srvctl add instance -d CRMDB -i CRMDB2 -n test02
Step 10 : Tried to start instances and failed with below error.
ORA-29702:error occurred in Cluster Group Service operation.
Step 11: RAC nodes pinned, execute below from root user
crsctl pin css -n test01 test02
Step 12: instances started.
Step 13: Add RAC home entry in central orainventory.
./runInstaller -silent -ignoreSysPrereqs -attachHome ORACLE_HOME="/Oracle/oracle/10G" ORACLE_HOME_NAME="ORACLE_HOME" LOCAL_NODE='test01' CLUSTER_NODES=test01,test02
Prob3.
Step 1: Cancelled DBUA
Step 2: Delete the statistics collected on the system table X$KCCRSR:
exec dbms_stats.DELETE_TABLE_STATS('SYS','X$KCCRSR');
Step 3: Run dbua again
Benefits for other upgrades:
This method can be used where one of CRS node removed without proper clean from OCR(i.e. node deletion was not proper).
===============================================================
History:
In order to migrate SIEBEL CRM 2 node RAC database(which was running on 10.2.0.4) on new servers with SSD storage, I had to prepare 10G RAC setup first then restoration of 10g database then upgrade to 11.2.0.3.
Problems:
Prob1. During ASM instance creation after 10.2.0.4 installation, encountered below error.
ORA-00600: internal error code, arguments: [KGHLATCH_REG4], [0x000000000], [], [], [], [], [], []
Prob2. During CRS upgrade to 11.2.0.3(Issue during runinstaller), encountered below error.
[INS-41712] Installer has detected that the Oracle Clusterware software on this cluster is not functioning properly on the following nodes [test02].
test02 - PRVF-7590 : "CRS daemon" is not running on node "test02" - PRVF-7590 : "CSS daemon" is not running on node "test02" - PRVF-7590 : "EVM daemon" is not running on node "test02"
Prob3. During installation of 11.2.0.3 RAC binaries with database upgrade option, dbua taking forever while gathering database information.
Analysis:
Prob1.
The problem is that the (Oracle internal) latch directory size is too small when CPU_COUNT > 135. This is Bug 7115828.
Prob2. cluvfy showed.
Liveness of all the daemons
Node Name CRS daemon CSS daemon EVM daemon
------------ ------------------------ ------------------------ ----------
test02 no no no
test01 yes yes yes
Prob3.
select count(*) from v$rman_backup_job_details where status = 'RUNNING'
Resolution:
Prob1.
Patches applied 9119284 & 7115828
Prob2.
Cleaned up 10 CRS, installed 11.2.0.3 clusterware.
Step 1: Stopped CRS on all nodes.
Step 2: Taken backup of below mentioned files and directories(These files and directories can be used in case of rollback).
/etc/init.cssd
/etc/init.crs
/etc/init.crsd
/etc/init.evmd
/etc/rc.d/rc2.d/K96init.crs
/etc/rc.d/rc2.d/S96init.crs
/etc/oracle/scls_scr
/etc/oracle/oprocd
/etc/inittab.crs
/etc/inittab
/etc/oratab
/etc/oraInst.loc
/etc/oracle/ocr.loc
Step 3: Cleaned of 10G CRS.
rm /etc/init.cssd
rm /etc/init.crs
rm /etc/init.crsd
rm /etc/init.evmd
rm /etc/rc.d/rc2.d/K96init.crs
rm /etc/rc.d/rc2.d/S96init.crs
rm -Rf /etc/oracle/scls_scr
rm -Rf /etc/oracle/oprocd
rm /etc/inittab.crs
cp /etc/inittab.orig /etc/inittab
rm -f /tmp/.oracle/*
Removed the ocr.loc
Step 4: Changed orainventory location in /etc/oraInst.loc file.
Step 5: Removed ASM and database instance entries from /etc/oratab file.
Step 6: Taken one disk of 5 GB to create DG for ocr and voting and made changes accordingly(created alias change ownership to oradb:dba and permission to 660).
Step 7 : Installed 11.2.0.3 clusterware. asm instances and listener(including scan listener) created.
Step 8 : mounted all the DGs with 11G asm on all nodes.
alter system set asm_diskstring='/dev/vot*','/dev/asm*';
alter system set asm_diskgroups='DATA','ARCH','OCR_VOTE';
asmcmd mount DATA
asmcmd mount ARCH
Step 9: Added database and instances in OCR from 10G home.
srvctl add database -d CRMDB -o /Oracle/oracle/10G
srvctl add instance -d CRMDB -i CRMDB1 -n test01
srvctl add instance -d CRMDB -i CRMDB2 -n test02
Step 10 : Tried to start instances and failed with below error.
ORA-29702:error occurred in Cluster Group Service operation.
Step 11: RAC nodes pinned, execute below from root user
crsctl pin css -n test01 test02
Step 12: instances started.
Step 13: Add RAC home entry in central orainventory.
./runInstaller -silent -ignoreSysPrereqs -attachHome ORACLE_HOME="/Oracle/oracle/10G" ORACLE_HOME_NAME="ORACLE_HOME" LOCAL_NODE='test01' CLUSTER_NODES=test01,test02
Prob3.
Step 1: Cancelled DBUA
Step 2: Delete the statistics collected on the system table X$KCCRSR:
exec dbms_stats.DELETE_TABLE_STATS('SYS','X$KCCRSR');
Step 3: Run dbua again
Benefits for other upgrades:
This method can be used where one of CRS node removed without proper clean from OCR(i.e. node deletion was not proper).
===============================================================
Thank you sir for above valuable information.
ReplyDeleteWe are expecting more on performance tuning with RAC... Vivekanand
ReplyDelete