Saturday, 12 April 2014

Diagnosing Performance Issue Due To Unnest Subquery

History:

There were two databases(ABCD2 and ABCD3) on a server. Due to increased workload and limited resources, it was decided to migrate ABCD2 database to new server. New database was created on new server with the name of ABCD. Full database was replicated and all the MVIEWS got replicated same way as it was on ABCD2 database. Now, all the applications had to be migrated which were running on ABCD2 database. Finally one of the applications was identified to migrate first.

Problems:

After pointing change from existing database(ABCD2 which was running on 10.2.0.4) to new one(ABCD database which was running on 10.2.0.5), number of active sessions got increased. Run queue reached more than 200 within 5  minutes. Changes were reverted back.

Analysis:

During analysis of root cause, it was found that plans were changed due to unwanted unnest subqueries(VW_SQ_n) in plans. In good plan, there was only one unnest subquery.
Bad plans had 5 unnest subqueries((VW_SQ_1 to VW_SQ_5).

Resolution:

After setting below hidden parameter at instance level, plan changed into good one and issue got resolved.

alter system set "_unnest_subquery"=false;

After setting this hidden parameter all plan of queries changed as expected except few queries.

For those few queries, stored outlines were created after setting _unnest_subquery to true at session level.

In this way over all issue got resolved and all the applications were migrated smoothly.

Benefits for other databases:

Later, all ABCD databases were upgraded to 10.2.0.5, the bug was fixed in similar way.

Friday, 11 April 2014

Diagnosing Installation Issue during CRS upgrade

Case Study: Diagnosing Installation Issue during CRS installation(ASM creation 10.2.0.4) and upgrade to 11.2.0.3 -- AIX 6.1

History:

In order to migrate SIEBEL CRM 2 node RAC database(which was running on 10.2.0.4) on new servers with SSD storage, I had to prepare 10G RAC setup first then restoration of 10g database then upgrade to 11.2.0.3.

Problems:

Prob1. During ASM instance creation after 10.2.0.4 installation, encountered below error.

ORA-00600: internal error code, arguments: [KGHLATCH_REG4], [0x000000000], [], [], [], [], [], []

Prob2. During CRS upgrade to 11.2.0.3(Issue during runinstaller), encountered below error.

[INS-41712] Installer has detected that the Oracle Clusterware software on this cluster is not functioning properly on the following nodes [test02].

test02  - PRVF-7590 : "CRS daemon" is not running on node "test02"  - PRVF-7590 : "CSS daemon" is not running on node "test02"  - PRVF-7590 : "EVM daemon" is not running on node "test02"

Prob3. During installation of 11.2.0.3 RAC binaries with database upgrade option, dbua taking forever while gathering database information.

Analysis:

Prob1.
The problem is that the (Oracle internal) latch directory size is too small when CPU_COUNT > 135. This is Bug 7115828.

Prob2. cluvfy showed.

Liveness of all the daemons
  Node Name     CRS daemon                CSS daemon                EVM daemon
  ------------  ------------------------  ------------------------  ----------
  test02      no                        no                        no
  test01      yes                       yes                       yes

Prob3.

select count(*) from v$rman_backup_job_details where status = 'RUNNING'

Resolution:

Prob1.

Patches applied 9119284 & 7115828

Prob2.

Cleaned up 10 CRS, installed 11.2.0.3 clusterware.

Step 1: Stopped CRS on all nodes.
Step 2: Taken backup of below mentioned files and directories(These files and directories can be used in case of rollback).

/etc/init.cssd
/etc/init.crs
/etc/init.crsd
/etc/init.evmd
/etc/rc.d/rc2.d/K96init.crs
/etc/rc.d/rc2.d/S96init.crs
/etc/oracle/scls_scr
/etc/oracle/oprocd
/etc/inittab.crs
/etc/inittab
/etc/oratab
/etc/oraInst.loc
/etc/oracle/ocr.loc


Step 3: Cleaned of 10G CRS.

rm /etc/init.cssd
rm /etc/init.crs
rm /etc/init.crsd
rm /etc/init.evmd
rm /etc/rc.d/rc2.d/K96init.crs
rm /etc/rc.d/rc2.d/S96init.crs
rm -Rf /etc/oracle/scls_scr
rm -Rf /etc/oracle/oprocd
rm /etc/inittab.crs
cp /etc/inittab.orig /etc/inittab
rm -f /tmp/.oracle/*
Removed the ocr.loc

Step 4: Changed orainventory location in /etc/oraInst.loc file.
Step 5: Removed ASM and database instance entries from /etc/oratab file.
Step 6: Taken one disk of 5 GB to create DG for ocr and voting and made changes accordingly(created alias change ownership to oradb:dba and permission to 660).
Step 7 : Installed 11.2.0.3 clusterware. asm instances and listener(including scan listener) created.
Step 8 : mounted all the DGs with 11G asm on all nodes.

alter system set asm_diskstring='/dev/vot*','/dev/asm*';
alter system set asm_diskgroups='DATA','ARCH','OCR_VOTE';

asmcmd mount DATA
asmcmd mount ARCH

Step 9: Added database and instances in OCR from 10G home.

srvctl add database -d CRMDB -o /Oracle/oracle/10G
srvctl add instance -d CRMDB -i CRMDB1 -n test01
srvctl add instance -d CRMDB -i CRMDB2 -n test02

Step 10 : Tried to start instances and failed with below error.

ORA-29702:error occurred in Cluster Group Service operation.

Step 11: RAC nodes pinned, execute below from root user

crsctl pin css -n test01 test02

Step 12: instances started.

Step 13: Add RAC home entry in central orainventory.

./runInstaller -silent -ignoreSysPrereqs -attachHome ORACLE_HOME="/Oracle/oracle/10G" ORACLE_HOME_NAME="ORACLE_HOME" LOCAL_NODE='test01' CLUSTER_NODES=test01,test02

Prob3.

Step 1: Cancelled DBUA

Step 2: Delete the statistics collected on the system table X$KCCRSR:

exec dbms_stats.DELETE_TABLE_STATS('SYS','X$KCCRSR');

Step 3: Run dbua again

Benefits for other upgrades:

This method can be used where one of CRS node removed without proper clean from OCR(i.e. node deletion was not proper).

===============================================================