Oracle University Istanbul Expert Summit 2014

expert

We had greate 3 days with the experts in their field.

Pete Finnigan with great focus on Security. Database design according to the security principles. He also mentioned encryption, leakage of critical data, dangerous functions and use of incorrect privileges.
Jonathan Lewis gave tons of examples from simple ones to complex ones and show how to make the best out of the optimizer. Optimizer is getting starter and smarter but it can still surprise you :)
Frits Hoogland with great experience in best practices, how to  set up a database environment, and what things to look out for to make them easier to administer and improve performance and also shared his extensive observations and investigations on database I/O focusing on the wait events.

Session 1: November 5 th 
PLSQL Secure Coding with Pete Finnigan

Session 2: November 6 th 
Beating the Optimizer with Jonathan Lewis

Session 3: November 7 th
Best Practice in Oracle Exadata and Database optimization by Frits Hoogland
Oracle database I/O implementation deep dive with Frits Hoogland

ORA-4068 Errors for Valid Objects

Compilation of oracle objects are pretty easy where there is no activity on the database. If the objects are heavily accessed and the compilation takes place then you should be prepared for the worst case scenarios.Depending on the object dependency and access rates, sometimes downtime is required.

Offline the system and compile the objects and then compile the invalid objects. One weird thing happened to us when we were compiling on a heavily accessed object. Everything worked perfectly and the we compiled all the
invalid objects then put the system online but then afterwards we saw there are invalid objects reported from the users but when we checked there are no invalid objects on the system.

select *
  from dba_objects o
 where last_ddl_time > trunc(sysdate)+18/24
   and status  <> 'VALID';

Returned 0 rows. It was pretty clear that there were no invalid objects on the system.

Then we found a document on the Support Oracle “ORA-4068 Errors for Valid Objects (Doc ID 835792.1)”. There happened to be a logical corruption on the dependency tree and the only way to find it, is querying the underlying $ views of sys (explained in the document)

select do.obj# d_obj,
       do.name d_name,
       do.type# d_type,
       po.obj# p_obj,
       po.name p_name,
       to_char(p_timestamp, 'DD-MON-YYYY HH24:MI:SS') "P_Timestamp",
       to_char(po.stime, 'DD-MON-YYYY HH24:MI:SS') "STIME",
       decode(sign(po.stime - p_timestamp), 0, 'SAME', '*DIFFER*') X
  from sys.obj$ do, sys.dependency$ d, sys.obj$ po
 where P_OBJ# = po.obj#(+)
   and D_OBJ# = do.obj#
   and do.status = 1 /*dependent is valid*/
   and po.status = 1 /*parent is valid*/
   and po.stime != p_timestamp /*parent timestamp not match*/
 order by 2, 1;

It says in the support document that you need to drop/create symbolic links but the objects we had was procedures.
The solution we found to fix the dependency tree was simple. After adding a new line inside the procedure and recompile it and check the dependency query if they are gone.

I can say It is one of the worst things you can face on a Production system :)

Restore Corrupt Datafile using Another Standby

If there is a corruption on one of the datafiles on a dataguard you can restore the corruption using another dataguard. I will explain how to do the restore operation in rman using oracle 11.2.

ON THE WORKING DATAGUARD

Backup the datafile as image copy.

RMAN> BACKUP AS COPY DATAFILE 178 FORMAT "+data/test2/datafile/test_datafile178";

Uncatalog the image backup on the working dataguard.

RMAN> LIST COPY OF DATAFILE 178;
List of Datafile Copies
=======================
Key File S Completion Time Ckp SCN Ckp Time
------- ---- - --------------- ---------- ---------------
2208 178 A 01-JUL-14 125562119884 01-JUL-14
Name: +DATA/test2/datafile/test_datafile178
Tag: TAG20140112T080845

RMAN> CHANGE DATAFILECOPY '+data/test2/datafile/test_datafile178' UNCATALOG;
uncataloged datafile copy
datafile copy file name=+DATA/test2/datafile/test_datafile178 RECID=2208 STAMP=852710960
Uncataloged 1 objects

ON THE CORRUPT DATAGUARD

Copy the datafile image backup to the remote host. If they are on the same host you do not need to do anything. Assume they are on the same host for now. Catalog the datafile image copy

RMAN> CATALOG DATAFILECOPY "+data/test2/datafile/test_datafile178";
using target database control file instead of recovery catalog
cataloged datafile copy
datafile copy file name=+DATA/test2/datafile/test_datafile178 RECID=2730 STAMP=852711106

Switch the dataguard copy.

RMAN> SWITCH DATAFILE 178 TO COPY ;
using target database control file instead of recovery catalog
datafile 178 switched to datafile copy "+data/test2/datafile/test_datafile178"

Delete the corrupt datafile

Active Dataguard in Action for Block Corruptions

Active Dataguard enables a dataguard to be open as read only and log apply at the same time. This way you can have near real time ( depending on the gap with your primary) read only copy of your primary system which has the all functionality of a normal dataguard. One of the cool features of Active Dataguard is Automatic Block Repair. In this blog I will demonstrate this feature.

The data block can be lost during write phase, zeroed out or corrupted. when the datablock is the first time used, it returns an error to the end user about the corruption.If Active Dataguard is enabled in your system then, on primary that block is restored from one of the active dataguards, no error is returned to the users. Everything occurs in the background and transparent to the users.

I assume the following points :
* block corruption has NOT been propagated to Dataguard
* you can access all the necessary redo logs and archive logs for recovering that block. (not much archivelog required if your lag is small)

Be careful you need extra licence to enable Active Dataguard.
http://docs.oracle.com/cd/E11882_01/license.112/e47877/options.htm#DBLIC141&#8243;

Startup the dataguard as active dataguard.

SQL> startup mount ;     
Database mounted.

SQL> alter database open read only ;
Database altered.

SQL> select open_mode from v$database ;

OPEN_MODE
--------------------
READ ONLY

SQL> alter database recover managed standby database using current logfile disconnect ;
Database altered.

SQL> select open_mode from v$database ;

OPEN_MODE
--------------------
READ ONLY WITH APPLY

Set up the Test Case on Primary Database

SQL> conn mehmeteser
Enter password: 
Connected.
SQL> create table junk_tbl tablespace JUNK_TBS as select * from dba_objects ;
Table created.

SQL> select *
  2    from (select distinct dbms_rowid.rowid_block_number(rowid) block_number
  3            from junk_tbl)
  4   where rownum  alter system flush buffer_cache;
System altered.

fun part Lets zero out the block on the test table block_number=1427

[oracle@host1 ~]$ dd of=/u01/datafile/VTYTEST_junk_tbs01  bs=8192 seek=1427 conv=notrunc count=1 if=/dev/zero
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 3.6328e-05 seconds, 226 MB/s

Let’s see What happens when we try to query the table:

WITH ACTIVE STANDBY

Block will be restored and recovered transparently in the background.

SQL> create table junk_tbl_bck tablespace junk_tbs as select * from junk_tbl;
Table created.

alert log :

Tue Jun 10 16:17:50 2014
Hex dump of (file 5, block 1427) in trace file /u01/app/oracle/diag/rdbms/vtytest/VTYTEST/trace/VTYTEST_ora_115127.trc
Corrupt block relative dba: 0x01400593 (file 5, block 1427)
Completely zero block found during multiblock buffer read
Reading datafile '/u01/datafile/VTYTEST_junk_tbs01' for corruption at rdba: 0x01400593 (file 5, block 1427)
Reread (file 5, block 1427) found same corrupt data (no logical check)
Automatic block media recovery requested for (file# 5, block# 1427)
Tue Jun 10 16:21:27 2014
Thread 1 advanced to log sequence 5211 (LGWR switch)
  Current log# 3 seq# 5211 mem# 0: +DATA/vtytest/onlinelog/group_3.299.839760905
  Current log# 3 seq# 5211 mem# 1: +DATA/vtytest/onlinelog/group_3.297.839760909
Tue Jun 10 16:21:27 2014
Archived Log entry 10399 added for thread 1 sequence 5210 ID 0x8cf991b8 dest 1:
Tue Jun 10 16:21:28 2014
Automatic block media recovery successful for (file# 5, block# 1427)
Tue Jun 10 16:21:28 2014

WITHOUT ACTIVE STANDBY

Manual restore is required. Rman block recovery can be used.

SQL> create table junk_tbl_bck tablespace junk_tbs as select * from junk_tbl;
create table junk_tbl_bck tablespace junk_tbs as select * from junk_tbl
                                                               *
ERROR at line 1:
ORA-01578: ORACLE data block corrupted (file # 5, block # 1427)
ORA-01110: data file 5: '/u01/datafile/VTYTEST_junk_tbs01'

alert log :

Tue Jun 10 16:29:16 2014
Hex dump of (file 5, block 1427) in trace file /u01/app/oracle/diag/rdbms/vtytest/VTYTEST/trace/VTYTEST_ora_115127.trc
Corrupt block relative dba: 0x01400593 (file 5, block 1427)
Completely zero block found during multiblock buffer read
Reading datafile '/u01/datafile/VTYTEST_junk_tbs01' for corruption at rdba: 0x01400593 (file 5, block 1427)
Reread (file 5, block 1427) found same corrupt data (no logical check)
Errors in file /u01/app/oracle/diag/rdbms/vtytest/VTYTEST/trace/VTYTEST_ora_115127.trc  (incident=59009):
ORA-01578: ORACLE data block corrupted (file # 5, block # 1427)
ORA-01110: data file 5: '/u01/datafile/VTYTEST_junk_tbs01'
Incident details in: /u01/app/oracle/diag/rdbms/vtytest/VTYTEST/incident/incdir_59009/VTYTEST_ora_115127_i59009.trc
Tue Jun 10 16:29:16 2014
Corrupt Block Found
         TSN = 6, TSNAME = JUNK_TBS
         RFN = 5, BLK = 1427, RDBA = 20972947
         OBJN = 24975, OBJD = 24975, OBJECT = JUNK_TBL, SUBOBJECT = 
         SEGMENT OWNER = MEHMETESER, SEGMENT TYPE = Table Segment
Tue Jun 10 16:29:20 2014
Sweep [inc][59009]: completed
Errors in file /u01/app/oracle/diag/rdbms/vtytest/VTYTEST/incident/incdir_59009/VTYTEST_m000_130104_i59009_a.trc:
ORA-19583: conversation terminated due to error
ORA-01122: database file 5 failed verification check
ORA-01110: data file 5: '/u01/datafile/VTYTEST_junk_tbs01'
ORA-01200: actual file size of 131072 is smaller than correct size of 131072 blocks
Tue Jun 10 16:29:20 2014
Dumping diagnostic data in directory=[cdmp_20140610162920], requested by (instance=1, osid=115127), summary=[incident=59009].
Tue Jun 10 16:29:43 2014

Tips on Linux Kernel Upgrade

Kernel upgrade can be applied node by node on exadata so there will be no service interruption. Kernel upgrades are required when you need new functionality or when you hit bugs on the current kernel version. I had to upgrade kernel of a box. It is a good experience and The following procedure is based on kernel upgrade on Oracle Linux 5.8 with Unbreakable Enterprise Kernel [2.6.32], a compute node of exadata.

PRE-UPGRADE

==> If you have EM12C the targets on the host will be unavailable for upgrade period. Put them in blackout state so that no false alarms generated from them.

==> Run the upgrade step on X-windows like vnc. This will prevent any disconnection issues from ssh clients.

==> Disable all NFS mounts on the system. check the locations /etc/rc.local , /etc/fstab

==> Is there any asm operations going on the system. Wait for them to finish. Make sure no rebalance job is running on the ASM part. check v$asm_operation.

==> Backup the grup startup file /boot/grub/grub.conf. you might need it for rollback.

==> Shutdown the crs and disable crs auto start. Also shutdown any databases or listeners that are not registered with the csr.
[root@host1 ~]# /u01/app/11.2.0.3/grid/bin/crsctl disable crs
CRS-4621: Oracle High Availability Services autostart is disabled.
[root@host1 ~]# /u01/app/11.2.0.3/grid/bin/crsctl stop crs -f

==> Make sure crs is not running
[root@host1 ~]# ps -ef | grep d.bin
root 66664 60395 0 09:55 pts/1 00:00:00 grep d.bin

==> Reboot the system and make sure it is able to restart before any kernel changes :)

==> Check the ilom problem page and make sure there is no problem on the server. If there are any like memory problems etc. fix them.

==> Record the current kernel
[root@host1 ~]# uname -r
2.6.32-400.11.1.el5uek

==> Check the server version and make sure the next kernel is designed for the server.
[root@host1 ~]# dmidecode -s system-product-name
SUN FIRE X4170 M3

==> Make sure enough space is available
[root@host1 ~]# df -h

==> Shutdown any database or listeners that hasn’t been registered with the crs. check the crs for the last time.
[root@host1 ~]# ps -ef | grep d.bin
root 66664 60395 0 09:55 pts/1 00:00:00 grep d.bin

UPGRADE
==> upgrade the kernel

[root@host1 ~]# rpm -ivh kernel-uek-firmware-2.6.32-400.34.1.el5uek.noarch.rpm kernel-uek-2.6.32-400.34.1.el5uek.x86_64.rpm ofa-2.6.32-400.34.1.el5uek-1.5.1-4.0.58.1.x86_64.rpm
Preparing… ########################################### [100%]
1:kernel-uek-firmware ########################################### [ 33%]
2:kernel-uek ########################################### [ 67%]
3:ofa-2.6.32-400.34.1.el5########################################### [100%]

==> Reboot the system
[root@host1 ~]# reboot

POST-UPGRADE
==> Check ilom for any errors. Check /var/log/messages for any errors.

==> Check the new kernel version
[root@host1 ~]# uname -r
2.6.32-400.34.1.el5uek

==> Start the crs and enable crs auto start
[root@host1 ~]# /u01/app/11.2.0.3/grid/bin/crsctl enable crs
CRS-4622: Oracle High Availability Services autostart is enabled.
[root@host1 ~]# /u01/app/11.2.0.3/grid/bin/crsctl start crs

==> Check if crs is starting

[root@host1 ~]# ps -ef | grep d.bin
root 11852 1 4 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ohasd.bin reboot
oracle 12013 1 0 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/oraagent.bin
oracle 12025 1 0 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/mdnsd.bin
oracle 12109 1 1 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gpnpd.bin
root 12119 1 0 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/orarootagent.bin
oracle 12122 1 1 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/gipcd.bin
root 12137 1 1 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/osysmond.bin
root 12150 1 0 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdmonitor
root 12167 1 0 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/cssdagent
oracle 12169 1 1 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/diskmon.bin -d -f
oracle 12187 1 2 10:22 ? 00:00:00 /u01/app/11.2.0.3/grid/bin/ocssd.bin
root 12389 10620 0 10:23 pts/0 00:00:00 grep d.bin
[root@host1 ~]#

==> Enable any NFS mount on the system and mount them

==> On EM12c end the blackout period for the targets.

Now you can move on the other server in the cluster.

Kdump – Kernel Crashdump

Many people agree that Linux kernel is pretty stable. It doesn’t crash pretty often. But there are times you hit a bug and It suddenly crash. In that case kdump is useful for investigating the reason.
Kdump can be simply defined as Linux Kernel Crashdump Mechanism. It is used to gather diagnostic information during a Linux system crash. If the Linux system crash, the dump file (memory image – vmcore) is generated.
This core file can be pretty huge depending on your memory usage. To investigate the root cause you need to generate stack trace from this file. For more information refer : “How to get the stack trace of kdump in Exadata (Doc ID 1363245.1)”

Example :

crash /u01/crashfiles/127.0.0.1-2012-05-12-02:31:16/crashcore  /usr/lib/debug/lib/modules/"kernel version"/vmlinux 

After a system crash, It may take a lot of time to start a linux machine when the kdump service is active. You can TEMPORARY disable the service. It is NOT advised to disable it all the time since
you may lose valuable diag information.

Disable/Enable on Start

[root@host1 ~]# dcli -g dbs_group  -l root "chkconfig kdump off  "
[root@host1 ~]# dcli -g dbs_group  -l root "chkconfig --list kdump "
host1: kdump              0:off   1:off   2:off   3:off   4:off   5:off   6:off
host2: kdump              0:off   1:off   2:off   3:off   4:off   5:off   6:off

[root@host1 ~]# dcli -g dbs_group  -l root "chkconfig kdump on   "
[root@host1 ~]# dcli -g dbs_group  -l root "chkconfig --list kdump "
host1: kdump              0:off   1:off   2:on    3:on    4:on    5:on    6:off
host2: kdump              0:off   1:off   2:on    3:on    4:on    5:on    6:off

Stop/Start Kdump Service

[root@host1 ~]#  dcli -g dbs_group  -l root " service kdump stop  "
host1: Stopping kdump:[  OK  ]
host2: Stopping kdump:[  OK  ]
[root@host1 ~]#  dcli -g dbs_group  -l root " service kdump status  "
host1: Kdump is not operational
host2: Kdump is not operational

[root@host1 ~]#  dcli -g dbs_group  -l root " service kdump start   "
host1: Starting kdump:[  OK  ]
host2: Starting kdump:[  OK  ]
[root@host1 ~]#  dcli -g dbs_group  -l root " service kdump status  "
host1: Kdump is operational
host2: Kdump is operational

for more info => http://docs.oracle.com/cd/E37670_01/E37355/html/ol_kdump_diag.html

Files Used :

/boot/grub/grub.conf
Appends the crash kernel option to the kernel line to specify the amount of reserved memory and any offset value.

/etc/kdump.conf
Sets the location where the dump file can be written, the filtering level for the makedumpfile command, and the default behavior to take if the dump fails.
See the comments in the file for information about the supported parameters.

When is the last time you changed a flashdisk or harddisk on Exadata ?

Why is this question is important for a DBA ? Actually the results are pretty surprising when you execute the query and see the last inserted date of a harddisk or a flashdisk. In another way, that information shows you the time you changed a faulty harddisk or flashdisk on the exadata. If you deleted your alert history on that cell, this information may give you a quick idea. Also It gives you an idea when the system is first installed :) Assuming not most of the disks are replaced.

[root@testdbadm01 ~]#  dcli -g cellnodes.txt  -l root "cellcli -e list physicaldisk  attributes name,diskType,physicalInsertTime " | sort -k 4
testdb01-priv: 20:0             HardDisk        2013-03-26T11:23:53+02:00       
testdb01-priv: 20:10            HardDisk        2013-03-26T11:23:53+02:00       
testdb01-priv: 20:11            HardDisk        2013-03-26T11:23:53+02:00       
testdb01-priv: 20:1             HardDisk        2013-03-26T11:23:53+02:00       
testdb01-priv: 20:2             HardDisk        2013-03-26T11:23:53+02:00       
testdb01-priv: 20:3             HardDisk        2013-03-26T11:23:53+02:00       
testdb01-priv: 20:4             HardDisk        2013-03-26T11:23:53+02:00       
testdb01-priv: 20:5             HardDisk        2013-03-26T11:23:53+02:00       
testdb01-priv: 20:7             HardDisk        2013-03-26T11:23:53+02:00       
testdb01-priv: 20:8             HardDisk        2013-03-26T11:23:53+02:00       
testdb01-priv: 20:9             HardDisk        2013-03-26T11:23:53+02:00       
testdb01-priv: FLASH_1_0        FlashDisk       2013-03-26T11:23:53+02:00       
testdb01-priv: FLASH_1_1        FlashDisk       2013-03-26T11:23:53+02:00       
testdb01-priv: FLASH_1_2        FlashDisk       2013-03-26T11:23:53+02:00       
. 
. 
. 
testdb07-priv: FLASH_4_0        FlashDisk       2013-03-26T12:05:59+02:00       
testdb07-priv: FLASH_4_1        FlashDisk       2013-03-26T12:05:59+02:00       
testdb07-priv: FLASH_4_2        FlashDisk       2013-03-26T12:05:59+02:00       
testdb07-priv: FLASH_4_3        FlashDisk       2013-03-26T12:05:59+02:00       
testdb07-priv: FLASH_5_0        FlashDisk       2013-09-20T11:10:31+03:00       
testdb07-priv: FLASH_5_1        FlashDisk       2013-09-20T11:10:31+03:00       
testdb07-priv: FLASH_5_2        FlashDisk       2013-09-20T11:10:31+03:00       
testdb07-priv: FLASH_5_3        FlashDisk       2013-09-20T11:10:31+03:00       
testdb06-priv: FLASH_5_0        FlashDisk       2013-10-10T16:30:15+03:00       
testdb06-priv: FLASH_5_1        FlashDisk       2013-10-10T16:30:15+03:00       
testdb06-priv: FLASH_5_2        FlashDisk       2013-10-10T16:30:15+03:00       
testdb06-priv: FLASH_5_3        FlashDisk       2013-10-10T16:30:15+03:00       
testdb01-priv: 20:6             HardDisk        2013-11-18T09:45:08+02:00       
testdb06-priv: FLASH_1_0        FlashDisk       2013-11-22T16:13:33+02:00       
testdb06-priv: FLASH_1_1        FlashDisk       2013-11-22T16:13:33+02:00       
testdb06-priv: FLASH_1_2        FlashDisk       2013-11-22T16:13:33+02:00       
testdb06-priv: FLASH_1_3        FlashDisk       2013-11-22T16:13:33+02:00       

Follow

Get every new post delivered to your Inbox.