Oracle MVA

Tales from a Jack of all trades

Archive for the ‘CRS’ Category

when AFCS crashes….

leave a comment »

Today three out of five nodes of a cluster crashed while a loadtest was running on two of the nodes of this cluster. The cluster is a ACFS cluster with OSB and SOA productions on top of it. It uses the ACFS disk for logging and configuration. All binaries are on local disk. The version of GI used is 11.2.0.1 running on 64-bit OEL 5.5.

This blogpost is mostly a note to myself, but I might help some other people with the content.

While looking in the logfiles of CRS for the cause of nodefailure I found this error:


view /u01/app/grid/log/some_server/agent/crsd/oraagent_oracle/oraagent_oracle.l01


2010-10-12 10:21:05.060: [ora.DGGRID.dg][1536899392] [check] InstConnection::connectInt (2) Exception OCIException
2010-10-12 10:21:05.060: [ora.DGGRID.dg][1536899392] [check] Exception type=2 string=ORA-01034: ORACLE not available
ORA-27102: out of memory
Linux-x86_64 Error: 12: Cannot allocate memory
Additional information: 1
Additional information: 491521
Additional information: 8
Process ID: 0
Session ID: 0 Serial number: 0

The ASM instance had 1 GB set for both memory_target as well as memory_max_target. So somehow ACFS uses more memory while on heavy load. I am not aware of any formula’s or best practice to calculate the memory_target for an ASM instance that is just running ACFS. The 1 GB was a guesstimate based on 11.1 knowledge. If anyone has some handles for me regarding memory settings for ASM with just ACFS, please comment on this blogpost.

Some more checking, in this case of Linux (OEL 5) showed some more:


dmesg


[Oracle ACFS] FSCK-NEEDED set for volume /dev/asm/v_disk-170 . Internal ACFS Location: 916 .
[Oracle ACFS] A problem has been detected with
[Oracle ACFS] the file system metadata in /dev/asm/v_disk-170 .
[Oracle ACFS] Normal operation can continue, but it is advisable
[Oracle ACFS] to run fsck on the file system as soon as it is
[Oracle ACFS] feasible to do so.  See the Storage Admin
[Oracle ACFS] Guide for more information about FSCK-NEEDED.

Now this seems like trouble, so I stopped all nodes of the cluster (*AIKS*) and started up an fsck. This ran for ages, just do this:


lseek(4, 9781714944, SEEK_SET) = 9781714944
read(4, "\202\1\6P\17dG\26o\315\324_\3262\363 \tG\2"..., 4096) = 4096
lseek(4, 9781706752, SEEK_SET) = 9781706752
read(4, "\202\1\6P\17dG\26o\315\324(\6\363\23\tG\2"..., 4096) = 4096
lseek(4, 9781649408, SEEK_SET) = 9781649408
read(4, "\202\1\6P\17dG\26o\315\324\262\226\352\20 \10G\2"..., 4096) = 4096
lseek(4, 9781739520, SEEK_SET) = 9781739520
read(4, "\202\1\5P\17dG\26o\315\324\257\27=\233\200\tG\2"..., 4096) = 4096
lseek(4, 2315993088, SEEK_SET) = 2315993088
read(4, "\202\1\5P\17dG\26o\315\324\340O\10\310@\v\212"..., 4096) = 4096

Now I’m no C programmer, nor an filesystem specialist so I don’t exactly know what’s going on (yet). After 4 hours I did decide that waiting longer was futile, it’s just a freaking 10 GB disk!

I started fsck again, only this time with some extra parameters:


$ fsck -a -v -y -t acfs /dev/asm/v_disk-170


OfsCheckOnDiskGBM entered
fsck.acfs: OfsReadMeta at offset: 67112960 (0x4001000)    size: 327680 (0x50000)
OfsCheckFileEntry entered for:
ACFS Internal File: [ACFS Snap Map]
fenum: 19 (0x13)   disk offset: 79360 (0x13600)


fsck.acfs: OfsReadMeta at offset: 79360 (0x13600)    size: 512 (0x200)
OfsCheckFileExtents entered for:
ACFS Internal File: [ACFS Snap Map]
fenum: 19 (0x13)   disk offset: 79360 (0x13600)


fsck.acfs: OfsReadMeta at offset: 67440640 (0x4051000)    size: 512 (0x200)


Checking if any files are orphaned...


Phase 1 Orphan check...


fsck.acfs: OfsReadMeta at offset: 81920 (0x14000)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 82432 (0x14200)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 82944 (0x14400)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 83456 (0x14600)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 83968 (0x14800)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 84480 (0x14a00)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 84992 (0x14c00)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 85504 (0x14e00)    size: 512 (0x200)


Phase 2 Orphan check...


fsck.acfs: OfsReadMeta at offset: 81920 (0x14000)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 82432 (0x14200)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 82944 (0x14400)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 83456 (0x14600)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 83968 (0x14800)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 84480 (0x14a00)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 84992 (0x14c00)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 85504 (0x14e00)    size: 512 (0x200)


0 orphans found


fsck.acfs: fsck.acfs: Checker completed with the following results:
File System Errors:   2
Fixed:            2
Not Fixed:        0

This caused fsck to finish in a couple of minutes, after which I could mount the ACFS disk on the cluster again.

Written by Jacco H. Landlust

October 12, 2010 at 4:49 pm

Small things you notice

leave a comment »

In the past you always had to remember to start your database services after you (re)started a RAC database. To automate this I used to create a script in $CRS_HOME/racg/usrco.

Just today I noticed that Oracle starts your services automatically when you have GI version 11.2.0.2.0. Can’t say for sure this is also the case on 11.2.0.1.0.

Yet another custom script can be trashed, I start to love default more and more every day 🙂

Written by Jacco H. Landlust

September 15, 2010 at 1:26 pm

Posted in CRS, Dataguard

11gR2 grid install, just some random things I noticed

leave a comment »

Obviously I am intending to spend my weekend on 11gR2, luckily my girlfriend is at some golf-tournament 😉 Anyway, here’s a preliminary post with some of the things I noticed while installing 11gR2.

Read the rest of this entry »

Written by Jacco H. Landlust

September 5, 2009 at 2:13 pm

Posted in CRS

Creating a failover disk using ascrs

leave a comment »

I certainly hope that installing CRS is no magic anymore for most DBA’s. If it is, please refer to Tim Hall’s website and follow the guide. Certain things in the CRS installation are less documented though:

  1. X-forwarding has been a nuisance for a lot of DBA’s. Refer to this post for some more information about x-forwarding.
  2. I notice that most guides on VMWare ask you to reboot after adding disks. Refer to this post to see how to scan your bus without rebooting.
  3. Shared disks and VMWare Workstation are a pain in the behind.  Obviously someone else felt the same pain too.

I installed CRS on my laptop running VMWare Workstation. The machines are called wls1 and wls2 (guess what this will be in when I’m done 😉 ) After installing CRS, I installed ascrs. ascrs is delivered through the Companion CD of Oracle Fusion Middleware 11g. It installs by just unzipping the ascrs.zip file in your CRS-tree. Next simply call the configure script in the $CRS_HOME/ascrs/bin directory. When you want to use ascrs on all nodes of the cluster, you need to unzip the file on all nodes.

Read the rest of this entry »

Written by Jacco H. Landlust

August 18, 2009 at 3:36 pm