Oracle MVA

Tales from a Jack of all trades

Archive for October 2010

running a WLS domain as a different user

leave a comment »

When you’re creating a WebLogic environment you have to think about separation of privileges. Technical administrators should have different privileges than functional administrators, developers have even other privileges. All this requires some additional configuration while creating a WebLogic domain. Installing WLS and creating a domain is a pretty straight forward process that is widely documented by now. separating privileges is not.

For this document, the following is assumed:
– Oracle Fusion Middleware (i.e. WebLogic) has been installed with user oracle
– User oracle has primary group dba and UMASK 022 (which is exactly alike the Oracle documentation)
– The middleware home (MW_HOME) is /u01/app/oracle/middleware
– All lines starting with “#” assume actions by root
– All lines starting with a username followed by a “$” assume actions performed by the user preceding the “$”

Before a domain is created, first setup a new group and user with will be “owner” of the domain. In this example the group will be called domgrp1 and the user will be called domusr1:

# groupadd domgrp1
# useradd -g domgrp1 -G dba -s /bin/bash -m -c "Domain Owner for dom1" domusr1

As you can see, the user has a secondary group: dba. Membership of this group is needed to allow read-access to the MW_HOME. Next to read access, certain directories and/or files have to be made writable:

oracle$ chmod g+w ${MW_HOME}/logs
oracle$ touch ${MW_HOME}/domain-registry.xml
oracle$ chmod g+w ${MW_HOME}/domain-registry.xml
oracle$ touch ${MW_HOME}/common/nodemanager/nodemanager.domains
oracle$ chmod g+w ${MW_HOME}/wlserver_10.3/common/nodemanager/nodemanager.domains
oracle$ chmod g+w ${MW_HOME}/wlserver_10.3/server/lib
oracle$ chmod g+w ${MW_HOME}/wlserver_10.3/server/lib/*.jks
oracle$ chmod g+w ${MW_HOME}/oracle_common/sysman
oracle$ find ${MW_HOME}/oracle_common/modules -type d -exec chmod g+rx {} \;
oracle$ find ${MW_HOME}/oracle_common/modules -type f -name "*" -exec chmod g+r {} \;

Now everything is setup and a domain can be created using the domain user:

domusr1$ ${MW_HOME}/oracle_common/common/bin/config.sh

I prefer to locate the domains and applications outside of the $MW_HOME, i.e. /u01/app/user_projects . After the domain has been created, start the domain as the domain owner and you are all set. Users that need access to the domain need to have the domgrp1 group as secondary group.

If you forgot to setup umask before running config.sh, or if you cannot run with umask 022 for some reason, you need to setup privileges on the /u01/app/user_projects manually. I trust you are able to setup privileges on files and directories properly. Just to give a hint:

domusr1$ chmod g+rx /u01/app/user_projects/domains/${domain}
domusr1$ chmod g+rx /u01/app/user_projects/domains/${domain}/servers
domusr1$ chmod g+rx /u01/app/user_projects/domains/${domain}/servers/${servername}
domusr1$ chmod g+rx /u01/app/user_projects/domains/${domain}/servers/${servername}/logs

And setup a sticky bit on logfiles:

domusr1$ chmod g+s /u01/app/user_projects/domains/${domain}/servers/${servername}/logs

The next phase is to create users in the WLS console with the corresponding privileges. (or even better, put these users in some LDAP).

Hope this helps…

Written by Jacco H. Landlust

October 20, 2010 at 2:59 pm

Posted in Weblogic

when AFCS crashes….

leave a comment »

Today three out of five nodes of a cluster crashed while a loadtest was running on two of the nodes of this cluster. The cluster is a ACFS cluster with OSB and SOA productions on top of it. It uses the ACFS disk for logging and configuration. All binaries are on local disk. The version of GI used is 11.2.0.1 running on 64-bit OEL 5.5.

This blogpost is mostly a note to myself, but I might help some other people with the content.

While looking in the logfiles of CRS for the cause of nodefailure I found this error:


view /u01/app/grid/log/some_server/agent/crsd/oraagent_oracle/oraagent_oracle.l01


2010-10-12 10:21:05.060: [ora.DGGRID.dg][1536899392] [check] InstConnection::connectInt (2) Exception OCIException
2010-10-12 10:21:05.060: [ora.DGGRID.dg][1536899392] [check] Exception type=2 string=ORA-01034: ORACLE not available
ORA-27102: out of memory
Linux-x86_64 Error: 12: Cannot allocate memory
Additional information: 1
Additional information: 491521
Additional information: 8
Process ID: 0
Session ID: 0 Serial number: 0

The ASM instance had 1 GB set for both memory_target as well as memory_max_target. So somehow ACFS uses more memory while on heavy load. I am not aware of any formula’s or best practice to calculate the memory_target for an ASM instance that is just running ACFS. The 1 GB was a guesstimate based on 11.1 knowledge. If anyone has some handles for me regarding memory settings for ASM with just ACFS, please comment on this blogpost.

Some more checking, in this case of Linux (OEL 5) showed some more:


dmesg


[Oracle ACFS] FSCK-NEEDED set for volume /dev/asm/v_disk-170 . Internal ACFS Location: 916 .
[Oracle ACFS] A problem has been detected with
[Oracle ACFS] the file system metadata in /dev/asm/v_disk-170 .
[Oracle ACFS] Normal operation can continue, but it is advisable
[Oracle ACFS] to run fsck on the file system as soon as it is
[Oracle ACFS] feasible to do so.  See the Storage Admin
[Oracle ACFS] Guide for more information about FSCK-NEEDED.

Now this seems like trouble, so I stopped all nodes of the cluster (*AIKS*) and started up an fsck. This ran for ages, just do this:


lseek(4, 9781714944, SEEK_SET) = 9781714944
read(4, "\202\1\6P\17dG\26o\315\324_\3262\363 \tG\2"..., 4096) = 4096
lseek(4, 9781706752, SEEK_SET) = 9781706752
read(4, "\202\1\6P\17dG\26o\315\324(\6\363\23\tG\2"..., 4096) = 4096
lseek(4, 9781649408, SEEK_SET) = 9781649408
read(4, "\202\1\6P\17dG\26o\315\324\262\226\352\20 \10G\2"..., 4096) = 4096
lseek(4, 9781739520, SEEK_SET) = 9781739520
read(4, "\202\1\5P\17dG\26o\315\324\257\27=\233\200\tG\2"..., 4096) = 4096
lseek(4, 2315993088, SEEK_SET) = 2315993088
read(4, "\202\1\5P\17dG\26o\315\324\340O\10\310@\v\212"..., 4096) = 4096

Now I’m no C programmer, nor an filesystem specialist so I don’t exactly know what’s going on (yet). After 4 hours I did decide that waiting longer was futile, it’s just a freaking 10 GB disk!

I started fsck again, only this time with some extra parameters:


$ fsck -a -v -y -t acfs /dev/asm/v_disk-170


OfsCheckOnDiskGBM entered
fsck.acfs: OfsReadMeta at offset: 67112960 (0x4001000)    size: 327680 (0x50000)
OfsCheckFileEntry entered for:
ACFS Internal File: [ACFS Snap Map]
fenum: 19 (0x13)   disk offset: 79360 (0x13600)


fsck.acfs: OfsReadMeta at offset: 79360 (0x13600)    size: 512 (0x200)
OfsCheckFileExtents entered for:
ACFS Internal File: [ACFS Snap Map]
fenum: 19 (0x13)   disk offset: 79360 (0x13600)


fsck.acfs: OfsReadMeta at offset: 67440640 (0x4051000)    size: 512 (0x200)


Checking if any files are orphaned...


Phase 1 Orphan check...


fsck.acfs: OfsReadMeta at offset: 81920 (0x14000)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 82432 (0x14200)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 82944 (0x14400)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 83456 (0x14600)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 83968 (0x14800)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 84480 (0x14a00)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 84992 (0x14c00)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 85504 (0x14e00)    size: 512 (0x200)


Phase 2 Orphan check...


fsck.acfs: OfsReadMeta at offset: 81920 (0x14000)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 82432 (0x14200)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 82944 (0x14400)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 83456 (0x14600)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 83968 (0x14800)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 84480 (0x14a00)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 84992 (0x14c00)    size: 512 (0x200)
fsck.acfs: OfsReadMeta at offset: 85504 (0x14e00)    size: 512 (0x200)


0 orphans found


fsck.acfs: fsck.acfs: Checker completed with the following results:
File System Errors:   2
Fixed:            2
Not Fixed:        0

This caused fsck to finish in a couple of minutes, after which I could mount the ACFS disk on the cluster again.

Written by Jacco H. Landlust

October 12, 2010 at 4:49 pm