Oracle MVA

Tales from a Jack of all trades

Archive for the ‘Exalogic’ Category

On Exalogic, OTD and Multicast

with 2 comments

Oracle Traffic Director is Oracle’s software loadbalancing product that you can use on Exalogic. When you deploy OTD on Exalogic, you can choose to configurge high availability. How this works is fully described within manuals and typically works all nice when you try this on your local testsystems (e.g. in VirtualBox). Additional quircks that you have to be aware of are described also, e.g. on Donals Forbes his blog here and here. I encourage you to read all of that.

However when deploying such a configuration I kept on running into issues with my active/passive failover groups. To describe the issue in somewhat more detail, let me first show you how a typical architecture looks. A typical setup with OTD and an application looks like the image depicted below:
OTD HA

There is a public network, in this case it is collored green. The public network runs on a bonded network interface, identified by 1. This is the network that your clients use to access the environment. Secondly there is an internal network that is non-routable and only available within the Exalogic. This network is collored red and is running via bonded interface identified as 2. The OTD sits in the middle and basically proxies traffic comming in on 1 and forward the traffic non-transparent for the client via interface 2 to the backend weblogic servers.

When you setup a active/passive failover group, the VIP you want to run is mounted in interface 1 (public network. Again see Donals Forbes blog for implementation again. If you create such a configuration via tadm (or in the GUI) what happens under the covers, is that keepalived is configured to use VRRP. You can find this configuration in the keepalived.conf configuration file that is stored with the instance.

This configuration looks something like this:

vrrp_instance otd-vrrp-router-1 {
        priority 250
        interface bond1
        virtual_ipaddress {
                XXX.XXX.XXX.XXX/XX
        }
        virtual_router_id 33
}

On the second OTD node you would see the same configuration, however the priority will be different. Based on priority the VIP is mounted on either one or the other OTD node.

As you can see in this configuration file, only only interface 1 is into play currently. This means that all traffic regarding OTD is send over interface 1. This is public network. The problem with this is two-fold:

  1. Multicast over public network doesn’t always work
  2. Sending cluster traffic over public network is a bad idea from security perspective, especially since OTD’s VRRP configuration does not require authentication

When I look at the architecture picture, I prefer to send cluster traffic over the private network (via interface 2) instead of via public. In my last endeavor the external switches didn’t allow any multicast traffic, so actually the OTD nodes weren’t able to find each other and both mounted the VIP. I found that multicast traffic was dropped by performing a tcpdump on the network interface (no multicast packets from other hosts arrived). Since tcpdump puts the network interface in a promiscuous mode, I get called by the security team after every time I perform a tcpdump. Therefore I typcally stay away from tcpdump and simply read the keepalived output in /var/log/messages when both OTD nodes are up. If you can see that one node is running as backup and one as master you are okay. Also you can see this by checking the network interfaces: if the VIP is mounted on both nodes you are in trouble.

The latter was the case for me: trouble. The VIP was mounted on both OTD nodes. This somehow did not lead to IP conflicts, however when the second OTD node was stopped the ARP table was not updated and hence traffic was not forwarded to the remaining OTD.

After a long search on Google, My Oracle Support and all kinds of other sources I almost started crying: no documentation how to configure this was to be found. Therefore I started fiddling with the configuration, just to see if I could fix this. Here’s what I found:

The directive interface in the keepalived.conf is the interface that you use for clustering communication. However you can run a VIP on every interface by adding a dev directive to the virtual_ipaddress configuration. So here’s my corrected configuration:

vrrp_instance otd-vrrp-router-1 {
#   Specify the default network interface, used for cluster traffic
    interface bond2
#   The virtual router ID must be unique to each VRRP instance that you define
    virtual_router_id 33
    priority 250
    virtual_ipaddress {
       # add dev to route traffic via a non-default interface
       XXXX.XXXX.XXXX.XXXX/XX dev bond1
    }
}

So what this does, is send all keepalived traffic (meaning: cluster traffic) via bond2, however the VIP is mounted on bond1. If you also want to introduce authentication, the directive advert_int 1 is your new best friend. Example snippet to add to keepalived.conf within the otd_vrrp-router configuration:

    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1066
    }

Hope this helps.

Advertisements

Written by Jacco H. Landlust

June 6, 2016 at 9:29 am

Resource usage on Exalogic

with one comment

For the coming  conference season I decided not to present. I have had plenty of interesting experiences, multiple could be interesting enough to present about, but working with new products requires so much energy that I rather skip for now. Right after this decision I noticed that I just can’t help myself. I like to share what I do, not only to share but also to learn (and perhaps even being told that I am wrong).

The last two years loads of my work has been around engineered systems. Last year (2014) I was involved with multiple customer as platform architect deploying all kinds of engineered systems, including a total of 11 racks Exalogic (either half or quarter) . Most of these racks were fitted with OVM (a.k.a. Exalogic Virtual), five of these were Hybrid (half “bare metal” and half OVM). There are plenty of things to say about the Exalogic product, what does and what does not work, what is missing etc. I would like to stay away from all that, not in the least because Exalogic 12c has been announced. Therefore I just share some of the tooling, scripts and notes that I wrote to support my every day work.

One of the topics that my customers keep on asking questions about is actual resource usage. Obviously tooling like Oracle Enterprise Manager can help with that, however not every customer is running OEM (yes these customers exist and have numerous reasons). So that leaves you, as consultant/administrator/local-techie with a challenge, how to find how many VMs are actually running on your rack. Especially the control stack of Exalogic does not have any features that makes output excel-friendly (apparently a must-have feature for any resource report).

Read the rest of this entry »

Written by Jacco H. Landlust

July 17, 2015 at 12:26 am

Posted in Exalogic

OTD-62015 An error occurred while creating server certificates

leave a comment »

<UPDATE !!!>
One of my colleagues asked for help creating an OTD configuration on an engineered system. For some reason the creation of the administration server failed. Here’s the command he issued:

-bash-3.2$ export ORACLE_HOME=/u01/app/oracle/product/otd
-bash-3.2$ export PATH=$ORACLE_HOME/bin:$PATH
-bash-3.2$ $ORACLE_HOME/bin/tadm configure-server --host=my_host --java-home=$ORACLE_HOME/jdk --port=8989 --user=admin --instance-home=/u01/app/oracle/admin/otd/otdadmin --server-user=oracle --port 8989 --verbose
This command will create the administration server. The password that is provided will be required to access the administration server.
Enter admin-user-password>
Enter admin-user-password again>
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ConfigureServer validateRuntimeUser
FINEST: Checking availability of valid runtime user...
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.AdminServerInstance init
FINEST: Initing AdminServerInstance
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance init
FINEST: Initing ServerInstance...
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.AdminServerInstance prepareDirsAndFiles
FINEST: AdminServerInstance.prepareDirsAndFiles()
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.AdminServerInstance prepareInstanceNameAndDir
FINEST: AdminServerInstance.prepareInstanceNameAndDir()
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.AdminServerInstance prepareTokens
FINEST: AdminServerInstance.prepareTokens()
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance prepareTokens
FINEST: ServerInstance.prepareTokens()
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance init
FINEST: isWindows = false
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance init
FINEST: oracleHome = /u01/app/oracle/product/otd
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance init
FINEST: instanceHome = /u01/app/oracle/admin/otd/otdadmin
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance init
FINEST: cfgTmplPath = /u01/app/oracle/product/otd/lib/templates/config
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance init
FINEST: scriptsTmplPath = /u01/app/oracle/product/otd/lib/templates/scripts
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance init
FINEST: configName = admin-server
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance init
FINEST: unixUser = null
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance init
FINEST: isZip = false
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance init
FINEST: createService = false
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.AdminServerInstance
FINEST: In AdminServerInstance constructor :: after calling super
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.AdminServerInstance
FINEST: 		 logger is null = false
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance createInstance
FINEST: Starting to create server instance...
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.ServerInstance createDirectories
FINEST: Starting to create instance directory structure...
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.AdminServerInstance setupSecurityDB
FINEST: AdminServerInstance.setupSecurityDB
Jan 14, 2014 11:06:52 AM com.sun.web.admin.configurator.AdminServerInstance setupSecurityDB
FINEST: dbDir = /u01/app/oracle/admin/otd/otdadmin/admin-server/config
Jan 14, 2014 11:06:54 AM com.sun.web.admin.configurator.AdminServerInstance createAdminCerts
FINEST: Starting to setup the administration self-signed certificates
Jan 14, 2014 11:06:55 AM com.sun.web.admin.configurator.AdminServerInstance createAdminCerts
FINEST: java.lang.SecurityException: Unable to initialize security library
com.sun.web.admin.security.NSSDBException: java.lang.SecurityException: Unable to initialize security library
	at com.sun.web.admin.security.SecurityUtil.initDB(SecurityUtil.java:69)
	at com.sun.web.admin.configurator.AdminServerInstance.createAdminCerts(AdminServerInstance.java:161)
	at com.sun.web.admin.configurator.AdminServerInstance.setupSecurityDB(AdminServerInstance.java:101)
	at com.sun.web.admin.configurator.ServerInstance.createInstance(ServerInstance.java:604)
	at com.sun.web.admin.configurator.ConfigureServer.configureServer(ConfigureServer.java:111)
	at com.sun.web.admin.cli.commands.ConfigureServerCommand.configure(ConfigureServerCommand.java:93)
	at com.sun.web.admin.cli.commands.ConfigureServerCommand.configureServer(ConfigureServerCommand.java:48)
	at com.sun.web.admin.cli.commands.ConfigureServerCommand.runCommand(ConfigureServerCommand.java:29)
	at com.sun.enterprise.cli.framework.CLIMain.invokeCommand(CLIMain.java:171)
	at com.sun.web.admin.cli.shelladapter.WSadminShell.invokeFramework(WSadminShell.java:162)
	at com.sun.web.admin.cli.shelladapter.WSadminShell.main(WSadminShell.java:79)
Caused by: java.lang.SecurityException: Unable to initialize security library
	at org.mozilla.jss.CryptoManager.initializeAllNative(Native Method)
	at org.mozilla.jss.CryptoManager.initialize(CryptoManager.java:919)
	at org.mozilla.jss.CryptoManager.initialize(CryptoManager.java:885)
	at com.sun.web.admin.security.SecurityUtil.initDB(SecurityUtil.java:62)
	... 10 more

OTD-62015 An error occurred while creating server certificates: java.lang.SecurityException: Unable to initialize security library

Now this seemed interesting to me, since I never had this error before. So, fond of tracing as I am I started an strace

strace -f -o /tmp/tadm.trc $ORACLE_HOME/bin/tadm configure-server --host=my_host --java-home=$ORACLE_HOME/jdk --port=8989 --user=admin --instance-home=/u01/app/oracle/admin/otd/otdadmin --server-user=oracle --port 8989 --verbose

This gave me a rather extensive trace file (close to 12k lines) which I won’t bother you with. One of the relevant lines that draw my attention was:

fcntl(3, F_SETLK, {type=F_RDLCK, whence=SEEK_SET, start=1073741824, len=1}) = -1 ENOLCK (No locks available)

So, it is a NFS locking issue! Checking /etc/mtab showed me that the instance home was on a NFS mount: /u01/app/oracle/admin/otd . I changed the mountoptions to include noac,nolock and this instantly solved the error.

Hope this helps.

<UPDATE>
Well, that noac option caused some severe performance issues. Seems that this database best practice doesn’t work so much on Exalogic.

The nolock option should be handled with care. If you are absolutely sure that files can only be opened from one location this could solve the issues, but I was told by experts to avoid this as much as possible. Removing the nolock option did bring me back to a crashing tadm though. Back to the drawing board….

Written by Jacco H. Landlust

January 14, 2014 at 12:32 pm

yum exclude list for Exalogic vServers

leave a comment »

Recently I have been doing some work on Exalogic. While building a template for vServers on Exalogic I ran into an issue. After executing yum update following by a reboot, I wasn’t able to connect to the vServers anymore. This is caused by an issue with the network stack which, in the end, is caused by an documentation error.

It seems that the yum exclude list for vServers is not correctly documented , also Oracle Support Document 1594674.1 (Exalogic Virtual Environment – Guest vServer Upgrade to Oracle Linux v5.10 ) seems to be off.  The exclusion list that didn’t break the operating system after a yum update is:

exclude=kernel* compat-dapl* dapl* ib-bonding* ibacm* ibutils* ibsim* infiniband-diags* kmod-ovmapi-uek* libibcm* libibmad* libibumad* libibverbs* libmlx4* libovmapi* librdmacm* libsdp* mpi-selector* mpitests_openmpi_gcc* mstflint* mvapich* ofa* ofed* openmpi_gcc* opensm* ovm-template-config* ovmd* perftest* qperf* rds-tools* sdpnetstat* srptools* exalogic* infinibus* xenstoreprovider* initscripts* nfs-utils*

Written by Jacco H. Landlust

January 3, 2014 at 3:17 pm