Oracle MVA

Tales from a Jack of all trades

Archive for the ‘Uncategorized’ Category

On Exalogic, OTD and Multicast

with 2 comments

Oracle Traffic Director is Oracle’s software loadbalancing product that you can use on Exalogic. When you deploy OTD on Exalogic, you can choose to configurge high availability. How this works is fully described within manuals and typically works all nice when you try this on your local testsystems (e.g. in VirtualBox). Additional quircks that you have to be aware of are described also, e.g. on Donals Forbes his blog here and here. I encourage you to read all of that.

However when deploying such a configuration I kept on running into issues with my active/passive failover groups. To describe the issue in somewhat more detail, let me first show you how a typical architecture looks. A typical setup with OTD and an application looks like the image depicted below:
OTD HA

There is a public network, in this case it is collored green. The public network runs on a bonded network interface, identified by 1. This is the network that your clients use to access the environment. Secondly there is an internal network that is non-routable and only available within the Exalogic. This network is collored red and is running via bonded interface identified as 2. The OTD sits in the middle and basically proxies traffic comming in on 1 and forward the traffic non-transparent for the client via interface 2 to the backend weblogic servers.

When you setup a active/passive failover group, the VIP you want to run is mounted in interface 1 (public network. Again see Donals Forbes blog for implementation again. If you create such a configuration via tadm (or in the GUI) what happens under the covers, is that keepalived is configured to use VRRP. You can find this configuration in the keepalived.conf configuration file that is stored with the instance.

This configuration looks something like this:

vrrp_instance otd-vrrp-router-1 {
        priority 250
        interface bond1
        virtual_ipaddress {
                XXX.XXX.XXX.XXX/XX
        }
        virtual_router_id 33
}

On the second OTD node you would see the same configuration, however the priority will be different. Based on priority the VIP is mounted on either one or the other OTD node.

As you can see in this configuration file, only only interface 1 is into play currently. This means that all traffic regarding OTD is send over interface 1. This is public network. The problem with this is two-fold:

  1. Multicast over public network doesn’t always work
  2. Sending cluster traffic over public network is a bad idea from security perspective, especially since OTD’s VRRP configuration does not require authentication

When I look at the architecture picture, I prefer to send cluster traffic over the private network (via interface 2) instead of via public. In my last endeavor the external switches didn’t allow any multicast traffic, so actually the OTD nodes weren’t able to find each other and both mounted the VIP. I found that multicast traffic was dropped by performing a tcpdump on the network interface (no multicast packets from other hosts arrived). Since tcpdump puts the network interface in a promiscuous mode, I get called by the security team after every time I perform a tcpdump. Therefore I typcally stay away from tcpdump and simply read the keepalived output in /var/log/messages when both OTD nodes are up. If you can see that one node is running as backup and one as master you are okay. Also you can see this by checking the network interfaces: if the VIP is mounted on both nodes you are in trouble.

The latter was the case for me: trouble. The VIP was mounted on both OTD nodes. This somehow did not lead to IP conflicts, however when the second OTD node was stopped the ARP table was not updated and hence traffic was not forwarded to the remaining OTD.

After a long search on Google, My Oracle Support and all kinds of other sources I almost started crying: no documentation how to configure this was to be found. Therefore I started fiddling with the configuration, just to see if I could fix this. Here’s what I found:

The directive interface in the keepalived.conf is the interface that you use for clustering communication. However you can run a VIP on every interface by adding a dev directive to the virtual_ipaddress configuration. So here’s my corrected configuration:

vrrp_instance otd-vrrp-router-1 {
#   Specify the default network interface, used for cluster traffic
    interface bond2
#   The virtual router ID must be unique to each VRRP instance that you define
    virtual_router_id 33
    priority 250
    virtual_ipaddress {
       # add dev to route traffic via a non-default interface
       XXXX.XXXX.XXXX.XXXX/XX dev bond1
    }
}

So what this does, is send all keepalived traffic (meaning: cluster traffic) via bond2, however the VIP is mounted on bond1. If you also want to introduce authentication, the directive advert_int 1 is your new best friend. Example snippet to add to keepalived.conf within the otd_vrrp-router configuration:

    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 1066
    }

Hope this helps.

Written by Jacco H. Landlust

June 6, 2016 at 9:29 am

Lengthy errormessage while creating OID

with one comment

For most of the Fusion Middleware domains or systems components I setup for clients I use scripts. Today, while creating a OID, I decided to type in the commands manually. This resulted in the following:

$ /u01/app/oracle/admin/instances/XXXXX/oid_instance_XXXXXXXX/bin/opmnctl  createcomponent -componentType OID -componentName area51_oid -Db_info rdbms-scan.area51.local:1521:srv4oid -Host oid.area51.local -Port 3060 -Sport 3131

Command requires login to weblogic admin server (oid.area51.local):
  Username: weblogic
  Password:

Creating empty component directories...Done
Provisioning OID files for area51_oid
  OID onCreate....

Enter ODS password:
  Validating OID input parameters
Enter ODSSM password:
oracle.as.config.ProvisionException: Error deleting credential odssm from CSF
        at oracle.iam.management.oid.install.wls.OIDComponentHelper.delCredFromCSF(OIDComponentHelper.java:2373)
        at oracle.iam.management.oid.install.wls.OIDComponentHelper.removeCreds(OIDComponentHelper.java:2345)
        at oracle.iam.management.oid.install.wls.OIDComponent.onRemove(OIDComponent.java:429)
        at oracle.as.config.impl.OracleASComponentBaseImpl.remove(OracleASComponentBaseImpl.java:287)
        at oracle.as.config.impl.OracleASComponentBaseImpl.remove(OracleASComponentBaseImpl.java:174)
        at oracle.as.config.impl.OracleASComponentBaseImpl.remove(OracleASComponentBaseImpl.java:155)
        at oracle.iam.management.oid.install.wls.OIDComponent.onCreate(OIDComponent.java:227)
        at oracle.as.config.impl.OracleASComponentBaseImpl.createComponent(OracleASComponentBaseImpl.java:597)
        at oracle.as.config.impl.OracleASComponentBaseImpl.create(OracleASComponentBaseImpl.java:106)
        at oracle.as.config.provisioner.commands.CreateComponentCommand.execute(CreateComponentCommand.java:40)
        at oracle.as.config.provisioner.InstallerCmdLine.run(InstallerCmdLine.java:146)
        at oracle.as.config.provisioner.InstallerCmdLine.main(InstallerCmdLine.java:46)
Caused by: java.security.PrivilegedActionException: oracle.as.config.ProvisionException:
        at java.security.AccessController.doPrivileged(Native Method)
        at oracle.iam.management.oid.install.wls.OIDComponentHelper.delCredFromCSF(OIDComponentHelper.java:2354)
        ... 11 more
Caused by: oracle.as.config.ProvisionException:
        at oracle.iam.management.oid.install.wls.OIDComponentHelper$3.run(OIDComponentHelper.java:2367)
        ... 13 more
Caused by: oracle.security.jps.config.JpsConfigurationException: /u01/app/oracle/admin/instances/eoid2/oid_instance_oesv9510/config/JPS/jps-config-jse.xml (No such file or directory)
        at oracle.security.jps.internal.config.xml.XmlConfigurationFactory.initDefaultConfiguration(XmlConfigurationFactory.java:439)
        at oracle.security.jps.internal.config.xml.XmlConfigurationFactory.getDefaultConfiguration(XmlConfigurationFactory.java:338)
        at oracle.security.jps.internal.config.xml.XmlConfigurationFactory.getConfiguration(XmlConfigurationFactory.java:160)
        at oracle.security.jps.internal.core.runtime.JpsContextFactoryImpl.(JpsContextFactoryImpl.java:112)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at java.lang.Class.newInstance0(Class.java:355)
        at java.lang.Class.newInstance(Class.java:308)
        at oracle.security.jps.util.JpsUtil.newInstance(JpsUtil.java:190)
        at oracle.security.jps.JpsContextFactory$1.run(JpsContextFactory.java:74)
        at oracle.security.jps.JpsContextFactory$1.run(JpsContextFactory.java:72)
        at java.security.AccessController.doPrivileged(Native Method)
        at oracle.security.jps.JpsContextFactory.getContextFactory(JpsContextFactory.java:71)
        at oracle.iam.management.oid.install.wls.OIDComponentHelper$3.run(OIDComponentHelper.java:2357)
        ... 13 more
Caused by: java.io.FileNotFoundException: /u01/app/oracle/admin/instances/eoid2/oid_instance_oesv9510/config/JPS/jps-config-jse.xml (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.(FileInputStream.java:120)
        at oracle.security.jps.internal.common.util.XmlSchemaValidationUtil.doValidation(XmlSchemaValidationUtil.java:96)
        at oracle.security.jps.internal.config.xml.XmlConfigurationFactory.initDefaultConfiguration(XmlConfigurationFactory.java:418)
        ... 28 more
    Skipping oesv9510_oid unregistration. It is not currently registered with the adminserver.
    Deleting oesv9510_oid directories
    Invoking opmn reload...Done
Command failed: Exception in onCreate()
Details are logged in /u01/app/oracle/admin/instances/eoid2/oid_instance_oesv9510/diagnostics/logs/OPMN/opmn/provision.log

opmnctl createcomponent: failed.

Now this is a rather lengthy error message and it really surprised me. Since I just associated the security store for the domain to a database, and this error pointing towards JPS-config I figured something must be wrong with the reassociateSecurityStore wlst comamnd. So I checked logfiles, My Oracle Support and Google before I checked the provision.log.

The provision log showed me these messages

SEVERE: Command failed:
oracle.as.config.ProvisionException: Exception in onCreate()
        at oracle.iam.management.oid.install.wls.OIDComponent.onCreate(OIDComponent.java:235)
        at oracle.as.config.impl.OracleASComponentBaseImpl.createComponent(OracleASComponentBaseImpl.java:597)
        at oracle.as.config.impl.OracleASComponentBaseImpl.create(OracleASComponentBaseImpl.java:106)
        at oracle.as.config.provisioner.commands.CreateComponentCommand.execute(CreateComponentCommand.java:40)
        at oracle.as.config.provisioner.InstallerCmdLine.run(InstallerCmdLine.java:146)
        at oracle.as.config.provisioner.InstallerCmdLine.main(InstallerCmdLine.java:46)
Caused by: oracle.as.config.ProvisionException: -Namespace parameter missing
        at oracle.iam.management.oid.install.wls.OIDComponentHelper.validateParams(OIDComponentHelper.java:314)
        at oracle.iam.management.oid.install.wls.OIDComponent.onCreate(OIDComponent.java:158)
        ... 5 more

Aha. So that was a enormous error message that tried to tell me “Hey, you missed the -Namespace parameter”.

P.S. 11.1.1.7 also introduced a new process called oiddispd. documentation” gives this description:
“Beginning with Oracle Internet Directory 11g Release 1 (11.1.1.7.0), the OIDLDAPD process is separated as the OIDDISPD (dispatcher) process and the OIDLDAPD (server) process. On UNIX and Linux systems, however, the ps -ef command will continue to show both of these processes as OIDLDAPD at runtime.”

If you happen to separate software from configuration like I do (binaries owned by oracle, oid processes run as some other user) you need to chown the $ORACLE_HOME/bin/oiddispd process and chmod it to 4740

Hope this helps.

Written by Jacco H. Landlust

July 19, 2013 at 4:03 pm

Posted in Uncategorized

mandatory columns

with 5 comments

**UPDATE: Joaquín González pointed out to me that Carlos Sierra has blogged about this too . It was his presentation where the topic came up. **

While attending Hotsos 2013 I learned of a new 11.2.0.3 feature explain to me by Toon Koppelaars. This new feature has to do with adding a mandatory column to an existing table.

Whenever I want to add a mandatory column to a table, I added the column, set the value to whatever I preferred and then set the column to not null.

JACCO@DEMODB > describe test;

Name                                    Null?    Type
--------------------------------------- -------- ---------------------------------
ID                                      NOT NULL NUMBER

JACCO@DEMODB > select * from test;

ID
----------
1
2

2 rows selected.

JACCO@DEMODB > alter table test add string_value varchar2(100);

Table altered.

JACCO@DEMODB > alter table test modify string_value  not null;

alter table test modify string_value  not null

ERROR at line 1:
ORA-02296: cannot enable (JACCO.) - null values found

JACCO@DEMODB > update test set string_value = 'not null';

2 rows updated.

JACCO@DEMODB > commit;

Commit complete.

JACCO@DEMODB > alter table test modify string_value  not null;

Table altered.

So far so good. As of 11.2.0.3 you can also add a column with a default value and set null directly without updating the table first:

JACCO@DEMODB > set timing on
JACCO@DEMODB > alter table test add string_value2 varchar2(100) default 'my value' not null;

Table altered.

Elapsed: 00:00:00.02

Very very fast (and the statement keeps running very fast, even when you have a lot of rows in the table) and it seems a very convenient statement. Especially with very big tables this seems useful.

If I would select from the table, I can actually see the value in there:

JACCO@DEMODB > column string_value format a10
JACCO@DEMODB > column string_value2 format a10
JACCO@DEMODB > select * from test;

ID  STRING_VAL STRING_VAL
--- ---------- ----------
1  not null   my value
2  not null   my value

2 rows selected.

Now this seems al very cool and such, but I think there’s a real caveats. Lets say you would have a somewhat larger table:

JACCO@DEMODB > create table my_objects as
               (select rownum as id, t.object_name from all_objects t);

Table created.

JACCO@DEMODB > alter table my_objects add constraint pk_my_objects primary key (id);

Table altered.

JACCO@DEMODB > select count(*) from my_objects;

COUNT(*)
----------
66883
1 row selected.

lets say you would add a column to this table, and make it mandatory

JACCO@DEMODB > alter table my_objects add null_test_column char(1) default 'N' not null;

Table altered.

JACCO@DEMODB > select null_test_column, count(*) from my_objects group by null_test_column;

N   COUNT(*)
- ----------
N      66883

1 row selected.

Now lets add another row (to create some skew), add an index and gather stats:

JACCO@DEMODB > insert into my_objects (id,object_name,null_test_column) values (66884,'test','Y');

1 row created.

JACCO@DEMODB > commit;

Commit complete.

JACCO@DEMODB > create index idx_null_test on my_objects (null_test_column);

Index created.

JACCO@DEMODB > exec dbms_stats.gather_table_stats(ownname => user, tabname => 'MY_OBJECTS', cascade => true, method_opt => 'for all columns size auto');

PL/SQL procedure successfully completed.

When I analyze the results, the following appears:

JACCO@DEMODB > l
1  select num_distinct, num_nulls, num_buckets, histogram,
2         utl_raw.cast_to_varchar2(low_value) as low_value,
3         utl_raw.cast_to_varchar2(high_value) as high_value
4*   from user_tab_columns where table_name = 'MY_OBJECTS' and column_name = 'NULL_TEST_COLUMN'
JACCO@DEMODB > /

NUM_DISTINCT  NUM_NULLS NUM_BUCKETS HISTOGRAM       LOW_VALUE       HIGH_VALUE
------------ ---------- ----------- --------------- --------------- ---------------
           2          0           1 NONE            N               Y

1 row selected.

The number of distinct values is correct, the number of nulls sounds correct, but given the skew in the data I would have expected a histogram. So something seems wrong. Also when I run a query on the table, I get some funny results

JACCO@DEMODB > explain plan for
2  select * from my_objects where null_test_column = 'Y';

Explained.

JACCO@DEMODB > select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------

Plan hash value: 880823944

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            | 33442 |  1045K|   106   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| MY_OBJECTS | 33442 |  1045K|   106   (1)| 00:00:02 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("NULL_TEST_COLUMN"='Y')

13 rows selected.

There must be something wrong, it’s a wrong access path. Why is this query running a full table scan when it should run over an index? The clue is the predicate information, it’s a filter. Now where does this filter come from? Or is dms_xplan lying to me again?

To find out I ran an 10053 trace:

JACCO@DEMODB > alter session set events '10053 trace name context forever, level 1';

Session altered.

JACCO@DEMODB > select * from my_objects where null_test_column = 'Y';

ID         OBJECT_NAME                    N
---------- ------------------------------ -
66884      test                           Y

1 rows selected.

When I read the trace file I noticed this:

----- Plan Table -----
============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id  | Operation          | Name      | Rows  | Bytes | Cost  | Time      |
---------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |           |       |       |   106 |           |
| 1   |  TABLE ACCESS FULL | MY_OBJECTS|   33K | 1045K |   106 |  00:00:02 |
---------------------------------------+-----------------------------------+
Predicate Information:
----------------------
1 - filter(NVL("NULL_TEST_COLUMN",'N')='Y')

Where did that NVL come from? That just appeared because of the default value that I was set on the column without updating the values first. This is rather unexpected and it can bite you when you don’t expect this (especially when you run into this issue 6 months after the column was added to the table). A very unpleasant side-effect of a new feature that seems very promising upon first usage. This shows again that you really have to be careful when using new features.

Hope this helps.

http://carlos-sierra.net/2013/03/12/interesting-case-where-a-full-table-scan-is-chosen-instead-of-an-index

Written by Jacco H. Landlust

March 13, 2013 at 11:15 pm

Posted in Uncategorized