Oracle MVA

Tales from a Jack of all trades

mandatory columns

with 5 comments

**UPDATE: Joaquín González pointed out to me that Carlos Sierra has blogged about this too . It was his presentation where the topic came up. **

While attending Hotsos 2013 I learned of a new 11.2.0.3 feature explain to me by Toon Koppelaars. This new feature has to do with adding a mandatory column to an existing table.

Whenever I want to add a mandatory column to a table, I added the column, set the value to whatever I preferred and then set the column to not null.

JACCO@DEMODB > describe test;

Name                                    Null?    Type
--------------------------------------- -------- ---------------------------------
ID                                      NOT NULL NUMBER

JACCO@DEMODB > select * from test;

ID
----------
1
2

2 rows selected.

JACCO@DEMODB > alter table test add string_value varchar2(100);

Table altered.

JACCO@DEMODB > alter table test modify string_value  not null;

alter table test modify string_value  not null

ERROR at line 1:
ORA-02296: cannot enable (JACCO.) - null values found

JACCO@DEMODB > update test set string_value = 'not null';

2 rows updated.

JACCO@DEMODB > commit;

Commit complete.

JACCO@DEMODB > alter table test modify string_value  not null;

Table altered.

So far so good. As of 11.2.0.3 you can also add a column with a default value and set null directly without updating the table first:

JACCO@DEMODB > set timing on
JACCO@DEMODB > alter table test add string_value2 varchar2(100) default 'my value' not null;

Table altered.

Elapsed: 00:00:00.02

Very very fast (and the statement keeps running very fast, even when you have a lot of rows in the table) and it seems a very convenient statement. Especially with very big tables this seems useful.

If I would select from the table, I can actually see the value in there:

JACCO@DEMODB > column string_value format a10
JACCO@DEMODB > column string_value2 format a10
JACCO@DEMODB > select * from test;

ID  STRING_VAL STRING_VAL
--- ---------- ----------
1  not null   my value
2  not null   my value

2 rows selected.

Now this seems al very cool and such, but I think there’s a real caveats. Lets say you would have a somewhat larger table:

JACCO@DEMODB > create table my_objects as
               (select rownum as id, t.object_name from all_objects t);

Table created.

JACCO@DEMODB > alter table my_objects add constraint pk_my_objects primary key (id);

Table altered.

JACCO@DEMODB > select count(*) from my_objects;

COUNT(*)
----------
66883
1 row selected.

lets say you would add a column to this table, and make it mandatory

JACCO@DEMODB > alter table my_objects add null_test_column char(1) default 'N' not null;

Table altered.

JACCO@DEMODB > select null_test_column, count(*) from my_objects group by null_test_column;

N   COUNT(*)
- ----------
N      66883

1 row selected.

Now lets add another row (to create some skew), add an index and gather stats:

JACCO@DEMODB > insert into my_objects (id,object_name,null_test_column) values (66884,'test','Y');

1 row created.

JACCO@DEMODB > commit;

Commit complete.

JACCO@DEMODB > create index idx_null_test on my_objects (null_test_column);

Index created.

JACCO@DEMODB > exec dbms_stats.gather_table_stats(ownname => user, tabname => 'MY_OBJECTS', cascade => true, method_opt => 'for all columns size auto');

PL/SQL procedure successfully completed.

When I analyze the results, the following appears:

JACCO@DEMODB > l
1  select num_distinct, num_nulls, num_buckets, histogram,
2         utl_raw.cast_to_varchar2(low_value) as low_value,
3         utl_raw.cast_to_varchar2(high_value) as high_value
4*   from user_tab_columns where table_name = 'MY_OBJECTS' and column_name = 'NULL_TEST_COLUMN'
JACCO@DEMODB > /

NUM_DISTINCT  NUM_NULLS NUM_BUCKETS HISTOGRAM       LOW_VALUE       HIGH_VALUE
------------ ---------- ----------- --------------- --------------- ---------------
           2          0           1 NONE            N               Y

1 row selected.

The number of distinct values is correct, the number of nulls sounds correct, but given the skew in the data I would have expected a histogram. So something seems wrong. Also when I run a query on the table, I get some funny results

JACCO@DEMODB > explain plan for
2  select * from my_objects where null_test_column = 'Y';

Explained.

JACCO@DEMODB > select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------

Plan hash value: 880823944

--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            | 33442 |  1045K|   106   (1)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| MY_OBJECTS | 33442 |  1045K|   106   (1)| 00:00:02 |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("NULL_TEST_COLUMN"='Y')

13 rows selected.

There must be something wrong, it’s a wrong access path. Why is this query running a full table scan when it should run over an index? The clue is the predicate information, it’s a filter. Now where does this filter come from? Or is dms_xplan lying to me again?

To find out I ran an 10053 trace:

JACCO@DEMODB > alter session set events '10053 trace name context forever, level 1';

Session altered.

JACCO@DEMODB > select * from my_objects where null_test_column = 'Y';

ID         OBJECT_NAME                    N
---------- ------------------------------ -
66884      test                           Y

1 rows selected.

When I read the trace file I noticed this:

----- Plan Table -----
============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id  | Operation          | Name      | Rows  | Bytes | Cost  | Time      |
---------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |           |       |       |   106 |           |
| 1   |  TABLE ACCESS FULL | MY_OBJECTS|   33K | 1045K |   106 |  00:00:02 |
---------------------------------------+-----------------------------------+
Predicate Information:
----------------------
1 - filter(NVL("NULL_TEST_COLUMN",'N')='Y')

Where did that NVL come from? That just appeared because of the default value that I was set on the column without updating the values first. This is rather unexpected and it can bite you when you don’t expect this (especially when you run into this issue 6 months after the column was added to the table). A very unpleasant side-effect of a new feature that seems very promising upon first usage. This shows again that you really have to be careful when using new features.

Hope this helps.

http://carlos-sierra.net/2013/03/12/interesting-case-where-a-full-table-scan-is-chosen-instead-of-an-index

Advertisements

Written by Jacco H. Landlust

March 13, 2013 at 11:15 pm

Posted in Uncategorized

5 Responses

Subscribe to comments with RSS.

  1. This issue was also discussed yesterday at
    http://carlos-sierra.net/2013/03/12/interesting-case-where-a-full-table-scan-is-chosen-instead-of-an-index/

    Best Regards
    Joaquin Gonzalez

    joaquingonzalez

    March 14, 2013 at 10:00 am

    • I didn’t see that post yet. Thanks for mentioning it, I will update the post.

      Jacco H. Landlust

      March 14, 2013 at 10:20 am

  2. Your test case, same as my real-life case, shows a column with two distinct values (low and high are different) but just one histogram bucket. DBMS_STATS may miss some buckets when data is very skewed (see bug 10174050). It will be fixed on 12c. Then, having only one bucket, when your equality predicate is on the missing value, its selectivity is computed as half the size of the smallest bucket (see bug 5483301). Notice you get half the cardinality on table. Once CBO underestimates cardinality to 50% table size it will go for a FTS. Then you see the NVL predicate, which makes sense since the rows that were in table before the new column was added have still NULL in data block, but should now have the default value specified at the moment the column was created. If the CBO uses an index on that new column, since the values do exist on this index, there is no need of NVL.
    For your tests you can disable 5483301 with:
    ALTER SESSION SET “_fix_control” = ‘5483301:OFF’;
    ALTER SYSTEM SET “_fix_control” = ‘5483301:ON’;
    Bottom line: The NVL is not causing the FTS. The stats are causing the FTS. And the FTS cause the NVL.

    Carlos Sierra

    March 14, 2013 at 1:03 pm

    • Thanks for your comment.

      You are correct, absence of the histogram seems to be caused by the bug you mentioned. As workaround I should test doing a full calculation (not auto sample size) and retest. That should create the histogram, agreed?

      As soon as I get to my test system (tonight), I will retry.

      Jacco H. Landlust

      March 14, 2013 at 1:14 pm

      • with estimate_percent you should get the missing bucket created, then cardinality should be estimated correctly.

        Carlos Sierra

        March 15, 2013 at 12:54 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: