设计两个方案,表结构、数据量、数据分布、索引情况、统计信息等都一样,只有数据入库的顺序不一样,如下:
方案一说明:插入测试表的数据顺序为“111…111222…222333…333…”
方案二说明:插入测试表的数据顺序为“123…123…123…123…”

方案一:

1. 创建测试表

-- Create table
drop table alex_t03 purge;
create table alex_t03 (
id number, 
col1 number, 
name varchar2(100)
);

-- Create/Recreate primary, unique and foreign key constraints 
alter table alex_t03 add constraint pk_alex_t03 primary key (id) using index;
create index idx_alex_t03_col1 on alex_t03 (col1);

-- Insert table
declare
seq number:=1;
begin
for i in 1..&tabcnt loop
insert into alex_t03 values (i, seq, 'alex');
if mod(i,trunc(&tabcnt/&discnt))=0 then 
seq:=seq+1;
end if;
end loop;
commit;
end;
/

--Gather Statistics
exec dbms_stats.gather_table_stats('dbmgr','alex_t03')
exec dbms_stats.gather_index_stats('dbmgr','pk_alex_t03')
exec dbms_stats.gather_index_stats('dbmgr','idx_alex_t03_col1')

2. 查看测试表的信息

SQL> desc alex_t03
 名称                                      是否为空? 类型
 ----------------------------------------- -------- ----------------------------
 ID                                        NOT NULL NUMBER
 COL1                                               NUMBER
 NAME                                               VARCHAR2(100)

SQL> select count(*) tcount, count(distinct col1) coldc_col1 from alex_t03;
    TCOUNT COLDC_COL1
---------- ----------
    100000         20

SQL> pro 索引理想选择度
SQL> select trunc((1/count(distinct col1))*100,2)||'%' col_ratio from alex_t03;
COL_RATIO
--------------------
5%

SQL> pro 索引实际选择度
SQL> select trunc((select count(*) from alex_t03 where col1=1)/(select count(*) from alex_t03)*100,2)||'%' select_ratio from dual;
SELECT_RATIO
--------------------
5%

SQL> select table_name, index_name, column_name from user_ind_columns where table_name='ALEX_T03';
TABLE_NAME           INDEX_NAME           COLUMN_NAME
-------------------- -------------------- --------------------
ALEX_T03             PK_ALEX_T03          ID
ALEX_T03             IDX_ALEX_T03_COL1    COL1

3. 查询一下索引列col1=1

SQL> select * from alex_t03 where col1=1;
----------------------------------------------------------
Plan hash value: 577105205
-------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                   |  5025 | 60300 |    26   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALEX_T03          |  5025 | 60300 |    26   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_ALEX_T03_COL1 |  5025 |       |    12   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

发现是和预期一样,走了索引的。

方案二:

1. 创建测试表

-- Create table
drop table alex_t03 purge;
create table alex_t03 (
id number, 
col1 number, 
name varchar2(100)
);

-- Create/Recreate primary, unique and foreign key constraints 
alter table alex_t03 add constraint pk_alex_t03 primary key (id) using index;
create index idx_alex_t03_col1 on alex_t03 (col1);

-- Insert table
declare
seq number:=1;
begin
for i in 1..&tabcnt loop
insert into alex_t03 values (i, seq, 'alex');
seq:=seq+1;
if mod(i,&discnt)=0 then 
seq:=1;
end if;
end loop;
commit;
end;
/

--Gather Statistics
exec dbms_stats.gather_table_stats('dbmgr','alex_t03')
exec dbms_stats.gather_index_stats('dbmgr','pk_alex_t03')
exec dbms_stats.gather_index_stats('dbmgr','idx_alex_t03_col1')

2. 查看测试表的信息

SQL> desc alex_t03
 名称                                      是否为空? 类型
 ----------------------------------------- -------- ----------------------------
 ID                                        NOT NULL NUMBER
 COL1                                               NUMBER
 NAME                                               VARCHAR2(100)

SQL> select count(*) tcount, count(distinct col1) coldc_col1 from alex_t03;
    TCOUNT COLDC_COL1
---------- ----------
    100000         20

SQL> pro 索引理想选择度
SQL> select trunc((1/count(distinct col1))*100,2)||'%' col_ratio from alex_t03;
COL_RATIO
--------------------
5%

SQL> pro 索引实际选择度
SQL> select trunc((select count(*) from alex_t03 where col1=1)/(select count(*) from alex_t03)*100,2)||'%' select_ratio from dual;
SELECT_RATIO
--------------------
5%

SQL> select table_name, index_name, column_name from user_ind_columns where table_name='ALEX_T03';
TABLE_NAME           INDEX_NAME           COLUMN_NAME
-------------------- -------------------- --------------------
ALEX_T03             PK_ALEX_T03          ID
ALEX_T03             IDX_ALEX_T03_COL1    COL1

3. 查询一下索引列col1=1

SQL> select * from alex_t03 where col1=1;
----------------------------------------------------------
Plan hash value: 4180973395
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          |  4973 | 59676 |    87   (5)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| ALEX_T03 |  4973 | 59676 |    87   (5)| 00:00:02 |
------------------------------------------------------------------------------

发现没有走索引,而是走了全表扫描。

总结分析:
(1)可以看到,两个方案的数据、索引、统计信息(包括直方图)都是一样的,按照预期应该不会出现执行计划偏差的;
(2)CBO优化器的算法其实是将所有操作转换成单块读,来累积计算成本的,所以两个方案COST不一样,导致执行计划不一样;
(3)方案一中,COL1=1的数据存储在相对集中的data block中,要查询COL1=1的数据,相对查询的data block较少,所以选择走索引,通过rowid返回数据;
(4)方案二中,则相反,要返回COL1=1的数据,几乎要查询所有的data block,故走了全表扫描。
综上所述,CBO判断数据分布是按照block的分布情况来选择的,而非数值的情况。故,在考察数据分布的时候,不仅要考察数值分布的情况,更需要考察数据实际存储block的分布情况。

Trackback

no comment untill now

Add your comment now

切换到手机版