设计两个方案,表结构、数据量、数据分布、索引情况、统计信息等都一样,只有数据入库的顺序不一样,如下:
方案一说明:插入测试表的数据顺序为“111…111222…222333…333…”
方案二说明:插入测试表的数据顺序为“123…123…123…123…”
方案一:
1. 创建测试表
-- Create table drop table alex_t03 purge; create table alex_t03 ( id number, col1 number, name varchar2(100) ); -- Create/Recreate primary, unique and foreign key constraints alter table alex_t03 add constraint pk_alex_t03 primary key (id) using index; create index idx_alex_t03_col1 on alex_t03 (col1); -- Insert table declare seq number:=1; begin for i in 1..&tabcnt loop insert into alex_t03 values (i, seq, 'alex'); if mod(i,trunc(&tabcnt/&discnt))=0 then seq:=seq+1; end if; end loop; commit; end; / --Gather Statistics exec dbms_stats.gather_table_stats('dbmgr','alex_t03') exec dbms_stats.gather_index_stats('dbmgr','pk_alex_t03') exec dbms_stats.gather_index_stats('dbmgr','idx_alex_t03_col1')
2. 查看测试表的信息
SQL> desc alex_t03 名称 是否为空? 类型 ----------------------------------------- -------- ---------------------------- ID NOT NULL NUMBER COL1 NUMBER NAME VARCHAR2(100) SQL> select count(*) tcount, count(distinct col1) coldc_col1 from alex_t03; TCOUNT COLDC_COL1 ---------- ---------- 100000 20 SQL> pro 索引理想选择度 SQL> select trunc((1/count(distinct col1))*100,2)||'%' col_ratio from alex_t03; COL_RATIO -------------------- 5% SQL> pro 索引实际选择度 SQL> select trunc((select count(*) from alex_t03 where col1=1)/(select count(*) from alex_t03)*100,2)||'%' select_ratio from dual; SELECT_RATIO -------------------- 5% SQL> select table_name, index_name, column_name from user_ind_columns where table_name='ALEX_T03'; TABLE_NAME INDEX_NAME COLUMN_NAME -------------------- -------------------- -------------------- ALEX_T03 PK_ALEX_T03 ID ALEX_T03 IDX_ALEX_T03_COL1 COL1
3. 查询一下索引列col1=1
SQL> select * from alex_t03 where col1=1; ---------------------------------------------------------- Plan hash value: 577105205 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5025 | 60300 | 26 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| ALEX_T03 | 5025 | 60300 | 26 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_ALEX_T03_COL1 | 5025 | | 12 (0)| 00:00:01 | -------------------------------------------------------------------------------------------------
发现是和预期一样,走了索引的。
方案二:
1. 创建测试表
-- Create table drop table alex_t03 purge; create table alex_t03 ( id number, col1 number, name varchar2(100) ); -- Create/Recreate primary, unique and foreign key constraints alter table alex_t03 add constraint pk_alex_t03 primary key (id) using index; create index idx_alex_t03_col1 on alex_t03 (col1); -- Insert table declare seq number:=1; begin for i in 1..&tabcnt loop insert into alex_t03 values (i, seq, 'alex'); seq:=seq+1; if mod(i,&discnt)=0 then seq:=1; end if; end loop; commit; end; / --Gather Statistics exec dbms_stats.gather_table_stats('dbmgr','alex_t03') exec dbms_stats.gather_index_stats('dbmgr','pk_alex_t03') exec dbms_stats.gather_index_stats('dbmgr','idx_alex_t03_col1')
2. 查看测试表的信息
SQL> desc alex_t03 名称 是否为空? 类型 ----------------------------------------- -------- ---------------------------- ID NOT NULL NUMBER COL1 NUMBER NAME VARCHAR2(100) SQL> select count(*) tcount, count(distinct col1) coldc_col1 from alex_t03; TCOUNT COLDC_COL1 ---------- ---------- 100000 20 SQL> pro 索引理想选择度 SQL> select trunc((1/count(distinct col1))*100,2)||'%' col_ratio from alex_t03; COL_RATIO -------------------- 5% SQL> pro 索引实际选择度 SQL> select trunc((select count(*) from alex_t03 where col1=1)/(select count(*) from alex_t03)*100,2)||'%' select_ratio from dual; SELECT_RATIO -------------------- 5% SQL> select table_name, index_name, column_name from user_ind_columns where table_name='ALEX_T03'; TABLE_NAME INDEX_NAME COLUMN_NAME -------------------- -------------------- -------------------- ALEX_T03 PK_ALEX_T03 ID ALEX_T03 IDX_ALEX_T03_COL1 COL1
3. 查询一下索引列col1=1
SQL> select * from alex_t03 where col1=1; ---------------------------------------------------------- Plan hash value: 4180973395 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 4973 | 59676 | 87 (5)| 00:00:02 | |* 1 | TABLE ACCESS FULL| ALEX_T03 | 4973 | 59676 | 87 (5)| 00:00:02 | ------------------------------------------------------------------------------
发现没有走索引,而是走了全表扫描。
总结分析:
(1)可以看到,两个方案的数据、索引、统计信息(包括直方图)都是一样的,按照预期应该不会出现执行计划偏差的;
(2)CBO优化器的算法其实是将所有操作转换成单块读,来累积计算成本的,所以两个方案COST不一样,导致执行计划不一样;
(3)方案一中,COL1=1的数据存储在相对集中的data block中,要查询COL1=1的数据,相对查询的data block较少,所以选择走索引,通过rowid返回数据;
(4)方案二中,则相反,要返回COL1=1的数据,几乎要查询所有的data block,故走了全表扫描。
综上所述,CBO判断数据分布是按照block的分布情况来选择的,而非数值的情况。故,在考察数据分布的时候,不仅要考察数值分布的情况,更需要考察数据实际存储block的分布情况。
no comment untill now