接着上篇BLOG的情况,接下去继续分析数据分布情况对索引的影响。

1. 在上例中数据分布是均匀的,我们现在来制造一些数据分布的倾斜,考察一下情况。

#############################################################################
# 制造索引列数据分布倾斜
#############################################################################
SQL> select col1,count(*) from alex_t03 group by col1 order by 1;
      COL1   COUNT(*)
---------- ----------
         1      10000
         2      10000
         3      10000
         5      50000
         9      10000
        10      10000

已选择6行。

SQL> pro 索引理想选择度
SQL> select trunc((1/count(distinct col1))*100,2)||'%' col_ratio from alex_t03;
COL_RATIO
--------------------
16.66%

SQL> pro 索引实际选择度(col1=1)
SQL> select trunc((select count(*) from alex_t03 where col1=1)/(select count(*) from alex_t03)*100,2)||'%' select_ratio from dual;
SELECT_RATIO
--------------------
10%

SQL> pro 索引实际选择度(col1=5)
SQL> select trunc((select count(*) from alex_t03 where col1=5)/(select count(*) from alex_t03)*100,2)||'%' select_ratio from dual;
SELECT_RATIO
--------------------
50%

上述看到,数据分布倾斜后,索引的理想选择度和实际选择度不一样了,CBO会如何选择呢?

2. 使用旧有的统计信息,不重新收集,发现CBO默认还是理想的均匀数据分布的算法。

SQL> select * from alex_t03 where col1=1;
----------------------------------------------------------
Plan hash value: 577105205
-------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                   | 10013 |   117K|    53   (2)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALEX_T03          | 10013 |   117K|    53   (2)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_ALEX_T03_COL1 | 10013 |       |    26   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

SQL> select * from alex_t03 where col1=5;
----------------------------------------------------------
Plan hash value: 577105205
-------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                   | 10013 |   117K|    53   (2)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALEX_T03          | 10013 |   117K|    53   (2)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_ALEX_T03_COL1 | 10013 |       |    26   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

3. 重新收集统计信息,但不构建直方图。

#############################################################################
# 重新收集统计信息(不构建直方图)
#############################################################################
SQL> select * from alex_t03 where col1=1;
----------------------------------------------------------
Plan hash value: 4180973395
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          | 16649 |   195K|    87   (5)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| ALEX_T03 | 16649 |   195K|    87   (5)| 00:00:02 |
------------------------------------------------------------------------------

SQL> select * from alex_t03 where col1=5;
----------------------------------------------------------
Plan hash value: 4180973395
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          | 16649 |   195K|    87   (5)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| ALEX_T03 | 16649 |   195K|    87   (5)| 00:00:02 |
------------------------------------------------------------------------------

发现不论实际选择度=10%的COL1=1的情况,还是实际选择度=50%的COL1=5的情况,都走了全表扫描。
这样对COL1=5的情况固然没有问题,但对COL1=1的情况,就不对了。

4. 再重新收集统计信息,这次构建直方图。

#############################################################################
# 重新收集统计信息(构建直方图)
#############################################################################
SQL> select * from alex_t03 where col1=1;
----------------------------------------------------------
Plan hash value: 577105205
-------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                   | 10345 |   121K|    59   (2)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| ALEX_T03          | 10345 |   121K|    59   (2)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IDX_ALEX_T03_COL1 | 10461 |       |    31   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------

SQL> select * from alex_t03 where col1=5;
----------------------------------------------------------
Plan hash value: 4180973395
------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          | 48527 |   568K|    87   (5)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| ALEX_T03 | 48527 |   568K|    87   (5)| 00:00:02 |
------------------------------------------------------------------------------

这次,我们发现实际选择度=10%的COL1=1的情况,走了索引扫描,实际选择度=50%的COL1=5的情况,走了全表扫描,这样就对了。

5. 总结
(1)在缺少直方图的情况下,对于索引列的数据分布,尽可能保证其均匀分布,否则会执行计划偏差。
(2)由于收集直方图成本较高,一般情况下不建议收集。
(3)索引列的数据分布在设计之初,尽可能保证较均匀分布,即理想选择度和实际选择度尽可能相等。

Trackback

no comment untill now

Add your comment now

切换到手机版