1. 问题背景:
一个Oracle数据库从HP-UX平台迁移到Solaris平台,由于数据库高并发写入的压力较大,导致LGWR进程写redo log过慢,需要分析研究一下优化方案。
HP-UX:redo log卷开启Direct IO;
Solaris:redo log卷开启Direct IO,系统和数据库均开始AIO。

2. 对比一下HP-UX与Solaris平台的LGWR写入性能。
(1)测试单表100万条数据插入过程的redo分析:

select b.NAME, a.VALUE
  from v$sysstat a, v$statname b
 where b.name in ('redo write time', 'redo log space requests',
        'redo log space wait time', 'messages sent',
        'redo entries', 'redo size', 'redo synch writes',
        'redo wastage', 'redo writes', 'redo blocks written')
   and a.STATISTIC# = b.STATISTIC#;

drop table alex;
create table alex(tab1_col number);

begin
  for i in 1 .. 1000000 loop
    insert into alex values (i);
  end loop;
  commit;
end;
/

select b.NAME, a.VALUE
  from v$sysstat a, v$statname b
 where b.name in ('redo write time', 'redo log space requests',
        'redo log space wait time', 'messages sent',
        'redo entries', 'redo size', 'redo synch writes',
        'redo wastage', 'redo writes', 'redo blocks written')
   and a.STATISTIC# = b.STATISTIC#;

在Oracle层面扑捉到的一些信息如下,Solaris的redo产生总量和wastage都有优势,但是写入redo的次数和block数反而更多了, Redo的写入时间更长,后台的message sent更多。

---------------------------------------------------
| 	                      |   HP-UX	  |  Solaris  |
--------------------------------------|------------
|messages sent	          |       297	|      1178 |
|redo synch writes	      |         3	|         3 |
|redo entries	            |   1006690	|   1007117 |
|redo size	              | 265153640	| 245097116 |
|redo wastage             |    139096	|     58704 |
|redo writes	            |       285	|       306 |
|redo blocks written	    |    263260	|    500614 |
|redo write time	        |       180	|       320 |
|redo log space requests	|         0	|         0 |
|redo log space wait time	|         0	|         0 |
---------------------------------------------------

可以推断Solaris平台上,redo block size较小,写入效率也差,写redo的时候,要求的后台进程交互也更多。
但是,同样的事务量,Solaris比HP-UX产生的redo量(redo size + redo wastage)较小。

(2)Redo Block Size的分析
Redo block的大小,由操作系统决定,可以通过select max(lebsz) from x$kccle得到。

redo log block size = 16 + (redo size + redo wastage) / redo blocks written。
  HP-UX:redo log block size = 16+(265153640+139096)/263260≈1024
Solaris: redo log block size = 16+( 245097116+58704)/500614≈512

Redo log block size的大小取决于操作系统的最小RAW I/O单元,部分操作系统取决于操作系统的最小文件系统I/O单元,主流操作系统如下表:

---------------------------------------------------------------------------------------------------------------------
|  Log Block Size  |	      Operating Systems                                                                       |
|------------------|-------------------------------------------------------------------------------------------------
|   512 bytes	     |     Solaris, AIX, Windows NT/2000, Linux, Irix, DG/UX, OpenVMS, NetWare, UnixWare, DYNIX/ptx   |
|   1024 bytes	   |     HP-UX, Tru64 Unix                                                                          |
|   2048 bytes	   |     SCO Unix, Reliant Unix                                                                     |
|   4096 bytes	   |     MVS, MPE/ix                                                                                |
---------------------------------------------------------------------------------------------------------------------

从oracle层面数据的那个表格看,solaris的redo block较小,solaris 512bytes,HP 1024bytes。
那么当有1025 byte的redo buffer需要写入的时候,
Solaris下需要写入3次,redo wastage有511 bytes;HP下需要写2次,redo wastage有1023 bytes,符合表格的情况。
写的次数多,IO交互多,且后台进程之间的message sent多,通信交互多,导致性能慢,也是必然的。
这样的话,对于解决log file sync之类的redo log file写入方面的问题,Solaris处理能力是要比HP-UX差了。

3. 通过OS层面LGWR进程的trace来分析一下。
(1)HP-UX平台,通过tust跟踪分析:

( Attached to process 10631 ("ora_lgwr_perftest") [64-bit] )
12:40:35 [10631] pw_wait(0x9fffffffffffcae0) .......................................................... [sleeping]
12:40:35 [10631] (0.182779) pw_wait(0x9fffffffffffcae0) ............................................... ERR#11 EAGAIN
12:40:37 [10631] pw_wait(0x9fffffffffffcae0) .......................................................... [sleeping]
12:40:37 [10631] (2.023364) pw_wait(0x9fffffffffffcae0) ............................................... = 0
12:40:37 [10631] (0.000020) lseek(26218, 437163008, SEEK_SET) ......................................... = 437163008
12:40:37 [10631] (0.003642) write(26218, "01B \0\0\00683a5\0\0\02 8010T 1b".., 3072) .................. = 3072
12:40:37 [10631] (0.000018) gettimeofday(0x9fffffffffffcdd0, NULL) .................................... = 0
12:40:37 [10631] (0.000017) lseek(26218, 437166080, SEEK_SET) ......................................... = 437166080
12:40:37 [10631] (0.000304) write(26218, "01B \0\0\00683a8\0\0\02 8010af9d".., 2048) .................. = 2048
12:40:37 [10631] (0.000797) pw_wait(0x9fffffffffffcae0) ............................................... = 0
12:40:37 [10631] (0.000016) lseek(26218, 437168128, SEEK_SET) ......................................... = 437168128
12:40:37 [10631] (0.000472) write(26218, "01B \0\0\00683aa\0\0\02 8010o e1".., 1024) .................. = 1024
12:40:37 [10631] (0.000020) pw_post(1530765251985982590) .............................................. = 0
12:40:37 [10631] (0.030276) pw_wait(0x9fffffffffffcae0) ............................................... = 0
12:40:37 [10631] (0.000023) lseek(26218, 437169152, SEEK_SET) ......................................... = 437169152
12:40:37 [10631] (0.001414) write(26218, "01B \0\0\00683ab\0\0\02 8010( d6".., 5120) .................. = 5120
12:40:37 [10631] (0.004568) pw_wait(0x9fffffffffffcae0) ............................................... = 0
12:40:37 [10631] (0.000024) lseek(26218, 437174272, SEEK_SET) ......................................... = 437174272
12:40:37 [10631] (0.000418) write(26218, "01B \0\0\00683b0\0\0\02 8010p \a".., 5120) .................. = 5120
12:40:37 [10631] (0.000020) pw_post(1530765251985982590) .............................................. = 0
12:40:37 [10631] (0.080157) pw_wait(0x9fffffffffffcae0) ............................................... = 0
12:40:37 [10631] (0.000021) lseek(26218, 437179392, SEEK_SET) ......................................... = 437179392
12:40:37 [10631] (0.000719) write(26218, "01B \0\0\00683b5\0\0\02 8010: c2".., 25600) ................. = 25600
12:40:37 [10631] (0.098210) pw_wait(0x9fffffffffffcae0) ............................................... = 0
12:40:37 [10631] (0.000023) lseek(26218, 437204992, SEEK_SET) ......................................... = 437204992
12:40:37 [10631] (0.010094) write(26218, "01B \0\0\00683ce\0\0\02 8010cdf ".., 1048576) ............... = 1048576
……
……

(2)Solaris平台,通过truss跟踪分析:

3305:	psargs: ora_lgwr_perftest
3305/2:		kaio(6, 0x00000000, 0xFFFFFFFF7AB4A300, 0x00000010, 0xFFFFFFFF7AB4BF98, 0xFFFFFFFF7B900A00) (sleeping...)
3305/8:		lwp_park(0x00000000, 0)		(sleeping...)
3305/4:		lwp_park(0x00000000, 0)		(sleeping...)
3305/6:		lwp_park(0x00000000, 0)		(sleeping...)
3305/7:		lwp_park(0x00000000, 0)		(sleeping...)
3305/1:		semtimedop(654311517, 0xFFFFFFFF7FFFDB04, 1, 0xFFFFFFFF7FFFDAF0) (sleeping...)
3305/11:	lwp_park(0x00000000, 0)		(sleeping...)
3305/10:	lwp_park(0x00000000, 0)		(sleeping...)
3305/3:		lwp_park(0x00000000, 0)		(sleeping...)
3305/5:		lwp_park(0x00000000, 0)		(sleeping...)
3305/9:		lwp_park(0x00000000, 0)		(sleeping...)
3305/1:		 2.8592	semtimedop(654311517, 0xFFFFFFFF7FFFDB04, 1, 0xFFFFFFFF7FFFDAF0) Err#11 EAGAIN
3305/1:		 0.0003	times(0xFFFFFFFF7FFFDCD0)			= 782024163
3305/1:		 0.0001	times(0xFFFFFFFF7FFFDC20)			= 782024163
3305/1:		semtimedop(654311517, 0xFFFFFFFF7FFFDB04, 1, 0xFFFFFFFF7FFFDAF0) (sleeping...)
3305/1:		 2.9513	semtimedop(654311517, 0xFFFFFFFF7FFFDB04, 1, 0xFFFFFFFF7FFFDAF0) = 0
3305/1:		 0.0282	lwp_unpark(5)					= 0
3305/5:		 5.8391	lwp_park(0x00000000, 0)				= 0
3305/5:		 0.0021	pwrite(260, "01 "\0\0\002E3 E\0\0\0 &".., 5120, 0x05C68A00) = 5120
3305/5:		 0.8412	kaio(AIONOTIFY, 1)				= 0
3305/1:		 0.8433	kaio(AIOWAIT, 0xFFFFFFFF7FFFD600)		= 1
3305/1:		 0.0004	write(2, " * * *   2 0 1 3 - 1 0 -".., 27)	= 27
3305/1:		 0.0004	write(2, "\n", 1)				= 1
3305/1:		 0.0008	write(2, " W a r n i n g :   l o g".., 39)	= 39
3305/1:		 0.0003	write(2, "\n", 1)				= 1
3305/1:		 0.0005	semctl(654311517, 24, SETVAL, 1)		= 0
3305/1:		 0.0326	semtimedop(654311517, 0xFFFFFFFF7FFFDB04, 1, 0xFFFFFFFF7FFFDAF0) = 0
3305/1:		 0.0002	lwp_unpark(7)					= 0
3305/7:		 6.7176	lwp_park(0x00000000, 0)				= 0
3305/7:		 0.0018	pwrite(260, "01 "\0\0\002E3 O\0\0\0 &".., 7168, 0x05C69E00) = 7168
……
……

(3)LGWR进程跟踪日志:
跟踪LGWR进程,HP-UX物理写入次数为539次,Solaris物理写入次数为408次。
但是Solaris日志可以看到写warning:log,如果加上这部分,有936次写操作。

3305/1:             0.0003     write(2, " * * *   2 0 1 3 - 1 0 -".., 27)       = 27
3305/1:             0.0005     write(2, "\n", 1)                                   = 1
3305/1:             0.0008     write(2, " W a r n i n g :   l o g".., 43)        = 43
3305/1:             0.0004     write(2, "\n", 1)                                   = 1

至于这部分write是在写什么呢?

TIPS:
- 从10.2.0.3或者10.2.0.4版本开始,在实际日志写入过慢时,LGWR进程会dump出一些有用的信息,如单次写的时间,单位为毫秒。
- 相关控制参数: _side_channel_batch_timeout_ms
LGWR进程trace文件:
*** 2010-03-10 11:36:06.759
Warning: log write time 690ms, size 19KB
*** 2010-03-10 11:37:23.778
Warning: log write time 52710ms, size 0KB
*** 2010-03-10 11:37:27.302
Warning: log write time 3520ms, size 144KB

由此可以判断,Solaris平台的LGWR效率差是因为redo log卷的存储性能差。

4. 总结:
(1)redo log block的大小由操作系统类型决定;
(2)同样的事务量,Solaris比HP-UX产生的redo量(redo size + redo wastage)较小;
(3)从Oracle数据库层面来分析,对于日志写入压力较大(log file sysnc较严重)的系统,写入同样的数据量时,redo log block越小,效率越差,写入次数多了,后台交互也多了;
(4)但是从OS层面来看并非如此,redo log block小的,其实际物理写的次数也可能较少,效率较好。

Trackback

no comment untill now

Add your comment now

切换到手机版