Database Forum / Oracle / Oracle Server / February 2006
Performance problem with Oracle server
|
|
Thread rating:  |
bubblecat@gmail.com - 27 Feb 2006 18:05 GMT We're experiencing "general slowness" with our application which has an Oracle back-end. As far as possible, I've eliminated the front-end (Forms) and network as bottle necks. Now, I'm not a DBA, but I did run a query on the v@system_event table as follows:
SQL> col EVENT format a30 SQL> select EVENT, 2 TOTAL_WAITS, 3 TOTAL_TIMEOUTS, 4 round(TIME_WAITED/6000) MINS_WAITED, 5 round(AVERAGE_WAIT/100,2) SECS_PER_WAIT 6 from V$SYSTEM_EVENT 7 where round(AVERAGE_WAIT/100,2) > 0 8 and EVENT not in ('pmon timer','smon timer','rdbms ipc reply','rdbms ipc message','SQL*Net message from client') 9 order by decode(EVENT,'SQL*Net message from client',-1,TIME_WAITED) desc;
And here's the output:
EVENT TOTAL_WAITS TOTAL_TIMEOUTS MINS_WAITED SECS_PER_WAIT ------------------------------ ----------- -------------- ----------- ------------- db file sequential read 90866422 0 7638 .01 jobq slave wait 36588 34709 1835 3.01 buffer busy waits 15291768 296 1618 .01 PL/SQL lock timer 35445 35439 1202 2.03 enqueue 29681 20584 1125 2.27 latch free 3547660 1935540 903 .02 pipe get 4417 1802 76 1.03 log file sync 208723 6 45 .01 log file parallel write 380461 0 41 .01 write complete waits 2303 2302 39 1.02 log file sequential read 17030 0 7 .03 log file switch completion 2165 0 5 .13 library cache load lock 163 36 3 1.05 local write wait 287 131 2 .5 process startup 1298 3 1 .04 library cache pin 354 6 1 .11 row cache lock 19 9 1 1.8 SQL*Net message from dblink 165 0 0 .14 wait list latch free 252 0 0 .02 control file heartbeat 1 1 0 4.09 BFILE open 106 0 0 .02 BFILE closure 106 0 0 .01 log buffer space 31 0 0 .04 db file parallel read 45 0 0 .02 BFILE internal seek 106 0 0 .01 BFILE read 106 0 0 .01 control file single write 12 0 0 .01 switch logfile command 3 0 0 .03 checkpoint completed 1 0 0 .08 reliable message 1 0 0 .01 BFILE get path object 2 0 0 .01
31 rows selected.
(Paste into monospaced font to get nice columns.)
In my lay opinion, events jobq slave wait, buffer busy waits, PL/SQL lock timer,enqueue, and latch free seem to take up a lot of time relative to 'db file sequential read'.
Is my crude analysis correct? I'd like an expert opinion.
Thanks.
frank.van.bortel@gmail.com - 27 Feb 2006 19:00 GMT State platform and version of your backend.
Assuming you are running a supported version (9i or higher), install statspack, get a snapshot, experience your "general slowness", take another snapshot, report on those two snapshots.
Tip: do *not* exclude SQL*Net message from client - maybe it *is* interaction with Forms (e.g. filling drop down lists, etc). Also, db links are excluded in this way.
xhoster@gmail.com - 28 Feb 2006 03:08 GMT > We're experiencing "general slowness" with our application which has an > Oracle back-end. As far as possible, I've eliminated the front-end [quoted text clipped - 9 lines] > 6 from V$SYSTEM_EVENT > 7 where round(AVERAGE_WAIT/100,2) > 0 That is a pretty foolish thing to do. Just because the average wait time rounds to zero doesn't mean the totatl wait time isn't important.
> 8 and EVENT not in ('pmon timer','smon timer','rdbms ipc > reply','rdbms ipc message','SQL*Net message from client') [quoted text clipped - 19 lines] > latch free 3547660 1935540 903 > .02 ...
> (Paste into monospaced font to get nice columns.) > > In my lay opinion, events jobq slave wait, buffer busy waits, PL/SQL > lock timer, Both jobq slave wait and PL/SQL lock timer are idle waits, so generally they aren't relevant.
> enqueue, and latch free seem to take up a lot of time > relative to 'db file sequential read'. Minutes are minutes, and 7638 is more than 1618 + 1125 + 903.
Xho
 Signature -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service $9.95/Month 30GB
Roman Klesel - 28 Feb 2006 07:26 GMT Hello,
bubblecat@gmail.com schrieb:
> We're experiencing "general slowness" with our application which has an > Oracle back-end. As far as possible, I've eliminated the front-end > (Forms) and network as bottle necks. Now, I'm not a DBA, but I did run > a query on the v@system_event table as follows: Doing performance tuning on this level is pretty hard. Have you checked the more obvious things before?
CPU load IO Load
Top SQL Top Sessions Invalid Objects
...
These are much more accessible. And if you find a problem here it is usually much easier fixed. (like danaged HD, broken indexes, missing stats, ...)
Greetings Roman
bubblecat@gmail.com - 28 Feb 2006 14:31 GMT OK, Frank/Xho/Roman, tks for your comments. Our dbase is 9.2.0.5.0.
I edited the query and ran it again:
SQL> col EVENT format a30 heading 'Event' SQL> col TOTAL_WAITS format 999,999,999 heading 'Waits' SQL> col TOTAL_TIMEOUTS format 999,999,999 heading 'Timeouts' SQL> col MINS_WAITED format 999,999 heading 'Total|Waited|(Minutes)' SQL> col SECS_PER_WAIT format 9,999.99 heading 'Average|Wait|(Seconds)' SQL> select EVENT, 2 TOTAL_WAITS, 3 TOTAL_TIMEOUTS, 4 round(TIME_WAITED/6000) MINS_WAITED, 5 round(AVERAGE_WAIT/100,2) SECS_PER_WAIT 6 from V$SYSTEM_EVENT 7 where EVENT not in ('pmon timer','smon timer','rdbms ipc reply','rdbms ipc message') 8 order by MINS_WAITED desc;
Total Average Waited Wait Event Waits Timeouts (Minutes) (Seconds) ------------------------------ ------------ ------------ --------- --------- SQL*Net message from client 82,948,635 0 315,437 .23 db file sequential read 125,360,178 0 9,652 .00 db file scattered read 74,566,347 0 4,255 .00 jobq slave wait 62,099 58,927 3,117 3.01 PL/SQL lock timer 59,312 59,305 2,011 2.03 buffer busy waits 18,059,575 346 1,882 .01 enqueue 32,270 22,079 1,202 2.24 latch free 4,511,028 2,748,645 1,113 .01 pipe get 6,342 2,919 112 1.06 log file parallel write 548,996 0 54 .01 log file sync 272,566 7 54 .01 write complete waits 2,339 2,337 40 1.02 SQL*Net more data to client 20,199,490 0 11 .00 log file sequential read 20,735 0 9 .02 log file switch completion 2,359 0 5 .13 library cache load lock 269 57 5 1.04 control file parallel write 53,151 0 4 .00 control file sequential read 55,311 0 3 .00 local write wait 341 131 2 .43 SQL*Net more data from client 223,086 0 2 .00 SQL*Net break/reset to client 51,140 0 2 .00 library cache pin 586 6 2 .17 process startup 2,320 3 1 .03 row cache lock 21 9 1 1.64 SQL*Net message to client 82,949,434 0 1 .00 LGWR wait for redo copy 19,891 2,882 1 .00 wait list latch free 260 0 0 .02 <...>
Frank - Re your point that 'SQL*Net message from client' event should be included, I'm not sure I follow this. Indeed this event comes out on top, but isn't that normal? You would get this event every time the dbase waits for the front-end, right? So assuming users don't respond at lightning speed and even go for coffee with the form still open, isn't it to be expected that this event accounts for most of the waiting time? Please explain if I misunderstand this event.
Xho - I know read time exceeds enqueue/latch time, but I mean, wouldn't the enqueue and latch times normally be lower on a properly tuned system, given the scattered/sequential read times? What is typical?
Roman - CPU is high, I/O and mem are fine. Of course we can add CPU's, but I'd like to know anything is amiss other than insufficent CPU power, before we go ahead with that.
Thanks all.
xhoster@gmail.com - 28 Feb 2006 16:07 GMT > Frank - Re your point that 'SQL*Net message from client' event should > be included, I'm not sure I follow this. Indeed this event comes out on [quoted text clipped - 3 lines] > isn't it to be expected that this event accounts for most of the > waiting time? Please explain if I misunderstand this event. At this level of granularity, the message from client event is pretty useless, as there is no way to distinguish harmless ones (end user is getting some coffee, whatever) from harmful ones (the end-user's client is constantly programmatically yacking back and forth with the server, suffering the death of a thousand small "idle" waits). But if you are tracing individual sessions, then it should not be pre-emptively dismissed.
If you can get the actual histogram of "message from client" wait times, then with that and a little intuition it is usually obvious where to draw the line between "Waiting for the end-user personally" and "Waiting for the end-user's client program". For example, ignore all waits more than 1 second, and see what the total is of the ones under 1 second each.
> Xho - I know read time exceeds enqueue/latch time, but I mean, wouldn't > the enqueue and latch times normally be lower on a properly tuned > system, given the scattered/sequential read times? I'll go out on a limb and say the proportion of enqueue waits have little to do with tuning. If the task your database is trying to accomplish is inherently contentious, or is contentious through bad design, then you will high enqueue waits; but this is at the level of design and/or reality, not the tuning level. If you have badly tuned DML, they can hold locks longer than they otherwise would, increasing enqueue waits. But that would also increase the other waits more or less proportionally. (However, it could be out of proportion if there are multiple thing queued up on the same resource. If one session holds a lock while it does 5 seconds of IO, and 4 sessions are blocked waiting for that row, this results in 5 seconds of IO wait and 20 seconds of enqueue wait.) So, assuming you can't redesign your applications at this point, focus on the sequential read time (which most likely means focus on the logical IO behind the reads) and enqueue will likely come down as well. For that matter, it is likely the latch waits and perhaps even the buffer busy waits will as well.
BTW, do you know if the buffer busy waits are mostly for data blocks, or header blocks, or what?
> What is typical? There is no typical. Your database needs to do what you need it do do for you, my database needs to do what I need it to do for me. That said, unless your task at hand is inherently highly contentious, the enqueue waits to do seem out of line to me.
Xho
 Signature -------------------- http://NewsReader.Com/ -------------------- Usenet Newsgroup Service $9.95/Month 30GB
bubblecat@gmail.com - 28 Feb 2006 14:40 GMT OK, Frank/Xho/Roman, tks for your comments. Our dbase is 9.2.0.5.0.
I edited the query based on what u said, and ran it again:
SQL> col EVENT format a30 heading 'Event' SQL> col TOTAL_WAITS format 999,999,999 heading 'Waits' SQL> col TOTAL_TIMEOUTS format 999,999,999 heading 'Timeouts' SQL> col MINS_WAITED format 999,999 heading 'Total|Waited|(Minutes)' SQL> col SECS_PER_WAIT format 9,999.99 heading 'Average|Wait|(Seconds)' SQL> select EVENT, 2 TOTAL_WAITS, 3 TOTAL_TIMEOUTS, 4 round(TIME_WAITED/6000) MINS_WAITED, 5 round(AVERAGE_WAIT/100,2) SECS_PER_WAIT 6 from V$SYSTEM_EVENT 7 where EVENT not in ('pmon timer','smon timer','rdbms ipc reply','rdbms ipc message') 8 order by MINS_WAITED desc;
Total Average Waited Wait Event Waits Timeouts (Minutes) (Seconds) ------------------------------ ------------ ------------ --------- --------- SQL*Net message from client 82,948,635 0 315,437 .23 db file sequential read 125,360,178 0 9,652 .00 db file scattered read 74,566,347 0 4,255 .00 jobq slave wait 62,099 58,927 3,117 3.01 PL/SQL lock timer 59,312 59,305 2,011 2.03 buffer busy waits 18,059,575 346 1,882 .01 enqueue 32,270 22,079 1,202 2.24 latch free 4,511,028 2,748,645 1,113 .01 pipe get 6,342 2,919 112 1.06 log file parallel write 548,996 0 54 .01 log file sync 272,566 7 54 .01 write complete waits 2,339 2,337 40 1.02 SQL*Net more data to client 20,199,490 0 11 .00 log file sequential read 20,735 0 9 .02 log file switch completion 2,359 0 5 .13 library cache load lock 269 57 5 1.04 control file parallel write 53,151 0 4 .00 control file sequential read 55,311 0 3 .00 local write wait 341 131 2 .43 SQL*Net more data from client 223,086 0 2 .00 SQL*Net break/reset to client 51,140 0 2 .00 library cache pin 586 6 2 .17 process startup 2,320 3 1 .03 row cache lock 21 9 1 1.64 SQL*Net message to client 82,949,434 0 1 .00 LGWR wait for redo copy 19,891 2,882 1 .00 wait list latch free 260 0 0 .02 <...>
Frank - Re your point that 'SQL*Net message from client' event should be included, I'm not sure I follow this. Indeed this event comes out on top, but isn't that normal? You would get this event every time the dbase waits for the front-end, right? So assuming users don't respond at lightning speed and even go for coffee with the form still open, isn't it to be expected that this event accounts for most of the waiting time? Please explain if I misunderstand this event.
Roman - CPU utilization is on the high side, I/O is fine, mem is fine. But what I'm trying to find out is if there is anything amiss other than insufficient CPU capacity, so that we can tune/optimize before we add more CPU's/mem/disks.
Xho - I know the read times exceed the latch/enqueue times, but what I meant was: Is it normal for latch/enqueue to be so high given the sequential/scattered read times? I mean, wouldn't latch/enqueue normally be lower for a properly tuned system?
Thanks all.
frank.van.bortel@gmail.com - 28 Feb 2006 19:30 GMT Xho pretty much said it all, and to the point.
One addition: I was thinking along the line of statspack: take a snapshot, let the db and programs run, take one more snapshot, *and compare the two snapshots*.
What you have here, is data from instance startup onwards; Xho used the phrase "at this level of granularity". Very politely put. I'd say: there is none.
Compare this (your) query (as of 10:00 this morning) to this query, as of 10:10 this morning.
Regards, Frank van Bortel
|
|
|