Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / Oracle / Oracle Server / May 2005

Tip: Looking for answers? Try searching our database.

iostat - multiblock read count

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
utkanbir - 18 May 2005 09:47 GMT
Hi ,

My system is a two node rac on redhat linux 2.1 , 4 ia64 cpus ,
8gb.ram , ocfs and emc raid 10

The db_block_size is 16kb. , current multi_block_read_Count is 64
(which makes 1mb. of read)

Below is some samples for different multiblock read counts . It seems
when i decrease the multblock read count , performance increases:

Sample query :

select /*+parallel(m,8)*/count(*) from taniadm.MERKEZ_CIKIS_34 m

the table is stored in a tablespace within locally managed tablespace
. The tablespace uses uniform extent sizes of 10mb. ( I have just
created it for this test, and have chosen this large extent size in
order to allow dbserver to use multiblock effectively.)

parallel 8 , 64 multiblock takes : 1:07 min.

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await
svctm %util
sdh1 43890.20 0.00 896.80 1.40 50860.00 1.40 56.63 119.07 131.83 1.09
100.00

parallel 8 , 32 multiblock takes : 59 sec

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await
svctm %util
sdh1 56885.00 0.00 1051.60 1.20 58651.60 1.20 55.71 138.40 130.24 0.93
100.00

parallel 8 , 16 multiblock takes : 45 secs

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await
svctm %util
sdh1 66734.20 0.00 1347.80 1.80 76781.60 1.80 56.89 48.62 36.00 0.72
100.00

parallel 8 , 8 multiblcok takes: 44 sec

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await
svctm %util
sdh1 66598.40 0.00 1371.40 2.00 77467.20 2.00 56.41 24.22 17.66 0.71
100.02

1.It seems when i decrease the multiblock read count parameter , the
rsec/s increases , but i expect to see the opposite.What's wrong with
this?

2. Are the values in r/s normal? Or do they point that my disks are
saturated. I have read an article explaining that 100 or 200 read per
second is enough to saturate disks but here i see large values .

3. Related to my second question , since i use raid 10 (striping +
mirroring ) is it possible to get higher io rates ? Looking at the
iostat values , (rsec/s) / (r/s) more or less equals to 28kb. (This is
the kb. read in each read.)It seems a very low value to me. On a sun
solaris ufs file system for instance , i can achive 128KB or even 1MB.
per read by playing with the parameters , i dont understand why my
linux box is different.

Any help will be appreciated.
Kind Regards,
tolga
DA Morgan - 18 May 2005 19:31 GMT
> Hi ,
>
[quoted text clipped - 63 lines]
> Kind Regards,
> tolga

To answer your questions really requires a StatsPack or AWR Report

Upgrading to RedHat 3 will likely improve things substantially

Depending on your hardware you could look at ethernet bonding to
increase through-put. Depends on what the limiting factor is.
Signature

Daniel A. Morgan
http://www.psoug.org
damorgan@x.washington.edu
(replace x with u to respond)

chao_ping - 19 May 2005 06:04 GMT
One possible reason could because of your global cache management. If
you shutdown one node, maybe result will be diffirent.

One question, is your OCFS doing directIO read? Else filesystem cache
can mask your result.
utkanbir - 23 May 2005 08:04 GMT
Hi Chao ,

I have checked the statstpack output regarding to this query , in raw
trace file i see lots of 'global cache cr request ' wait events but
the time they take is very little comparing to the disk read events:

select /*+NOPARALLEL(M) */count(*)
from
taniadm.MERKEZ_CIKIS_34 m  

call     count       cpu    elapsed       disk      query    current  
    rows
------- ------  -------- ---------- ---------- ---------- ----------
----------
Parse        3      0.00       0.01          1          1          0  
       0
Execute      3      0.00       0.00          0          0          0  
       0
Fetch        3     61.10     767.28     307181     307209          0  
       3
------- ------  -------- ---------- ---------- ---------- ----------
----------
total        9     61.10     767.30     307182     307210          0  
       3

Misses in library cache during parse: 1
Optimizer goal: CHOOSE
Parsing user id: 46

Rows     Row Source Operation
-------  ---------------------------------------------------
     1  SORT AGGREGATE (cr=102403 r=102399 w=0 time=262103077 us)
10133769   TABLE ACCESS FULL MERKEZ_CIKIS_34 (cr=102403 r=102399 w=0
time=257079273 us)

Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total
Waited
 ----------------------------------------   Waited  ----------
------------
 SQL*Net message to client                       8        0.00      
 0.00
 SQL*Net message from client                     8       78.11      
173.33
 global cache cr request                    154082        0.12      
 6.73
 db file scattered read                      23984        1.01      
707.97
 latch free                                      6        0.03      
 0.05
 SQL*Net break/reset to client                   2        0.00      
 0.00
 library cache lock                              4        0.00      
 0.00
 db file sequential read                         3        0.01      
 0.02
*******************************************************************************

Here the total times waited value for global cache cr request is large
but total waited is very small . The majority of query time spent in
disk io.

For the direct/io , i have checked the oracle executables, straced
them (especially open system calls) and saw the o_direct flag , and :

filesystemio_options is set to none. (for ocfs i was told it was not
necessary to set it, since ocfs uses direct io without this parameter)

Kind Regrads,

> One possible reason could because of your global cache management. If
> you shutdown one node, maybe result will be diffirent.
>
> One question, is your OCFS doing directIO read? Else filesystem cache
> can mask your result.
Noons - 23 May 2005 08:44 GMT
> My system is a two node rac on redhat linux 2.1 , 4 ia64 cpus ,
> 8gb.ram , ocfs and emc raid 10

time to move to RH3?  ;)

> The db_block_size is 16kb. , current multi_block_read_Count is 64
> (which makes 1mb. of read)

can mean nothing in Linux, read on...

> 1.It seems when i decrease the multiblock read count parameter , the
> rsec/s increases , but i expect to see the opposite.What's wrong with
> this?

I've got a funny feeling you just hit the 32K default Linux I/O limit.
You see, until kernel release 2.6 (or patched 2.4), Linux will
"secretly" transform any single I/O request for more than 32K bytes
into as many 32K requests as needed.  This takes time and physical
overhead from the disk controller(s).   When you reduce the dbfmr,
you reduce this overhead and paradoxically (my my, what a long
word for "D'uh!"...) you end up with a little more r/s. Read on.

> 3. Related to my second question , since i use raid 10 (striping +
> mirroring ) is it possible to get higher io rates ? Looking at the
[quoted text clipped - 3 lines]
> per read by playing with the parameters , i dont understand why my
> linux box is different.

There is a patch for RHAS at Oracle Metalink that gets rid of this
32K limitation.  It applies AFAIK only to 2.4.21 onwards, until
RHAS4 whereupon the 2.6 kernel takes over and it's not a problem
anymore.  However I'm not sure if this patch is compatible with
anything under Oracle 10g, so CHECK first with support.

If this is your problem you need first to upgrade to the adequate
level of RHAS3, *then* apply the Oracle patch.  You'll need to
request it from Oracle support themselves.

Go here:
http://www.oracle.com/technology/deploy/availability/pdf/ora_lcs.pdf
for all the nasty details.
HTH
chao_ping - 24 May 2005 10:28 GMT
Hi, Noons,
   It is the first time I know about this:
>>I've got a funny feeling you just hit the 32K default Linux I/O limit.
>>You see, until kernel release 2.6 (or patched 2.4), Linux will
>>"secretly" transform any single I/O request for more than 32K bytes
>>into as many 32K requests as needed
  Can you provide some detail about this? For example, metalink id, or
URL about linux kernel about this.

And Utkanbir, since it is for test, can you shutdown one node and
perform your test again?
THanks
Noons - 24 May 2005 12:53 GMT
chao_ping apparently said,on my timestamp of 24/05/2005 7:29 PM:

>>>You see, until kernel release 2.6 (or patched 2.4), Linux will
>>>"secretly" transform any single I/O request for more than 32K bytes
[quoted text clipped - 5 lines]
> And Utkanbir, since it is for test, can you shutdown one node and
> perform your test again?

That's funny: somehow Utkanbir's reply never made it
to my server.  I wonder why...

Anyways: see bug 4039598 (which is not really a bug...)
and Red Hat's Bugzilla problem 148838.
This doc also has some info on it:
http://www.oracle.com/technology/deploy/availability/pdf/ora_lcs.pdf
And never hearing about it doesn't mean it's incorrect: it just
means nobody told you before the simple facts.  Now you know.
;)

Signature

Cheers
Nuno Souto
in sunny Sydney, Australia
wizofoz2k@yahoo.com.au.nospam

Fabrizio - 24 May 2005 17:32 GMT
> I've got a funny feeling you just hit the 32K default Linux I/O limit.
> You see, until kernel release 2.6 (or patched 2.4), Linux will
[quoted text clipped - 3 lines]
> you reduce this overhead and paradoxically (my my, what a long
> word for "D'uh!"...) you end up with a little more r/s. Read on.

Is this a Redhat limit?

None of my distribution seems to have a 32K I/O limit even on the now
old fashioned 2.4 kernel.

Here is an example from a SLES8:

# dd if=/dev/zero of=/boot/foo bs=4096k count=10
10+0 records in
10+0 records out

Device:    rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
/dev/sda1    0.00 40636.00  0.00 322.00    0.00 81920.00     0.00
40960.00   254.41   222.40  687.58  24.84  80.00

The iostat shows 322 real calls to the device (while 40636 were merged
in this 322 thanks to the asynch i/o).

40960.00 K / 322 = 127.20 K

almost 128 K which can be the limit of my physical device.

(by the way: redhat bug 148838 speaks about a 128K limit on qlogic).

Regards

Signature

Fabrizio Magni

fabrizio.magni@mycontinent.com

replace mycontinent with europe

Noons - 25 May 2005 03:13 GMT
> Is this a Redhat limit?

AFAIK, yes.  That's why it is in the RedHat Bugzilla.
It's not really a limit: just a default design decision of
the 2.4 kernel people, I guess.  And only for 2.4 kernel level.
Ie, supposedly fixed in RHAS4 which is kernel 2.6 (or so I'm told...).

And of course in SLES 9 onwards?  But then again, SLES always
traditionally seemed to have less restrictions than RH.  Looking
at the compatibility tables in Metaclick for asynchIO and directio
for different flavours of Linux, it comes out very clearly SLES
paid attention to this disk stuff a lot more than RH.  And it
also supports a lot more file systems...
(but I'm biased towards Suse, so take this with a grain of salt!)

> 40960.00 K / 322 = 127.20 K
>
> almost 128 K which can be the limit of my physical device.

Yeah.  I think the patch lets it go to 1M which is what
you need for sequential IO in devices like the low-cost RAID
stuff Oracle is looking at now.

> (by the way: redhat bug 148838 speaks about a 128K limit on qlogic).

Yup, very much so.  Which happens to involve all disk IO.
See the best practices pdf link posted before for details.
Fabrizio - 25 May 2005 09:52 GMT
>>Is this a Redhat limit?
>
> AFAIK, yes.  That's why it is in the RedHat Bugzilla.
> It's not really a limit: just a default design decision of
> the 2.4 kernel people, I guess.  And only for 2.4 kernel level.
> Ie, supposedly fixed in RHAS4 which is kernel 2.6 (or so I'm told...).

I had a look at the associated bug on metalink. It seems the limit of
32K involves only i/o on raw and OCFS of a specific redhat version.

I cannot see any limit in 2.4 kernel related to i/o (at least I couldn't
spot any references in the source code or in the kernel mailing list).

As a test I took an old vanilla kernel (2.4.12) and performed the
previous dd (always on ext3 and reiserfs). The limit is not there and
the asynch i/o is doing its duty.

The interesting thing about the bug you pointed out is that it shows
only on OCFS and raw devices: those use synch OS i/o instead of aio.
I would conclude that it is really a redhat issue only.

May I ask you the sources of your statement about a 2.4 "design limit"?
(I'd like to invastigate this further: so far it smmes a myth).

> Yup, very much so.  Which happens to involve all disk IO.
> See the best practices pdf link posted before for details.

Thank you. The document (and the related ones) is really useful.

Signature

Fabrizio Magni

fabrizio.magni@mycontinent.com

replace mycontinent with europe

Noons - 25 May 2005 10:23 GMT
> I had a look at the associated bug on metalink. It seems the limit of
> 32K involves only i/o on raw and OCFS of a specific redhat version.

Not sure.  Its symptoms appear to show up in at least my RH test
box and I'm not running raw or OCFS...

> As a test I took an old vanilla kernel (2.4.12) and performed the
> previous dd (always on ext3 and reiserfs). The limit is not there and
> the asynch i/o is doing its duty.

'sOK with SLES, AFAIK.  (yay!)

> May I ask you the sources of your statement about a 2.4 "design limit"?
> (I'd like to invastigate this further: so far it smmes a myth).

The pdf document I pointed out plus the blurb in the RH bugzilla.
I'd say it's similar to the old Unix limit on max IO size:
used to be 64K, then some makers bumped it up to 1M.

> Thank you. The document (and the related ones) is really useful.

Pleasure.  Please let me know of anything you find: running
RH and this could have a big impact for me in the near future.
Here or via email in the header.  I'm going to find this particular
module in the source and see if I can figger who/what runs when.
That should show what's going on.  Gotta load the RH sources first,
across the bigpond: s-l-o-w...
hopehope_123 - 25 May 2005 15:42 GMT
Hi Friends ,

Thank you very much for your replies , Fabrizio it is good to hear from
you again , i have made your tests here , these are the results:

1.  redhat linux advanced server  ia64  , ocfs file system on emc
raid10. + fibre channel ,
no asyc , but direct_io active

time dd if=/dev/zero of=/oracle/koccrm01/test bs=4096k count=100
100+0 records in
100+0 records out

real    0m5.964s
user    0m0.001s
sys     0m3.923s

Device:  rrqm/s   wrqm/s   r/s        w/s      rsec/s   wsec/s
avgrq-sz  avgqu-sz   await  svctm  %util
sdn       7699.00 7624.00  347.00  227.00 8046.00 7851.00     27.70
1.28          2.22   0.65  38.10

7851.00 wsec/s = 7851*(512/1024)=3925,5KBytes/sec.  (1 sector=512k)

3925,5 / 227.00 = 17,29 kb.

2. same server:  redhat linux advanced server  ia64  , this time ext3
file system ,

for aio : /proc/sys/fs/aio_nr = 0 so it is not active.

time dd if=/dev/zero of=/oracle/stagetmp/tmp/test bs=4096k count=100
100+0 records in
100+0 records out

real    0m1.089s
user    0m0.002s
sys     0m1.071s

cant get iostat since it is too fast . tried by increasing the file
size :

time dd if=/dev/zero of=/oracle/stagetmp/tmp/test bs=4096k count=1000

Device:  rrqm/s wrqm/s     r/s    w/s       rsec/s  wsec/s
avgrq-sz  avgqu-sz     await     svctm  %util
sdl1       0.00   20951.00  1.00  958.00    8.00   175048.00   182.54
4190621.19  379.70   1.02 100.00

87524 KB.  / 958 = 91KB.

3. redhat linux x86 , ext3 file system , aio is same with above (not
active)

time dd if=/dev/zero of=/oracle/tolga/test bs=4096k count=100
100+0 records in
100+0 records out
   3.97s real     0.00s user     2.23s system

Device:  rrqm/s wrqm/s   r/s    w/s       rsec/s  wsec/s       avgrq-sz
avgqu-sz   await      svctm  %util
sdb5       0.00 22098.00  0.00  31.00    0.00    178408.00  5755.10
644.40      2183.87 116.13  36.00

178408wsec/s = 178408*(512/1024) =89204kb.

89240kb / 31 =  2878 kb.   (huh!)

4.  sun solaris  , ufs file system :

time dd if=/dev/zero of=/data/spss/test bs=4096k count=100

real    0m6.206s
user    0m0.000s
sys     0m2.830s

                 extended device statistics
device       r/s    w/s      kr/s   kw/s       wait  actv  svc_t  %w
%b
sd81         0.0  241.6    0.0   47195.2   29.7  11.9  172.2  24  74

47195,2 / 241.6 = 195 kb.

These results are also interesting .

Chao , i have tried the sql statements by stopping one of the nodes ,
the duration did not change. I think this is also clear from the
statspack output which shows lots of global cr request events but its
wait time is very low.

Kind Regards,
tolga
Noons - 26 May 2005 02:04 GMT
> 1.  redhat linux advanced server  ia64  , ocfs file system on emc
> raid10. + fibre channel ,
> no asyc , but direct_io active

I think I'm a bit in the blank here.  How is direct_io active?
By file system mount?  If it is through Oracle, then a test of
IO speed with "dd" in this case means you're writing to buffer cache,
very very fast.  Might as well use "hdparm -T <any raw device>"
and see how fast you can write to buffer cache in one simple go?

Last time I looked, dd uses normal file system io if "of" is not
a raw device.  Which means we're merrily going through the buffer
cache.

So what is the point of testing like this? Or am I missing something?
hopehope_123 - 26 May 2005 07:07 GMT
Hi Nuno ,

Thank you very much for your correction. In fact , direct io is enabled
by default for the ocfs. But since dd uses normal file io, this test
fails. ( There exists  a version of dd , cp commands for linux which
uses direct io also.)

the point here is just to see  whether the symptom tou mentiones exists
on my stsem . ( 32kb. io barrier)

Kind Regards,
tolga
Noons - 26 May 2005 15:36 GMT
hopehope_123 apparently said,on my timestamp of 26/05/2005 4:07 PM:

> by default for the ocfs. But since dd uses normal file io, this test
> fails. ( There exists  a version of dd , cp commands for linux which
> uses direct io also.)

No worries, now I get it.  There is a howto on the Linux Documentation
Project website that goes in detail to all the tools available for
accurate testing. Worth a search for these bits of doco:
use "IO Performance" and have a quiet read.  The problem with raw disk
IO (and hence any IO that uses raw disks as base, including f/s)
for 2.4 kernels is described in detail in one of them.  Fixed on 2.4.17
onwards with the vary-io patch which is also referred in the Oracle
doco mentioned before.

I'm toying around with test suites based on dd, 1M-32K-8K-4K-2K IO size
for both cooked(file system) and raw IO, similar to yours.  Some very
surprising results in my systems, with net-based raid as well as native
disks, raw and ext3!  Once I finish making sense of the results will
pop the scripts here or on dizwell for others to try.

> the point here is just to see  whether the symptom tou mentiones exists
> on my stsem . ( 32kb. io barrier)

Really hard to click into unless raw: the file system layer masks
it all out for me.  In fact, after 8k it makes bugger all difference
what the IO size is if using ext3.

Signature

Cheers
Nuno Souto
in sunny Sydney, Australia
wizofoz2k@yahoo.com.au.nospam

Fabrizio - 26 May 2005 20:32 GMT
> Hi Friends ,
>
> Thank you very much for your replies , Fabrizio it is good to hear from
> you again , i have made your tests here , these are the results:

Always arounf (lurking). I'm too busy fighting against clusters, fiber
devices and SANs... and, of course, losing... :(

But I seems attracted by your post. ;)

Unfortunately I cannot add anything about your problem with the
multiblock read count. Probably Noons is right and you are hitting an
"i/o fragmentation" issue.

Instead I would be interested in the methodology you used for the result
you posted.
They appear... weird...

I'm going to provide the steps I followed to set up my test environment.

I choose a device where I was sure none was writing but me (you can test
it by runnning an iostat and querying the proc) in this way my metric
won't be tainted.

Then with two shells:

on the fist I'm going to write from another device (pseudo device since
it is /dev/zero).

on the other I probed the output device with the command:

iostat -x /dev/sda1 1

it gives me several line at 1 second distance.

I check that all the lines are zeros (*do not go for the first line of
an iostat because it is the avarage since system boot*).

When I go for the dd on the first shell (the parameter are chosen to
make the dd last less than one second) I see only one line with non-zero
values and that is the one I post and where I calculate my i/o rate.

In my opinion iostat is not the right tool for precision measurements.
You can always go and query the /proc before and after the dd and
calculate your own result.

> 3. redhat linux x86 , ext3 file system , aio is same with above (not
> active)
[quoted text clipped - 12 lines]
>
> 89240kb / 31 =  2878 kb.   (huh!)

This appears too good to be true... :(

May you try another set of measures?

Thank you.

Signature

Fabrizio Magni

fabrizio.magni@mycontinent.com

replace mycontinent with europe

Noons - 27 May 2005 10:32 GMT
> In my opinion iostat is not the right tool for precision measurements.
> You can always go and query the /proc before and after the dd and
> calculate your own result.

Same problem here.  I've quite taken to:
watch --interval 5 cat /proc/partitions
and then keep an eye on the relevant columns.

> This appears too good to be true... :(

Dunno:  just got 66Mb/s sustained off the Xserve!  ;)
hopehope_123 - 27 May 2005 12:38 GMT
Hi friends,

Fabrizio , thank you very much for your corrections. Here is more
accurate results:

2. same server: redhat linux advanced server ia64 , ext3 file system ,

[oracle@tanidw1 tmp]$ time dd if=/dev/zero of=/oracle/stagetmp/tmp/test
bs=4096k count=5
5+0 records in
5+0 records out

real    0m0.063s
user    0m0.000s
sys     0m0.063s

Device:  rrqm/s  wrqm/s   r/s    w/s        rsec/s  wsec/s    avgrq-sz
avgqu-sz   await    svctm  %util
sdl1       0.00    4678.00  0.00  456.00    0.00    41072.00    90.07
91.11      199.79   0.89    41.60

41072.00 / 2=20536 kb.   20536/456 = 45kb. per read

4. sun solaris

device       r/s    w/s      kr/s   kw/s      wait actv  svc_t  %w  %b
sd81         0.0  105.0    0.0    20483.3  4.8   3.2   76.7  11     30

195KB. per write

I have logged a tar for the bug issue.

Kind Regards,
tolga
hopehope_123 - 31 May 2005 15:23 GMT
Dear Friends ,

I have logged a tar for the bug issue, but oracle says :

Oracle Bug:4039598 is an Internal bug and it's for RH 3.0 x86 (2.4.21)
and you running RH 2.1 IA64 (2.4.18)
The bug it's not releated to your system

But i believe i have the sypmtoms. Little bit confused now.

tolga
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2010 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.