Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / Informix Topics / November 2007

Tip: Looking for answers? Try searching our database.

long checkpoints

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Apostrof - 21 Nov 2007 13:01 GMT
hi,
i have some problem with the long running checkpoints in IDS 7.31.UD8
on HP/UX.
every day a batch utility making hig volume of insert and updates. and
this batch works about 5-10 minutes(checkpoint times included).
during this batch load one or two (long) checkpoints occur.
(unfortunately blocking checkpoints)

17:32:53  Checkpoint Completed:  duration was 160 seconds.
17:32:53  Checkpoint loguniq 1250, logpos 0x1842484
17:38:59  Checkpoint Completed:  duration was 155 seconds.
17:38:59  Checkpoint loguniq 1253, logpos 0x8fa098

some of the onconfig parameters are

PHYSDBS         rootdbs         # Location (dbspace) of physical log
PHYSFILE        100000          # Physical log file size (Kbytes)

LOGFILES        40              # Number of logical log files
LOGSIZE         20000           # Logical log size (Kbytes)
LOGSMAX         50

LOCKS           150000          # Maximum number of locks
BUFFERS         200000          # Maximum number of shared buffers
NUMAIOVPS                       # Number of IO vps
PHYSBUFF        64              # Physical log buffer size (Kbytes)
LOGBUFF         64              # Logical log buffer size (Kbytes)
CLEANERS        50              # Number of buffer cleaner processes
SHMBASE         0x0               # Shared memory base address
SHMVIRTSIZE     64000           # initial virtual shared memory
segment size
SHMADD          32000           # Size of new shared memory segments
(Kbytes)
SHMTOTAL        0               # Total shared memory (Kbytes).
0=>unlimited
CKPTINTVL       200             # Check point interval (in sec)
LRUS            50              # Number of LRU queues
LRU_MAX_DIRTY   60              # LRU percent dirty begin cleaning
limit
LRU_MIN_DIRTY   50              # LRU percent dirty end cleaning limit
LTXHWM          50              # Long transaction high water mark
percentage
LTXEHWM         60              # Long transaction high water mark
(exclusive)
TXTIMEOUT       0x12c             # Transaction timeout (in sec)
STACKSIZE       64              # Stack size (Kbytes)

when checkpoint starts i look at onstat -F output and see one or two
lines with state C. the chunk writes takes much of the checkpoint
time.
i changed the LRU_MAX_DIRTY:10, LRU_MIN_DIRTY:5 and try the batch
again. but this time due to LRU writes
the batch utility gets slower and finishes its job about at 20minutes.
i've read in some notes that LRU writes is worse than chunk writes.
so what can i do to make this batch process faster and shorten
checkpoint times?
thanks

Abdullah
Art S. Kagel - 21 Nov 2007 13:21 GMT
> hi,

Abdullah,

You've got a lot going on.  First thing I see is there should be a
checkpoint between the two you've published since you have CKPTINTVL
set to 200 or 3 mins 20 secs.  Hopefully this is just because the
intervening chkpt was shorter so you didn't post it.

I agree that your settings for LRU_MIN/MAX_DIRTY are way to high for
this kind of system and I would go further even than your attempt at
moving from 50/60 to 10/5 and go all the way to 2/1.

Now, what might have happened when you reduced the LRU flush settings
to cause LRU writes (yes any significant number of these - say more
than 10 - is death to throughput)?  I would guess that you need more
BUFFERS.  What do your server's metrics look like?  Post onstat -D,
onstat -p, and the first 20 and last 20 lines of the onstat -P output
with the time since the stats were last zero'd (onstat -z) and I'll
try to calculate them for you.

I suspect that at least part of the problem is slow disk writes.  This
wouldn't be a RAID5 setup and/or using COOKED chunks would it?  RAID5
alone will slow down your updates/bulk loads SIGNIFICANTLY.
Similarly, you have only the default AIO VPs configured if you are
using COOKED filesystem chunks that will also be slowing you down.
Please post more configuration information.

Art S. Kagel

> i have some problem with the long running checkpoints in IDS 7.31.UD8
> on HP/UX.
[quoted text clipped - 54 lines]
>
> Abdullah
Apostrof - 21 Nov 2007 13:46 GMT
> > hi,
>
[quoted text clipped - 84 lines]
>
> > Abdullah

Hi Art,
our server has 4gb memory and informix is using 500mb of it. there is
another application which is using 1-2gb of memory. the batch utility
is part of this application.
we are using cooked chunks. there is no raid5 but hp mirroring tool is
used with volume groups where the chunks resides.
you are right with the checkpoint information. there is only one
checkpoint in my sample output while the batch utility is working.
after batch we get an onunload backup. i think th other checkpoint
takes place while we take this backup. sometimes one sometimes two
checkpoints occurs during the batch utility. second checkpoint may
occur during backup.
these are the outputs you want.

17:32:53  Checkpoint Completed:  duration was 160 seconds.
17:32:53  Checkpoint loguniq 1250, logpos 0x1842484
17:33:05  Logical Log 1250 Complete.
17:33:07  Process exited with return code 156: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1250 Complete." "Logical
Log 1250 Complete."
17:33:52  Logical Log 1251 Complete.
17:33:53  Process exited with return code 156: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1251 Complete." "Logical
Log 1251 Complete."
17:35:19  Logical Log 1252 Complete.
17:35:20  Process exited with return code 156: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1252 Complete." "Logical
Log 1252 Complete."
17:38:59  Checkpoint Completed:  duration was 155 seconds.
17:38:59  Checkpoint loguniq 1253, logpos 0x8fa098

onstat -D output:
-----------------

IBM Informix Dynamic Server Version 7.31.UD8     -- On-Line -- Up 2
days 07:08:22 -- 501872 Kbytes

Dbspaces
address  number   flags    fchunk   nchunks  flags    owner    name
e41dc158 1        1        1        1        N        informix rootdbs
e41ddf38 2        1        2        1        N        informix
llogspace
e4206b48 3        1        3        4        N        informix datadbs
e4206c08 4        2001     7        1        N T      informix tempdbs
4 active, 2047 maximum

Chunks
address  chk/dbs offset   page Rd  page Wr  pathname
e41dc218 1   1   0        4114     102979   /work_a1/db/rootchunk
e41ddc20 2   2   0        9969     108051   /data1/logs/llogchunk1
e41ddd28 3   3   0        9313329  14220    /data1/db/datachunk1
e41dde30 4   3   0        7721858  116564   /data1/db/datachunk2
e4206830 5   3   0        5        0        /data1/db/datachunk3
e4206938 6   3   0        5        0        /data1/db/datachunk4
e4206a40 7   4   0        92433    92610    /work_a1/db/tempchunk
7 active, 2047 maximum

onstat -p output:
-----------------

IBM Informix Dynamic Server Version 7.31.UD8     -- On-Line -- Up 2
days 07:09:13 -- 501872 Kbytes

Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
13163890 17141713 73963987 82.20   141207   434424   2013940  92.99

isamtot  open     start    read     write    rewrite  delete
commit   rollbk
15105310 93462    1625540  7330333  521236   2764     382820
4434     0

gp_read  gp_write gp_rewrt gp_del   gp_alloc gp_free  gp_curs
0        0        0        0        0        0        0

ovlock   ovuserthread ovbuff   usercpu  syscpu   numckpts flushes
0        0            0        1473.22  635.22   50       1870

bufwaits lokwaits lockreqs deadlks  dltouts  ckpwaits compress
seqscans
705363   0        102952514 0        0        13       38773
13827

ixda-RA  idx-RA   da-RA    RA-pgsused lchwaits
31895    2292     11835066 11832481   675752

onstat -P | head -20
--------------------

IBM Informix Dynamic Server Version 7.31.UD8     -- On-Line -- Up 2
days 07:10:38 -- 501872 Kbytes
partnum  total    btree    data     other    resident dirty
0        8529     7575     788      166      0        0
1048578  2        1        1        0        0        0
1048579  10       5        5        0        0        0
1048580  1        1        0        0        0        0
1048582  1        1        0        0        0        0
1048584  3        2        1        0        0        0
1048595  1        1        0        0        0        0
1048606  1        1        0        0        0        0
1048703  1        1        0        0        0        0
3145730  26       11       15       0        0        0
3145731  1        1        0        0        0        0
3145732  46       16       30       0        0        0
3145733  7        2        5        0        0        0
3145734  10       5        5        0        0        0
3145735  2        1        1        0        0        0
3145736  8        4        4        0        0        0
3145737  2        1        1        0        0        0

onstat -P | tail -20
--------------------

3145888  6        0        6        0        0        0
3145891  17       0        17       0        0        0
3145892  154      0        154      0        0        0
3145893  4        0        4        0        0        0
3145894  1        0        1        0        0        0
3145895  7        0        7        0        0        0
3145899  1        1        0        0        0        0
3145900  1        0        1        0        0        0
3145901  1        0        1        0        0        0
3145902  478      1        477      0        0        0
3145903  22       3        19       0        0        0
3145904  60       0        60       0        0        0

Totals:  200000   146840   52876    284      0        0

Percentages:
Data  26.44
Btree 73.42
Other 0.14
Superboer - 21 Nov 2007 14:38 GMT
i suggest try to tune this first on a test box!!

avoid cooked.... USE RAW. and therefor grab more buffers....

afaikr HP has a kernel param which says how much memory will be used
for unix FS cache.
this can take up 50 %... (DBC_MAX_PCT) you could set this to 20 or 10
and set DBC_MIN_PCT to 5

this memory can not be used for informix memory.... when DBC_MAX_PCT
is 50....

lru cleaning is less efficient as checkpoint cleaning.
lru is/can be random while checkpoint cleaning will sort the
buffers to be written so it can do bigger i/o's (16 pages max....)

you could try to do your own checkpoints...
use onmode -B followed by onmode -c
onmode -B is not doced and does cleaning the same as a checkpoint, but
does not block anything.
if however your i/o subsystem does not cope then you are still in
trouble.  regardless of onmode -B.
post an onstat -R -r 5  during checkpointing; also look at sar/iostat
how busy your disks are..
maybe one is full and the rest is doing nothing......

Per Art : NO RAID 5 NO RAID 5 NO RAID 5!!!!!!!!!!!!!

Further:

NUMAIOVPS                       # Number of IO vps  please set this to
a value!!!!!!

i did not see resident you may want to set it to 1 or -1 or -2.
you may consider kernel i/o -->> check rel notes.

Superboer.

way fast=http://www.clipjes.nl/clip/nederlands/n/normaal_-
_oerend_hard.html
Art S. Kagel - 21 Nov 2007 15:31 GMT
<Previous post SNIPPED>
> Hi Art,

Hi,

Your server's healthy overall.  Metrics look good:

BR = 3.68
BTR = 1.73/hour
RAU = 99.69%  - not perfect but acceptable It probably could no hurt
to reduce the RA threshhold be 50%

> our server has 4gb memory and informix is using 500mb of it. there is
> another application which is using 1-2gb of memory. the batch utility
> is part of this application.
> we are using cooked chunks. there is no raid5 but hp mirroring tool is

COOKED chunks without NUMAIOVPS set properly is your BIG problem.  You
have 7 chunks and on 7.31 you'll need 1.5 AIO VPs per chunk plus a few
extra for message log and other cooked IO (like SET EXPLAIN output).
That means I'd set:
      NUMAIOVPS  16
To begin with and monitor onstat -g iov over a normal workload
period.  If you see that there are one or more AIO VPs with io/wup >=
1.0 it means that at least some of the time you have IOs waiting for a
VP to free up so you need to increase the number of AIO VPs for the
next restart.  If there are any AIO VPs showing io/wup == 0.0 you can
reduce NUMAIOVPS to eliminate them.

I still think that you could benefit from more buffers and I suspect
that is the reason that onstat -P shows that 73% of your buffer cache
is dedicated to index pages.  I'm guessing that index pages and data
pages are thrashing the cache a bit.  That's why your Read Cache hit
rate is only 92.9% and Write Cache hit rate is only 82.2%.  You want
to see these above 95% and 85% (90 would be better but that's
application dependent) and it is also part of the cause of the LRU
Writes - though the AIO VP configuration is the biggest culprit.

Art S. Kagel

Finally, I notice that there is almost as much write activity in the
ROOTDB dbspace as in the data dbspace.  This is almost entirely
because you have the physical log configured there.  You should move
the physical log to an separate dbspace, preferably on it's own disk
structure away from data, root, and logical logs as much as possible.
Notice that your data write activity, physical log write activity and
logical log write activity are all about the same volume.  So, the
more you can isolate these from each other the better checkpoints and
other physical write operations will perform.

> used with volume groups where the chunks resides.
> you are right with the checkpoint information. there is only one
[quoted text clipped - 123 lines]
> Btree 73.42
> Other 0.14
Keith Simmons - 21 Nov 2007 15:45 GMT
I recall an issue on 7.3 engines where the output from onstat -P
showed high index use and low data use (as in this case). It is caused
by the index pages getting too high a priority. Can't remember the
case number, but there is an onconfig or environment variable that can
be set to force a much higher data usage.

Keith

> <Previous post SNIPPED>
> > Hi Art,
[quoted text clipped - 178 lines]
> Informix-list@iiug.org
> http://www.iiug.org/mailman/listinfo/informix-list
Christian Knappke - 22 Nov 2007 10:04 GMT
From the keyboard of "Keith Simmons" <smiley73@googlemail.com>:

> I recall an issue on 7.3 engines where the output from onstat -P
> showed high index use and low data use (as in this case). It is
> caused by the index pages getting too high a priority. Can't
> remember the case number, but there is an onconfig or
> environment variable that can be set to force a much higher data
> usage.

You mean LRUAGE = 1 in the engine's environment?

It was one of our standard "must set this" recommendations for 7.31.

Regards
Christian
Signature

#include <std_disclaimer.h>
/* The opinions stated above are my own and not
  necessarily those of my employer. */

Apostrof - 22 Nov 2007 16:45 GMT
> From the keyboard of "Keith Simmons" <smile...@googlemail.com>:
>
[quoted text clipped - 15 lines]
> /* The opinions stated above are my own and not
>    necessarily those of my employer. */

i am starting to think that the main problem is slow disk access.
because after all configuration changes and tries nothing got better
in checkpoint times. they are all high.
today first i changed the onconfig and set LRUAGE environment to 1.
unset the NUMAIOVPS unset again,because in this setting it also
creates two aio for each chunk. (i saw it in the onstat -g iov output)
then started the instance.
here is the related lines from online.log

16:45:06  Onconfig parameter BUFFERS modified from 200000 to 300000.
16:45:06  Onconfig parameter CLEANERS modified from 50 to 8.
16:45:06  Onconfig parameter NUMAIOVPS modified from 24 to 2.

and the lines from online.log about checkpoints while the batch
utility was working.

17:29:57  Logical Log 1110 Complete.
17:29:58  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1110 Complete." "Logical
Log 1110 Complete."
17:34:03  Checkpoint Completed:  duration was 160 seconds.
17:34:03  Checkpoint loguniq 1111, logpos 0x14f8308

17:34:19  Logical Log 1111 Complete.
17:34:20  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1111 Complete." "Logical
Log 1111 Complete."
17:34:54  Logical Log 1112 Complete.
17:34:55  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1112 Complete." "Logical
Log 1112 Complete."
17:35:41  Logical Log 1113 Complete.
17:35:42  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1113 Complete." "Logical
Log 1113 Complete."
17:40:33  Checkpoint Completed:  duration was 178 seconds.
17:40:33  Checkpoint loguniq 1114, logpos 0x230539c

during the checkpoints sar -d 5 20 output is like this:

Average    c1t0d0   90.34    0.50     148    1906    4.94    9.52
Average    c2t0d0   73.89    0.50     117    1450    4.93    9.12
Average    c1t2d0    1.03    0.50       2      19    3.24    9.39
Average    c2t2d0    1.02    0.50       2      18    3.53    8.18

all the database files resides on the disks with high load. most of
the volumes on the first two disks are mirrored.

after this i created a new volume (unfortunately on the same disks)
and move the physical log to that dbspace. but didn't change the size.
and undo the config changes that i made in previous step.

17:58:38  Onconfig parameter PHYSDBS modified from rootdbs to
physlogdbs.
17:58:38  Onconfig parameter BUFFERS modified from 300000 to 200000.
17:58:38  Onconfig parameter CLEANERS modified from 8 to 50.

i did not understand why but the checkpoint times get worse in this
case. can it be because of sar command?

18:05:58  Checkpoint Completed:  duration was 228 seconds.
18:05:58  Checkpoint loguniq 1117, logpos 0x148a748

18:06:18  Logical Log 1117 Complete.
18:06:20  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1117 Complete." "Logical
Log 1117 Complete."
18:07:14  Logical Log 1118 Complete.
18:07:15  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1118 Complete." "Logical
Log 1118 Complete."
18:08:58  Logical Log 1119 Complete.
18:08:59  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1119 Complete." "Logical
Log 1119 Complete."
18:15:29  Checkpoint Completed:  duration was 359 seconds.
18:15:29  Checkpoint loguniq 1120, logpos 0x3cb328

some onstat -g iov outputs when checkpoint got started and going on.

IBM Informix Dynamic Server Version 7.31.UD8     -- On-Line (CKPT REQ)
-- Up 00:04:46 -- 501872 Kbytes
Blocked:CKPT

AIO I/O vps:
class/vp s  io/s totalops  dskread dskwrite  dskcopy  wakeups  io/wup
errors
 msc  0 i   0.2       48        0        0        0       49
1.0       0
 aio  0 s 292.2    83568    81777     1732        0    82030
1.0       0
 aio  1 s 118.1    33790    33533      255        0    34494
1.0       0
 aio  2 i   5.3     1523     1337      184        0     2039
0.7       0
 aio  3 s   1.3      370      207      161        0      320
1.2       0
 aio  4 i   0.9      248      113      133        0      208
1.2       0
 aio  5 i   0.5      148       28      118        0      102
1.5       0
 aio  6 i   0.5      145       22      122        0       94
1.5       0
 aio  7 i   0.4      127       18      108        0       85
1.5       0
 aio  8 i   0.5      132       22      110        0       76
1.7       0
 aio  9 i   0.4      120       22       98        0       80
1.5       0
 aio 10 i   0.4      106       14       92        0       71
1.5       0
 aio 11 i   0.4      108       12       96        0       74
1.5       0
 aio 12 i   0.4      101       15       86        0       68
1.5       0
 aio 13 s   0.4      103       21       82        0       64
1.6       0
 aio 14 s   0.3       92        7       85        0       61
1.5       0
 aio 15 s   0.3       83        5       78        0       60
1.4       0
 pio  0 i   3.0      856        0      856        0      857
1.0       0
 lio  0 i   2.9      824        0      824        0      825
1.0       0

/usr/informix>onstat -g iov

IBM Informix Dynamic Server Version 7.31.UD8     -- On-Line (CKPT REQ)
-- Up 00:05:18 -- 501872 Kbytes
Blocked:CKPT

AIO I/O vps:
class/vp s  io/s totalops  dskread dskwrite  dskcopy  wakeups  io/wup
errors
 msc  0 i   0.2       48        0        0        0       49
1.0       0
 aio  0 i 273.6    86992    85127     1806        0    85401
1.0       0
 aio  1 s 108.9    34627    34232      393        0    35314
1.0       0
 aio  2 s   5.5     1763     1438      323        0     2271
0.8       0
 aio  3 i   1.9      614      314      298        0      558
1.1       0
 aio  4 i   1.2      390      125      263        0      350
1.1       0
 aio  5 i   1.0      306       56      248        0      249
1.2       0
 aio  6 i   0.9      287       40      246        0      236
1.2       0
 aio  7 i   0.8      248       20      227        0      209
1.2       0
 aio  8 s   0.8      245       28      217        0      188
1.3       0
 aio  9 i   0.7      215       24      191        0      177
1.2       0
 aio 10 s   0.6      198       31      167        0      164
1.2       0
 aio 11 i   0.5      172       15      157        0      138
1.2       0
 aio 12 i   0.5      151       15      136        0      120
1.3       0
 aio 13 i   0.5      151       21      130        0      112
1.3       0
 aio 14 i   0.4      135        7      128        0      104
1.3       0
 aio 15 i   0.4      127        8      119        0      100
1.3       0
 pio  0 i   2.7      856        0      856        0      857
1.0       0
 lio  0 i   2.6      824        0      824        0      825
1.0       0

i will try to arrange a system reboot to change kernel parameters for
file system cache. may be it helps.
any other ideas?
Christian Knappke - 23 Nov 2007 13:44 GMT
From the keyboard of Apostrof <abdullah.akoglu@gmail.com>:

> during the checkpoints sar -d 5 20 output is like this:
>
[quoted text clipped - 5 lines]
> all the database files resides on the disks with high load. most
> of the volumes on the first two disks are mirrored.

So all chunks are on just two devices. How many physical disks
hide behind one device? As you have four chunks in your datadbs,
two of them show I/O activity and the Informix release is 7.31,
the whole database cannot be larger than four Gig. I assume that
there is only one physical disk behind every device. When you have
parallel I/O requests to different files/chunks on the same disk,
you only earn I/O contention. Have you checked I/O waits?

BTW, how many physical CPUs are in the server?

Anyway, I'd recommend:

- make sure that chunk and mirror are on separate physical disks
- move logical and physical log to separate disks and separate
 from data disks
- set LRUS = (# of physical CPUs * 2); minimum 4
- set CLEANERS = (LRUS * 2)
- set NUMAIOVPS = (# of physical disks); minimum 2

Tune the latter according to "onstat -g iov" output:
(sample values)

AIO I/O vps:                                                    
class/vp s io/s totalops dskread dskwrite dskcopy wakeups io/wup
 msc  0 i  0.0      113       0        0       0     114  1.0  
 aio  0 i  0.2   138065  137679      383       0  137945  1.0  
 aio  1 i  0.0      892     483      409       0     803  1.1  
 aio  2 i  0.0      249      24      224       0     133  1.9  
 pio  0 i  0.0       30       0       29       0      65  0.5  
 pio  1 i  0.0       29       0       29       0      30  1.0  
 lio  0 i  0.0      278       0      277       0     312  0.9  
 lio  1 i  0.0      275       0      275       0     276  1.0  

look at the lines that start with "aio": if the most work
("totalops") is done by the first AOI VP(s), then there are enough
of them.

If you have the chance to move to raw devices and KAIO, then do
it. The same recommendations apply, only set NUMAIOVPS = 2 and
NUMCPUVPS = floor((# of physical CPUs - 1) / 2)[1] if there is
more than one.

Christian
[1] on a server that also runs the application. On a standalone DB
server you can go up to NUMCPUVPS = (# of physical CPUs)
Signature

#include <std_disclaimer.h>
/* The opinions stated above are my own and not
  necessarily those of my employer. */

Apostrof - 23 Nov 2007 14:46 GMT
> From the keyboard of Apostrof <abdullah.ako...@gmail.com>:
>
[quoted text clipped - 57 lines]
> /* The opinions stated above are my own and not
>    necessarily those of my employer. */

i will try those for LRU, CLEANERS and NUMAIOVPS.
there is only one disk for each devices shown in output of sar.
there are two physical cpu's.
by the way what do you mean by "whole database cannot be larger than
four Gig" ? is there any limit for the database size in this release
of informix?

dropping indexes and recreating them after batch utility will not be
useful for me. because users start working as soon as batch utility
finishes its job. index create time will be longer than the
checkpoints.
Christian Knappke - 23 Nov 2007 14:57 GMT
From the keyboard of Apostrof <abdullah.akoglu@gmail.com>:

> by the way what do you mean by "whole database cannot be larger
> than four Gig" ?

I did not mean that the DB cannot grow beyond 4 GB. There are two
chunks that are used. 7.31 chunks cannot be larger than 2 GB. From
that I derived that there is not more than 4 GB data in the DB.

> is there any limit for the database size in
> this release of informix?

Yes. The bottom line of onstat -d tells you: max 2047 chunks. Times
2 GB means max ~4 TB.

To get on topic again: depending on the data model it may also help
to not throw all tables into one big bucket but to separate the most
wanted tables and distribute over more dbspaces. But that affords
more physical disks.

regards
Christian
Signature

#include <std_disclaimer.h>
/* The opinions stated above are my own and not
  necessarily those of my employer. */

Superboer - 23 Nov 2007 16:25 GMT
your sar says

90% busy doing 148 r/w sec which is 1906 blocks

meaning aprox 1 MB/s a sec for a disk.... having an average of 6 Kb in
size...

onstat -Rr |grep dirty during checkpoint will tell you how many pages
per 5 secs are written.
Also it will tell how many pages are really dirty....

The program you have may be useless when you try this on cooked files.

(may be this: fd = open( "cifx_204", O_WRONLY|O_SYNC);  will bypass
the cache..... dono need to test it...
have played in the past with such a writer (sorry do not recall if it
had O_SYNC...) and that yielded in unpredictable results
because of the fs cache.!!!!!!!!!)

You need to do this on char mode raw devs.

the problem is that unix  file system cache is spoiling the
measurement.
with a small number of i/os you can get high numbers, but with a
database size amount
of action the filesystem cache will kick in for sure.

your tune job will not be easy.

i would setup a test system with char mode raw devs and start tuning
there.

also more disks... striping/mirroring???!!!!!!!!!!!

also have a look at the stripe size... 2 times maxio = 2 * 16 pages =
64k or 128 kb in order to avoid split writes....

Superboer

way fast=http://www.clipjes.nl/clip/nederlands/n/normaal_-
_oerend_hard.html

> From the keyboard of Apostrof <abdullah.ako...@gmail.com>:
>
[quoted text clipped - 22 lines]
> /* The opinions stated above are my own and not
>    necessarily those of my employer. */
Apostrof - 22 Nov 2007 17:04 GMT
> From the keyboard of "Keith Simmons" <smile...@googlemail.com>:
>
[quoted text clipped - 15 lines]
> /* The opinions stated above are my own and not
>    necessarily those of my employer. */

i am starting to think that the main problem is slow disk access.
because after all configuration changes and tries nothing got better
in checkpoint times. they are all high.
today first i changed the onconfig and set LRUAGE environment to 1.
unset the NUMAIOVPS unset again,because in this setting it also
creates two aio for each chunk. (i saw it in the onstat -g iov output)
then started the instance.
here is the related lines from online.log

16:45:06  Onconfig parameter BUFFERS modified from 200000 to 300000.
16:45:06  Onconfig parameter CLEANERS modified from 50 to 8.
16:45:06  Onconfig parameter NUMAIOVPS modified from 24 to 2.

and the lines from online.log about checkpoints while the batch
utility was working.

17:29:57  Logical Log 1110 Complete.
17:29:58  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1110 Complete." "Logical
Log 1110 Complete."
17:34:03  Checkpoint Completed:  duration was 160 seconds.
17:34:03  Checkpoint loguniq 1111, logpos 0x14f8308

17:34:19  Logical Log 1111 Complete.
17:34:20  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1111 Complete." "Logical
Log 1111 Complete."
17:34:54  Logical Log 1112 Complete.
17:34:55  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1112 Complete." "Logical
Log 1112 Complete."
17:35:41  Logical Log 1113 Complete.
17:35:42  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1113 Complete." "Logical
Log 1113 Complete."
17:40:33  Checkpoint Completed:  duration was 178 seconds.
17:40:33  Checkpoint loguniq 1114, logpos 0x230539c

during the checkpoints sar -d 5 20 output is like this:

Average    c1t0d0   90.34    0.50     148    1906    4.94    9.52
Average    c2t0d0   73.89    0.50     117    1450    4.93    9.12
Average    c1t2d0    1.03    0.50       2      19    3.24    9.39
Average    c2t2d0    1.02    0.50       2      18    3.53    8.18

all the database files resides on the disks with high load. most of
the volumes on the first two disks are mirrored.

after this i created a new volume (unfortunately on the same disks)
and move the physical log to that dbspace. but didn't change the size.
and undo the config changes that i made in previous step.

17:58:38  Onconfig parameter PHYSDBS modified from rootdbs to
physlogdbs.
17:58:38  Onconfig parameter BUFFERS modified from 300000 to 200000.
17:58:38  Onconfig parameter CLEANERS modified from 8 to 50.

i did not understand why but the checkpoint times get worse in this
case. can it be because of sar command?

18:05:58  Checkpoint Completed:  duration was 228 seconds.
18:05:58  Checkpoint loguniq 1117, logpos 0x148a748

18:06:18  Logical Log 1117 Complete.
18:06:20  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1117 Complete." "Logical
Log 1117 Complete."
18:07:14  Logical Log 1118 Complete.
18:07:15  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1118 Complete." "Logical
Log 1118 Complete."
18:08:58  Logical Log 1119 Complete.
18:08:59  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1119 Complete." "Logical
Log 1119 Complete."
18:15:29  Checkpoint Completed:  duration was 359 seconds.
18:15:29  Checkpoint loguniq 1120, logpos 0x3cb328

some onstat -g iov outputs when checkpoint got started and going on.

IBM Informix Dynamic Server Version 7.31.UD8     -- On-Line (CKPT REQ)
-- Up 00:04:46 -- 501872 Kbytes
Blocked:CKPT

AIO I/O vps:
class/vp s  io/s totalops  dskread dskwrite  dskcopy  wakeups  io/wup
errors
 msc  0 i   0.2       48        0        0        0       49
1.0       0
 aio  0 s 292.2    83568    81777     1732        0    82030
1.0       0
 aio  1 s 118.1    33790    33533      255        0    34494
1.0       0
 aio  2 i   5.3     1523     1337      184        0     2039
0.7       0
 aio  3 s   1.3      370      207      161        0      320
1.2       0
 aio  4 i   0.9      248      113      133        0      208
1.2       0
 aio  5 i   0.5      148       28      118        0      102
1.5       0
 aio  6 i   0.5      145       22      122        0       94
1.5       0
 aio  7 i   0.4      127       18      108        0       85
1.5       0
 aio  8 i   0.5      132       22      110        0       76
1.7       0
 aio  9 i   0.4      120       22       98        0       80
1.5       0
 aio 10 i   0.4      106       14       92        0       71
1.5       0
 aio 11 i   0.4      108       12       96        0       74
1.5       0
 aio 12 i   0.4      101       15       86        0       68
1.5       0
 aio 13 s   0.4      103       21       82        0       64
1.6       0
 aio 14 s   0.3       92        7       85        0       61
1.5       0
 aio 15 s   0.3       83        5       78        0       60
1.4       0
 pio  0 i   3.0      856        0      856        0      857
1.0       0
 lio  0 i   2.9      824        0      824        0      825
1.0       0

/usr/informix>onstat -g iov

IBM Informix Dynamic Server Version 7.31.UD8     -- On-Line (CKPT REQ)
-- Up 00:05:18 -- 501872 Kbytes
Blocked:CKPT

AIO I/O vps:
class/vp s  io/s totalops  dskread dskwrite  dskcopy  wakeups  io/wup
errors
 msc  0 i   0.2       48        0        0        0       49
1.0       0
 aio  0 i 273.6    86992    85127     1806        0    85401
1.0       0
 aio  1 s 108.9    34627    34232      393        0    35314
1.0       0
 aio  2 s   5.5     1763     1438      323        0     2271
0.8       0
 aio  3 i   1.9      614      314      298        0      558
1.1       0
 aio  4 i   1.2      390      125      263        0      350
1.1       0
 aio  5 i   1.0      306       56      248        0      249
1.2       0
 aio  6 i   0.9      287       40      246        0      236
1.2       0
 aio  7 i   0.8      248       20      227        0      209
1.2       0
 aio  8 s   0.8      245       28      217        0      188
1.3       0
 aio  9 i   0.7      215       24      191        0      177
1.2       0
 aio 10 s   0.6      198       31      167        0      164
1.2       0
 aio 11 i   0.5      172       15      157        0      138
1.2       0
 aio 12 i   0.5      151       15      136        0      120
1.3       0
 aio 13 i   0.5      151       21      130        0      112
1.3       0
 aio 14 i   0.4      135        7      128        0      104
1.3       0
 aio 15 i   0.4      127        8      119        0      100
1.3       0
 pio  0 i   2.7      856        0      856        0      857
1.0       0
 lio  0 i   2.6      824        0      824        0      825
1.0       0

i will try to arrange a system reboot to change kernel parameters for
file system cache. may be it helps.
any other ideas?
Apostrof - 22 Nov 2007 17:33 GMT
> From the keyboard of "Keith Simmons" <smile...@googlemail.com>:
>
[quoted text clipped - 15 lines]
> /* The opinions stated above are my own and not
>    necessarily those of my employer. */

i am starting to think that the main problem is slow disk access.
because after all configuration changes and tries nothing got better
in checkpoint times. they are all high.
today first i changed the onconfig and set LRUAGE environment to 1.
unset the NUMAIOVPS unset again,because in this setting it also
creates two aio for each chunk. (i saw it in the onstat -g iov output)
then started the instance.
here is the related lines from online.log

16:45:06  Onconfig parameter BUFFERS modified from 200000 to 300000.
16:45:06  Onconfig parameter CLEANERS modified from 50 to 8.
16:45:06  Onconfig parameter NUMAIOVPS modified from 24 to 2.

and the lines from online.log about checkpoints while the batch
utility was working.

17:29:57  Logical Log 1110 Complete.
17:29:58  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1110 Complete." "Logical
Log 1110 Complete."
17:34:03  Checkpoint Completed:  duration was 160 seconds.
17:34:03  Checkpoint loguniq 1111, logpos 0x14f8308

17:34:19  Logical Log 1111 Complete.
17:34:20  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1111 Complete." "Logical
Log 1111 Complete."
17:34:54  Logical Log 1112 Complete.
17:34:55  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1112 Complete." "Logical
Log 1112 Complete."
17:35:41  Logical Log 1113 Complete.
17:35:42  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1113 Complete." "Logical
Log 1113 Complete."
17:40:33  Checkpoint Completed:  duration was 178 seconds.
17:40:33  Checkpoint loguniq 1114, logpos 0x230539c

during the checkpoints sar -d 5 20 output is like this:

Average    c1t0d0   90.34    0.50     148    1906    4.94    9.52
Average    c2t0d0   73.89    0.50     117    1450    4.93    9.12
Average    c1t2d0    1.03    0.50       2      19    3.24    9.39
Average    c2t2d0    1.02    0.50       2      18    3.53    8.18

all the database files resides on the disks with high load. most of
the volumes on the first two disks are mirrored.

after this i created a new volume (unfortunately on the same disks)
and move the physical log to that dbspace. but didn't change the size.
and undo the config changes that i made in previous step.

17:58:38  Onconfig parameter PHYSDBS modified from rootdbs to
physlogdbs.
17:58:38  Onconfig parameter BUFFERS modified from 300000 to 200000.
17:58:38  Onconfig parameter CLEANERS modified from 8 to 50.

i did not understand why but the checkpoint times get worse in this
case. can it be because of sar command?

18:05:58  Checkpoint Completed:  duration was 228 seconds.
18:05:58  Checkpoint loguniq 1117, logpos 0x148a748

18:06:18  Logical Log 1117 Complete.
18:06:20  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1117 Complete." "Logical
Log 1117 Complete."
18:07:14  Logical Log 1118 Complete.
18:07:15  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1118 Complete." "Logical
Log 1118 Complete."
18:08:58  Logical Log 1119 Complete.
18:08:59  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1119 Complete." "Logical
Log 1119 Complete."
18:15:29  Checkpoint Completed:  duration was 359 seconds.
18:15:29  Checkpoint loguniq 1120, logpos 0x3cb328

some onstat -g iov outputs when checkpoint got started and going on.

IBM Informix Dynamic Server Version 7.31.UD8     -- On-Line (CKPT REQ)
-- Up 00:04:46 -- 501872 Kbytes
Blocked:CKPT

AIO I/O vps:
class/vp s  io/s totalops  dskread dskwrite  dskcopy  wakeups  io/wup
errors
 msc  0 i   0.2       48        0        0        0       49
1.0       0
 aio  0 s 292.2    83568    81777     1732        0    82030
1.0       0
 aio  1 s 118.1    33790    33533      255        0    34494
1.0       0
 aio  2 i   5.3     1523     1337      184        0     2039
0.7       0
 aio  3 s   1.3      370      207      161        0      320
1.2       0
 aio  4 i   0.9      248      113      133        0      208
1.2       0
 aio  5 i   0.5      148       28      118        0      102
1.5       0
 aio  6 i   0.5      145       22      122        0       94
1.5       0
 aio  7 i   0.4      127       18      108        0       85
1.5       0
 aio  8 i   0.5      132       22      110        0       76
1.7       0
 aio  9 i   0.4      120       22       98        0       80
1.5       0
 aio 10 i   0.4      106       14       92        0       71
1.5       0
 aio 11 i   0.4      108       12       96        0       74
1.5       0
 aio 12 i   0.4      101       15       86        0       68
1.5       0
 aio 13 s   0.4      103       21       82        0       64
1.6       0
 aio 14 s   0.3       92        7       85        0       61
1.5       0
 aio 15 s   0.3       83        5       78        0       60
1.4       0
 pio  0 i   3.0      856        0      856        0      857
1.0       0
 lio  0 i   2.9      824        0      824        0      825
1.0       0

/usr/informix>onstat -g iov

IBM Informix Dynamic Server Version 7.31.UD8     -- On-Line (CKPT REQ)
-- Up 00:05:18 -- 501872 Kbytes
Blocked:CKPT

AIO I/O vps:
class/vp s  io/s totalops  dskread dskwrite  dskcopy  wakeups  io/wup
errors
 msc  0 i   0.2       48        0        0        0       49
1.0       0
 aio  0 i 273.6    86992    85127     1806        0    85401
1.0       0
 aio  1 s 108.9    34627    34232      393        0    35314
1.0       0
 aio  2 s   5.5     1763     1438      323        0     2271
0.8       0
 aio  3 i   1.9      614      314      298        0      558
1.1       0
 aio  4 i   1.2      390      125      263        0      350
1.1       0
 aio  5 i   1.0      306       56      248        0      249
1.2       0
 aio  6 i   0.9      287       40      246        0      236
1.2       0
 aio  7 i   0.8      248       20      227        0      209
1.2       0
 aio  8 s   0.8      245       28      217        0      188
1.3       0
 aio  9 i   0.7      215       24      191        0      177
1.2       0
 aio 10 s   0.6      198       31      167        0      164
1.2       0
 aio 11 i   0.5      172       15      157        0      138
1.2       0
 aio 12 i   0.5      151       15      136        0      120
1.3       0
 aio 13 i   0.5      151       21      130        0      112
1.3       0
 aio 14 i   0.4      135        7      128        0      104
1.3       0
 aio 15 i   0.4      127        8      119        0      100
1.3       0
 pio  0 i   2.7      856        0      856        0      857
1.0       0
 lio  0 i   2.6      824        0      824        0      825
1.0       0

i will try to arrange a system reboot to change kernel parameters for
file system cache. may be it helps.
any other ideas?
Apostrof - 22 Nov 2007 17:56 GMT
> From the keyboard of "Keith Simmons" <smile...@googlemail.com>:
>
[quoted text clipped - 15 lines]
> /* The opinions stated above are my own and not
>    necessarily those of my employer. */

i am starting to think that the main problem is slow disk access.
because after all configuration changes and tries nothing got better
in checkpoint times. they are all high.
today first i changed the onconfig and set LRUAGE environment to 1.
unset the NUMAIOVPS unset again,because in this setting it also
creates two aio for each chunk. (i saw it in the onstat -g iov output)
then started the instance.
here is the related lines from online.log

16:45:06  Onconfig parameter BUFFERS modified from 200000 to 300000.
16:45:06  Onconfig parameter CLEANERS modified from 50 to 8.
16:45:06  Onconfig parameter NUMAIOVPS modified from 24 to 2.

and the lines from online.log about checkpoints while the batch
utility was working.

17:29:57  Logical Log 1110 Complete.
17:29:58  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1110 Complete." "Logical
Log 1110 Complete."
17:34:03  Checkpoint Completed:  duration was 160 seconds.
17:34:03  Checkpoint loguniq 1111, logpos 0x14f8308

17:34:19  Logical Log 1111 Complete.
17:34:20  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1111 Complete." "Logical
Log 1111 Complete."
17:34:54  Logical Log 1112 Complete.
17:34:55  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1112 Complete." "Logical
Log 1112 Complete."
17:35:41  Logical Log 1113 Complete.
17:35:42  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1113 Complete." "Logical
Log 1113 Complete."
17:40:33  Checkpoint Completed:  duration was 178 seconds.
17:40:33  Checkpoint loguniq 1114, logpos 0x230539c

during the checkpoints sar -d 5 20 output is like this:

Average    c1t0d0   90.34    0.50     148    1906    4.94    9.52
Average    c2t0d0   73.89    0.50     117    1450    4.93    9.12
Average    c1t2d0    1.03    0.50       2      19    3.24    9.39
Average    c2t2d0    1.02    0.50       2      18    3.53    8.18

all the database files resides on the disks with high load. most of
the volumes on the first two disks are mirrored.

after this i created a new volume (unfortunately on the same disks)
and move the physical log to that dbspace. but didn't change the size.
and undo the config changes that i made in previous step.

17:58:38  Onconfig parameter PHYSDBS modified from rootdbs to
physlogdbs.
17:58:38  Onconfig parameter BUFFERS modified from 300000 to 200000.
17:58:38  Onconfig parameter CLEANERS modified from 8 to 50.

i did not understand why but the checkpoint times get worse in this
case. can it be because of sar command?

18:05:58  Checkpoint Completed:  duration was 228 seconds.
18:05:58  Checkpoint loguniq 1117, logpos 0x148a748

18:06:18  Logical Log 1117 Complete.
18:06:20  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1117 Complete." "Logical
Log 1117 Complete."
18:07:14  Logical Log 1118 Complete.
18:07:15  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1118 Complete." "Logical
Log 1118 Complete."
18:08:58  Logical Log 1119 Complete.
18:08:59  Process exited with return code 163: /bin/sh /bin/sh -c /usr/
informix/etc/log_full.sh 2 23 "Logical Log 1119 Complete." "Logical
Log 1119 Complete."
18:15:29  Checkpoint Completed:  duration was 359 seconds.
18:15:29  Checkpoint loguniq 1120, logpos 0x3cb328

some onstat -g iov outputs when checkpoint got started and going on.

IBM Informix Dynamic Server Version 7.31.UD8     -- On-Line (CKPT REQ)
-- Up 00:04:46 -- 501872 Kbytes
Blocked:CKPT

AIO I/O vps:
class/vp s  io/s totalops  dskread dskwrite  dskcopy  wakeups  io/wup
errors
 msc  0 i   0.2       48        0        0        0       49
1.0       0
 aio  0 s 292.2    83568    81777     1732        0    82030
1.0       0
 aio  1 s 118.1    33790    33533      255        0    34494
1.0       0
 aio  2 i   5.3     1523     1337      184        0     2039
0.7       0
 aio  3 s   1.3      370      207      161        0      320
1.2       0
 aio  4 i   0.9      248      113      133        0      208
1.2       0
 aio  5 i   0.5      148       28      118        0      102
1.5       0
 aio  6 i   0.5      145       22      122        0       94
1.5       0
 aio  7 i   0.4      127       18      108        0       85
1.5       0
 aio  8 i   0.5      132       22      110        0       76
1.7       0
 aio  9 i   0.4      120       22       98        0       80
1.5       0
 aio 10 i   0.4      106       14       92        0       71
1.5       0
 aio 11 i   0.4      108       12       96        0       74
1.5       0
 aio 12 i   0.4      101       15       86        0       68
1.5       0
 aio 13 s   0.4      103       21       82        0       64
1.6       0
 aio 14 s   0.3       92        7       85        0       61
1.5       0
 aio 15 s   0.3       83        5       78        0       60
1.4       0
 pio  0 i   3.0      856        0      856        0      857
1.0       0
 lio  0 i   2.9      824        0      824        0      825
1.0       0

/usr/informix>onstat -g iov

IBM Informix Dynamic Server Version 7.31.UD8     -- On-Line (CKPT REQ)
-- Up 00:05:18 -- 501872 Kbytes
Blocked:CKPT

AIO I/O vps:
class/vp s  io/s totalops  dskread dskwrite  dskcopy  wakeups  io/wup
errors
 msc  0 i   0.2       48        0        0        0       49
1.0       0
 aio  0 i 273.6    86992    85127     1806        0    85401
1.0       0
 aio  1 s 108.9    34627    34232      393        0    35314
1.0       0
 aio  2 s   5.5     1763     1438      323        0     2271
0.8       0
 aio  3 i   1.9      614      314      298        0      558
1.1       0
 aio  4 i   1.2      390      125      263        0      350
1.1       0
 aio  5 i   1.0      306       56      248        0      249
1.2       0
 aio  6 i   0.9      287       40      246        0      236
1.2       0
 aio  7 i   0.8      248       20      227        0      209
1.2       0
 aio  8 s   0.8      245       28      217        0      188
1.3       0
 aio  9 i   0.7      215       24      191        0      177
1.2       0
 aio 10 s   0.6      198       31      167        0      164
1.2       0
 aio 11 i   0.5      172       15      157        0      138
1.2       0
 aio 12 i   0.5      151       15      136        0      120
1.3       0
 aio 13 i   0.5      151       21      130        0      112
1.3       0
 aio 14 i   0.4      135        7      128        0      104
1.3       0
 aio 15 i   0.4      127        8      119        0      100
1.3       0
 pio  0 i   2.7      856        0      856        0      857
1.0       0
 lio  0 i   2.6      824        0      824        0      825
1.0       0

i will try to arrange a system reboot to change kernel parameters for
file system cache. may be it helps.
any other ideas?
TBP - 21 Nov 2007 20:23 GMT
> <Previous post SNIPPED>
>> Hi Art,
[quoted text clipped - 24 lines]
> next restart.  If there are any AIO VPs showing io/wup == 0.0 you can
> reduce NUMAIOVPS to eliminate them.
<SNIP>

Well, I would ... err ... disagree.

Without NUMAIOVPS set, the server will allocate 2 AIO VPs per chunk, and
this instance using files where only 5 chunks are actually doing any
work, too many AIO VPs can actually cause a bottleneck.

I think an onstat -g ioa would be interesting along with an onstat -g glo.

Myself (just to be provocative) I would set CLEANERS to 8 and NUMAIOVPs
to 4 :D (Awaiting flame :o) )

And I would agree with superboer .... check the DBC_MAX_PCT values, and
shut them right down (well, keep it to represent about 128 to 256 Mb).
Apostrof - 22 Nov 2007 07:29 GMT
> > <Previous post SNIPPED>
> >> Hi Art,
[quoted text clipped - 40 lines]
> And I would agree with superboer .... check the DBC_MAX_PCT values, and
> shut them right down (well, keep it to represent about 128 to 256 Mb).

yesterday i have changed the NUMAIOVPS to 16 and made a test. during
the load i looked at the onstat -g iov output for io/wup values.
a lot of values were over 1. so i increased the NUMAIOVPS to 24 and
made a new test. most of the time io/wup values were below 1. rarely
saw a maximum value of 1.2
bu no decrease in checkpoint durations.
i've checked the kernel parameters for DBC_MAX_PCT and DBC_MIN_PCT.
our setting is 10 and 5.
this type of change needs a reboot so i give up. (may be i can try
this in 1 or 2 days time)

today i will test these scenarios:
moving physical log to a new dbspace located on a separate
disk(logical volume).
setting CLEANERS=8 ( any suggestions for NUMAIOVPS? leave it unset or
16 or 24? )
setting BUFFERS=300000
Neil Truby - 22 Nov 2007 08:38 GMT
>> > <Previous post SNIPPED>

> yesterday i have changed the NUMAIOVPS to 16 and made a test. during
> the load i looked at the onstat -g iov output for io/wup values.
[quoted text clipped - 13 lines]
> 16 or 24? )
> setting BUFFERS=300000

At checkpoint time, what does sar -d show about the disk service times. eg:

sar -d 5 20

?
stefan@weideneder.de - 22 Nov 2007 17:29 GMT
> hi,
> i have some problem with the long running checkpoints in IDS 7.31.UD8
[quoted text clipped - 55 lines]
>
> Abdullah

Hi Abdullah,

have you ever tried to find out your I/O speed? You might compile the
following c-code and start the test-program.

cat <<eof > prog.c
/*******save to prog.c ****************/
#include <fcntl.h>

main( argc, argv )
int argc;
char **argv;
{
 int i;
 int fd = -1;
 char buffer_vc[4096*2];
 char *ptr_pc;

 for ( ptr_pc  = buffer_vc; ( (int)ptr_pc % 1024 ) ; ptr_pc++ );

 memset( ptr_pc, 'A', 2048 );
 fd = open( "cifx_204", O_WRONLY|O_SYNC);

 if ( fd == -1 )
 {
       perror( "Check existance of file cifx_204! touch cifx_204");
       exit(1);
 }
 for ( i = 0; i < 10000; i++ )
       write( fd, ptr_pc, 2048 );
 close( fd );
}
/*********** end of code ****************/
eof
cc -o prog prog.c
touch cifx_204
time ./prog

I'm just interested how long it will take to write appr. 20MB

Best regards

Stefan
TBP (The Big Potato) - 22 Nov 2007 20:49 GMT
<SNIP

> Hi Abdullah,
>
[quoted text clipped - 39 lines]
>
> Stefan

Or, even check the read speed from the actual chunks :

timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k count=20240

and

timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k count=10240

the above read 20Mb each (bit late for me, so even basic maths is tricky), and you should be getting at least 4 Mb a second - so
each of the above should take 5 seconds or less.

As an aside, you only have the one temp dbspace, which appears pretty active, create another one or two and add them to DBSPACETEMP
in the $ONCONFIG
david@smooth1.co.uk - 22 Nov 2007 22:17 GMT
> <SNIP
>
[quoted text clipped - 57 lines]
>
> - Show quoted text -

1 As per http://www-1.ibm.com/support/docview.wss?uid=swg21250366 set
TRACECKPT=1
before starting the engine. What does that give?

2.  What does onstat -g ioq give?
Apostrof - 23 Nov 2007 06:30 GMT
On Nov 23, 12:17 am, "da...@smooth1.co.uk" <da...@smooth1.co.uk>
wrote:

> > <SNIP
>
[quoted text clipped - 63 lines]
>
> 2.  What does onstat -g ioq give?

i tried the prog.c program on the chunks directory.
results:
real 2:06.7
user 0.0
sys 0.0

output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k
count=20240

real 0.85
user 0.03
sys 0.33

output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k
count=10240

real 0.14
user 0.02
sys 0.12
Apostrof - 23 Nov 2007 06:43 GMT
On Nov 23, 12:17 am, "da...@smooth1.co.uk" <da...@smooth1.co.uk>
wrote:

> > <SNIP
>
[quoted text clipped - 63 lines]
>
> 2.  What does onstat -g ioq give?

i tried the prog.c program on the chunks directory.
results:
real 2:06.7
user 0.0
sys 0.0

output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k
count=20240

real 0.85
user 0.03
sys 0.33

output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k
count=10240

real 0.14
user 0.02
sys 0.12
Apostrof - 23 Nov 2007 06:58 GMT
On Nov 23, 12:17 am, "da...@smooth1.co.uk" <da...@smooth1.co.uk>
wrote:

> > <SNIP
>
[quoted text clipped - 63 lines]
>
> 2.  What does onstat -g ioq give?

i tried the prog.c program on the chunks directory.
results:
real 2:06.7
user 0.0
sys 0.0

output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k
count=20240

real 0.85
user 0.03
sys 0.33

output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k
count=10240

real 0.14
user 0.02
sys 0.12
RoB - 23 Nov 2007 13:41 GMT
> ...
>every day a batch utility making hig volume of insert and updates. and
>this batch works about 5-10 minutes(checkpoint times included).
>during this batch load one or two (long) checkpoints occur.
>(unfortunately blocking checkpoints)
> ...

Just a thought. Have you thought about disabling/dropping any indexes
before the load and then enabling/recreating them after the load? If
this could be done (perhaps concurrency issues won't let you) your
load would finish quicker and your indexes would be more compact and
efficient. During the load it would also reduce the number of logical
log records generated, pages needed to be read in from disk and pages
needed to be written to disk.

RoB
Thomas J. Girsch - 23 Nov 2007 17:01 GMT
I'm just curious:  What's the row size of the table that's getting all
the inserts?  And how does that compare to your system's page size?

> hi,
> i have some problem with the long running checkpoints in IDS 7.31.UD8
[quoted text clipped - 55 lines]
>
> Abdullah
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.