Database Forum / Informix Topics / November 2007
long checkpoints
|
|
Thread rating:  |
Apostrof - 21 Nov 2007 13:01 GMT hi, i have some problem with the long running checkpoints in IDS 7.31.UD8 on HP/UX. every day a batch utility making hig volume of insert and updates. and this batch works about 5-10 minutes(checkpoint times included). during this batch load one or two (long) checkpoints occur. (unfortunately blocking checkpoints)
17:32:53 Checkpoint Completed: duration was 160 seconds. 17:32:53 Checkpoint loguniq 1250, logpos 0x1842484 17:38:59 Checkpoint Completed: duration was 155 seconds. 17:38:59 Checkpoint loguniq 1253, logpos 0x8fa098
some of the onconfig parameters are
PHYSDBS rootdbs # Location (dbspace) of physical log PHYSFILE 100000 # Physical log file size (Kbytes)
LOGFILES 40 # Number of logical log files LOGSIZE 20000 # Logical log size (Kbytes) LOGSMAX 50
LOCKS 150000 # Maximum number of locks BUFFERS 200000 # Maximum number of shared buffers NUMAIOVPS # Number of IO vps PHYSBUFF 64 # Physical log buffer size (Kbytes) LOGBUFF 64 # Logical log buffer size (Kbytes) CLEANERS 50 # Number of buffer cleaner processes SHMBASE 0x0 # Shared memory base address SHMVIRTSIZE 64000 # initial virtual shared memory segment size SHMADD 32000 # Size of new shared memory segments (Kbytes) SHMTOTAL 0 # Total shared memory (Kbytes). 0=>unlimited CKPTINTVL 200 # Check point interval (in sec) LRUS 50 # Number of LRU queues LRU_MAX_DIRTY 60 # LRU percent dirty begin cleaning limit LRU_MIN_DIRTY 50 # LRU percent dirty end cleaning limit LTXHWM 50 # Long transaction high water mark percentage LTXEHWM 60 # Long transaction high water mark (exclusive) TXTIMEOUT 0x12c # Transaction timeout (in sec) STACKSIZE 64 # Stack size (Kbytes)
when checkpoint starts i look at onstat -F output and see one or two lines with state C. the chunk writes takes much of the checkpoint time. i changed the LRU_MAX_DIRTY:10, LRU_MIN_DIRTY:5 and try the batch again. but this time due to LRU writes the batch utility gets slower and finishes its job about at 20minutes. i've read in some notes that LRU writes is worse than chunk writes. so what can i do to make this batch process faster and shorten checkpoint times? thanks
Abdullah
Art S. Kagel - 21 Nov 2007 13:21 GMT > hi, Abdullah,
You've got a lot going on. First thing I see is there should be a checkpoint between the two you've published since you have CKPTINTVL set to 200 or 3 mins 20 secs. Hopefully this is just because the intervening chkpt was shorter so you didn't post it.
I agree that your settings for LRU_MIN/MAX_DIRTY are way to high for this kind of system and I would go further even than your attempt at moving from 50/60 to 10/5 and go all the way to 2/1.
Now, what might have happened when you reduced the LRU flush settings to cause LRU writes (yes any significant number of these - say more than 10 - is death to throughput)? I would guess that you need more BUFFERS. What do your server's metrics look like? Post onstat -D, onstat -p, and the first 20 and last 20 lines of the onstat -P output with the time since the stats were last zero'd (onstat -z) and I'll try to calculate them for you.
I suspect that at least part of the problem is slow disk writes. This wouldn't be a RAID5 setup and/or using COOKED chunks would it? RAID5 alone will slow down your updates/bulk loads SIGNIFICANTLY. Similarly, you have only the default AIO VPs configured if you are using COOKED filesystem chunks that will also be slowing you down. Please post more configuration information.
Art S. Kagel
> i have some problem with the long running checkpoints in IDS 7.31.UD8 > on HP/UX. [quoted text clipped - 54 lines] > > Abdullah Apostrof - 21 Nov 2007 13:46 GMT > > hi, > [quoted text clipped - 84 lines] > > > Abdullah Hi Art, our server has 4gb memory and informix is using 500mb of it. there is another application which is using 1-2gb of memory. the batch utility is part of this application. we are using cooked chunks. there is no raid5 but hp mirroring tool is used with volume groups where the chunks resides. you are right with the checkpoint information. there is only one checkpoint in my sample output while the batch utility is working. after batch we get an onunload backup. i think th other checkpoint takes place while we take this backup. sometimes one sometimes two checkpoints occurs during the batch utility. second checkpoint may occur during backup. these are the outputs you want.
17:32:53 Checkpoint Completed: duration was 160 seconds. 17:32:53 Checkpoint loguniq 1250, logpos 0x1842484 17:33:05 Logical Log 1250 Complete. 17:33:07 Process exited with return code 156: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1250 Complete." "Logical Log 1250 Complete." 17:33:52 Logical Log 1251 Complete. 17:33:53 Process exited with return code 156: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1251 Complete." "Logical Log 1251 Complete." 17:35:19 Logical Log 1252 Complete. 17:35:20 Process exited with return code 156: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1252 Complete." "Logical Log 1252 Complete." 17:38:59 Checkpoint Completed: duration was 155 seconds. 17:38:59 Checkpoint loguniq 1253, logpos 0x8fa098
onstat -D output: -----------------
IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line -- Up 2 days 07:08:22 -- 501872 Kbytes
Dbspaces address number flags fchunk nchunks flags owner name e41dc158 1 1 1 1 N informix rootdbs e41ddf38 2 1 2 1 N informix llogspace e4206b48 3 1 3 4 N informix datadbs e4206c08 4 2001 7 1 N T informix tempdbs 4 active, 2047 maximum
Chunks address chk/dbs offset page Rd page Wr pathname e41dc218 1 1 0 4114 102979 /work_a1/db/rootchunk e41ddc20 2 2 0 9969 108051 /data1/logs/llogchunk1 e41ddd28 3 3 0 9313329 14220 /data1/db/datachunk1 e41dde30 4 3 0 7721858 116564 /data1/db/datachunk2 e4206830 5 3 0 5 0 /data1/db/datachunk3 e4206938 6 3 0 5 0 /data1/db/datachunk4 e4206a40 7 4 0 92433 92610 /work_a1/db/tempchunk 7 active, 2047 maximum
onstat -p output: -----------------
IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line -- Up 2 days 07:09:13 -- 501872 Kbytes
Profile dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 13163890 17141713 73963987 82.20 141207 434424 2013940 92.99
isamtot open start read write rewrite delete commit rollbk 15105310 93462 1625540 7330333 521236 2764 382820 4434 0
gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs 0 0 0 0 0 0 0
ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes 0 0 0 1473.22 635.22 50 1870
bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans 705363 0 102952514 0 0 13 38773 13827
ixda-RA idx-RA da-RA RA-pgsused lchwaits 31895 2292 11835066 11832481 675752
onstat -P | head -20 --------------------
IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line -- Up 2 days 07:10:38 -- 501872 Kbytes partnum total btree data other resident dirty 0 8529 7575 788 166 0 0 1048578 2 1 1 0 0 0 1048579 10 5 5 0 0 0 1048580 1 1 0 0 0 0 1048582 1 1 0 0 0 0 1048584 3 2 1 0 0 0 1048595 1 1 0 0 0 0 1048606 1 1 0 0 0 0 1048703 1 1 0 0 0 0 3145730 26 11 15 0 0 0 3145731 1 1 0 0 0 0 3145732 46 16 30 0 0 0 3145733 7 2 5 0 0 0 3145734 10 5 5 0 0 0 3145735 2 1 1 0 0 0 3145736 8 4 4 0 0 0 3145737 2 1 1 0 0 0
onstat -P | tail -20 --------------------
3145888 6 0 6 0 0 0 3145891 17 0 17 0 0 0 3145892 154 0 154 0 0 0 3145893 4 0 4 0 0 0 3145894 1 0 1 0 0 0 3145895 7 0 7 0 0 0 3145899 1 1 0 0 0 0 3145900 1 0 1 0 0 0 3145901 1 0 1 0 0 0 3145902 478 1 477 0 0 0 3145903 22 3 19 0 0 0 3145904 60 0 60 0 0 0
Totals: 200000 146840 52876 284 0 0
Percentages: Data 26.44 Btree 73.42 Other 0.14
Superboer - 21 Nov 2007 14:38 GMT i suggest try to tune this first on a test box!!
avoid cooked.... USE RAW. and therefor grab more buffers....
afaikr HP has a kernel param which says how much memory will be used for unix FS cache. this can take up 50 %... (DBC_MAX_PCT) you could set this to 20 or 10 and set DBC_MIN_PCT to 5
this memory can not be used for informix memory.... when DBC_MAX_PCT is 50....
lru cleaning is less efficient as checkpoint cleaning. lru is/can be random while checkpoint cleaning will sort the buffers to be written so it can do bigger i/o's (16 pages max....)
you could try to do your own checkpoints... use onmode -B followed by onmode -c onmode -B is not doced and does cleaning the same as a checkpoint, but does not block anything. if however your i/o subsystem does not cope then you are still in trouble. regardless of onmode -B. post an onstat -R -r 5 during checkpointing; also look at sar/iostat how busy your disks are.. maybe one is full and the rest is doing nothing......
Per Art : NO RAID 5 NO RAID 5 NO RAID 5!!!!!!!!!!!!!
Further:
NUMAIOVPS # Number of IO vps please set this to a value!!!!!!
i did not see resident you may want to set it to 1 or -1 or -2. you may consider kernel i/o -->> check rel notes.
Superboer.
way fast=http://www.clipjes.nl/clip/nederlands/n/normaal_- _oerend_hard.html
Art S. Kagel - 21 Nov 2007 15:31 GMT <Previous post SNIPPED>
> Hi Art, Hi,
Your server's healthy overall. Metrics look good:
BR = 3.68 BTR = 1.73/hour RAU = 99.69% - not perfect but acceptable It probably could no hurt to reduce the RA threshhold be 50%
> our server has 4gb memory and informix is using 500mb of it. there is > another application which is using 1-2gb of memory. the batch utility > is part of this application. > we are using cooked chunks. there is no raid5 but hp mirroring tool is COOKED chunks without NUMAIOVPS set properly is your BIG problem. You have 7 chunks and on 7.31 you'll need 1.5 AIO VPs per chunk plus a few extra for message log and other cooked IO (like SET EXPLAIN output). That means I'd set: NUMAIOVPS 16 To begin with and monitor onstat -g iov over a normal workload period. If you see that there are one or more AIO VPs with io/wup >= 1.0 it means that at least some of the time you have IOs waiting for a VP to free up so you need to increase the number of AIO VPs for the next restart. If there are any AIO VPs showing io/wup == 0.0 you can reduce NUMAIOVPS to eliminate them.
I still think that you could benefit from more buffers and I suspect that is the reason that onstat -P shows that 73% of your buffer cache is dedicated to index pages. I'm guessing that index pages and data pages are thrashing the cache a bit. That's why your Read Cache hit rate is only 92.9% and Write Cache hit rate is only 82.2%. You want to see these above 95% and 85% (90 would be better but that's application dependent) and it is also part of the cause of the LRU Writes - though the AIO VP configuration is the biggest culprit.
Art S. Kagel
Finally, I notice that there is almost as much write activity in the ROOTDB dbspace as in the data dbspace. This is almost entirely because you have the physical log configured there. You should move the physical log to an separate dbspace, preferably on it's own disk structure away from data, root, and logical logs as much as possible. Notice that your data write activity, physical log write activity and logical log write activity are all about the same volume. So, the more you can isolate these from each other the better checkpoints and other physical write operations will perform.
> used with volume groups where the chunks resides. > you are right with the checkpoint information. there is only one [quoted text clipped - 123 lines] > Btree 73.42 > Other 0.14 Keith Simmons - 21 Nov 2007 15:45 GMT I recall an issue on 7.3 engines where the output from onstat -P showed high index use and low data use (as in this case). It is caused by the index pages getting too high a priority. Can't remember the case number, but there is an onconfig or environment variable that can be set to force a much higher data usage.
Keith
> <Previous post SNIPPED> > > Hi Art, [quoted text clipped - 178 lines] > Informix-list@iiug.org > http://www.iiug.org/mailman/listinfo/informix-list Christian Knappke - 22 Nov 2007 10:04 GMT From the keyboard of "Keith Simmons" <smiley73@googlemail.com>:
> I recall an issue on 7.3 engines where the output from onstat -P > showed high index use and low data use (as in this case). It is > caused by the index pages getting too high a priority. Can't > remember the case number, but there is an onconfig or > environment variable that can be set to force a much higher data > usage. You mean LRUAGE = 1 in the engine's environment?
It was one of our standard "must set this" recommendations for 7.31.
Regards Christian
 Signature #include <std_disclaimer.h> /* The opinions stated above are my own and not necessarily those of my employer. */
Apostrof - 22 Nov 2007 16:45 GMT > From the keyboard of "Keith Simmons" <smile...@googlemail.com>: > [quoted text clipped - 15 lines] > /* The opinions stated above are my own and not > necessarily those of my employer. */ i am starting to think that the main problem is slow disk access. because after all configuration changes and tries nothing got better in checkpoint times. they are all high. today first i changed the onconfig and set LRUAGE environment to 1. unset the NUMAIOVPS unset again,because in this setting it also creates two aio for each chunk. (i saw it in the onstat -g iov output) then started the instance. here is the related lines from online.log
16:45:06 Onconfig parameter BUFFERS modified from 200000 to 300000. 16:45:06 Onconfig parameter CLEANERS modified from 50 to 8. 16:45:06 Onconfig parameter NUMAIOVPS modified from 24 to 2.
and the lines from online.log about checkpoints while the batch utility was working.
17:29:57 Logical Log 1110 Complete. 17:29:58 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1110 Complete." "Logical Log 1110 Complete." 17:34:03 Checkpoint Completed: duration was 160 seconds. 17:34:03 Checkpoint loguniq 1111, logpos 0x14f8308
17:34:19 Logical Log 1111 Complete. 17:34:20 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1111 Complete." "Logical Log 1111 Complete." 17:34:54 Logical Log 1112 Complete. 17:34:55 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1112 Complete." "Logical Log 1112 Complete." 17:35:41 Logical Log 1113 Complete. 17:35:42 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1113 Complete." "Logical Log 1113 Complete." 17:40:33 Checkpoint Completed: duration was 178 seconds. 17:40:33 Checkpoint loguniq 1114, logpos 0x230539c
during the checkpoints sar -d 5 20 output is like this:
Average c1t0d0 90.34 0.50 148 1906 4.94 9.52 Average c2t0d0 73.89 0.50 117 1450 4.93 9.12 Average c1t2d0 1.03 0.50 2 19 3.24 9.39 Average c2t2d0 1.02 0.50 2 18 3.53 8.18
all the database files resides on the disks with high load. most of the volumes on the first two disks are mirrored.
after this i created a new volume (unfortunately on the same disks) and move the physical log to that dbspace. but didn't change the size. and undo the config changes that i made in previous step.
17:58:38 Onconfig parameter PHYSDBS modified from rootdbs to physlogdbs. 17:58:38 Onconfig parameter BUFFERS modified from 300000 to 200000. 17:58:38 Onconfig parameter CLEANERS modified from 8 to 50.
i did not understand why but the checkpoint times get worse in this case. can it be because of sar command?
18:05:58 Checkpoint Completed: duration was 228 seconds. 18:05:58 Checkpoint loguniq 1117, logpos 0x148a748
18:06:18 Logical Log 1117 Complete. 18:06:20 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1117 Complete." "Logical Log 1117 Complete." 18:07:14 Logical Log 1118 Complete. 18:07:15 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1118 Complete." "Logical Log 1118 Complete." 18:08:58 Logical Log 1119 Complete. 18:08:59 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1119 Complete." "Logical Log 1119 Complete." 18:15:29 Checkpoint Completed: duration was 359 seconds. 18:15:29 Checkpoint loguniq 1120, logpos 0x3cb328
some onstat -g iov outputs when checkpoint got started and going on.
IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line (CKPT REQ) -- Up 00:04:46 -- 501872 Kbytes Blocked:CKPT
AIO I/O vps: class/vp s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors msc 0 i 0.2 48 0 0 0 49 1.0 0 aio 0 s 292.2 83568 81777 1732 0 82030 1.0 0 aio 1 s 118.1 33790 33533 255 0 34494 1.0 0 aio 2 i 5.3 1523 1337 184 0 2039 0.7 0 aio 3 s 1.3 370 207 161 0 320 1.2 0 aio 4 i 0.9 248 113 133 0 208 1.2 0 aio 5 i 0.5 148 28 118 0 102 1.5 0 aio 6 i 0.5 145 22 122 0 94 1.5 0 aio 7 i 0.4 127 18 108 0 85 1.5 0 aio 8 i 0.5 132 22 110 0 76 1.7 0 aio 9 i 0.4 120 22 98 0 80 1.5 0 aio 10 i 0.4 106 14 92 0 71 1.5 0 aio 11 i 0.4 108 12 96 0 74 1.5 0 aio 12 i 0.4 101 15 86 0 68 1.5 0 aio 13 s 0.4 103 21 82 0 64 1.6 0 aio 14 s 0.3 92 7 85 0 61 1.5 0 aio 15 s 0.3 83 5 78 0 60 1.4 0 pio 0 i 3.0 856 0 856 0 857 1.0 0 lio 0 i 2.9 824 0 824 0 825 1.0 0
/usr/informix>onstat -g iov
IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line (CKPT REQ) -- Up 00:05:18 -- 501872 Kbytes Blocked:CKPT
AIO I/O vps: class/vp s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors msc 0 i 0.2 48 0 0 0 49 1.0 0 aio 0 i 273.6 86992 85127 1806 0 85401 1.0 0 aio 1 s 108.9 34627 34232 393 0 35314 1.0 0 aio 2 s 5.5 1763 1438 323 0 2271 0.8 0 aio 3 i 1.9 614 314 298 0 558 1.1 0 aio 4 i 1.2 390 125 263 0 350 1.1 0 aio 5 i 1.0 306 56 248 0 249 1.2 0 aio 6 i 0.9 287 40 246 0 236 1.2 0 aio 7 i 0.8 248 20 227 0 209 1.2 0 aio 8 s 0.8 245 28 217 0 188 1.3 0 aio 9 i 0.7 215 24 191 0 177 1.2 0 aio 10 s 0.6 198 31 167 0 164 1.2 0 aio 11 i 0.5 172 15 157 0 138 1.2 0 aio 12 i 0.5 151 15 136 0 120 1.3 0 aio 13 i 0.5 151 21 130 0 112 1.3 0 aio 14 i 0.4 135 7 128 0 104 1.3 0 aio 15 i 0.4 127 8 119 0 100 1.3 0 pio 0 i 2.7 856 0 856 0 857 1.0 0 lio 0 i 2.6 824 0 824 0 825 1.0 0
i will try to arrange a system reboot to change kernel parameters for file system cache. may be it helps. any other ideas?
Christian Knappke - 23 Nov 2007 13:44 GMT From the keyboard of Apostrof <abdullah.akoglu@gmail.com>:
> during the checkpoints sar -d 5 20 output is like this: > [quoted text clipped - 5 lines] > all the database files resides on the disks with high load. most > of the volumes on the first two disks are mirrored. So all chunks are on just two devices. How many physical disks hide behind one device? As you have four chunks in your datadbs, two of them show I/O activity and the Informix release is 7.31, the whole database cannot be larger than four Gig. I assume that there is only one physical disk behind every device. When you have parallel I/O requests to different files/chunks on the same disk, you only earn I/O contention. Have you checked I/O waits?
BTW, how many physical CPUs are in the server?
Anyway, I'd recommend:
- make sure that chunk and mirror are on separate physical disks - move logical and physical log to separate disks and separate from data disks - set LRUS = (# of physical CPUs * 2); minimum 4 - set CLEANERS = (LRUS * 2) - set NUMAIOVPS = (# of physical disks); minimum 2
Tune the latter according to "onstat -g iov" output: (sample values)
AIO I/O vps: class/vp s io/s totalops dskread dskwrite dskcopy wakeups io/wup msc 0 i 0.0 113 0 0 0 114 1.0 aio 0 i 0.2 138065 137679 383 0 137945 1.0 aio 1 i 0.0 892 483 409 0 803 1.1 aio 2 i 0.0 249 24 224 0 133 1.9 pio 0 i 0.0 30 0 29 0 65 0.5 pio 1 i 0.0 29 0 29 0 30 1.0 lio 0 i 0.0 278 0 277 0 312 0.9 lio 1 i 0.0 275 0 275 0 276 1.0
look at the lines that start with "aio": if the most work ("totalops") is done by the first AOI VP(s), then there are enough of them.
If you have the chance to move to raw devices and KAIO, then do it. The same recommendations apply, only set NUMAIOVPS = 2 and NUMCPUVPS = floor((# of physical CPUs - 1) / 2)[1] if there is more than one.
Christian [1] on a server that also runs the application. On a standalone DB server you can go up to NUMCPUVPS = (# of physical CPUs)
 Signature #include <std_disclaimer.h> /* The opinions stated above are my own and not necessarily those of my employer. */
Apostrof - 23 Nov 2007 14:46 GMT > From the keyboard of Apostrof <abdullah.ako...@gmail.com>: > [quoted text clipped - 57 lines] > /* The opinions stated above are my own and not > necessarily those of my employer. */ i will try those for LRU, CLEANERS and NUMAIOVPS. there is only one disk for each devices shown in output of sar. there are two physical cpu's. by the way what do you mean by "whole database cannot be larger than four Gig" ? is there any limit for the database size in this release of informix?
dropping indexes and recreating them after batch utility will not be useful for me. because users start working as soon as batch utility finishes its job. index create time will be longer than the checkpoints.
Christian Knappke - 23 Nov 2007 14:57 GMT From the keyboard of Apostrof <abdullah.akoglu@gmail.com>:
> by the way what do you mean by "whole database cannot be larger > than four Gig" ? I did not mean that the DB cannot grow beyond 4 GB. There are two chunks that are used. 7.31 chunks cannot be larger than 2 GB. From that I derived that there is not more than 4 GB data in the DB.
> is there any limit for the database size in > this release of informix? Yes. The bottom line of onstat -d tells you: max 2047 chunks. Times 2 GB means max ~4 TB.
To get on topic again: depending on the data model it may also help to not throw all tables into one big bucket but to separate the most wanted tables and distribute over more dbspaces. But that affords more physical disks.
regards Christian
 Signature #include <std_disclaimer.h> /* The opinions stated above are my own and not necessarily those of my employer. */
Superboer - 23 Nov 2007 16:25 GMT your sar says
90% busy doing 148 r/w sec which is 1906 blocks
meaning aprox 1 MB/s a sec for a disk.... having an average of 6 Kb in size...
onstat -Rr |grep dirty during checkpoint will tell you how many pages per 5 secs are written. Also it will tell how many pages are really dirty....
The program you have may be useless when you try this on cooked files.
(may be this: fd = open( "cifx_204", O_WRONLY|O_SYNC); will bypass the cache..... dono need to test it... have played in the past with such a writer (sorry do not recall if it had O_SYNC...) and that yielded in unpredictable results because of the fs cache.!!!!!!!!!)
You need to do this on char mode raw devs.
the problem is that unix file system cache is spoiling the measurement. with a small number of i/os you can get high numbers, but with a database size amount of action the filesystem cache will kick in for sure.
your tune job will not be easy.
i would setup a test system with char mode raw devs and start tuning there.
also more disks... striping/mirroring???!!!!!!!!!!!
also have a look at the stripe size... 2 times maxio = 2 * 16 pages = 64k or 128 kb in order to avoid split writes....
Superboer
way fast=http://www.clipjes.nl/clip/nederlands/n/normaal_- _oerend_hard.html
> From the keyboard of Apostrof <abdullah.ako...@gmail.com>: > [quoted text clipped - 22 lines] > /* The opinions stated above are my own and not > necessarily those of my employer. */ Apostrof - 22 Nov 2007 17:04 GMT > From the keyboard of "Keith Simmons" <smile...@googlemail.com>: > [quoted text clipped - 15 lines] > /* The opinions stated above are my own and not > necessarily those of my employer. */ i am starting to think that the main problem is slow disk access. because after all configuration changes and tries nothing got better in checkpoint times. they are all high. today first i changed the onconfig and set LRUAGE environment to 1. unset the NUMAIOVPS unset again,because in this setting it also creates two aio for each chunk. (i saw it in the onstat -g iov output) then started the instance. here is the related lines from online.log
16:45:06 Onconfig parameter BUFFERS modified from 200000 to 300000. 16:45:06 Onconfig parameter CLEANERS modified from 50 to 8. 16:45:06 Onconfig parameter NUMAIOVPS modified from 24 to 2.
and the lines from online.log about checkpoints while the batch utility was working.
17:29:57 Logical Log 1110 Complete. 17:29:58 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1110 Complete." "Logical Log 1110 Complete." 17:34:03 Checkpoint Completed: duration was 160 seconds. 17:34:03 Checkpoint loguniq 1111, logpos 0x14f8308
17:34:19 Logical Log 1111 Complete. 17:34:20 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1111 Complete." "Logical Log 1111 Complete." 17:34:54 Logical Log 1112 Complete. 17:34:55 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1112 Complete." "Logical Log 1112 Complete." 17:35:41 Logical Log 1113 Complete. 17:35:42 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1113 Complete." "Logical Log 1113 Complete." 17:40:33 Checkpoint Completed: duration was 178 seconds. 17:40:33 Checkpoint loguniq 1114, logpos 0x230539c
during the checkpoints sar -d 5 20 output is like this:
Average c1t0d0 90.34 0.50 148 1906 4.94 9.52 Average c2t0d0 73.89 0.50 117 1450 4.93 9.12 Average c1t2d0 1.03 0.50 2 19 3.24 9.39 Average c2t2d0 1.02 0.50 2 18 3.53 8.18
all the database files resides on the disks with high load. most of the volumes on the first two disks are mirrored.
after this i created a new volume (unfortunately on the same disks) and move the physical log to that dbspace. but didn't change the size. and undo the config changes that i made in previous step.
17:58:38 Onconfig parameter PHYSDBS modified from rootdbs to physlogdbs. 17:58:38 Onconfig parameter BUFFERS modified from 300000 to 200000. 17:58:38 Onconfig parameter CLEANERS modified from 8 to 50.
i did not understand why but the checkpoint times get worse in this case. can it be because of sar command?
18:05:58 Checkpoint Completed: duration was 228 seconds. 18:05:58 Checkpoint loguniq 1117, logpos 0x148a748
18:06:18 Logical Log 1117 Complete. 18:06:20 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1117 Complete." "Logical Log 1117 Complete." 18:07:14 Logical Log 1118 Complete. 18:07:15 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1118 Complete." "Logical Log 1118 Complete." 18:08:58 Logical Log 1119 Complete. 18:08:59 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1119 Complete." "Logical Log 1119 Complete." 18:15:29 Checkpoint Completed: duration was 359 seconds. 18:15:29 Checkpoint loguniq 1120, logpos 0x3cb328
some onstat -g iov outputs when checkpoint got started and going on.
IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line (CKPT REQ) -- Up 00:04:46 -- 501872 Kbytes Blocked:CKPT
AIO I/O vps: class/vp s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors msc 0 i 0.2 48 0 0 0 49 1.0 0 aio 0 s 292.2 83568 81777 1732 0 82030 1.0 0 aio 1 s 118.1 33790 33533 255 0 34494 1.0 0 aio 2 i 5.3 1523 1337 184 0 2039 0.7 0 aio 3 s 1.3 370 207 161 0 320 1.2 0 aio 4 i 0.9 248 113 133 0 208 1.2 0 aio 5 i 0.5 148 28 118 0 102 1.5 0 aio 6 i 0.5 145 22 122 0 94 1.5 0 aio 7 i 0.4 127 18 108 0 85 1.5 0 aio 8 i 0.5 132 22 110 0 76 1.7 0 aio 9 i 0.4 120 22 98 0 80 1.5 0 aio 10 i 0.4 106 14 92 0 71 1.5 0 aio 11 i 0.4 108 12 96 0 74 1.5 0 aio 12 i 0.4 101 15 86 0 68 1.5 0 aio 13 s 0.4 103 21 82 0 64 1.6 0 aio 14 s 0.3 92 7 85 0 61 1.5 0 aio 15 s 0.3 83 5 78 0 60 1.4 0 pio 0 i 3.0 856 0 856 0 857 1.0 0 lio 0 i 2.9 824 0 824 0 825 1.0 0
/usr/informix>onstat -g iov
IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line (CKPT REQ) -- Up 00:05:18 -- 501872 Kbytes Blocked:CKPT
AIO I/O vps: class/vp s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors msc 0 i 0.2 48 0 0 0 49 1.0 0 aio 0 i 273.6 86992 85127 1806 0 85401 1.0 0 aio 1 s 108.9 34627 34232 393 0 35314 1.0 0 aio 2 s 5.5 1763 1438 323 0 2271 0.8 0 aio 3 i 1.9 614 314 298 0 558 1.1 0 aio 4 i 1.2 390 125 263 0 350 1.1 0 aio 5 i 1.0 306 56 248 0 249 1.2 0 aio 6 i 0.9 287 40 246 0 236 1.2 0 aio 7 i 0.8 248 20 227 0 209 1.2 0 aio 8 s 0.8 245 28 217 0 188 1.3 0 aio 9 i 0.7 215 24 191 0 177 1.2 0 aio 10 s 0.6 198 31 167 0 164 1.2 0 aio 11 i 0.5 172 15 157 0 138 1.2 0 aio 12 i 0.5 151 15 136 0 120 1.3 0 aio 13 i 0.5 151 21 130 0 112 1.3 0 aio 14 i 0.4 135 7 128 0 104 1.3 0 aio 15 i 0.4 127 8 119 0 100 1.3 0 pio 0 i 2.7 856 0 856 0 857 1.0 0 lio 0 i 2.6 824 0 824 0 825 1.0 0
i will try to arrange a system reboot to change kernel parameters for file system cache. may be it helps. any other ideas?
Apostrof - 22 Nov 2007 17:33 GMT > From the keyboard of "Keith Simmons" <smile...@googlemail.com>: > [quoted text clipped - 15 lines] > /* The opinions stated above are my own and not > necessarily those of my employer. */ i am starting to think that the main problem is slow disk access. because after all configuration changes and tries nothing got better in checkpoint times. they are all high. today first i changed the onconfig and set LRUAGE environment to 1. unset the NUMAIOVPS unset again,because in this setting it also creates two aio for each chunk. (i saw it in the onstat -g iov output) then started the instance. here is the related lines from online.log
16:45:06 Onconfig parameter BUFFERS modified from 200000 to 300000. 16:45:06 Onconfig parameter CLEANERS modified from 50 to 8. 16:45:06 Onconfig parameter NUMAIOVPS modified from 24 to 2.
and the lines from online.log about checkpoints while the batch utility was working.
17:29:57 Logical Log 1110 Complete. 17:29:58 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1110 Complete." "Logical Log 1110 Complete." 17:34:03 Checkpoint Completed: duration was 160 seconds. 17:34:03 Checkpoint loguniq 1111, logpos 0x14f8308
17:34:19 Logical Log 1111 Complete. 17:34:20 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1111 Complete." "Logical Log 1111 Complete." 17:34:54 Logical Log 1112 Complete. 17:34:55 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1112 Complete." "Logical Log 1112 Complete." 17:35:41 Logical Log 1113 Complete. 17:35:42 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1113 Complete." "Logical Log 1113 Complete." 17:40:33 Checkpoint Completed: duration was 178 seconds. 17:40:33 Checkpoint loguniq 1114, logpos 0x230539c
during the checkpoints sar -d 5 20 output is like this:
Average c1t0d0 90.34 0.50 148 1906 4.94 9.52 Average c2t0d0 73.89 0.50 117 1450 4.93 9.12 Average c1t2d0 1.03 0.50 2 19 3.24 9.39 Average c2t2d0 1.02 0.50 2 18 3.53 8.18
all the database files resides on the disks with high load. most of the volumes on the first two disks are mirrored.
after this i created a new volume (unfortunately on the same disks) and move the physical log to that dbspace. but didn't change the size. and undo the config changes that i made in previous step.
17:58:38 Onconfig parameter PHYSDBS modified from rootdbs to physlogdbs. 17:58:38 Onconfig parameter BUFFERS modified from 300000 to 200000. 17:58:38 Onconfig parameter CLEANERS modified from 8 to 50.
i did not understand why but the checkpoint times get worse in this case. can it be because of sar command?
18:05:58 Checkpoint Completed: duration was 228 seconds. 18:05:58 Checkpoint loguniq 1117, logpos 0x148a748
18:06:18 Logical Log 1117 Complete. 18:06:20 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1117 Complete." "Logical Log 1117 Complete." 18:07:14 Logical Log 1118 Complete. 18:07:15 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1118 Complete." "Logical Log 1118 Complete." 18:08:58 Logical Log 1119 Complete. 18:08:59 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1119 Complete." "Logical Log 1119 Complete." 18:15:29 Checkpoint Completed: duration was 359 seconds. 18:15:29 Checkpoint loguniq 1120, logpos 0x3cb328
some onstat -g iov outputs when checkpoint got started and going on.
IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line (CKPT REQ) -- Up 00:04:46 -- 501872 Kbytes Blocked:CKPT
AIO I/O vps: class/vp s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors msc 0 i 0.2 48 0 0 0 49 1.0 0 aio 0 s 292.2 83568 81777 1732 0 82030 1.0 0 aio 1 s 118.1 33790 33533 255 0 34494 1.0 0 aio 2 i 5.3 1523 1337 184 0 2039 0.7 0 aio 3 s 1.3 370 207 161 0 320 1.2 0 aio 4 i 0.9 248 113 133 0 208 1.2 0 aio 5 i 0.5 148 28 118 0 102 1.5 0 aio 6 i 0.5 145 22 122 0 94 1.5 0 aio 7 i 0.4 127 18 108 0 85 1.5 0 aio 8 i 0.5 132 22 110 0 76 1.7 0 aio 9 i 0.4 120 22 98 0 80 1.5 0 aio 10 i 0.4 106 14 92 0 71 1.5 0 aio 11 i 0.4 108 12 96 0 74 1.5 0 aio 12 i 0.4 101 15 86 0 68 1.5 0 aio 13 s 0.4 103 21 82 0 64 1.6 0 aio 14 s 0.3 92 7 85 0 61 1.5 0 aio 15 s 0.3 83 5 78 0 60 1.4 0 pio 0 i 3.0 856 0 856 0 857 1.0 0 lio 0 i 2.9 824 0 824 0 825 1.0 0
/usr/informix>onstat -g iov
IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line (CKPT REQ) -- Up 00:05:18 -- 501872 Kbytes Blocked:CKPT
AIO I/O vps: class/vp s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors msc 0 i 0.2 48 0 0 0 49 1.0 0 aio 0 i 273.6 86992 85127 1806 0 85401 1.0 0 aio 1 s 108.9 34627 34232 393 0 35314 1.0 0 aio 2 s 5.5 1763 1438 323 0 2271 0.8 0 aio 3 i 1.9 614 314 298 0 558 1.1 0 aio 4 i 1.2 390 125 263 0 350 1.1 0 aio 5 i 1.0 306 56 248 0 249 1.2 0 aio 6 i 0.9 287 40 246 0 236 1.2 0 aio 7 i 0.8 248 20 227 0 209 1.2 0 aio 8 s 0.8 245 28 217 0 188 1.3 0 aio 9 i 0.7 215 24 191 0 177 1.2 0 aio 10 s 0.6 198 31 167 0 164 1.2 0 aio 11 i 0.5 172 15 157 0 138 1.2 0 aio 12 i 0.5 151 15 136 0 120 1.3 0 aio 13 i 0.5 151 21 130 0 112 1.3 0 aio 14 i 0.4 135 7 128 0 104 1.3 0 aio 15 i 0.4 127 8 119 0 100 1.3 0 pio 0 i 2.7 856 0 856 0 857 1.0 0 lio 0 i 2.6 824 0 824 0 825 1.0 0
i will try to arrange a system reboot to change kernel parameters for file system cache. may be it helps. any other ideas?
Apostrof - 22 Nov 2007 17:56 GMT > From the keyboard of "Keith Simmons" <smile...@googlemail.com>: > [quoted text clipped - 15 lines] > /* The opinions stated above are my own and not > necessarily those of my employer. */ i am starting to think that the main problem is slow disk access. because after all configuration changes and tries nothing got better in checkpoint times. they are all high. today first i changed the onconfig and set LRUAGE environment to 1. unset the NUMAIOVPS unset again,because in this setting it also creates two aio for each chunk. (i saw it in the onstat -g iov output) then started the instance. here is the related lines from online.log
16:45:06 Onconfig parameter BUFFERS modified from 200000 to 300000. 16:45:06 Onconfig parameter CLEANERS modified from 50 to 8. 16:45:06 Onconfig parameter NUMAIOVPS modified from 24 to 2.
and the lines from online.log about checkpoints while the batch utility was working.
17:29:57 Logical Log 1110 Complete. 17:29:58 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1110 Complete." "Logical Log 1110 Complete." 17:34:03 Checkpoint Completed: duration was 160 seconds. 17:34:03 Checkpoint loguniq 1111, logpos 0x14f8308
17:34:19 Logical Log 1111 Complete. 17:34:20 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1111 Complete." "Logical Log 1111 Complete." 17:34:54 Logical Log 1112 Complete. 17:34:55 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1112 Complete." "Logical Log 1112 Complete." 17:35:41 Logical Log 1113 Complete. 17:35:42 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1113 Complete." "Logical Log 1113 Complete." 17:40:33 Checkpoint Completed: duration was 178 seconds. 17:40:33 Checkpoint loguniq 1114, logpos 0x230539c
during the checkpoints sar -d 5 20 output is like this:
Average c1t0d0 90.34 0.50 148 1906 4.94 9.52 Average c2t0d0 73.89 0.50 117 1450 4.93 9.12 Average c1t2d0 1.03 0.50 2 19 3.24 9.39 Average c2t2d0 1.02 0.50 2 18 3.53 8.18
all the database files resides on the disks with high load. most of the volumes on the first two disks are mirrored.
after this i created a new volume (unfortunately on the same disks) and move the physical log to that dbspace. but didn't change the size. and undo the config changes that i made in previous step.
17:58:38 Onconfig parameter PHYSDBS modified from rootdbs to physlogdbs. 17:58:38 Onconfig parameter BUFFERS modified from 300000 to 200000. 17:58:38 Onconfig parameter CLEANERS modified from 8 to 50.
i did not understand why but the checkpoint times get worse in this case. can it be because of sar command?
18:05:58 Checkpoint Completed: duration was 228 seconds. 18:05:58 Checkpoint loguniq 1117, logpos 0x148a748
18:06:18 Logical Log 1117 Complete. 18:06:20 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1117 Complete." "Logical Log 1117 Complete." 18:07:14 Logical Log 1118 Complete. 18:07:15 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1118 Complete." "Logical Log 1118 Complete." 18:08:58 Logical Log 1119 Complete. 18:08:59 Process exited with return code 163: /bin/sh /bin/sh -c /usr/ informix/etc/log_full.sh 2 23 "Logical Log 1119 Complete." "Logical Log 1119 Complete." 18:15:29 Checkpoint Completed: duration was 359 seconds. 18:15:29 Checkpoint loguniq 1120, logpos 0x3cb328
some onstat -g iov outputs when checkpoint got started and going on.
IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line (CKPT REQ) -- Up 00:04:46 -- 501872 Kbytes Blocked:CKPT
AIO I/O vps: class/vp s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors msc 0 i 0.2 48 0 0 0 49 1.0 0 aio 0 s 292.2 83568 81777 1732 0 82030 1.0 0 aio 1 s 118.1 33790 33533 255 0 34494 1.0 0 aio 2 i 5.3 1523 1337 184 0 2039 0.7 0 aio 3 s 1.3 370 207 161 0 320 1.2 0 aio 4 i 0.9 248 113 133 0 208 1.2 0 aio 5 i 0.5 148 28 118 0 102 1.5 0 aio 6 i 0.5 145 22 122 0 94 1.5 0 aio 7 i 0.4 127 18 108 0 85 1.5 0 aio 8 i 0.5 132 22 110 0 76 1.7 0 aio 9 i 0.4 120 22 98 0 80 1.5 0 aio 10 i 0.4 106 14 92 0 71 1.5 0 aio 11 i 0.4 108 12 96 0 74 1.5 0 aio 12 i 0.4 101 15 86 0 68 1.5 0 aio 13 s 0.4 103 21 82 0 64 1.6 0 aio 14 s 0.3 92 7 85 0 61 1.5 0 aio 15 s 0.3 83 5 78 0 60 1.4 0 pio 0 i 3.0 856 0 856 0 857 1.0 0 lio 0 i 2.9 824 0 824 0 825 1.0 0
/usr/informix>onstat -g iov
IBM Informix Dynamic Server Version 7.31.UD8 -- On-Line (CKPT REQ) -- Up 00:05:18 -- 501872 Kbytes Blocked:CKPT
AIO I/O vps: class/vp s io/s totalops dskread dskwrite dskcopy wakeups io/wup errors msc 0 i 0.2 48 0 0 0 49 1.0 0 aio 0 i 273.6 86992 85127 1806 0 85401 1.0 0 aio 1 s 108.9 34627 34232 393 0 35314 1.0 0 aio 2 s 5.5 1763 1438 323 0 2271 0.8 0 aio 3 i 1.9 614 314 298 0 558 1.1 0 aio 4 i 1.2 390 125 263 0 350 1.1 0 aio 5 i 1.0 306 56 248 0 249 1.2 0 aio 6 i 0.9 287 40 246 0 236 1.2 0 aio 7 i 0.8 248 20 227 0 209 1.2 0 aio 8 s 0.8 245 28 217 0 188 1.3 0 aio 9 i 0.7 215 24 191 0 177 1.2 0 aio 10 s 0.6 198 31 167 0 164 1.2 0 aio 11 i 0.5 172 15 157 0 138 1.2 0 aio 12 i 0.5 151 15 136 0 120 1.3 0 aio 13 i 0.5 151 21 130 0 112 1.3 0 aio 14 i 0.4 135 7 128 0 104 1.3 0 aio 15 i 0.4 127 8 119 0 100 1.3 0 pio 0 i 2.7 856 0 856 0 857 1.0 0 lio 0 i 2.6 824 0 824 0 825 1.0 0
i will try to arrange a system reboot to change kernel parameters for file system cache. may be it helps. any other ideas?
TBP - 21 Nov 2007 20:23 GMT > <Previous post SNIPPED> >> Hi Art, [quoted text clipped - 24 lines] > next restart. If there are any AIO VPs showing io/wup == 0.0 you can > reduce NUMAIOVPS to eliminate them. <SNIP>
Well, I would ... err ... disagree.
Without NUMAIOVPS set, the server will allocate 2 AIO VPs per chunk, and this instance using files where only 5 chunks are actually doing any work, too many AIO VPs can actually cause a bottleneck.
I think an onstat -g ioa would be interesting along with an onstat -g glo.
Myself (just to be provocative) I would set CLEANERS to 8 and NUMAIOVPs to 4 :D (Awaiting flame :o) )
And I would agree with superboer .... check the DBC_MAX_PCT values, and shut them right down (well, keep it to represent about 128 to 256 Mb).
Apostrof - 22 Nov 2007 07:29 GMT > > <Previous post SNIPPED> > >> Hi Art, [quoted text clipped - 40 lines] > And I would agree with superboer .... check the DBC_MAX_PCT values, and > shut them right down (well, keep it to represent about 128 to 256 Mb). yesterday i have changed the NUMAIOVPS to 16 and made a test. during the load i looked at the onstat -g iov output for io/wup values. a lot of values were over 1. so i increased the NUMAIOVPS to 24 and made a new test. most of the time io/wup values were below 1. rarely saw a maximum value of 1.2 bu no decrease in checkpoint durations. i've checked the kernel parameters for DBC_MAX_PCT and DBC_MIN_PCT. our setting is 10 and 5. this type of change needs a reboot so i give up. (may be i can try this in 1 or 2 days time)
today i will test these scenarios: moving physical log to a new dbspace located on a separate disk(logical volume). setting CLEANERS=8 ( any suggestions for NUMAIOVPS? leave it unset or 16 or 24? ) setting BUFFERS=300000
Neil Truby - 22 Nov 2007 08:38 GMT >> > <Previous post SNIPPED>
> yesterday i have changed the NUMAIOVPS to 16 and made a test. during > the load i looked at the onstat -g iov output for io/wup values. [quoted text clipped - 13 lines] > 16 or 24? ) > setting BUFFERS=300000 At checkpoint time, what does sar -d show about the disk service times. eg:
sar -d 5 20
?
stefan@weideneder.de - 22 Nov 2007 17:29 GMT > hi, > i have some problem with the long running checkpoints in IDS 7.31.UD8 [quoted text clipped - 55 lines] > > Abdullah Hi Abdullah,
have you ever tried to find out your I/O speed? You might compile the following c-code and start the test-program.
cat <<eof > prog.c /*******save to prog.c ****************/ #include <fcntl.h>
main( argc, argv ) int argc; char **argv; { int i; int fd = -1; char buffer_vc[4096*2]; char *ptr_pc;
for ( ptr_pc = buffer_vc; ( (int)ptr_pc % 1024 ) ; ptr_pc++ );
memset( ptr_pc, 'A', 2048 ); fd = open( "cifx_204", O_WRONLY|O_SYNC);
if ( fd == -1 ) { perror( "Check existance of file cifx_204! touch cifx_204"); exit(1); } for ( i = 0; i < 10000; i++ ) write( fd, ptr_pc, 2048 ); close( fd ); } /*********** end of code ****************/ eof cc -o prog prog.c touch cifx_204 time ./prog
I'm just interested how long it will take to write appr. 20MB
Best regards
Stefan
TBP (The Big Potato) - 22 Nov 2007 20:49 GMT <SNIP
> Hi Abdullah, > [quoted text clipped - 39 lines] > > Stefan Or, even check the read speed from the actual chunks :
timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k count=20240
and
timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k count=10240
the above read 20Mb each (bit late for me, so even basic maths is tricky), and you should be getting at least 4 Mb a second - so each of the above should take 5 seconds or less.
As an aside, you only have the one temp dbspace, which appears pretty active, create another one or two and add them to DBSPACETEMP in the $ONCONFIG
david@smooth1.co.uk - 22 Nov 2007 22:17 GMT > <SNIP > [quoted text clipped - 57 lines] > > - Show quoted text - 1 As per http://www-1.ibm.com/support/docview.wss?uid=swg21250366 set TRACECKPT=1 before starting the engine. What does that give?
2. What does onstat -g ioq give?
Apostrof - 23 Nov 2007 06:30 GMT On Nov 23, 12:17 am, "da...@smooth1.co.uk" <da...@smooth1.co.uk> wrote:
> > <SNIP > [quoted text clipped - 63 lines] > > 2. What does onstat -g ioq give? i tried the prog.c program on the chunks directory. results: real 2:06.7 user 0.0 sys 0.0
output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k count=20240
real 0.85 user 0.03 sys 0.33
output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k count=10240
real 0.14 user 0.02 sys 0.12
Apostrof - 23 Nov 2007 06:43 GMT On Nov 23, 12:17 am, "da...@smooth1.co.uk" <da...@smooth1.co.uk> wrote:
> > <SNIP > [quoted text clipped - 63 lines] > > 2. What does onstat -g ioq give? i tried the prog.c program on the chunks directory. results: real 2:06.7 user 0.0 sys 0.0
output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k count=20240
real 0.85 user 0.03 sys 0.33
output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k count=10240
real 0.14 user 0.02 sys 0.12
Apostrof - 23 Nov 2007 06:58 GMT On Nov 23, 12:17 am, "da...@smooth1.co.uk" <da...@smooth1.co.uk> wrote:
> > <SNIP > [quoted text clipped - 63 lines] > > 2. What does onstat -g ioq give? i tried the prog.c program on the chunks directory. results: real 2:06.7 user 0.0 sys 0.0
output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k count=20240
real 0.85 user 0.03 sys 0.33
output for timex dd if=/data1/db/datachunk1 of=/dev/null bs=2k count=10240
real 0.14 user 0.02 sys 0.12
RoB - 23 Nov 2007 13:41 GMT > ... >every day a batch utility making hig volume of insert and updates. and >this batch works about 5-10 minutes(checkpoint times included). >during this batch load one or two (long) checkpoints occur. >(unfortunately blocking checkpoints) > ... Just a thought. Have you thought about disabling/dropping any indexes before the load and then enabling/recreating them after the load? If this could be done (perhaps concurrency issues won't let you) your load would finish quicker and your indexes would be more compact and efficient. During the load it would also reduce the number of logical log records generated, pages needed to be read in from disk and pages needed to be written to disk.
RoB
Thomas J. Girsch - 23 Nov 2007 17:01 GMT I'm just curious: What's the row size of the table that's getting all the inserts? And how does that compare to your system's page size?
> hi, > i have some problem with the long running checkpoints in IDS 7.31.UD8 [quoted text clipped - 55 lines] > > Abdullah
|
|
|