Database Forum / Informix Topics / December 2008
Checkpoint durations
|
|
Thread rating:  |
Neil Truby - 28 Nov 2008 19:45 GMT IDS 10.0FC8W2 on HP-UX 11.31
I'm tuning IDS on quite a large system (18 cpus, 65g memory, Tier One SAN).
I have BUFFERS set to 10,000,000 (ie 20g), and lru_max_dirty to 0.5%.(ie 50,000 pages or 100m). lru_min_dirty is effectively zero.
At times I'm seeingcheckpoints of 5 to 8 seconds. According to the checkpoint tracer nearly all of this is waiting on disk. And my question is: on a lareg OLTP system, does this seem reasonable throughput, a maximum of 100m in 5-8 seconds?).
Obviously I can reduce lru_max_dirty further, but I'm just interested to know if this 12-20MByte/s rate is comparable to others' experience.
Checkpoint statistics from other large users would be most welcome.
thx Neil
Fernando Nunes - 28 Nov 2008 23:17 GMT > IDS 10.0FC8W2 on HP-UX 11.31 > [quoted text clipped - 15 lines] > thx > Neil Neil,
Are you sure you're not having more than 100MB to write? Are you basing this number on observations or just because of LRU_MAX_DIRTY? You should also check the cleaners work during checkpoint. Are they all reasonably busy, or are you left with one or two for the last seconds?
 Signature Fernando Nunes Portugal
http://informix-technology.blogspot.com My email works... but I don't check it frequently...
Neil Truby - 28 Nov 2008 23:43 GMT >> IDS 10.0FC8W2 on HP-UX 11.31 >> [quoted text clipped - 24 lines] > You should also check the cleaners work during checkpoint. Are they all > reasonably busy, or are you left with one or two for the last seconds? Yes, I am sure. I have a script that monitors the number of dirty pages etc, and see it change in real time. Also, checkpoint tracing gives you the number of pages written. Not too sure how to check the cleaners; any advice?
Can you answer my original question from experience: do the checkpoint durations I cite above look good, or look long to you?
cheers Neil
Fernando Nunes - 29 Nov 2008 01:02 GMT >>> IDS 10.0FC8W2 on HP-UX 11.31 >>> [quoted text clipped - 36 lines] > cheers > Neil onstat -F during checkpoint will show cleaner activity. Column "data" is the chunk number... If the I/O load is not balanced you'll probably see a "burst" of activity on all of them when checkpoint starts, then possibly most of them will get back to sleep and one or two may be there longer. This can also happen if by some weird reason the chunk is on "slow disk" (hard to understand on modern systems...)
I can't give you any numbers at the moment... and they wouldn't be comparable to your system. The higher load system I work daily has a much smaller number of buffers, very old hardware and HDR... And I'm not tracing the checkpoints (I can't even remember the max_dirty/min_dirty :) ) I could risk something like 15K to 17K buffers in about 3/4 seconds... that would be 30-34MB in about half the time... If this numbers are correct maybe your numbers are not very good, but I need to check these numbers.
At first glance I wouldn't be shocked by your numbers... It has to order the buffers, and write them. Maybe you can test your disk throughput with dd? It's a completely different situation. It should be clearly higher than what you get at checkpoints, but maybe it can give you an idea...
Let's wait for more...
P.S.: I don't need to remind *you* but for others, IDS 11 would be nice for that system... Regards,
 Signature Fernando Nunes Portugal
http://informix-technology.blogspot.com My email works... but I don't check it frequently...
Neil Truby - 29 Nov 2008 09:13 GMT > At first glance I wouldn't be shocked by your numbers... It has to order > the buffers, and write them. Maybe you can test your disk throughput with > dd? It's a completely different situation. It should be clearly higher > than what you get at checkpoints, but maybe it can give you an idea... Yes, I have tried dd. the results are somewhat variable, and therefore inconclusive, hence the question about checkpoint times on other users' large systems.
Obnoxio The Clown - 29 Nov 2008 09:45 GMT > Yes, I have tried dd. the results are somewhat variable, and therefore > inconclusive, hence the question about checkpoint times on other users' > large systems. Can you please post them up so that people can understand what you mean by "somewhat variable"?
 Signature Cheers, Obnoxio The Clown
http://obotheclown.blogspot.com
david@smooth1.co.uk - 29 Nov 2008 17:25 GMT > IDS 10.0FC8W2 on HP-UX 11.31 > [quoted text clipped - 15 lines] > thx > Neil Well a cetain customer with an IBM p690 with 32 cpus was treating 5-11 second checkpoints as normal but that we IDS 9.40.
Another Informix customer had a HP-UX Superdome with 32cpus and a fully loaded XP12000 array capable of 500,000 io/s. They created a 60GB chunk then tried to load 100 million rows via lots of parallel loads. This was IDS 9.40 with the checkpoint interval set to 15 minutes. As this was all to one chunk the checkpoints were single threaded and taking >15 minutes per checkpoint!! Once they partitioned across lots of 2GB chunks checkpoints started using lots of cpus and dropped to 30sec-1 minute.
It depends on how much to flush and also how many chunks the dirty pages are spread across and disk performance.
Run onstat -u and look for entries with the flags column ending in F, see how the writes are balanced across page flushers.
Try running iostat -x 4 or whatever the HP-UX equivalent is to see disk i/o times per device. Possible HP-UX has something equivalent to vxstat that can see i/o performence per volume.
What array is behind the SAN, something like an XP12000 has lots of stats/settings that can be monitoring/tuned by the HP guys.
Neil Truby - 29 Nov 2008 20:38 GMT On 28 Nov, 19:45, "Neil Truby" <neil.tr...@ardenta.com> wrote:
> IDS 10.0FC8W2 on HP-UX 11.31 What I really need, David, is someone who works for a huge company with lots of big tin - er, exactly like you as it happens! ;-) - to tell me what throughout *you* get on checkpoints etc, so I can know for sure if this customer's is bad.
>> It depends on how much to flush and also how many chunks the dirty pages are spread across and disk performance.
In this test case there is a root, physlog, llog, tempdbs and appdbs, each with one chunk.
>> Try running iostat -x 4 or whatever the HP-UX equivalent is to see disk i/o times per device. Possible HP-UX has something equivalent to vxstat that can see i/o performence per volume.
Service time on the "disks" seem brilliant, a millisecond or two at least
>> What array is behind the SAN, something like an XP12000 has lots of stats/settings that can be monitoring/tuned by the HP guys.
An EMC Symmetrix. The Daddy of them All!
OK, to address Obnoxio's question, a dd of 1g in 2k pages gives:
# time dd if=/dev/vg00/lvol7 of=/dev/vg01/rneil_1 bs=2k count=50000 # 500000+0 records in 500000+0 records out
real 4:35.5 user 0.4 sys 21.1
This is much worse that a small VM at Ardenta attached to an EMC Clariion, and worse even than a tiny Itanium at Ardenta copying within its own single internal disk.
But the customer has had HP and EMC check performance thoroughly, and has been given a clean bill of health. I have an IBM PMR open; they suspect a disk problem.
So I'm seeking some comparative data from others on checkpoint times, or even dd rates to try to verify that there is actually no problem.
Ian Goddard - 29 Nov 2008 22:01 GMT > On 28 Nov, 19:45, "Neil Truby" <neil.tr...@ardenta.com> wrote: >> IDS 10.0FC8W2 on HP-UX 11.31 [quoted text clipped - 41 lines] > So I'm seeking some comparative data from others on checkpoint times, or > even dd rates to try to verify that there is actually no problem. Maybe you should repeat that writing to /dev/null in order to check what proportion of this is read time.
 Signature Ian
Hotmail is for spammers. Real mail address is igoddard at nildram co uk
Neil Truby - 29 Nov 2008 23:26 GMT > Maybe you should repeat that writing to /dev/null in order to check what > proportion of this is read time. Yes, maybe. What I'd really like is someone to answer my original questions about their expereince of checkpoint throughputs and dd times ... ;-)
Fernando Nunes - 30 Nov 2008 00:45 GMT >> Maybe you should repeat that writing to /dev/null in order to check >> what proportion of this is read time. > > Yes, maybe. What I'd really like is someone to answer my original > questions about their expereince of checkpoint throughputs and dd times > ... ;-) Don't forget "onstat -F" during checkpoint... Regards,
 Signature Fernando Nunes Portugal
http://informix-technology.blogspot.com My email works... but I don't check it frequently...
Ian Goddard - 30 Nov 2008 12:10 GMT >> Maybe you should repeat that writing to /dev/null in order to check >> what proportion of this is read time. > > Yes, maybe. What I'd really like is someone to answer my original > questions about their expereince of checkpoint throughputs and dd times > ... ;-) But in regard to dd times you've already answered your question - it's piss-poor compared to small-iron.
 Signature Ian
Hotmail is for spammers. Real mail address is igoddard at nildram co uk
Neil Truby - 29 Nov 2008 23:47 GMT > Maybe you should repeat that writing to /dev/null in order to check what > proportion of this is read time. What does this tell me then?
# time dd if=/dev/vg00/lvol7 of=/dev/vg01/rneil_1 bs=2k count=500000 500000+0 records in 500000+0 records out
real 4:24.8 user 0.4 sys 21.8
# time dd if=/dev/vg00/lvol7 of=/dev/null bs=2k count=500000 500000+0 records in 500000+0 records out
real 2:07.9 user 0.4 sys 12.3
Art Kagel - 30 Nov 2008 02:20 GMT It's telling you that it takes about 3.2 seconds to write 500,000 Informix pages to disk on your SAN Taking half of the dev/null time as reading time and half as writing time. If your checkpoints are 5-8 seconds then the IO speed is the biggest part of the bottleneck which is what IDS is telling you - no matter what EMC says. May be the network connections to the SAN, the SAN itself, the disk array configuration (can you say NO RAID5????). Another possibility is the dbspace setup. At chunk write time (ie during a checkpoint) IDS writes out each chunk's dirty pages with a single IO thread. If you are pushing all of that changed data to a small number of very large chunks you are not taking very good advantage of IDS's parallelism. Similarly, are you using RAW chunks or COOKED? If COOKED two points 1) do you have enough AIO VPS configured (you should have at least 1-1.5 AIO VPs per chunk), 2) know that on HPUX enabling and properly tuning KAIO and using RAW chunks will make a significant difference in IO throughput (>25%). Finally make sure that the SA's and EMC have not configured the SAN with a RAID5 array configuration with a very large block size. Databases, especially but not only Informix, do MUCH better with the performance of RAID10 than RAID5 (not to mention that RAID5 is NOT SAFE AT ANY SPEED -see the papers at www.baarf.com) and with small block sizes (16K - 64K at most). EMC is in love with 256K blocks which work well for filesystems but are poor for database style IO.
Art
> > Maybe you should repeat that writing to /dev/null in order to check what > > proportion of this is read time. [quoted text clipped - 21 lines] > Informix-list@iiug.org > http://www.iiug.org/mailman/listinfo/informix-list
 Signature Art S. Kagel Oninit (www.oninit.com) IIUG Board of Directors (art@iiug.org)
Disclaimer: Please keep in mind that my own opinions are my own opinions and do not reflect on my employer, Oninit, the IIUG, nor any other organization with which I am associated either explicitly or implicitly. Neither do those opinions reflect those of other individuals affiliated with any entity with which I am affiliated nor those of the entities themselves.
InDeep - 30 Nov 2008 04:10 GMT > It's telling you that it takes about 3.2 seconds to write 500,000 > Informix pages to disk on your SAN Taking half of the dev/null time as [quoted text clipped - 20 lines] > > Art In addition to what Art said...
Check with your SAN Admin to see how many servers and disks are on the same ports that these drives are on. On my last project we found 200, yes 200 servers on one port. The previous SAN Admin had EMC training and still allowed this to happen. If too many systems are on one fibre channel no amount of tuning of your software is going to matter. A good SAN Admin is going to distribute the load across all the available ports and channels to make sure your system gets the right throughput. Hint: You can get this information with powermt commands. Ask your system administrator or SAN Admin to run reports on I/Os and get a powermt display of your ports. You should see multiple ports--at least two--for each disk. If you don't then you should ask why. This alone has a major impact on performance. EMC can get you those I/O reports as well, all their people are trained to do this.
Regarding EMC RAID5, EMC loves their RAID5, so it's most likely already set up this way. I'd be very surprised if it isn't. EMC loves to sell you on their RAID5 being 'superior' to conventional RAID5. The reason the customer uses it is most likely because they cry when they figure out how expensive RAID1-0 is going to be on Symmetrix, so they fall back to RAID5 and move on. Flash drives are coming, hide your wallet!
-ID-
> "Ian Goddard" <goddai01@hotmail.co.uk > <mailto:goddai01@hotmail.co.uk>> wrote in message [quoted text clipped - 26 lines] > Informix-list@iiug.org <mailto:Informix-list@iiug.org> > http://www.iiug.org/mailman/listinfo/informix-list Neil Truby - 30 Nov 2008 11:07 GMT >> It's telling you that it takes about 3.2 seconds to write 500,000 >> Informix pages to disk on your SAN Taking half of the dev/null time as [quoted text clipped - 18 lines] >> (16K - 64K at most). EMC is in love with 256K blocks which work well for >> filesystems but are poor for database style IO. HP and EMC have told the customer that their parts of the system are performing well. What I would really like to see, if anyone could provide it, is some checkpoint write rates/dd run times on other Big Tin to assess whether the rates we are seeing here are reasonable. If they aren't I could perhaps press the issue. Otherwise I can't. Sending me tuning tips is great, and wuld be the next valuable step. But firstly I would really value some statistics from others so that i can assess if I even have a problem or not.
thanks Neil
Obnoxio The Clown - 30 Nov 2008 17:15 GMT > What I would really like to see, if anyone could provide > it, is some checkpoint write rates/dd run times on other Big Tin to assess > whether the rates we are seeing here are reasonable. But if your "crappy virtual machine" has faster dd times, what more do you need?
 Signature Cheers, Obnoxio The Clown
http://obotheclown.blogspot.com
InDeep - 30 Nov 2008 18:47 GMT >> What I would really like to see, if anyone could provide it, is some >> checkpoint write rates/dd run times on other Big Tin to assess whether >> the rates we are seeing here are reasonable. > > But if your "crappy virtual machine" has faster dd times, what more do > you need? Good point.
There should be two metrics in play. Your disk speed is indicated by the disk drives' specs. The difference is what you're getting from the SAN and the software. You can't go faster than the drive itself, so everything from that point forward is your performance penalty. Like I said before, EMC can give you I/O reports, their people know how to get that for you, so you know what your raw I/O is. The speed of the database, and the OS, to read/write are next. You the DBA can get your stats, and really, it doesn't matter what you see from other sites, the difference from what the disk drive is capable of and what you are getting is all you need to worry about. I'd be willing to bet you're probably going to be faster than others simply because you care, most of the shops I work in don't think about it, they assume the SAN is set up with the right multi-channel architecture, and that the SAN people, yes, even EMCs' own people, know what they are doing. Most of them I've seen I can't give the highest marks, but they will do what you need if you know what to ask--and they can ask their own people if they don't know what to do. EMC likes to sell you the goods, but they expect you to go to their training, and learn how to use it. If you buy it and don't know what to ask, in most cases they do not offer much unless you pay for their professional services.
-ID-
Neil Truby - 30 Nov 2008 19:29 GMT >> What I would really like to see, if anyone could provide it, is some >> checkpoint write rates/dd run times on other Big Tin to assess whether >> the rates we are seeing here are reasonable. > > But if your "crappy virtual machine" has faster dd times, what more do you > need? Well my crappy virtual machine is attached to a Clariion which is likely doing nothing else, whereas (as someone else suggested) the customer's DMX may be busy servicing many other calls. But more generally the message back was that a 2k dd was a simplistic and unrealistic test.
Neil Truby - 30 Nov 2008 21:13 GMT (In response to a suggestion from Art):
Well yes, 8k is ***far*** superior to 2k. But, what does this tell me, and how can it more representative of what Informix does ... which is write 2k pages, isn't it?
# time dd if=/dev/vg00/lvol7 of=/dev/vg01/rneil_1 bs=2k count=500000 500000+0 records in 500000+0 records out
real 4m31.09s user 0m0.46s sys 0m21.65s
# time dd if=/dev/vg00/lvol7 of=/dev/vg01/rneil_1 bs=8k count=125000 125000+0 records in 125000+0 records out
real 1m19.58s user 0m0.11s sys 0m5.21s
# time dd if=/dev/vg00/lvol7 of=/dev/vg01/rneil_1 bs=8k count=125000 125000+0 records in 125000+0 records out
real 1m15.62s user 0m0.12s sys 0m5.91s
cheers N
Richard Kofler - 01 Dec 2008 09:00 GMT Neil Truby schrieb:
> (In response to a suggestion from Art): > [quoted text clipped - 28 lines] > cheers > N Hi Neil,
testing using dd is not an easy thing. You must consider: If your host has a lot of memory and is running LINUX or SOLARIS you will see that if the size of your sequetial file fits into memory, every repetition of your test will be I/O-less, i.e. very fast.
You must destroy your cache inside the host memory, and you can do this by shrinking memory by allocating it to another process, which eats memory. I use an IDS instance to do this and I set RESIDENT to 1. Or you can read another huge file big enough to be sure, that it destroys your cache from your dd.
Also you should use more than 1 dd in parallel. Best of course, if you already know how many cleaners are running in parallel during checkpoint. During ckeckpoint you will never see all cleaners staring to run in parallel ending at the same time, but generally you will have many cleaners active in the beginning 75% of you waiting time, and only 1 will run at the very end. It is impossible to map this behavior to dd but you can simulate the beginning, or the 'avarage' behavior. Note that starting dd in parallel will flood - your input side of dd (disk reads) - your output side of dd (writes to the I/O-subsystem) - cache(s) in your I/O-subsystem - the interconnecting network
So if you read and write from the same I/O-subsystem (which is typical) you must add input and output MB/sec and IOPS. Many FC switches are able to monitor and display their thruput channelwise, and there you can see the network load.
EMC can monitor caherates and I/O rates per host. You should ask for these figures during your test.
A good test on large iron will use 15-20GB filesize and start 4,8 and 12 dds in parallel. This will give you an impression when you hit the ceiling, and what resource is limiting your speed.
dic_k
 Signature Richard Kofler SOLID STATE EDV Dienstleistungen GmbH Vienna/Austria/Europe
Ian Michael Gumby - 02 Dec 2008 03:24 GMT > You must destroy your cache inside the host memory, and you can do > this by shrinking memory by allocating it to another process, which > eats memory. I use an IDS instance to do this and I set RESIDENT to 1. > Or you can read another huge file big enough to be sure, that it > destroys your cache from your dd. Slightly off topic, the whole "memory resident" thing isn't a good idea.
The short reason... Your Unix Kernel usually tunes itself prior to the start up of IDS during the boot process. So you're tying up memory that the system anticipates as being available. The other part of the argument is that if IDS is being utilized it will be in memory so you don't need to set the memory resident flag.
(When was the last time you set your kernel's parameters manually?)
-G
InDeep - 02 Dec 2008 04:40 GMT >> You must destroy your cache inside the host memory, and you can do >> this by shrinking memory by allocating it to another process, which [quoted text clipped - 14 lines] > > -G Gumpy,
1. Type this command on your linux command-line:
/sbin/sysctl -a
2. Pull foot out of mouth
Ian Michael Gumby - 02 Dec 2008 16:53 GMT > >> You must destroy your cache inside the host memory, and you can do > >> this by shrinking memory by allocating it to another process, which [quoted text clipped - 22 lines] > > 2. Pull foot out of mouth Again, how often do you tune your kernel?
It used to be that you had to. Then it was limited because some parameters were functions of other parameters. Now most tend to do this automatically.
Of course I'm going back to SCO days, HP-UX and AIX many moons ago. Back when you were doing that porn thing on Sybase. ;-)
InDeep - 03 Dec 2008 03:35 GMT >>>> You must destroy your cache inside the host memory, and you can do >>>> this by shrinking memory by allocating it to another process, which [quoted text clipped - 19 lines] > > Again, how often do you tune your kernel? Every single installation. Every single kernel compile.
> It used to be that you had to. Then it was limited because some > parameters were functions of other parameters. Now most tend to do > this automatically. > > Of course I'm going back to SCO days, HP-UX and AIX many moons ago. > Back when you were doing that porn thing on Sybase. ;-) I'm surprised you don't understand kernel tuning better, having more kernel experience than a lot of people. Oracle installations require certain kernel params, same for Informix.
-ID-
Ian Michael Gumby - 03 Dec 2008 18:27 GMT I do have more experience tuning kernels. Thats why I don't try to tune them unless I have to.
HP-UX starting around version 9 started using formulas when it came to tunable parameters. If you replace the formula with a hard value, you could screw things up with unintended consequences. If you modify the formula with a different formula, you could screw up other variables and again, screw things up with unintended consequences.
Also there became this separation of DBA and Sys Admin roles. So you have Logical DBAs, Physical DBAs, Sys Admins, and network Admins who all get involved on the back end. If its a web based app, toss in your webmaster/app server master too.
Since most machines served dual or multiple purposes, rather than just a dedicated database server, you erred on the side of caution by not tuning the kernel.
In today's machines, when the machines boot, they do a self discovery and they tend to tune themselves. In addition, it takes a bit of time to make the changes and test them. This costs money. Any performance boost you may get is minimal and you end up facing the statement ... "Just add more CPU/Memory/Disk. Its cheaper and lasts longer than the cost of a good consultant.". Also, all of the changes you just made get tossed out of whack when they decide to add an additional application on the server.
The point is that you don't muck with the kernel. You're better off tuning the engine to what you have than mucking up the kernel. There are other areas that you can change that will have a greater impact on performance.
To give you an example...
I have an Browning A-Bolt Rem 7mm sitting in storage since I don't get to hunt much anymore. I spent a lot of time and money in ammo to get my rifle sighted in to 200yrds and harmonicly balanced to a specific brand of ammo. (Hornaddy 139gr molly coated SSTs) I can get sub MOA shots under good conditions from a bench. I use this ammo primarily for coyotes and ferral dogs, but also effective on deer.
If I shift to a different ammunition and a heavier bullet, my grouping expands and the point of impact will shift slightly. (You want a heavier slug for larger game and different bullets have different characteristics upon impact). We're talking 1-3" depending on bullet shape, weight, quality of cartridge, and weather conditions. (And of course, I'm estimating the distance too which has the largest impact on performance).
The point is that I could spend the time and money in ammunition to sight my rifle for each different load. But why? In the field, I'm still able to put the target down because the rifle performs well enough. There are other factors which will have a greater impact on performance than trying to tune the barrel harmonics. (Oh and if I switch to the muzzle break, all bets are off and you have to tune it again, although you'll have a good starting point.)
Your trying to mod the kernel is akin to tuning the barrel harmonics. You can do it, but you'd be better off spending your money elsewhere.
-G
> Date: Tue, 2 Dec 2008 19:35:26 -0800 > From: indeep@indeep.com [quoted text clipped - 43 lines] > Informix-list@iiug.org > http://www.iiug.org/mailman/listinfo/informix-list _________________________________________________________________ Send e-mail faster without improving your typing skills. http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_speed_122008
Davorin Kremenjas - 02 Dec 2008 17:14 GMT > (In response to a suggestion from Art): > > Well yes, 8k is ***far*** superior to 2k. But, what does this tell me, and > how can it more representative of what Informix does ... which is write 2k > pages, isn't it? Neil,
I don't think it necessarily does it, at least not if you use KAIO. This is output from "onstat -g iob" on my main instance:
IBM Informix Dynamic Server Version 10.00.FC8W2 -- On-Line (Prim) -- Up 5 days 10:07:46 -- 58418704 Kbytes
AIO big buffer usage summary: class reads writes pages ops pgs/op holes hl-ops hls/op pages ops pgs/op kio 96831278 46041964 2.10 5094718 851714 5.98 6224761 1581086 3.94
So, almost 4 pages per average IO write request. And this is on an average day, not peak load.
But back to your original question, some numbers finally :)
Had the very similar situation as you until few months ago, even worse, we flushed approx 5 MB/sec on our SAN. Did all possible tests: dd (with parallel threads), pfread, table (un)loading, index building and modeling the main business processes/application requests.
Always got 5MB/sec max, no matter what. Until we reconfigured the SAN, pretty much along the lines of Art's suggestions. Now we can flush 80 to 100 MB per second - a factor 20 increase!
Since you're already tracing the checkpoints you have "a proof" it's the dskflush() is taking that checkpoint time, and not wait4critex().
What we found out is it's not really the throughput problem as MBs/ sec, it's the number of IO requests outstanding on physical disks in SAN. The reason for this was LUN's were built using 4k "segment size" because we also thought Informix only writes page by page (4k in our case, AIX). With such small segment/stripe sizes you get a lot of IOs on a busy day. We changed that to 8 pages so 32k (we've gathered AIX block size stats and decided 32k is a good starting point and trade-off between optimising for average day and peak time load; also, Informix big buffers should be 8 pages if that's relevant at all). So we slashed the number of IO requests by factor of 8 (theoretically, but close to truth during the peak load). And voila, using the same tests as before we went from 5 MB/s to 100 MB/s.
HTH
Davorin
david@smooth1.co.uk - 02 Dec 2008 18:09 GMT > (In response to a suggestion from Art): > [quoted text clipped - 28 lines] > cheers > N Informix does not always use 2k
a) it can use 16k "bug buffers" for i/o b) it can coalecse writes together at checkpoint time
Obnoxio The Clown - 30 Nov 2008 23:56 GMT >>> What I would really like to see, if anyone could provide it, is some >>> checkpoint write rates/dd run times on other Big Tin to assess whether [quoted text clipped - 7 lines] > But more generally the message back was that a 2k dd was a simplistic and > unrealistic test. Well, they would say that, wouldn't they? However, it's been a pretty standard test that has served me and others well for two decades now.
 Signature Cheers, Obnoxio The Clown
http://obotheclown.blogspot.com
Mark Townsend - 01 Dec 2008 06:43 GMT >> But more generally the message back was that a 2k dd was a simplistic >> and unrealistic test. > > Well, they would say that, wouldn't they? However, it's been a pretty > standard test that has served me and others well for two decades now. FYI - over in O-land we used to hear the "unrealistic test" refrain a lot. So to overcome that we have the Orion tool - see http://www.oracle.com/technology/software/tech/orion/index.html
Not going to be applicable here, as it models Oracle I/O, but something similar should be reasonably easy to do for Informix, and is a very useful tool for customers wishing to make informed choices around storage.
TBP - 01 Dec 2008 10:26 GMT >>> But more generally the message back was that a 2k dd was a simplistic >>> and unrealistic test. [quoted text clipped - 9 lines] > similar should be reasonably easy to do for Informix, and is a very > useful tool for customers wishing to make informed choices around storage. To me, this would seem more applicable than dd tests.
At least it will (hopefully) show up what would be "bad" for Oracle.
Neil Truby - 10 Dec 2008 05:31 GMT >>> But more generally the message back was that a 2k dd was a simplistic >>> and unrealistic test. [quoted text clipped - 9 lines] > similar should be reasonably easy to do for Informix, and is a very useful > tool for customers wishing to make informed choices around storage. Have had a look and think it *might* be very useful in diagnosing disk problems. Unfortunately for this case it is not available for HP-UX.
rgds Neil
Neil Truby - 30 Nov 2008 11:04 GMT It's telling you that it takes about 3.2 seconds to write 500,000 Informix pages to disk on your SAN Taking half of the dev/null time as reading time and half as writing time. If your checkpoints are 5-8 seconds then the IO speed is the biggest part of the bottleneck which is what IDS is telling you - no matter what EMC says. May be the network connections to the SAN, the SAN itself, the disk array configuration (can you say NO RAID5????). Another possibility is the dbspace setup. At chunk write time (ie during a checkpoint) IDS writes out each chunk's dirty pages with a single IO thread. If you are pushing all of that changed data to a small number of very large chunks you are not taking very good advantage of IDS's parallelism. Similarly, are you using RAW chunks or COOKED? If COOKED two points 1) do you have enough AIO VPS configured (you should have at least 1-1.5 AIO VPs per chunk), 2) know that on HPUX enabling and properly tuning KAIO and using RAW chunks will make a significant difference in IO throughput (>25%). Finally make sure that the SA's and EMC have not configured the SAN with a RAID5 array configuration with a very large block size. Databases, especially but not only Informix, do MUCH better with the performance of RAID10 than RAID5 (not to mention that RAID5 is NOT SAFE AT ANY SPEED -see the papers at www.baarf.com) and with small block sizes (16K - 64K at most). EMC is in love with 256K blocks which work well for filesystems but are poor for database style IO.
Thanks Art.
Not sure where you get 3.2 seconds. The figures from dd were 264s to write 500,000 2k pages to disk, and 127s to "write" the same to /dev/null.
Usually we do the SAN administration for customers so I have a view overv all this stuff. But here, the customer pays EMC to do it, so I'm in the dark really. But we informed that the mirroring is RAID-10.
I take all your points about the layout of the SANs and dbs. I know all this stuff but maybe I can't see the wood for the trees. What I would really like to see, if anyone could provide it, is some checkpoint write rates/dd run times on other Big Tin to assess whether the rates we are seeing here are reasonable.
david@smooth1.co.uk - 30 Nov 2008 05:34 GMT > <da...@smooth1.co.uk> wrote in message > [quoted text clipped - 7 lines] > throughout *you* get on checkpoints etc, so I can know for sure if this > customer's is bad. No you do not, I run different applications producing a different load using different machines with potentially different SAN switches and using different arrays with a different disk layout with different performance requirements.
What you need to know is what is the maximum checkpoint time your customers is willing to accept?
> >> It depends on how much to flush and also how many chunks the dirty > > pages are spread across and disk performance. > > In this test case there is a root, physlog, llog, tempdbs and appdbs, each > with one chunk. How are the writes spread across the dbspaces in the test case? Can you create more dbspaces to spread the load across different chunks e.g. would seperating data and indexes into different dbspaces help? Or different tables into different dbspaces? One dbspace for an app is not enough if you want good checkpoint times.
> >> Try running iostat -x 4 or whatever the HP-UX equivalent is to see > > disk i/o times per device. Possible HP-UX has something equivalent to > vxstat that can see i/o performence per volume. > > Service time on the "disks" seem brilliant, a millisecond or two at least OK so running top whilst the checkpoint is running how many cpus are busy out of the 18?
Is something else using cpu time? Is the box swapping/paging heavily? How much memory is free on the box?
> >> What array is behind the SAN, something like an XP12000 has lots of > > stats/settings that can be monitoring/tuned by the HP guys. > > An EMC Symmetrix. The Daddy of them All! How many i/os per second are you getting in the last 2-3 seconds of the checkpoint and across how many different volumes?
> OK, to address Obnoxio's question, a dd of 1g in 2k pages gives: > [quoted text clipped - 16 lines] > So I'm seeking some comparative data from others on checkpoint times, or > even dd rates to try to verify that there is actually no problem. EMC with Synmetrix used to guaranttee so many i/os per second do they still do that? What do they say for this setup?
Can they tune the array to be able to go faster given your test case?
Your two dd's above are the same aren't they? Ask EMC why the times vary so much. Are other machines accessing the array at the same time?
pokeyman76@yahoo.com - 30 Nov 2008 06:50 GMT Hate to say the obvious, but... you could try upgrading to IDS 11 and not worry about checkpoints at all since it has nonblocking checkpoint technology. It also includes automatically configuring LRU flushing and lots of nice performance features.
Just a suggestion :)
On Nov 29, 9:34 pm, "da...@smooth1.co.uk" <da...@smooth1.co.uk> wrote:
> On 29 Nov, 20:38, "Neil Truby" <neil.tr...@ardenta.com> wrote:> <da...@smooth1.co.uk> wrote in message > [quoted text clipped - 80 lines] > vary so much. > Are other machines accessing the array at the same time? Neil Truby - 30 Nov 2008 11:11 GMT >> <pokeyman76@yahoo.com> wrote in message >> news:b1d117a3-e7ec-4443-944e-c19b8cf758be@z1g2000yqn.googlegroups.com... Hate to say the obvious, but... you could try upgrading to IDS 11 and not worry about checkpoints at all since it has nonblocking checkpoint technology. It also includes automatically configuring LRU flushing and lots of nice performance features.
It isn't an option here as the underlying application is not certified for 11.
Neil Truby - 30 Nov 2008 11:10 GMT >> <da...@smooth1.co.uk> wrote in message >> [quoted text clipped - 16 lines] > What you need to know is what is the maximum checkpoint time your > customers is willing to accept? Well, zero would probably be acceptable ;-) And of course I can tune Informix to do virtually everythign as LRU writes to achieve that.
What I would like to assess is whether, by doing so, I'm addressing the problem rather than the cause. In this respect then, even though you are using different hardware, SAN and application profile, if you or anyone else could provide some checkpoint write rates/dd run times run on other Big Tin, it really *would* help me to assess whether the rates we are seeing here are reasonable.
Richard Kofler - 30 Nov 2008 17:05 GMT Neil Truby schrieb:
>>> <da...@smooth1.co.uk> wrote in message >>> [quoted text clipped - 28 lines] > on other Big Tin, it really *would* help me to assess whether the rates > we are seeing here are reasonable. Hi Neil,
here are some outdated figures from a former cutomer of mine, now running Oracle. Version was 9.40. There were 32 CPUs and a multipathed 1Gbit, full duplex FC network (bonding 2 paths actually). 2 I/O Subsystems were used exclusively by the database server. Each I/O subsystem was quad connected (4x 1Gbit FC). The host used 8 FC boards. We saw 24 flushers beeing active almost to the end of checkpoint. Thruput was around 185 MB/sec, a bit more than 23K IOPS doing 8KB writes we were monitoring as normal behavior, evenly spread over both I/O subsystems. To reach this we had to ensure: - that long blocks are written (8KB per start-I/O) - spread the load evenly on both I/O-subsystems - spread the load over 60 LUNs as even as possible
This was neither easy nor a goal to be reached very fast.
When we had like 40% of this thruput everyone and her sister tried to convince me, that everything is OK...... But in fact - it was not - that made ne say since then: Best practice does not always mean 'good' ;)
And another well hated lil signature of mine: Why spend +50% money to get 50% less performance out of the boxes?
The figures you present from dd makes me believe, that if u are on a DMX3, your test was eaviliy hampered by upstaging, i.e. had to wait for physical disks, because the write cache in the EMC was full at all times. This happens when large block sequenial writes are there at the same time as your short block writes (8KB during checkpoint). On the other hand I do know from other customers of mine, that on a DMX3 a write thruput of upto 300MB/sec is possible, though not too long (less than 1 minute) because if the write cache in the subsystem is full .... (see above) One more thing to pay attention to: Is the I/O- subsystem doing replication (like SRDF)?. If so, all depends on the interlink speed and latency and a typical value is that your writes are slower by 30%, even on 4Gbit FC attachment system when the interlink is 1 GBIT only.
HTH dic_k
 Signature Richard Kofler SOLID STATE EDV Dienstleistungen GmbH Vienna/Austria/Europe
Neil Truby - 30 Nov 2008 19:28 GMT > One more thing to pay attention to: Is the I/O- subsystem doing > replication (like SRDF)?. If so, all depends on the interlink > speed and latency and a typical value is that your writes are > slower by 30%, even on 4Gbit FC attachment system when the interlink > is 1 GBIT only. Thanks for that. There is no SRDF at present, but the intention is to start SRDF sync. replication, which will make the disk times worse of course.
Keith Simmons - 01 Dec 2008 12:42 GMT 2008/11/28 Neil Truby <neil.truby@ardenta.com>:
> IDS 10.0FC8W2 on HP-UX 11.31 > [quoted text clipped - 20 lines] > Informix-list@iiug.org > http://www.iiug.org/mailman/listinfo/informix-list Neil
P570, 4 x twin processors, 16 Gb memory total (not hugh but might help) 500 Gb fibre attached, sole use disk. 600,000 Buffers (= 2.4 Gb) LRU max 15 %, min 8%. Snapshot checkpoint around 7.7% dirty (5 mins from last) (= 185 Mb) with Fuzzy Checkpoints (are you fuzzy) < 1 second. Looking back over November my average checkpoints have been around 1 second. Any help
Keith
scottishpoet - 02 Dec 2008 09:51 GMT > IDS 10.0FC8W2 on HP-UX 11.31 > [quoted text clipped - 15 lines] > thx > Neil if large amounts of time are aiting on disk, how well does dd of large amounts of data to the disk perform without informix, is that acceptable?
scottishpoet - 02 Dec 2008 09:55 GMT > > IDS 10.0FC8W2 on HP-UX 11.31 > [quoted text clipped - 21 lines] > > - Show quoted text - oops, should have rad thw wholethread! for seem reason i thought there had only been 1 response
|
|
|