Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / Informix Topics / December 2008

Tip: Looking for answers? Try searching our database.

Checkpoint durations

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Neil Truby - 28 Nov 2008 19:45 GMT
IDS 10.0FC8W2 on HP-UX 11.31

I'm tuning IDS on quite a large system (18 cpus, 65g memory, Tier One SAN).

I have BUFFERS set to 10,000,000 (ie 20g), and lru_max_dirty to 0.5%.(ie
50,000 pages or 100m).  lru_min_dirty is effectively zero.

At times I'm seeingcheckpoints of 5 to 8 seconds.  According to the
checkpoint tracer nearly all of this is waiting on disk.  And my question
is: on a lareg OLTP system, does this seem reasonable throughput, a maximum
of 100m in 5-8 seconds?).

Obviously I can reduce lru_max_dirty further, but I'm just interested to
know if this 12-20MByte/s rate is comparable to others' experience.

Checkpoint statistics from other large users would be most welcome.

thx
Neil
Fernando Nunes - 28 Nov 2008 23:17 GMT
> IDS 10.0FC8W2 on HP-UX 11.31
>
[quoted text clipped - 15 lines]
> thx
> Neil

Neil,

Are you sure you're not having more than 100MB to write?
Are you basing this number on observations or just because of LRU_MAX_DIRTY?
You should also check the cleaners work during checkpoint. Are they all
reasonably busy, or are you left with one or two for the last seconds?

Signature

Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...

Neil Truby - 28 Nov 2008 23:43 GMT
>> IDS 10.0FC8W2 on HP-UX 11.31
>>
[quoted text clipped - 24 lines]
> You should also check the cleaners work during checkpoint. Are they all
> reasonably busy, or are you left with one or two for the last seconds?

Yes, I am sure.  I have a script that monitors the number of dirty pages
etc, and see it change in real time.  Also, checkpoint tracing gives you the
number of pages written. Not too sure how to check the cleaners; any advice?

Can you answer my original question from experience: do the checkpoint
durations I cite above look good, or look long to you?

cheers
Neil
Fernando Nunes - 29 Nov 2008 01:02 GMT
>>> IDS 10.0FC8W2 on HP-UX 11.31
>>>
[quoted text clipped - 36 lines]
> cheers
> Neil

onstat -F during checkpoint will show cleaner activity.
Column "data" is the chunk number... If the I/O load is not balanced you'll
probably see a "burst" of activity on all of them when checkpoint starts, then
possibly most of them will get back to sleep and one or two may be there
longer. This can also happen if by some weird reason the chunk is on "slow
disk" (hard to understand on modern systems...)

I can't give you any numbers at the moment... and they wouldn't be comparable
to your system.
The higher load system I work daily has a much smaller number of buffers, very
old hardware and HDR... And I'm not tracing the checkpoints (I can't even
remember the max_dirty/min_dirty :) ) I could risk something like 15K to 17K
buffers in about 3/4 seconds... that would be 30-34MB in about half the time...
If this numbers are correct maybe your numbers are not very good, but I need to
check these numbers.

At first glance I wouldn't be shocked by your numbers... It has to order the
buffers, and write them. Maybe you can test your disk throughput with dd? It's
a completely different situation. It should be clearly higher than what you get
at checkpoints, but maybe it can give you an idea...

Let's wait for more...

P.S.: I don't need to remind *you* but for others, IDS 11 would be nice for
that system...
Regards,

Signature

Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...

Neil Truby - 29 Nov 2008 09:13 GMT
> At first glance I wouldn't be shocked by your numbers... It has to order
> the buffers, and write them. Maybe you can test your disk throughput with
> dd? It's a completely different situation. It should be clearly higher
> than what you get at checkpoints, but maybe it can give you an idea...

Yes, I have tried dd.  the results are somewhat variable, and therefore
inconclusive, hence the question about checkpoint times on other users'
large systems.
Obnoxio The Clown - 29 Nov 2008 09:45 GMT
> Yes, I have tried dd.  the results are somewhat variable, and therefore
> inconclusive, hence the question about checkpoint times on other users'
> large systems.

Can you please post them up so that people can understand what you mean
by "somewhat variable"?

Signature

Cheers,
Obnoxio The Clown

http://obotheclown.blogspot.com

david@smooth1.co.uk - 29 Nov 2008 17:25 GMT
> IDS 10.0FC8W2 on HP-UX 11.31
>
[quoted text clipped - 15 lines]
> thx
> Neil

Well a cetain customer with an IBM p690 with 32 cpus was treating 5-11
second checkpoints as normal but that we IDS 9.40.

Another Informix customer had a HP-UX Superdome with 32cpus and a
fully loaded XP12000 array capable of 500,000 io/s. They
created a 60GB chunk then tried to load 100 million rows via lots of
parallel loads. This was IDS 9.40 with the checkpoint interval set to
15 minutes. As this was all to one chunk the checkpoints were single
threaded and taking >15 minutes per checkpoint!! Once they partitioned
across lots of 2GB chunks checkpoints started using lots of cpus and
dropped to 30sec-1 minute.

It depends on how much to flush and also how many chunks the dirty
pages are spread across and disk performance.

Run onstat -u and look for entries with the flags column ending in F,
see how the writes are balanced across page flushers.

Try running iostat -x 4 or whatever the HP-UX equivalent is to see
disk i/o times per device. Possible HP-UX has something equivalent to
vxstat that can see i/o performence per volume.

What array is behind the SAN, something like an XP12000 has lots of
stats/settings that can be monitoring/tuned by the HP guys.
Neil Truby - 29 Nov 2008 20:38 GMT
On 28 Nov, 19:45, "Neil Truby" <neil.tr...@ardenta.com> wrote:
> IDS 10.0FC8W2 on HP-UX 11.31

What I really need, David, is someone who works for a huge company with lots
of big tin - er, exactly like you as it happens! ;-) - to tell me what
throughout *you* get on checkpoints etc, so I can know for sure if this
customer's is bad.

>> It depends on how much to flush and also how many chunks the dirty
pages are spread across and disk performance.

In this test case there is a root, physlog, llog, tempdbs and appdbs, each
with one chunk.

>> Try running iostat -x 4 or whatever the HP-UX equivalent is to see
disk i/o times per device. Possible HP-UX has something equivalent to
vxstat that can see i/o performence per volume.

Service time on the "disks" seem brilliant, a millisecond or two at least

>> What array is behind the SAN, something like an XP12000 has lots of
stats/settings that can be monitoring/tuned by the HP guys.

An EMC Symmetrix.  The Daddy of them All!

OK, to address Obnoxio's question, a dd of 1g in 2k pages gives:

# time dd  if=/dev/vg00/lvol7 of=/dev/vg01/rneil_1 bs=2k count=50000
# 500000+0 records in
500000+0 records out

real     4:35.5
user        0.4
sys        21.1

This is much worse that a small VM at Ardenta attached to an EMC Clariion,
and worse even than a tiny Itanium at Ardenta copying within its own single
internal disk.

But the customer has had HP and EMC check performance thoroughly, and has
been given a clean bill of health.  I have an IBM PMR open; they suspect a
disk problem.

So I'm seeking some comparative data from others on checkpoint times, or
even dd rates to try to verify that there is actually no problem.
Ian Goddard - 29 Nov 2008 22:01 GMT
> On 28 Nov, 19:45, "Neil Truby" <neil.tr...@ardenta.com> wrote:
>> IDS 10.0FC8W2 on HP-UX 11.31
[quoted text clipped - 41 lines]
> So I'm seeking some comparative data from others on checkpoint times, or
> even dd rates to try to verify that there is actually no problem.

Maybe you should repeat that writing to /dev/null in order to check what
proportion of this is read time.

Signature

Ian

Hotmail is for spammers.  Real mail address is igoddard
at nildram co uk

Neil Truby - 29 Nov 2008 23:26 GMT
> Maybe you should repeat that writing to /dev/null in order to check what
> proportion of this is read time.

Yes, maybe.  What I'd really like is someone to answer my original questions
about their expereince of checkpoint throughputs and dd times ... ;-)
Fernando Nunes - 30 Nov 2008 00:45 GMT
>> Maybe you should repeat that writing to /dev/null in order to check
>> what proportion of this is read time.
>
> Yes, maybe.  What I'd really like is someone to answer my original
> questions about their expereince of checkpoint throughputs and dd times
> ... ;-)

Don't forget "onstat -F" during checkpoint...
Regards,

Signature

Fernando Nunes
Portugal

http://informix-technology.blogspot.com
My email works... but I don't check it frequently...

Ian Goddard - 30 Nov 2008 12:10 GMT
>> Maybe you should repeat that writing to /dev/null in order to check
>> what proportion of this is read time.
>
> Yes, maybe.  What I'd really like is someone to answer my original
> questions about their expereince of checkpoint throughputs and dd times
> ... ;-)

But in regard to dd times you've already answered your question - it's
piss-poor compared to small-iron.

Signature

Ian

Hotmail is for spammers.  Real mail address is igoddard
at nildram co uk

Neil Truby - 29 Nov 2008 23:47 GMT
> Maybe you should repeat that writing to /dev/null in order to check what
> proportion of this is read time.

What does this tell me then?

# time dd  if=/dev/vg00/lvol7 of=/dev/vg01/rneil_1 bs=2k count=500000
500000+0 records in
500000+0 records out

real     4:24.8
user        0.4
sys        21.8

# time dd if=/dev/vg00/lvol7 of=/dev/null bs=2k count=500000
500000+0 records in
500000+0 records out

real     2:07.9
user        0.4
sys        12.3
Art Kagel - 30 Nov 2008 02:20 GMT
It's telling you that it takes about 3.2 seconds to write 500,000 Informix
pages to disk on your SAN Taking half of the dev/null time as reading time
and half as writing time.  If your checkpoints are 5-8 seconds then the IO
speed is the biggest part of the bottleneck which is what IDS is telling you
- no matter what EMC says.  May be the network connections to the SAN, the
SAN itself, the disk array configuration (can you say NO RAID5????).
Another possibility is the dbspace setup.  At chunk write time (ie during a
checkpoint) IDS writes out each chunk's dirty pages with a single IO
thread.  If you are pushing all of that changed data to a small number of
very large chunks you are not taking very good advantage of IDS's
parallelism.  Similarly, are you using RAW chunks or COOKED?  If COOKED two
points 1) do you have enough AIO VPS configured (you should have at least
1-1.5 AIO VPs per chunk), 2) know that on HPUX enabling and properly tuning
KAIO and using RAW chunks will make a significant difference in IO
throughput (>25%).  Finally make sure that the SA's and EMC have not
configured the SAN with a RAID5 array configuration with a very large block
size.  Databases, especially but not only Informix, do MUCH better with the
performance of RAID10 than RAID5 (not to mention that RAID5 is NOT SAFE AT
ANY SPEED -see the papers at www.baarf.com) and with small block sizes (16K
- 64K at most).  EMC is in love with 256K blocks which work well for
filesystems but are poor for database style IO.

Art

> > Maybe you should repeat that writing to /dev/null in order to check what
> > proportion of this is read time.
[quoted text clipped - 21 lines]
> Informix-list@iiug.org
> http://www.iiug.org/mailman/listinfo/informix-list

Signature

Art S. Kagel
Oninit (www.oninit.com)
IIUG Board of Directors (art@iiug.org)

Disclaimer: Please keep in mind that my own opinions are my own opinions and
do not reflect on my employer, Oninit, the IIUG, nor any other organization
with which I am associated either explicitly or implicitly.  Neither do
those opinions reflect those of other individuals affiliated with any entity
with which I am affiliated nor those of the entities themselves.

InDeep - 30 Nov 2008 04:10 GMT
> It's telling you that it takes about 3.2 seconds to write 500,000
> Informix pages to disk on your SAN Taking half of the dev/null time as
[quoted text clipped - 20 lines]
>
> Art

In addition to what Art said...

Check with your SAN Admin to see how many servers and disks are on the same
ports that these drives are on.  On my last project we found 200, yes 200
servers on one port.  The previous SAN Admin had EMC training and still
allowed this to happen.  If too many systems are on one fibre channel no
amount of tuning of your software is going to matter.  A good SAN Admin is
going to distribute the load across all the available ports and channels
to make sure your system gets the right throughput.  Hint:  You can get
this information with powermt commands.  Ask your system administrator or
SAN Admin to run reports on I/Os and get a powermt display of your ports.
You should see multiple ports--at least two--for each disk.  If you don't
then you should ask why.  This alone has a major impact on performance.
EMC can get you those I/O reports as well, all their people are trained
to do this.

Regarding EMC RAID5, EMC loves their RAID5, so it's most likely already
set up this way.  I'd be very surprised if it isn't.  EMC loves to sell you
on their RAID5 being 'superior' to conventional RAID5.  The reason the
customer uses it is most likely because they cry when they figure out how
expensive RAID1-0 is going to be on Symmetrix, so they fall back to RAID5
and move on.  Flash drives are coming, hide your wallet!

-ID-

>     "Ian Goddard" <goddai01@hotmail.co.uk
>     <mailto:goddai01@hotmail.co.uk>> wrote in message
[quoted text clipped - 26 lines]
>     Informix-list@iiug.org <mailto:Informix-list@iiug.org>
>     http://www.iiug.org/mailman/listinfo/informix-list
Neil Truby - 30 Nov 2008 11:07 GMT
>> It's telling you that it takes about 3.2 seconds to write 500,000
>> Informix pages to disk on your SAN Taking half of the dev/null time as
[quoted text clipped - 18 lines]
>> (16K - 64K at most).  EMC is in love with 256K blocks which work well for
>> filesystems but are poor for database style IO.

HP and EMC have told the customer that their parts of the system are
performing well.   What I would really like to see, if anyone could provide
it, is some checkpoint write rates/dd run times on other Big Tin to assess
whether the rates we are seeing here are reasonable.  If they aren't I could
perhaps press the issue.  Otherwise I can't.  Sending me tuning tips is
great, and wuld be the next valuable step.  But firstly I would really value
some statistics from others so that i can assess if I even have a problem or
not.

thanks
Neil
Obnoxio The Clown - 30 Nov 2008 17:15 GMT
> What I would really like to see, if anyone could provide
> it, is some checkpoint write rates/dd run times on other Big Tin to assess
> whether the rates we are seeing here are reasonable.  

But if your "crappy virtual machine" has faster dd times, what more do
you need?

Signature

Cheers,
Obnoxio The Clown

http://obotheclown.blogspot.com

InDeep - 30 Nov 2008 18:47 GMT
>> What I would really like to see, if anyone could provide it, is some
>> checkpoint write rates/dd run times on other Big Tin to assess whether
>> the rates we are seeing here are reasonable.  
>
> But if your "crappy virtual machine" has faster dd times, what more do
> you need?

Good point.

There should be two metrics in play.  Your disk speed is indicated by the
disk drives' specs.  The difference is what you're getting from the SAN and
the software.  You can't go faster than the drive itself, so everything from
that point forward is your performance penalty.   Like I said before, EMC can
give you I/O reports, their people know how to get that for you, so you know
what your raw I/O is.  The speed of the database, and the OS, to read/write are
next.  You the DBA can get your stats, and really, it doesn't matter what you
see from other sites, the difference from what the disk drive is capable of
and what you are getting is all you need to worry about.  I'd be willing to
bet you're probably going to be faster than others simply because you care,
most of the shops I work in don't think about it, they assume the SAN is set
up with the right multi-channel architecture, and that the SAN people, yes,
even EMCs' own people, know what they are doing.  Most of them I've seen I
can't give the highest marks, but they will do what you need if you know
what to ask--and they can ask their own people if they don't know what to do.
EMC likes to sell you the goods, but they expect you to go to their training,
and learn how to use it.  If you buy it and don't know what to ask, in most
cases they do not offer much unless you pay for their professional services.

-ID-
Neil Truby - 30 Nov 2008 19:29 GMT
>> What I would really like to see, if anyone could provide it, is some
>> checkpoint write rates/dd run times on other Big Tin to assess whether
>> the rates we are seeing here are reasonable.
>
> But if your "crappy virtual machine" has faster dd times, what more do you
> need?

Well my crappy virtual machine is attached to a Clariion which is likely
doing nothing else, whereas (as someone else suggested) the customer's DMX
may be busy servicing many other calls.
But more generally the message back was that a 2k dd was a simplistic and
unrealistic test.
Neil Truby - 30 Nov 2008 21:13 GMT
(In response to a suggestion from Art):

Well yes, 8k is ***far*** superior to 2k.  But, what does this tell me, and
how can it more representative of what Informix does ... which is write 2k
pages, isn't it?

# time dd  if=/dev/vg00/lvol7 of=/dev/vg01/rneil_1 bs=2k count=500000
500000+0 records in
500000+0 records out

real    4m31.09s
user    0m0.46s
sys     0m21.65s

# time dd  if=/dev/vg00/lvol7 of=/dev/vg01/rneil_1 bs=8k count=125000
125000+0 records in
125000+0 records out

real    1m19.58s
user    0m0.11s
sys     0m5.21s

# time dd  if=/dev/vg00/lvol7 of=/dev/vg01/rneil_1 bs=8k count=125000
125000+0 records in
125000+0 records out

real    1m15.62s
user    0m0.12s
sys     0m5.91s

cheers
N
Richard Kofler - 01 Dec 2008 09:00 GMT
Neil Truby schrieb:
> (In response to a suggestion from Art):
>
[quoted text clipped - 28 lines]
> cheers
> N

Hi Neil,

testing using dd is not an easy thing.
You must consider:
If your host has a lot of memory and is running LINUX or SOLARIS
you will see that if the size of your sequetial file fits into memory,
every repetition of your test will be I/O-less, i.e. very fast.

You must destroy your cache inside the host memory, and you can do
this by shrinking memory by allocating it to another process, which
eats memory. I use an IDS instance to do this and I set RESIDENT to 1.
Or you can read another huge file big enough to be sure, that it
destroys your cache from your dd.

Also you should use more than 1 dd in parallel. Best of course, if you
already know how many cleaners are running in parallel during checkpoint.
During ckeckpoint you will never see all cleaners staring to run
in parallel ending at the same time, but generally you will have
many cleaners active in the beginning 75% of you waiting time, and only 1
will run at the very end.
It is impossible to map this behavior to dd but you can simulate the
beginning, or the 'avarage' behavior.
Note that starting dd in parallel will flood
- your input side of dd (disk reads)
- your output side of dd (writes to the I/O-subsystem)
- cache(s) in your I/O-subsystem
- the interconnecting network

So if you read and write from the same I/O-subsystem (which is
typical) you must add input and output MB/sec and IOPS.
Many FC switches are able to monitor and display their thruput
channelwise, and there you can see the network load.

EMC can monitor caherates and I/O rates per host.
You should ask for these figures during your test.

A good test on large iron will use 15-20GB filesize and start
4,8 and 12 dds in parallel. This will give you an impression when
you hit the ceiling, and what resource is limiting your speed.

dic_k

Signature

Richard Kofler
SOLID STATE EDV
Dienstleistungen GmbH
Vienna/Austria/Europe

Ian Michael Gumby - 02 Dec 2008 03:24 GMT
> You must destroy your cache inside the host memory, and you can do
> this by shrinking memory by allocating it to another process, which
> eats memory. I use an IDS instance to do this and I set RESIDENT to 1.
> Or you can read another huge file big enough to be sure, that it
> destroys your cache from your dd.

Slightly off topic, the whole "memory resident" thing isn't a good
idea.

The short reason... Your Unix Kernel usually tunes itself prior to the
start up of IDS during the boot process. So you're tying up memory
that the system anticipates as being available. The other part of the
argument is that if IDS is being utilized it will be in memory so you
don't need to set the memory resident flag.

(When was the last time you set your kernel's parameters manually?)

-G
InDeep - 02 Dec 2008 04:40 GMT
>> You must destroy your cache inside the host memory, and you can do
>> this by shrinking memory by allocating it to another process, which
[quoted text clipped - 14 lines]
>
> -G

Gumpy,

1.  Type this command on your linux command-line:

/sbin/sysctl -a

2.  Pull foot out of mouth
Ian Michael Gumby - 02 Dec 2008 16:53 GMT
> >> You must destroy your cache inside the host memory, and you can do
> >> this by shrinking memory by allocating it to another process, which
[quoted text clipped - 22 lines]
>
> 2.  Pull foot out of mouth

Again, how often do you tune your kernel?

It used to be that you had to. Then it was limited because some
parameters were functions of other parameters. Now most tend to do
this automatically.

Of course I'm going back to SCO days, HP-UX and AIX many moons ago.
Back when you were doing that porn thing on Sybase. ;-)
InDeep - 03 Dec 2008 03:35 GMT
>>>> You must destroy your cache inside the host memory, and you can do
>>>> this by shrinking memory by allocating it to another process, which
[quoted text clipped - 19 lines]
>
> Again, how often do you tune your kernel?

Every single installation.  Every single kernel compile.

> It used to be that you had to. Then it was limited because some
> parameters were functions of other parameters. Now most tend to do
> this automatically.
>
> Of course I'm going back to SCO days, HP-UX and AIX many moons ago.
> Back when you were doing that porn thing on Sybase. ;-)

I'm surprised you don't understand kernel tuning better, having more
kernel experience than a lot of people.  Oracle installations require
certain kernel params, same for Informix.

-ID-
Ian Michael Gumby - 03 Dec 2008 18:27 GMT
I do have more experience tuning kernels.
Thats why I don't try to tune them unless I have to.

HP-UX starting around version 9 started using formulas when it came to tunable parameters.
If you replace the formula with a hard value, you could screw things up with unintended consequences.
If you modify the formula with a different formula, you could screw up other variables and again, screw things up with unintended consequences.

Also there became this separation of DBA and Sys Admin roles. So you have Logical DBAs, Physical DBAs, Sys Admins, and network Admins who all get involved on the back end. If its a web based app, toss in your webmaster/app server master too.

Since most machines served dual or multiple purposes, rather than just a dedicated database server, you erred on the side of caution by not tuning the kernel.

In today's machines, when the machines boot, they do a self discovery and they tend to tune themselves. In addition, it takes a bit of time to make the changes and test them. This costs money. Any performance boost you may get is minimal and you end up facing the statement ... "Just add more CPU/Memory/Disk. Its cheaper and lasts longer than the cost of a good consultant.". Also, all of the changes you just made get tossed out of whack when they decide to add an additional application on the server.

The point is that you don't muck with the kernel.
You're better off tuning the engine to what you have than mucking up the kernel.
There are other areas that you can change that will have a greater impact on performance.

To give you an example...

I have an Browning A-Bolt Rem 7mm sitting in storage since I don't get to hunt much anymore.
I spent a lot of time and money in ammo to get my rifle sighted in to 200yrds and harmonicly balanced to a specific brand of ammo.
(Hornaddy 139gr molly coated SSTs) I can get sub MOA shots under good conditions from a bench. I use this ammo primarily for coyotes and ferral dogs, but also effective on deer.

If I shift to a different ammunition and a heavier bullet, my grouping expands and the point of impact will shift slightly.
(You want a heavier slug for larger game and different bullets have different characteristics upon impact).
We're talking 1-3" depending on bullet shape, weight, quality of cartridge, and weather conditions. (And of course, I'm estimating the distance too which has the largest impact on performance).

The point is that I could spend the time and money in ammunition to sight my rifle for each different load. But why? In the field, I'm still able to put the target down because the rifle performs well enough. There are other factors which will have a greater impact on performance than trying to tune the barrel harmonics. (Oh and if I switch to the muzzle break, all bets are off and you have to tune it again, although you'll have a good starting point.)

Your trying to mod the kernel is akin to tuning the barrel harmonics. You can do it, but you'd be better off spending your money elsewhere.

-G

> Date: Tue, 2 Dec 2008 19:35:26 -0800
> From: indeep@indeep.com
[quoted text clipped - 43 lines]
> Informix-list@iiug.org
> http://www.iiug.org/mailman/listinfo/informix-list

_________________________________________________________________
Send e-mail faster without improving your typing skills.
http://windowslive.com/Explore/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_speed_122008
Davorin Kremenjas - 02 Dec 2008 17:14 GMT
> (In response to a suggestion from Art):
>
> Well yes, 8k is ***far*** superior to 2k.  But, what does this tell me, and
> how can it more representative of what Informix does ... which is write 2k
> pages, isn't it?

Neil,

I don't think it necessarily does it, at least not if you use KAIO.
This is  output from "onstat -g iob" on my main instance:

IBM Informix Dynamic Server Version 10.00.FC8W2   -- On-Line (Prim) --
Up 5 days 10:07:46 -- 58418704 Kbytes

AIO big buffer usage summary:
class                 reads                                   writes
      pages    ops  pgs/op  holes  hl-ops hls/op      pages    ops
pgs/op
kio 96831278  46041964   2.10 5094718  851714   5.98      6224761
1581086   3.94

So, almost 4 pages per average IO write request. And this is on an
average day, not peak load.

But back to your original question, some numbers finally :)

Had the very similar situation as you until few months ago, even
worse, we flushed approx 5 MB/sec on our SAN. Did all possible tests:
dd (with parallel threads), pfread, table (un)loading, index building
and modeling the main business processes/application requests.

Always got 5MB/sec max, no matter what. Until we reconfigured the SAN,
pretty much along the lines of Art's suggestions. Now we can flush 80
to 100 MB per second - a factor 20 increase!

Since you're already tracing the checkpoints you have "a proof" it's
the dskflush() is taking that checkpoint time, and not wait4critex().

What we found out is it's not really the throughput problem as MBs/
sec, it's the number of IO requests outstanding on physical disks in
SAN. The reason for this was LUN's were built using 4k "segment size"
because we also thought Informix only writes page by page (4k in our
case, AIX). With such small segment/stripe sizes you get a lot of IOs
on a busy day.
We changed that to 8 pages so 32k (we've gathered AIX block size stats
and decided 32k is a good starting point and trade-off between
optimising for average day and peak time load; also, Informix big
buffers should be 8 pages if that's relevant at all). So we slashed
the number of IO requests by factor of 8 (theoretically, but close to
truth during the peak load). And voila, using the same tests as before
we went from 5 MB/s to 100 MB/s.

HTH

Davorin
david@smooth1.co.uk - 02 Dec 2008 18:09 GMT
> (In response to a suggestion from Art):
>
[quoted text clipped - 28 lines]
> cheers
> N

Informix does not always use 2k

a) it can use 16k "bug buffers" for i/o
b) it can coalecse writes together at checkpoint time
Obnoxio The Clown - 30 Nov 2008 23:56 GMT
>>> What I would really like to see, if anyone could provide it, is some
>>> checkpoint write rates/dd run times on other Big Tin to assess whether
[quoted text clipped - 7 lines]
> But more generally the message back was that a 2k dd was a simplistic and
> unrealistic test.

Well, they would say that, wouldn't they? However, it's been a pretty
standard test that has served me and others well for two decades now.

Signature

Cheers,
Obnoxio The Clown

http://obotheclown.blogspot.com

Mark Townsend - 01 Dec 2008 06:43 GMT
>> But more generally the message back was that a 2k dd was a simplistic
>> and unrealistic test.
>
> Well, they would say that, wouldn't they? However, it's been a pretty
> standard test that has served me and others well for two decades now.

FYI - over in O-land we used to hear the "unrealistic test" refrain a
lot. So to overcome that we have the Orion tool - see
http://www.oracle.com/technology/software/tech/orion/index.html

Not going to be applicable here, as it models Oracle I/O, but something
similar should be reasonably easy to do for Informix, and is a very
useful tool for customers wishing to make informed choices around storage.
TBP - 01 Dec 2008 10:26 GMT
>>> But more generally the message back was that a 2k dd was a simplistic
>>> and unrealistic test.
[quoted text clipped - 9 lines]
> similar should be reasonably easy to do for Informix, and is a very
> useful tool for customers wishing to make informed choices around storage.

To me, this would seem more applicable than dd tests.

At least it will (hopefully) show up what would be "bad" for Oracle.
Neil Truby - 10 Dec 2008 05:31 GMT
>>> But more generally the message back was that a 2k dd was a simplistic
>>> and unrealistic test.
[quoted text clipped - 9 lines]
> similar should be reasonably easy to do for Informix, and is a very useful
> tool for customers wishing to make informed choices around storage.

Have had a look and think it *might* be very useful in diagnosing disk
problems.  Unfortunately for this case it is not available for HP-UX.

rgds
Neil
Neil Truby - 30 Nov 2008 11:04 GMT
It's telling you that it takes about 3.2 seconds to write 500,000 Informix
pages to disk on your SAN Taking half of the dev/null time as reading time
and half as writing time.  If your checkpoints are 5-8 seconds then the IO
speed is the biggest part of the bottleneck which is what IDS is telling
you - no matter what EMC says.  May be the network connections to the SAN,
the SAN itself, the disk array configuration (can you say NO RAID5????).
Another possibility is the dbspace setup.  At chunk write time (ie during a
checkpoint) IDS writes out each chunk's dirty pages with a single IO thread.
If you are pushing all of that changed data to a small number of very large
chunks you are not taking very good advantage of IDS's parallelism.
Similarly, are you using RAW chunks or COOKED?  If COOKED two points 1) do
you have enough AIO VPS configured (you should have at least 1-1.5 AIO VPs
per chunk), 2) know that on HPUX enabling and properly tuning KAIO and using
RAW chunks will make a significant difference in IO throughput (>25%).
Finally make sure that the SA's and EMC have not configured the SAN with a
RAID5 array configuration with a very large block size.  Databases,
especially but not only Informix, do MUCH better with the performance of
RAID10 than RAID5 (not to mention that RAID5 is NOT SAFE AT ANY SPEED -see
the papers at www.baarf.com) and with small block sizes (16K - 64K at most).
EMC is in love with 256K blocks which work well for filesystems but are poor
for database style IO.

Thanks Art.

Not sure where you get 3.2 seconds.  The figures from dd were 264s to write
500,000 2k pages to disk, and 127s to "write" the same to /dev/null.

Usually we do the SAN administration for customers so I have a view overv
all this stuff.  But here, the customer pays EMC to do it, so I'm in the
dark really.  But we informed that the mirroring is RAID-10.

I take all your points about the layout of the SANs and dbs.  I know all
this stuff but maybe I can't see the wood for the trees.  What I would
really like to see, if anyone could provide it, is some checkpoint write
rates/dd run times on other Big Tin to assess whether the rates we are
seeing here are reasonable.
david@smooth1.co.uk - 30 Nov 2008 05:34 GMT
> <da...@smooth1.co.uk> wrote in message
>
[quoted text clipped - 7 lines]
> throughout *you* get on checkpoints etc, so I can know for sure if this
> customer's is bad.

  No you do not, I run different applications producing a different
load using different machines with
  potentially different SAN switches and using different arrays with
a different disk layout with different performance requirements.

  What you need to know is what is the maximum checkpoint time your
customers is willing to accept?

> >> It depends on how much to flush and also how many chunks the dirty
>
> pages are spread across and disk performance.
>
> In this test case there is a root, physlog, llog, tempdbs and appdbs, each
> with one chunk.

  How are the writes spread across the dbspaces in the test case?
  Can you create more dbspaces to spread the load across different
chunks e.g. would seperating data and indexes into different
  dbspaces help? Or different tables into different dbspaces? One
dbspace for an app is not enough if you want good checkpoint times.

> >> Try running iostat -x 4 or whatever the HP-UX equivalent is to see
>
> disk i/o times per device. Possible HP-UX has something equivalent to
> vxstat that can see i/o performence per volume.
>
> Service time on the "disks" seem brilliant, a millisecond or two at least

  OK so running top whilst the checkpoint is running how many cpus
are busy out of the 18?

  Is something else using cpu time? Is the box swapping/paging
heavily? How much memory is free on the box?

> >> What array is behind the SAN, something like an XP12000 has lots of
>
> stats/settings that can be monitoring/tuned by the HP guys.
>
> An EMC Symmetrix.  The Daddy of them All!

  How many i/os per second are you getting in the last 2-3 seconds of
the checkpoint and across how many different volumes?

> OK, to address Obnoxio's question, a dd of 1g in 2k pages gives:
>
[quoted text clipped - 16 lines]
> So I'm seeking some comparative data from others on checkpoint times, or
> even dd rates to try to verify that there is actually no problem.

EMC with Synmetrix used to guaranttee so many i/os per second do they
still do that? What do they say for this setup?

Can they tune the array to be able to go faster given your test case?

Your two dd's above are the same aren't they? Ask EMC why the times
vary so much.
Are other machines accessing the array at the same time?
pokeyman76@yahoo.com - 30 Nov 2008 06:50 GMT
Hate to say the obvious, but... you could try upgrading to IDS 11 and
not worry about checkpoints at all since it has nonblocking checkpoint
technology. It also includes automatically configuring LRU flushing
and lots of nice performance features.

Just a suggestion :)

On Nov 29, 9:34 pm, "da...@smooth1.co.uk" <da...@smooth1.co.uk> wrote:
> On 29 Nov, 20:38, "Neil Truby" <neil.tr...@ardenta.com> wrote:> <da...@smooth1.co.uk> wrote in message
>
[quoted text clipped - 80 lines]
> vary so much.
> Are other machines accessing the array at the same time?
Neil Truby - 30 Nov 2008 11:11 GMT
>> <pokeyman76@yahoo.com> wrote in message
>> news:b1d117a3-e7ec-4443-944e-c19b8cf758be@z1g2000yqn.googlegroups.com...
Hate to say the obvious, but... you could try upgrading to IDS 11 and
not worry about checkpoints at all since it has nonblocking checkpoint
technology. It also includes automatically configuring LRU flushing
and lots of nice performance features.

It isn't an option here as the underlying application is not certified for
11.
Neil Truby - 30 Nov 2008 11:10 GMT
>> <da...@smooth1.co.uk> wrote in message
>>
[quoted text clipped - 16 lines]
>   What you need to know is what is the maximum checkpoint time your
> customers is willing to accept?

Well, zero would probably be acceptable ;-)
And of course I can tune Informix to do virtually everythign as LRU writes
to achieve that.

What I would like to assess is whether, by doing so, I'm addressing the
problem rather than the cause.  In this respect then, even though you are
using different hardware, SAN and application profile,  if you or anyone
else could provide some checkpoint write rates/dd run times run on other Big
Tin, it really *would* help me to assess whether the rates we are seeing
here are reasonable.
Richard Kofler - 30 Nov 2008 17:05 GMT
Neil Truby schrieb:
>>> <da...@smooth1.co.uk> wrote in message
>>>
[quoted text clipped - 28 lines]
> on other Big Tin, it really *would* help me to assess whether the rates
> we are seeing here are reasonable.

Hi Neil,

here are some outdated figures from a former cutomer of mine, now running
Oracle. Version was 9.40. There were 32 CPUs and a multipathed 1Gbit,
full duplex FC network (bonding 2 paths actually).
2 I/O Subsystems were used exclusively by the database server.
Each I/O subsystem was quad connected (4x 1Gbit FC). The host used 8
FC boards.
We saw 24 flushers beeing active almost to the end of checkpoint.
Thruput was around 185 MB/sec, a bit more than 23K IOPS doing 8KB
writes we were monitoring as normal behavior, evenly spread over both
I/O subsystems.
To reach this we had to ensure:
- that long blocks are written (8KB per start-I/O)
- spread the load evenly on both I/O-subsystems
- spread the load over 60 LUNs as even as possible

This was neither easy nor a goal to be reached very fast.

When we had like 40% of this thruput everyone and her sister
tried to convince me, that everything is OK......
But in fact - it was not - that made ne say since then:
Best practice does not always mean 'good' ;)

And another well hated lil signature of mine:
Why spend +50% money to get 50% less performance out of the boxes?

The figures you present from dd makes me believe, that if u are on
a DMX3, your test was eaviliy hampered by upstaging, i.e. had to
wait for physical disks, because the write cache in the EMC was
full at all times. This happens when large block sequenial writes
are there at the same time as your short block writes (8KB during
checkpoint).
On the other hand I do know from other customers of mine, that on
a DMX3 a write thruput of upto 300MB/sec is possible, though not too
long (less than 1 minute) because if the write cache in the subsystem
is full .... (see above)
One more thing to pay attention to: Is the I/O- subsystem doing
replication (like SRDF)?. If so, all depends on the interlink
speed and latency and a typical value is that your writes are
slower by 30%, even on 4Gbit FC attachment system when the interlink
is 1 GBIT only.

HTH
dic_k

Signature

Richard Kofler
SOLID STATE EDV
Dienstleistungen GmbH
Vienna/Austria/Europe

Neil Truby - 30 Nov 2008 19:28 GMT
> One more thing to pay attention to: Is the I/O- subsystem doing
> replication (like SRDF)?. If so, all depends on the interlink
> speed and latency and a typical value is that your writes are
> slower by 30%, even on 4Gbit FC attachment system when the interlink
> is 1 GBIT only.

Thanks for that.
There is no SRDF at present, but the intention is to start SRDF sync.
replication, which will make the disk times worse of course.
Keith Simmons - 01 Dec 2008 12:42 GMT
2008/11/28 Neil Truby <neil.truby@ardenta.com>:
> IDS 10.0FC8W2 on HP-UX 11.31
>
[quoted text clipped - 20 lines]
> Informix-list@iiug.org
> http://www.iiug.org/mailman/listinfo/informix-list

Neil

P570, 4 x twin processors, 16 Gb memory total (not hugh but might help)
500 Gb fibre attached, sole use disk.
600,000 Buffers (= 2.4 Gb) LRU max 15 %, min 8%. Snapshot checkpoint
around 7.7% dirty (5 mins from last) (= 185 Mb) with Fuzzy Checkpoints
(are you fuzzy) < 1 second.
Looking back over November my average checkpoints have been around 1 second.
Any help

Keith
scottishpoet - 02 Dec 2008 09:51 GMT
> IDS 10.0FC8W2 on HP-UX 11.31
>
[quoted text clipped - 15 lines]
> thx
> Neil

if large amounts of time are aiting on disk, how well does dd  of
large amounts of data to the disk perform without informix, is that
acceptable?
scottishpoet - 02 Dec 2008 09:55 GMT
> > IDS 10.0FC8W2 on HP-UX 11.31
>
[quoted text clipped - 21 lines]
>
> - Show quoted text -

oops, should have rad thw wholethread! for seem reason i thought there
had only been 1 response
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.