Database Forum / DB2 Topics / January 2005
Is there a rule for determining number of Page Cleaners?
|
|
Thread rating:  |
Jean-David Beyer - 17 Jan 2005 20:49 GMT I have six hard drives (4 SCSI and 2 EIDE) on my main machine with parts of a database on each drive. The main index is on one SCSI drive all to itself. The main data are on the other three SCSI drives. Small relations are on one EIDE drive, and the logfiles are on the other EIDE drive. When running the task, below, the rest of the machine is not doing much.
I do not remember where I saw it, but somewhere I got the idea that the number of page cleaners might well be one more than the number of hard drives, so I setup my database with 7 page cleaners (and 7 page fetchers).
Now a big (for me) task gets into a state where, on Linux, all seven page cleaners are in the D state. It is all working, but slow. vmstat reveals that the IO is going no where near the maximum rate that the IO channels would support (about 6 Megabytes per second, even though the IO system can easily do 36 Megabytes per second (e.g., when doing a database reorg); the SCSI drives are 10,000rpm Ultra/320 SCSI drives and the SCSI controller is on a PCI-X bus all its own.). The problem pretty clearly is the time the drives spend seeking. Now perhaps I could increase the size of the buffer pools, move stuff around on the drives, etc. In fact, the big hog is the logging.
But what I would like to know is: is there any point in using more page cleaners than I do now? Is there a better rule than the one I indicated above?
 Signature .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 15:40:01 up 17 days, 5:00, 6 users, load average: 4.10, 4.16, 4.12
Knut Stolze - 17 Jan 2005 21:43 GMT > I have six hard drives (4 SCSI and 2 EIDE) on my main machine with parts > of a database on each drive. The main index is on one SCSI drive all to > itself. The main data are on the other three SCSI drives. Small relations > are on one EIDE drive, and the logfiles are on the other EIDE drive. When > running the task, below, the rest of the machine is not doing much. [...]
> In fact, the big hog is the > logging. You should place the log on the fastest drive you have.
 Signature Knut Stolze Information Integration IBM Germany / University of Jena
Jean-David Beyer - 18 Jan 2005 02:30 GMT >>I have six hard drives (4 SCSI and 2 EIDE) on my main machine with parts >>of a database on each drive. The main index is on one SCSI drive all to [quoted text clipped - 8 lines] > > You should place the log on the fastest drive you have. I am sure you are right. But that does not answer the question on the proper ratio of page cleaners to drive spindles.
 Signature .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 21:25:00 up 17 days, 10:45, 3 users, load average: 4.33, 4.20, 4.11
Serge Rielau - 18 Jan 2005 02:35 GMT >>> I have six hard drives (4 SCSI and 2 EIDE) on my main machine with parts >>> of a database on each drive. The main index is on one SCSI drive all to [quoted text clipped - 13 lines] > I am sure you are right. But that does not answer the question on the > proper ratio of page cleaners to drive spindles. Can't answer that either (maybe Matt can jump in). But if you say the log is the issue then the page cleaners won't help I guess. BTW, would the configuration assistant be of any help? Cheers Serge
Jean-David Beyer - 18 Jan 2005 03:07 GMT >>>> I have six hard drives (4 SCSI and 2 EIDE) on my main machine with >>>> parts [quoted text clipped - 17 lines] > Can't answer that either (maybe Matt can jump in). But if you say the > log is the issue then the page cleaners won't help I guess. I guess so, too, but if all 7 are in D-state, they are a bottleneck. Now if they are all in D-state because they are waiting for disk seeks, would throughput improve with more of them?
> BTW, would the configuration assistant be of any help? I do not know what the configuration assistant is. I tried to run it, and it wants:
$ db2ca sh: line 1: /opt/IBMJava2-131/jre/bin/java: No such file or directory DB2JAVIT : RC = 127
This is true enough. I would have thought a Java around a year old should be old enough to use with the current product. Guess not.
trillian:jdbeyer[~]$ ls -l /opt/ total 48 drwxr-xr-x 3 root root 4096 Jan 1 15:06 IBM drwxr-xr-x 8 root root 4096 Mar 1 2004 IBMJava2-141 <---<<< drwxr-xr-x 2 root root 4096 Jan 1 13:30 IBM_PDF_DOCS
On my old machine, I found the Java stuff ran so slowly that it is useless, so I either use C++ programs or the command line interpreter. That was on a machine with 512Meg RAM and two Ultra/2 10,000rpm SCSI hard drives and two 550MHz Pentium IIIs.
This machine has 4GBytes RAM, 2 Hyperthreaded 3.06GHz Xeons with 1 Megabyte L3 cache, four Ultra/320 10,000rpm SCSI hard drives. But I haven't tried that stuff. I find the documentation almost impossible to use with DB2 V8.1.?. Before I used DB2 V6.1.? and the documentation was pretty clear.
 Signature .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 21:50:00 up 17 days, 11:10, 3 users, load average: 4.31, 4.12, 4.09
Matt Emmerton - 18 Jan 2005 05:40 GMT > >>>> I have six hard drives (4 SCSI and 2 EIDE) on my main machine with > >>>> parts [quoted text clipped - 16 lines] > > Can't answer that either (maybe Matt can jump in). But if you say the > > log is the issue then the page cleaners won't help I guess. <jumping in ; re-stating system details for context>
I didn't see what DB2 version you are using, so I'm going to give a general answer with a bit of historical context, assuming v7 and v8 UDB.
> I do not remember where I saw it, but somewhere I got the idea that > the number of page cleaners might well be one more than the number > of hard drives, so I setup my database with 7 page cleaners (and 7 > page fetchers). IIRC, this was the recommendation provided in the v7 performance tuning guide.
Prior to v8, all page cleaner I/O was synchronous -- meaning that while an I/O (write) was in progress, no other cleaning-related activity (aging pages, figuring out what page to write next, etc) was being done by that cleaner process. Thus, the recommendation was to have one cleaner per physical disk (spindle), in order to keep all the drives busy with cleaning activity.
In v8, we starting using AIO (asynchronous I/O), which allows each cleaner process to fire off a batch of I/Os to the OS, and then go off and do housekeeping work (aging pages and determining the next batch of pages to clean) while the current batch of I/Os are in progress. This is a much more efficient use of CPU (less context-switching and pre-emption), and scales much better (especially on large systems with many arrays of disks.)
In v8, the rule of thumb is to use one cleaner per CPU.
> Now a big (for me) task gets into a state where, on Linux, all > seven page cleaners are in the D state. It is all working, but slow. [quoted text clipped - 7 lines] > move stuff around on the drives, etc. In fact, the big hog is the > logging. Note that the max throughput of your devices is based on sequential access. Page cleaner access is usually quite random, due to the way pages are hashed into slots in the bufferpool and the aging characteristics of pages based on the workload. I'd say a 16% throughput rate is "pretty good" for a random write workload.
[ Aside: To get a rough estimate of what throughput to expect on a random workload, take the track-to-track seek time, and divide it by the average seek time. This represents the access time ratio between a sequential workload (which will seek to an adjacent track) and a random workload (which will seek halfway across the disk, on average.) Apply this ratio to your rated [sequential] throughput and you will have a good estimate of what throughput to expect for a random workload. Scale as neccessary based on read/write mix and sequential/random properties thereof. This usually works out to be somewhere between 1/10 (10%) and 1/7 (14%). ]
Also, many of the memory structures (lists) used internally by the bufferpool are based on the number of cleaners. One of the things we've seen migrating workloads from v7 to v8 is that reducing the number of cleaners can actually improve performance. Part of this is due to reduced CPU contention (having 7 cleaners on a 4-way in v8 means that they will be competing for CPU; whereas using 4 will promote fair sharing), and the async nature of the cleaners in v8 allows the OS to make better decisions with scheduling I/Os, as each cleaner will batch off N I/Os at the same time.
Hopefully this answers the posters questions.
-- Matt Emmerton IBM Toronto Software Lab
Jean-David Beyer - 18 Jan 2005 14:29 GMT [snip]
> <jumping in ; re-stating system details for context> > > I didn't see what DB2 version you are using, so I'm going to give a > general answer with a bit of historical context, assuming v7 and v8 > UDB. I think I said, but if not, I am running DB2 UDB V8.1.7 at the moment on a machine with two 3.06 GHz hyperthreaded Intel Xeon processors -- the ones with 1 MByte L3 caches (Linux thinks there are four processors). I have 4 GBytes RAM. In the database of interest, most of the relations and their indices are DMS.
>> I do not remember where I saw it, but somewhere I got the idea that >> the number of page cleaners might well be one more than the number of [quoted text clipped - 20 lines] > > In v8, the rule of thumb is to use one cleaner per CPU. So I might get better performance by reducing the number of page cleaners (and probably page fetchers, though they are not a bottleneck) from the 7 I have now to 4 each, right?
>> Now a big (for me) task gets into a state where, on Linux, all seven >> page cleaners are in the D state. It is all working, but slow. vmstat [quoted text clipped - 23 lines] > thereof. This usually works out to be somewhere between 1/10 (10%) and > 1/7 (14%). ] According to Maxtor for the SCSI drives, seek times are
track-to-track: 0.3ms average seek: 4.5ms full stroke: 11. ms
So that comes to 0.3/4.5 = 0.0667 or about 7%.
Now the disk controllers can do 320MBytes/sec advertizing maximum, though I to not recall seeing much over 55MBytes/sec in actual use over an extended (> 1 second) period of time. When running a REORGANIZE DATABASE on a DB2 database, I see 35 MBytes/second for extended (many seconds) periods. But when just populating a database with INSERTs, it runs 6 to 12 MBytes/sec.
The low number is when setiathome is running at nice level 19, the high number, when setiathome is not running. I attribute this to the fact that DB2 must wait for IOs in this case, and while it is waiting, setiathome dirtys up the caches in the processors so they run at RAM speed instead of cache speed.
So without setiathome, according to this calculation, things should run at about 21 MBytes/sec, not 12. That is less than a 50% error, so perhaps that is as good as rules of thumb are likely to be.
> Also, many of the memory structures (lists) used internally by the > bufferpool are based on the number of cleaners. One of the things [quoted text clipped - 7 lines] > > Hopefully this answers the posters questions. That answers a lot of them. Thank you.
It seems, therefore, that I should reduce the number of page cleaners (and probably page fetchers) from 7 to 4. I should also (though that would be more difficult, take the part of the database on one of the SCSI drives and put it onto an EIDE drive, and move the log files to a SCSI drive. That is more difficult and I will probably do that later.
 Signature .~. Jean-David Beyer Registered Linux User 85642. /V\ PGP-Key: 9A2FC99A Registered Machine 241939. /( )\ Shrewsbury, New Jersey http://counter.li.org ^^-^^ 09:05:00 up 17 days, 22:25, 3 users, load average: 4.21, 4.21, 4.19
Matt Emmerton - 19 Jan 2005 00:58 GMT > [snip] > > <jumping in ; re-stating system details for context> [quoted text clipped - 37 lines] > (and probably page fetchers, though they are not a bottleneck) from the 7 > I have now to 4 each, right? Correct. Anywhere between 2 and 4 cleaners (accounting for HTT) would be a reasonable starting point on your system.
> >> Now a big (for me) task gets into a state where, on Linux, all seven > >> page cleaners are in the D state. It is all working, but slow. vmstat [quoted text clipped - 48 lines] > about 21 MBytes/sec, not 12. That is less than a 50% error, so perhaps > that is as good as rules of thumb are likely to be. You also need to factor in the overhead of whatever read activity you are doing as well -- so 12MB/sec may not be all that unreasonable.
The reason you see 35MB/sec while doing a reorganize is because DB2 does a lot of sequential reads (which are quick) and the data that is written out by the page cleaners is a lot less random than during a "normal" workload.
> > Also, many of the memory structures (lists) used internally by the > > bufferpool are based on the number of cleaners. One of the things [quoted text clipped - 15 lines] > and put it onto an EIDE drive, and move the log files to a SCSI drive. > That is more difficult and I will probably do that later. Agreed.
-- Matt Emmerton IBM Toronto Software Lab
|
|
|