Database Forum / DB2 Topics / April 2007
IBM 595 dual core CPU DB partitioning question
|
|
Thread rating:  |
Woody Ling - 30 Mar 2007 15:32 GMT I am starting to config a 64 bits DB2 in IBM 595 AIX box with 2 dual core CPU and I would like to assigned one 'processor' for one db partition. Should I config it as a 4 nodes or 2 nodes instances? How about other setting such as IO cleaner, Default degree etc?
james_dey@hotmail.com - 30 Mar 2007 16:12 GMT > I am starting to config a 64 bits DB2 in IBM 595 AIX box with 2 dual > core CPU and I would like to assigned one 'processor' for one db > partition. Should I config it as a 4 nodes or 2 nodes instances? How > about other setting such as IO cleaner, Default degree etc? There's no rules of thumb. It depends on expected volumetrics and the nature of the database that you're working with.
There's a "configuration advisor" in Control Center, if you get desperate
Liam Finnie - 30 Mar 2007 17:56 GMT On Mar 30, 11:12 am, james_...@hotmail.com wrote:
> > I am starting to config a 64 bits DB2 in IBM 595 AIX box with 2 dual > > core CPU and I would like to assigned one 'processor' for one db [quoted text clipped - 6 lines] > There's a "configuration advisor" in Control Center, if you get > desperate Hi Woody,
Any particular reason why you want a partitioned instance here? Each database partition will use up some fixed resources (CPU and memory) - the more partitions you have on a machine, the less resources are available for your actual workload. As far as machine sizes go, I probably wouldn't recommend using multiple partitions for a 4-CPU machine, unless you're planning to scale out and add more machines in the future. A single partition can fully utilize all the CPUs on your machine.
Cheers, Liam.
Mark A - 30 Mar 2007 18:20 GMT > Hi Woody, > [quoted text clipped - 9 lines] > Cheers, > Liam. Partitioning a table enables query parallelism. If this is a data warehouse application with a lot of table scans, then it will help performance. If it is an OLTP application or access is usually returning a small answer set via index access, then it is probably not a good idea.
The number of CPU's is not a determinant as to whether to partition, but you generally should have at least one CPU per partition (dual core can count as 2 CPU's). If you have a data warehouse application using a lot of table scans and the server has one dual core CPU, partitioning the table is probably a good idea.
Woody Ling - 30 Mar 2007 19:50 GMT > > Hi Woody, > [quoted text clipped - 20 lines] > scans and the server has one dual core CPU, partitioning the table is > probably a good idea. The idea of 1 CPU for 1 partition can be shown by the following example:
Suppose we have a complex query to select 10000 rows that use total 580s to return result (with 315s CPU time and 332s sort time). When we turn on the intra-parallel option and use 2 CPU to process the query, the total time becomes 533s (with 508s CPU times and 493s sort time [sum of 2 cpu] ). Although the 2 CPU are fully utilize in intra- parallel partition and the elapsed time is shorter, the efficiency is not as good as 1 CPU for 1 partition.
Assume there is 30% overhead for using multiple partitions, the elapsed time become ( 580/2 ) * 1.3 = 377sec because each partition handles 5000 rows only.
Do you agree with this example?
Mark A - 30 Mar 2007 22:14 GMT > The idea of 1 CPU for 1 partition can be shown by the following > example: [quoted text clipped - 12 lines] > > Do you agree with this example? I don't understanding what you are saying.
Just because you turn on intra-partition parallelism (not needing DPF) does not mean that the query is actually running in parallel mode. And intra-partition parallelism is significantly different than inter-partition parallelism (DPF) where a single table is physically partitioned (within a single server or across servers). Using DPF to partition and with a query that does a table scan, the total elapsed time will be reasonably close to 50% of a non-partitioned table (unless it is a very fast query).
With DPF, parallel overhead is not anywhere near 30%, although the overhead can be noticed if all the partitions are on the same physical node and you don't have enough bandwidth on your disk sub-system to run both partitions at full speed on a table scan. Preferably, each partition should have its own disk controller, and each partition must have its own disks to scale in a linear (or near linear) fashion on a large table scan.
Woody Ling - 31 Mar 2007 05:51 GMT > > The idea of 1 CPU for 1 partition can be shown by the following > > example: [quoted text clipped - 31 lines] > > - Show quoted text - Yes, I agree with you so that the elasped time for handling the complex query is:
a. 1 CPU without DPF and intra-parallel OFF is 580s (with 315s CPU time and 332s sort time). {figure from IBM} b. 2 CPU without DPF and intra-parallel ON is 533s (with 508s CPU times and 493s sort time [sum of 2 cpu] ). {figure from IBM} c. 2 CPU with DPF and intra-parallel OFF is 377s (each partition handles 50% workload and assume 30% overhead for cross join, combining result etc).
30% overhead for DPF is an assumption. I just want to show you that even there is overhead for using DPF, it is still much better than using intra-parallel with same number of CPU.
Think about if we have 4 CPU, which configuration is better? a. 2 CPU for 1 partition and create total 2 db partitions with intra- parallel ON b. 1 CPU for 1 partition and create total 4 db partitions with intra- paralle OFF
Mark A - 31 Mar 2007 14:28 GMT > Yes, I agree with you so that the elasped time for handling the > complex query is: [quoted text clipped - 16 lines] > b. 1 CPU for 1 partition and create total 4 db partitions with intra- > paralle OFF You are still confused. Let me repeat:
Just because you turn on intra-partition parallelism (without DPF) does not mean that the query is actually running in parallel mode.
Woody Ling - 31 Mar 2007 18:45 GMT > > Yes, I agree with you so that the elasped time for handling the > > complex query is: [quoted text clipped - 23 lines] > > - Show quoted text - I totally agree with you again.
>From the example, it already showed that the overhead of using intra- partition parallelism is very large. If intra-parallel makes the query running in parallel mode, the elasped time should be much shorter (may be 1/2 of original). In fact, it is not (533s vs 580s). So if I have 2 CPU, I will create 2 db partitions and assign 1 CPU for each partition and turn off intra-parallel to fully utilize all CPU resource.
Please correct me if I have a wrong concept.
Ian - 02 Apr 2007 08:00 GMT > Yes, I agree with you so that the elasped time for handling the > complex query is: [quoted text clipped - 10 lines] > even there is overhead for using DPF, it is still much better than > using intra-parallel with same number of CPU. Making an assumption of 30% overhead based on a single result is more than a little misleading.
> Think about if we have 4 CPU, which configuration is better? > a. 2 CPU for 1 partition and create total 2 db partitions with intra- > parallel ON > b. 1 CPU for 1 partition and create total 4 db partitions with intra- > paralle OFF The decision of whether to use 1 CPU / partition or 2 CPUs/partition is totally dependent on your workload. Certain workloads will benefit from having more CPU resources available per partition. Many don't, and the benefit of having more (smaller) partitions is better. Best practices today start with 1 CPU / partition.
Knut Stolze - 02 Apr 2007 10:57 GMT >> Think about if we have 4 CPU, which configuration is better? >> a. 2 CPU for 1 partition and create total 2 db partitions with intra- [quoted text clipped - 7 lines] > benefit of having more (smaller) partitions is better. Best practices > today start with 1 CPU / partition. Actually, there is a misconception in that one could assign a CPU to a specific logical partition in DPF. Instead, all logical partitions on the same machine share all CPUs and compete for CPU resources.
 Signature Knut Stolze DB2 z/OS Utilities Development IBM Germany
Liam Finnie - 02 Apr 2007 13:19 GMT > >> Think about if we have 4 CPU, which configuration is better? > >> a. 2 CPU for 1 partition and create total 2 db partitions with intra- [quoted text clipped - 16 lines] > DB2 z/OS Utilities Development > IBM Germany If you're on DB2 9, you can actually "bind" a partition to a resource set, by specifying the resource set name in your db2nodes.cfg file. On AIX, a resource set is composed of one or more processors, along with some amount of RAM. You can configure your own resource sets as needed, or you can use any predefined resource sets. This is particularly useful for NUMA machines (not sure if 595's are NUMA-like or not offhand), but you should also be able to use this on non-NUMA machines to enable CPU binding. This will probably require a bit of fiddling around to get right though - I think there are some kernel tuneables that need to be configured, and some privileges you need to grant the instance owning ID - search for "DB2 node configuration file" in the online books for more details.
Cheers, Liam.
Woody Ling - 02 Apr 2007 18:47 GMT > > >> Think about if we have 4 CPU, which configuration is better? > > >> a. 2 CPU for 1 partition and create total 2 db partitions with intra- [quoted text clipped - 34 lines] > > - Show quoted text - Thanks. I found the manual of db2 8.2 about how to define resource set on AIX. URL here:
http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.db2. udb.doc/start/r0006351.htm
But I have no idea if the agents and subagents processes are also "bind" by this setting too. May be they are shared by all CPU. On the other hand, I am afraid I cannot assign one logical CPU of dual core processor to one node by using resource set which is designed for physical resource only.
I also agree that we cannot make conclusion base on one simple example. Although I can make use of DPF, I should also increase memory and I/O to keep linear scale.
However, I have another question. Should I turn on intra-parallel if I assign 2 CPU for 1 node?
Liam Finnie - 03 Apr 2007 13:55 GMT > > > >> Think about if we have 4 CPU, which configuration is better? > > > >> a. 2 CPU for 1 partition and create total 2 db partitions with intra- [quoted text clipped - 52 lines] > However, I have another question. Should I turn on intra-parallel if I > assign 2 CPU for 1 node? Hi Woody,
If you have 2 dual-core chips, then you have 4 physical CPUs, and you should be able to assign single CPUs to resource sets. If these 4 CPUs have SMT enabled (8 CPU "threads"), then you can't assign each CPU thread independently.
The way the binding works is that the initial system controller process is bound to the resource set, and each process that is forked from that system controller will inherit the same binding, so will use the same CPU(s).
I'll leave that intra-parallel question for someone else :-) My naive approach would be that if you have multiple concurrent applications, it's not as important to enable intra-parallel, than if you have only a single application running at a time.
Cheers, Liam.
Ian - 06 Apr 2007 17:12 GMT > However, I have another question. Should I turn on intra-parallel if I > assign 2 CPU for 1 node? There is no rule for this decision. It is similar to the decision on whether you go with a ratio of 1 or 2 CPUs per database partition. And as I said earlier, this is very dependent on your workload.
Fortunately, turning INTRA_PARALLEL on or off is a lot easier than changing the number of partitions in your database, so it's easier to evaluate. ;-)
Ian - 06 Apr 2007 17:07 GMT > Actually, there is a misconception in that one could assign a CPU to a > specific logical partition in DPF. Instead, all logical partitions on the > same machine share all CPUs and compete for CPU resources. Sorry if I wasn't clear. I was just talking about the ratio of CPU / database partitions, not trying to imply that a CPU is dedicated.
|
|
|