Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / DB2 Topics / September 2008

Tip: Looking for answers? Try searching our database.

Include all columns in an unique index

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Henry J. - 28 Sep 2008 03:54 GMT
I have a big table (20M rows) whose primary key is composed of 2
columns.  The other 10 columns are all double values.  The queries on
the table are mostly aggregating on the value columns.

In order to improve performance, is it a good idea to add all the 10
value columns to the "include" part of the primary key?

I guess the disk usage of the table will double as the value columns
are duplicated in the index.  Is this right?  Would the select
performance be improved significantly?

Thanks!
Mark A - 28 Sep 2008 07:53 GMT
>I have a big table (20M rows) whose primary key is composed of 2
> columns.  The other 10 columns are all double values.  The queries on
[quoted text clipped - 8 lines]
>
> Thanks!

1. What is the ratio of selects, versus inserts/updates/deletes.
2. How may rows are typically accessed in each select statement (assuming we
are talking about the selects where you want to see all the columns).
3. How large is your bufferpool(s)
Henry J. - 29 Sep 2008 03:52 GMT
> >I have a big table (20M rows) whose primary key is composed of 2
> > columns.  The other 10 columns are all double values.  The queries on
[quoted text clipped - 13 lines]
> are talking about the selects where you want to see all the columns).
> 3. How large is your bufferpool(s)

During each batch run, the table is truncated and then populated, and
finally queried (by aggregation type of queries).  There is no update
or delete.

The queries can be queried by several jobs each accessing different
number of rows (varied from 10% to 80%).

Not sure about the bufferpool.

To benefit from putting all columns into "include", what should I ask
our DBAs to do?  Increate table space? Bufferpool?  Please advise.

Thanks!
Mark A - 29 Sep 2008 03:58 GMT
> To benefit from putting all columns into "include", what should I ask
> our DBAs to do?  Increate table space? Bufferpool?  Please advise.
>
>Thanks!

It probably will help your select performance, but it will take up more disk
space (which I assume you are OK with).
Henry J. - 29 Sep 2008 04:10 GMT
> > To benefit from putting all columns into "include", what should I ask
> > our DBAs to do?  Increate table space? Bufferpool?  Please advise.
[quoted text clipped - 3 lines]
> It probably will help your select performance, but it will take up more disk
> space (which I assume you are OK with).

Thanks.  I guess the insertion will be slower, right?  Let me test it
out.
Henry J. - 29 Sep 2008 04:12 GMT
> > To benefit from putting all columns into "include", what should I ask
> > our DBAs to do?  Increate table space? Bufferpool?  Please advise.
[quoted text clipped - 3 lines]
> It probably will help your select performance, but it will take up more disk
> space (which I assume you are OK with).

Thanks.  I guess insertion will be slower, right?
w.l.fischer@googlemail.com - 28 Sep 2008 19:16 GMT
> I have a big table (20M rows) whose primary key is composed of 2
> columns.  The other 10 columns are all double values.  The queries on
> the table are mostly aggregating on the value columns.
>
> In order to improve performance, is it a good idea to add all the 10
> value columns to the "include" part of the primary key?

I tried it on a ~100M rows table and it significantly improved
performance (factor 3...10, don't remember)

> I guess the disk usage of the table will double as the value columns
> are duplicated in the index.  Is this right?  Would the select
> performance be improved significantly?

Disk usage will increase significantly, obviously.
Henry J. - 29 Sep 2008 04:10 GMT
On Sep 28, 2:16 pm, w.l.fisc...@googlemail.com wrote:

> > I have a big table (20M rows) whose primary key is composed of 2
> > columns.  The other 10 columns are all double values.  The queries on
[quoted text clipped - 11 lines]
>
> Disk usage will increase significantly, obviously.

Thanks.
DA Morgan - 29 Sep 2008 07:24 GMT
>> I have a big table (20M rows) whose primary key is composed of 2
>> columns.  The other 10 columns are all double values.  The queries on
[quoted text clipped - 11 lines]
>
> Disk usage will increase significantly, obviously.

Doesn't DB2 have a table or index type comparable to Oracle's
Index Organized Table or SQL Server's Cluster Index where that
would not be the case?
Signature

Daniel A. Morgan
University of Washington
damorgan@x.washington.edu (replace x with u to respond)

Mark A - 29 Sep 2008 08:13 GMT
> Doesn't DB2 have a table or index type comparable to Oracle's
> Index Organized Table or SQL Server's Cluster Index where that
> would not be the case?

DB2 has index clustering which can specify the order of the table and
multi-dimensional clustering (MDC), but that affects the physical ordering
of the data, and not whether it exists in the index and the table. In DB2,
the table always has all the columns even if all the columns are also in one
of the indexes, and even if clustering is used. In DB2 clustering (ordering
of the table data) is usually implemented of a subset of the columns, as
defined in one of the indexes, and not on the entire the table (usually).
And in DB2 clustering is not exact, so insert performance is much better
than index ordered tables because there are no page splits and no
rearranging of pages (when a clustered index is defined, DB2 looks for the
best page to place a data row, not necessary the exact sequence in the data
page, and if there is no room on the correct page, then a nearby page, or
eventually any page is chosen for the insert--but MDC is different and bit
more complicated).

By "including" the extra non-unique columns in the unique index (and not
affecting the original uniqueness of the index based on the PK) he will get
index-only access on his selects. The table will not need to be accessed for
selects.

In this particular situation, it will require about 1.6 GB of extra disk
space.
DA Morgan - 29 Sep 2008 13:30 GMT
> In this particular situation, it will require about 1.6 GB of extra disk
> space.

Thank you.

Given the cost of disk these days not something I would expect anyone
to get excited about.
Signature

Daniel A. Morgan
University of Washington
damorgan@x.washington.edu (replace x with u to respond)

Henry J. - 29 Sep 2008 23:35 GMT
> > In this particular situation, it will require about 1.6 GB of extra disk
> > space.
[quoted text clipped - 7 lines]
> University of Washington
> damor...@x.washington.edu (replace x with u to respond)

So the cost of disk is increasing these days?  I thought it should
drop over time.
Mark A - 30 Sep 2008 00:02 GMT
> So the cost of disk is increasing these days?  I thought it should
> drop over time.

He said it is not enough space to get excited about (not enough to worry
about).
Serge Rielau - 30 Sep 2008 11:51 GMT
> Doesn't DB2 have a table or index type comparable to Oracle's
> Index Organized Table or SQL Server's Cluster Index where that
> would not be the case?
I do not know SQL Server's Cluster Index, can't comment there.
DB2 does not have an "index only table".
There are two table types that have "some" overlap:
* Range Clustered Tables (RCT). These are essentially arrays on disk.
  Typically used for for static look up tables
* Multi dimensional clustering. (MDC)
  Organizing the data in an n-dimensional cube.
  The individual blcoks are indexed instead of the rows and located by
  intersecting (or or-ing) the dimensions. As a result block indexes are
  tiny with one entry potentially covering thousands of rows.

So in a way both these approaches are "index-less" tables. The end
result is the same: Fast access without duplication of data.

Cheers
Serge

Signature

Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2010 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.