Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / DB2 Topics / June 2006

Tip: Looking for answers? Try searching our database.

Star Joins

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
gwise@pressganey.com - 05 Jun 2006 21:44 GMT
I've got a database with a star schema and I'm kind of confused about
what the optimizer is doing with my queries.  I've got a fact table
(about 24 M rows), which I'm joining to 3 dimensions.  Each dimension
ID in the fact has it's own index.

No matter what I do, I've been unable to get the optimizer to choose a
star join (IXAND or the like).  It always just picks one of the
dimensions and does a nested loop join and then HSJOINs in the rest.
To be clear the queries I've tried are like this:

select a.field, b.field, count(*)
from fact f, dim1 a, dim2 b, dim 3 c
where
f.dim1_key = a.dim1_key and
f.dim2_key = b.dim2_key and
f.dim3_key = c.dim3_key and
a.field like 'Data%' and
b.field like 'Data%' and
c.field like 'Data%'
group by a.field, b.field

The really odd-ball thing is that I've got another dimension, which has
absolutely no selectivity -- i.e. All the items in the fact are in a
single dimension.  (I know this seems stupid, but it's a test
database).  If I join in this dimension (in queries similar to the one
above), in many of my queries, the optimizer will actually choose this
dimension to drive the NLJOIN with the fact.  How it could ever use
that dimension over anything else is beyond me as the result is
essentially a full table scan.

So, what I'm wondering is 2 things:

1)  Can anyone point me to a good reference about how star joins work
or offer any advice regarding gettting the optimizer to use a star
join?
2)  Does anyone have any idea why the optimizer would choose worst
possible index under any circumstances?

Oh, by the way, the stats are all up-to-date.

Thanks,
Greig Wise
DB2 DBA
Press Ganey Associates
gwise@pressganey.com - 06 Jun 2006 04:51 GMT
I'm following up my own message here.  I've learned more.  In regards
to the problem where the optimizer selects the dimension with only one
matching pair in the fact table, I think I understand what's going on
there.  Although there's only one value for this particular dimension
in the fact table, there are actually 49 other records in the dimension
for which no facts exist.  It seems that that the optimizer is assuming
2% selectivity since I'm selecting one of 50 records in the dimension
(even though the fact only has data for the one single value of that
dimension that I'm specifying in the where clause of the query).

To prove this is true to myself, I removed the other 49 records from
the dimension, reran stats and re-explained the query and indeed it no
longer used this index to drive the NLJOIN.

It seems strange that the optimizer would assume a totally even
distribution of a dimension within a fact like that.  Is that what it's
really doing or am I missing something?  It seems pretty odd,
especially since I've run stats with distrubution I'd think the
optimizer would be able to tell that the dimension wasn't evenly
distributed in the fact table.

Any information would be helpful.

Thanks,
Greig Wise

> I've got a database with a star schema and I'm kind of confused about
> what the optimizer is doing with my queries.  I've got a fact table
[quoted text clipped - 40 lines]
> DB2 DBA
> Press Ganey Associates
Serge Rielau - 06 Jun 2006 08:18 GMT
> I'm following up my own message here.  I've learned more.  In regards
> to the problem where the optimizer selects the dimension with only one
[quoted text clipped - 61 lines]
>>
>> Oh, by the way, the stats are all up-to-date.
.. but which stats did you run?

Signature

Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

IOD Conference
http://www.ibm.com/software/data/ondemandbusiness/conf2006/

gwise@pressganey.com - 07 Jun 2006 04:32 GMT
I did runstats on table with distribution and detailed indexes all for
every table in the database.

Greig

> >> Oh, by the way, the stats are all up-to-date.
> .. but which stats did you run?
[quoted text clipped - 6 lines]
> IOD Conference
> http://www.ibm.com/software/data/ondemandbusiness/conf2006/
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.