Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / DB2 Topics / October 2008

Tip: Looking for answers? Try searching our database.

outer join table filtering

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Henry J. - 02 Oct 2008 16:47 GMT
Is it true that the following query:

   SELECT emp.empno, emp.lastname, dept.deptname
    FROM emp LEFT OUTER JOIN dept
         ON emp.workdept = dept.deptno
     WHERE emp.salary > 50000.00;

will run faster in general if rewritten as:

 SELECT emp.empno, emp.lastname, dept.deptname
  FROM  (SELECT empno, lastname FROM emp WHERE salary > 50000.00) as
e
     LEFT OUTER JOIN dept
       ON emp.workdept = dept.deptno

I thought the where clause in the 1st query would filter out rows in
emp before joining with dept, just like the 2nd query does?  How about
this:

    SELECT emp.empno, emp.lastname, dept.deptname
    FROM emp LEFT OUTER JOIN dept
         ON emp.salary > 50000.00 and emp.workdept = dept.deptno

Thanks.
--CELKO-- - 03 Oct 2008 14:00 GMT
>> I thought the where clause in the 1st query would filter out rows in  emp [sic: unless the set has only one member, use collective or plural names for tables] before joining with dept, just like the 2nd query does? <<

NO, the WHERE clause is done after the FROM cluase has constructed a
working table.  Here is how OUTER JOINs work in SQL-92.  Assume you
are given:

Table1       Table2
a   b        a   c
======       ======
1   w        1   r
2   x        2   s
3   y        3   t
4   z

and the outer join expression:

Table1
LEFT OUTER JOIN
Table2
ON Table1.a = Table2.a      <== join condition
   AND Table2.c = 't';      <== single table condition

We call Table1 the "preserved table" and Table2 the "unpreserved
table" in the query.  What I am going to give you is a little
different, but equivalent to the ANSI/ISO standards.

1) We build the CROSS JOIN of the two tables.  Scan each row in the
result set.

2) If the predicate tests TRUE for that row, then you keep it. You
also remove all rows derived from it from the CROSS JOIN

3) If the predicate tests FALSE or UNKNOWN for that row, then keep the
columns from the preserved table, convert all the columns from the
unpreserved table to NULLs and remove the duplicates.

So let us execute this by hand:

Let @ = passed the first predicate
Let * = passed the second predicate

Table1 CROSS JOIN Table2
a   b        a   c
=========================
1   w       1   r @
1   w       2   s
1   w       3   t *
2   x       1   r
2   x       2   s @
2   x       3   t *
3   y       1   r
3   y       2   s
3   y       3   t @* <== the TRUE set
4   z       1   r
4   z       2   s
4   z       3   t *

Table1 LEFT OUTER JOIN Table2
a   b        a   c
=========================
3   y     3      t      <= only TRUE row
-----------------------
1   w     NULL   NULL   Sets of duplicates
1   w     NULL   NULL
1   w     NULL   NULL
-----------------------
2   x     NULL   NULL
2   x     NULL   NULL
2   x     NULL   NULL
3   y     NULL   NULL  <== derived from the TRUE set - Remove
3   y     NULL   NULL
-----------------------
4   z     NULL   NULL
4   z     NULL   NULL
4   z     NULL   NULL

the final results:

Table1 LEFT OUTER JOIN Table2
a   b        a   c
=========================
1   w     NULL   NULL
2   x     NULL   NULL
3   y     3      t
4   z     NULL   NULL

The basic rule is that every row in the preserved table is represented
in the results in at least one result row.

There are limitations and very serious problems with the extended
equality version of an outer join used in some diseased mutant
products.  Consider the two Chris Date tables

Suppliers        SupParts
supno             supno partno qty
=========        ==============
S1               S1   P1    100
S2               S1   P2    250
S3               S2   P1    100
                 S2   P2    250

and let's do an extended equality outer join like this:

SELECT *
 FROM Supplier, SupParts
WHERE Supplier.supno *= SupParts.supno
  AND qty < 200;

If I do the outer first, I get:

Suppliers LOJ SupParts
supno supno partno qty
=======================
S1     S1   P1    100
S1     S1   P2    250
S2     S2   P1    100
S2     S2   P2    250
S3   NULL  NULL   NULL

Then I apply the (qty < 200) predicate and get

Suppliers LOJ SupParts
supno supno partno qty
===================
S1   S1   P1    100
S2   S2   P1    100

Doing it in the opposite order

Suppliers LOJ SupParts
supno supno partno qty
===================
S1   S1   P1    100
S2   S2   P1    100
S3   NULL NULL  NULL

Sybase does it one way, Oracle does it the other and Centura (nee
Gupta) lets you pick which one -- the worst of both non-standard
worlds!  In SQL-92, you have a choice and can force the order of
execution. Either do the predicates after the join ...

SELECT *
  FROM Supplier
       LEFT OUTER JOIN
       SupParts
       ON Supplier.supno = SupParts.supno
WHERE qty < 200;

... or do it in the joining:

SELECT *
 FROM Supplier
      LEFT OUTER JOIN
      SupParts
      ON Supplier.supno = SupParts.supno
         AND qty < 200;

Another problem is that you cannot show the same table as preserved
and unpreserved in the extended equality version, but it is easy in
SQL-92.  For example to find the students who have taken Math 101 and
might have taken Math 102:

SELECT C1.student, C1.math, C2.math
 FROM (SELECT * FROM Courses WHERE math = 101) AS C1
      LEFT OUTER JOIN
      (SELECT * FROM Courses WHERE math = 102) AS C2
      ON C1.student = C2.student;
ChrisC - 03 Oct 2008 19:07 GMT
> Is it true that the following query:
>
>     SELECT emp.empno, emp.lastname, dept.deptname
>      FROM emp LEFT OUTER JOIN dept
>           ON emp.workdept = dept.deptno
>       WHERE emp.salary > 50000.00;

CELKO has a good discussion about the logic behind what is going on.
However, if you look at the plan that DB2 is using to generate this,
you will find that it takes several shortcuts to speed up processing.
DB2 will look at the statistics for the tables behind the query and
decide what to do based on those statistics.  I suspect since you
imply that your first rewrite of the query was faster than the second
one that DB2 might have decided that joining the two tables up would
filter out more rows than applying the "emp.salary > 50000.00"
condition would, and so it joined them up first and then filtered.  If
the statisics were different, DB2 may decide to filter first and then
join.  It may be that with better statistics with distribution turned
on that DB2 might run this query better for you.

In reply to CELKO, since this criteria is on the OUTER part of the
join, it is safe to apply the filter before the join - there is no
chance of getting extra rows this way.  If the criteria was on the
inner part of the join, then yes, bad results would happen.

-Chris
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.