Maybe someone has an idea for this:
table "customer_article" contains prices for every article by customer
and has about 400.000 now.
customer char(5)
article char(5)
price double
There is a table "customer_article_bak" with the same structure but
contains the situation before the data update.
Now I like a report like "Show me all new articles but only if the
customer exists in the old table" (means: only existing customers get a
hint which article are new to them).
My first approach is doing something, but after 1 hour I stopped the query.
select new.* from customer_article as new
full outer join customer_article_bak as old
on new.customer=old.customer
where old.article IS null
Any ideas how to speed this up?
Bernd

Signature
In den USA war die letzte Zigarette, die man dem zum Tode Verurteilten
gewährte, eine Tradition, ein angestammtes Recht, bis vor kurzem der
Gouverneur von Alabama diese verbot - und zwar aus gesundheitlichen Gründen.
richard_mccutcheon@hcm.honda.com - 09 Jan 2006 20:22 GMT
How does it behave if you use an inner join instead of an outer join
(your WHERE clause will turn the outer join into an inner join
anyway.......) ?
Also, the classic questions are:
1. what indexes do you have?
2. have you run reorg and runstats lately?
3. are you table scanning?
Regards,
Richard M.
Bernd Hohmann - 09 Jan 2006 22:20 GMT
> How does it behave if you use an inner join instead of an outer join
0 Records found
> Also, the classic questions are:
>
> 1. what indexes do you have?
PRIMARY KEY ("ARTICLE","CUSTOMER") on both tables
> 2. have you run reorg and runstats lately?
db2 reorg table cmm.customer_article
db2 runstats on table cmm.customer_article with distribution and
detailed indexes all shrlevel change
Yes...
> 3. are you table scanning?
If you like to flagellate yourself:
http://www.harddiskcafe.de/db2/sql.txt has the original table
definition, the query and the "optimized" query.
http://www.harddiskcafe.de/db2/sql.jpg is the explain graph.
So if you ask for table scanning: yes, the beast does a lot of it <g>
Bernd
Art S. Kagel - 09 Jan 2006 21:38 GMT
> Maybe someone has an idea for this:
>
[quoted text clipped - 20 lines]
>
> Any ideas how to speed this up?
How about:
select new.*
from customer_article as new
where exists (
select 1
from customer_article_bak as old
where old.customer = new.customer
);
OR even better:
select new.*
from customer_article as new
where customer in (
select unique old.customer
from customer_article_bak as old
);
Art S. Kagel
Bernd Hohmann - 09 Jan 2006 22:05 GMT
>> Any ideas how to speed this up?
>
[quoted text clipped - 7 lines]
> where old.customer = new.customer
> );
Works great - it pumps zillions of rows on the monitor but doesn't
answer the question "which article is in customer_article but not in
customer_article_bak" :)
Bernd
Art S. Kagel - 10 Jan 2006 02:32 GMT
>>> Any ideas how to speed this up?
>>
[quoted text clipped - 11 lines]
> answer the question "which article is in customer_article but not in
> customer_article_bak" :)
Oh, but your original post says:
"Show me all new articles but only if the customer exists in the old table"
Misinterpreted 'new' to mean articles in the 'new' table. In that case:
select new.*
from customer_article as new
where exists (
select 1
from customer_article_bak as old
where old.customer = new.customer
) and not exists (
select 1
from customer_article_bak as old
where old.customer = new.customer
and old.article = new.article
);
Art S. Kagel
Phil Sherman - 10 Jan 2006 13:39 GMT
The first "exists" clause is potentially an issue because there is no
easy way to verify that a specific customer exists. This should cause
either a scan and sort of both tables (possibly one table and 1 index)
or multiple scans of customer_article_bak.
Try adding an index on customer_article_bak(customer).
Why do you want to notify the customers only when there are new items?
From an application view, what do you do when an item is eliminated or
the price is changed?
If your application accesses the data by customer, then you may want to
add an index customer_article(customer,article)
If there are really only three columns in the table, then you can carry
the price as additional data in the index. This increates the size of
the index, will slow updates to price, and will support index only
access to the data, improving read performance.
Phil Sherman
>>>> Any ideas how to speed this up?
>>>
[quoted text clipped - 31 lines]
>
> Art S. Kagel
Bernd Hohmann - 10 Jan 2006 14:16 GMT
> Try adding an index on customer_article_bak(customer).
This index exists [customer,article] so the index is ordered by customer
first.
> Why do you want to notify the customers only when there are new items?
> From an application view, what do you do when an item is eliminated or
> the price is changed?
In reality the table contains prices, rebaites, conditions for volume
sales and so on. Items are eliminated 2 years after the article is
withdrawn, after this time nobody needs to know about this (decision
made by my customer).
Bernd
Bernd Hohmann - 10 Jan 2006 14:19 GMT
[...]
> Misinterpreted 'new' to mean articles in the 'new' table. In that case:
After 1 hour runtime without any result I stopped the query.
Any other ideas to speedup this stuff?
A very primitive variant will be: exporting both tables ordered by
customer,article in two sequential files and comparing them by reading
line for line - which will take some minutes but not hours like DB2 needs.
Serge, Knut?
Bernd

Signature
In den USA war die letzte Zigarette, die man dem zum Tode Verurteilten
gewährte, eine Tradition, ein angestammtes Recht, bis vor kurzem der
Gouverneur von Alabama diese verbot - und zwar aus gesundheitlichen Gründen.
--CELKO-- - 10 Jan 2006 16:54 GMT
>> There is a table "customer_article_bak" with the same structure but
contains the situation before the data update. <<
Your design flaw is called "attribute splitting"; you have the same
entity in more than one table, where the tables are the values of some
attribute -- this looks like purchase date. You also cannot use
floating point numbers for money, you have no keys, etc, You are
mimicking a tape file, not an RDBMS.
Please post real DDL in the future; my guess is that youy meant:
CREATE TABLE CustomerArticles
(customer_id CHAR(5) NOT NULL,
article_id CHAR(5) NOT NULL,
purchase_date DATE DEFAULT CURRENT_TIMESTAMP NOT NULL,
article_price DECIMAL (7,2) NOT NULL
CHECK (article_price > 0.00),
PRIMARY KEY(customer_id, article_id, purchase_date));
>> Now I like a report like "Show me all new articles but only if the customer exists in the old table" (means: only existing customers get a hint which article are new to them). <<
SELECT C1.customer_id, C1.article_id AS new_article
FROM CustomerArticles AS C1
WHERE NOT EXISTS
(SELECT *
FROM CustomerArticles AS C2
WHERE C1.customer_id = C2.customer_id
AND C1.article_id = C2.article_id
AND C1.purchase_date > C2.purchase_date);