
Signature
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
Thanks for the quick response guys.. I had cross posted it due to the
fact that I had a seen a previous posting on
database.pick regarding attributes..
I am in fact using DB2. I didn't fully understand tony's explanation
regarding the MV values. but the relational
part doesn't seem to really work well since in my example I listed two
attributes types.. if the attributes are
going to dynamically be added and removed change in nature.. I think
it will be incredibly hard to group
combinations of them to store the item ids in deliminated fields.
which is why I was hopeing a EAV (not EVA as I had previously
mentioned) schema would be useful.
In regards to serge's response.. I definately found the pivot example
a step in the right direction. I have reduced
my thousand selects to one for a count and one for retrieving the
data.. However it is proving to still be very slow
when executing.
I'll briefly describe the setup and the sql queries.. please tell me
if there is anything I could do to optimise
this.. or is it just unavoidable with this size table.
Attribute table:
property_id (BIGINT), ATTRIBUTE_ID (BIGINT), ATTRIBUTE_VALUE (BIGINT)
Index1:
property_id,ATTRIBUTE_ID,ATTRIBUTE_VALUE (as primary)
Index2:
ATTRIBUTE_ID,ATTRIBUTE_VALUE
Dimension:
ATTRIBUTE_ID
I filled the table with 1,000,000 sample entries each one with 10
attribute definitions yeilding 10million rows.
I think one of the the biggest issues here is disk performance..
executing the count query takes a few minutes to complete.
I have a feeling it might go faster if it didn't have to select ALL
the property_ids and THEN then filter them.. but I couldn't find a way
to reference the CASE columns by alias. so I had to use a derived
table to capture all of them
then select ids from it where the values are what I need them to be!
The general idea is to get a count for each individual attribute value
NOT selected after having satisfied the
selected attributes. So the derived table sql counts the instances of
the attributes then returns a 1 if they
existed then the select is executed on that table with a where clause
ensuring all returned fields are = 1.
Any assistance would be greatly appriciated once again. I'm starting
to think I might have to scrap what I have made
and the thought is getting to me since the rest of the app is complete
SQL STATEMENTS:
////////////////////////////////////////////////////////////////////////
SELECT SQL:
////////////////////////////////////////////////////////////////////////
select
count (CASE WHEN attribute_id = 4 AND attribute_value = 46
THEN attribute_value ELSE NULL END) as
property_position_count,
count (CASE WHEN attribute_id = 2 AND attribute_value = 7
THEN attribute_value ELSE NULL END) as
property_base_type_count,
count (CASE WHEN attribute_id = 4 AND attribute_value >10
THEN attribute_value ELSE NULL END) as
property_price_rent_count,
count (CASE WHEN attribute_id = 8 AND attribute_value = 22
THEN attribute_value ELSE NULL END) as turnkey_count
from test2 INNER JOIN
table(
select property_id from
table(
select property_id,
COUNT(CASE WHEN test2.attribute_id = 4 AND test2.attribute_value =
46 THEN 1 ELSE NULL END) as property_characteristics,
COUNT(CASE WHEN test2.attribute_id = 5 AND attribute_value >1000
THEN 1 ELSE NULL END) as property_price_rent
from test2 group by property_id
) table_of_items_that_satisfy_filter
where table_of_items_that_satisfy_filter.property_price_rent=1
and
table_of_items_that_satisfy_filter.property_characteristics=1
) B
ON test2.property_id = B.property_id
////////////////////////////////////////////////////////////////////////
FETCH DATA SQL:
////////////////////////////////////////////////////////////////////////
select *
from
table(
select property_id from table(
select property_id,
COUNT(CASE WHEN ATTRIBUTE_ID = 5 AND ATTRIBUTE_VALUE = 4 THEN 1 ELSE
NULL END) as property_characteristics,
COUNT(CASE WHEN ATTRIBUTE_ID = 7 AND ATTRIBUTE_VALUE >344633 THEN 1
ELSE NULL END) as property_price_rent
from test2 group by property_id
) B
where B.property_price_rent=1 and B.property_characteristics=1
) filtered_table
INNER JOIN places ON places.places_id = filtered_table.property_id
Thanks again,
Vasili
Serge Rielau - 04 Dec 2007 12:35 GMT
Vasili,
I'm dropping PICK from the posts..
Which version of DB2 are you on? DB2 for LUW added some query rewrite
improvements very recently.
Cheers
Serge
PS: Have you considered XML? Schema chaos is XML turf.

Signature
Serge Rielau
DB2 Solutions Development
IBM Toronto Lab
vasilip - 04 Dec 2007 18:05 GMT
I'm using DB2 9.1 (windows) the express-c version currently till this
project pics up and I need to scale.. I plan on moving to linux but
once I saw what a pain it was to convert some of the tables
(especially the ones containing spatial data) I decided to postpone
the switch till I have completed the project and these priority issues
are out of the way.
my original method was using xml where each entity had an xml field
that contained
<Attributes>
<Attribute id='5' value='6'>
...
</Attributes>
I was querying the table using XQueries and XMLExists caluses.. this
proved to be quite slow even though I had indexed //Attribute[@id] and
[@value]
I also tried creating a VIEW (based on the xml fields) which was in
the form of ID, Attribute_ID, and Attribute_Value which also performed
slowly (for a couple thousand records..) this view is what prompted me
to scrap that idea and go for a relational table with the same 3
fields, which brought me to my original question that you helped me
with (the self join vs pivot table issue)
The pivot table method was working pretty well when the number of
items was under 100,000 or so.. but once the table got much bigger all
I hear is hard disk thrashing with about 20% cpu usage for a few
minutes.
I can I can export the table for you and include my SQL queries
together with the Explain results for the queries if it would be of
any help.
thanks again,
Vasili
> Vasili,
>
[quoted text clipped - 12 lines]
> DB2 Solutions Development
> IBM Toronto Lab
ChrisC - 05 Dec 2007 23:30 GMT
Hi.
...
> Dimension:
> ATTRIBUTE_ID
What do you mean by 'Dimension'? Is this an MDC dimension by any
chance?
Also, looking at your SQL statements, if you include some values in
the WHERE clause this might perform much faster (depending, of course,
on your data). In your query you only reference 4 total attribute_id
values, and if you filter by those, it should work better:
select
count (CASE WHEN attribute_id = 4 AND attribute_value = 46
THEN attribute_value ELSE NULL END) as
property_position_count,
count (CASE WHEN attribute_id = 2 AND attribute_value = 7
THEN attribute_value ELSE NULL END) as
property_base_type_count,
count (CASE WHEN attribute_id = 4 AND attribute_value >10
THEN attribute_value ELSE NULL END) as
property_price_rent_count,
count (CASE WHEN attribute_id = 8 AND attribute_value = 22
THEN attribute_value ELSE NULL END) as turnkey_count
from test2 INNER JOIN
table(
select property_id from
table(
select property_id,
COUNT(CASE WHEN test2.attribute_id = 4 AND test2.attribute_value =
46 THEN 1 ELSE NULL END) as property_characteristics,
COUNT(CASE WHEN test2.attribute_id = 5 AND attribute_value >1000
THEN 1 ELSE NULL END) as property_price_rent
from test2
/* first added where clause */
WHERE attribute_id IN (4, 5)
group by property_id
) table_of_items_that_satisfy_filter
where table_of_items_that_satisfy_filter.property_price_rent=1
and
table_of_items_that_satisfy_filter.property_characteristics=1
) B
ON test2.property_id = B.property_id
/* second added where clause */
WHERE attribute_id in (2, 4, 8)
Thanks,
Chris
vasilip - 06 Dec 2007 09:40 GMT
Hello!
Yup, its a MDC dimension.. After briefly reading what the purpose of
such dimensions were I had a feeling it might go faster if I created a
dimension based on attribute_id. not really sure if it did much .. but
I guess if the data is grouped by attribute ids it would be better
when filtering by them.
your suggestion on the where clause did help speed it up
significantly. I'm going to try adding the attribute values to the
where clause as well aside from the attribute ids!
thanks,
Vasili
> Hi.
>
[quoted text clipped - 47 lines]
> Thanks,
> Chris