Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / DB2 Topics / December 2007

Tip: Looking for answers? Try searching our database.

Many Attributes & Selection Drilldown Issue

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
vasilip - 01 Dec 2007 14:47 GMT
Does anyone have any recomendations for design of a database to be
used in a site such as mobile.de or funda.nl?

different product categories, with diferent attributes between them..
some shared some not.. and a variation of types.. multivalue, single
valued and range bracketed attributes.

Asside from the delema between a EVA and wide table approach.. which I
don't seem to be able to resolve.. since the only practical way would
be EVA which is very slow (assuming I want to store a lot of different
types of attributes and be able to add on the fly)...

my main question is:

take the following clip from mobile.de:

Make
VW (178062)
Mercedes-Benz (113969)
BMW (100948)
Opel (83552)
Audi (82859)

Fuel Type
Petrol (593881)
Diesel (387654)
-LPG (3989)

this is a sample of the "drill down" option you have once browsing
results.. there are obiviously a lot more listed... how does one get
around to selecting the count for each attribute value based on the
current "filter" of attribute selections whithought executing a single
SQL query for each one?.. with the EVA model.. not only do I end up
with 5-10 self joins for the search result.. but ends up being * 50 in
order to produce the count for the drill down attribute selection
assumign I'm listing only the most popular attributes. If you add the
cost of executing 50 individual queries from PHP the system is almost
too slow to even use.

Any thoughts on the issue would be much appriciated!

Vasili Papaconstantinou
Serge Rielau - 01 Dec 2007 17:28 GMT
Vasili,

I can only comment on SQL.
Presuming by EVA you mean a system where each row represents an
attribute of the form: (rowid, attrid, value)
Such systems are used for example in Websphere Commerce.
To get counts there is no need for self joins.
Take a look at my "SQL on Fire" presentation (easily found on google)
for a primer on pivoting.

Cheers
Serge
Signature

Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

vasilip - 03 Dec 2007 19:42 GMT
Thanks for the quick response guys.. I had cross posted it due to the
fact that I had a seen a previous posting on
database.pick regarding attributes..

I am in fact using DB2. I didn't fully understand tony's explanation
regarding the MV values. but the relational
part doesn't seem to really work well since in my example I listed two
attributes types.. if the attributes are
going to  dynamically be added and removed change in nature.. I think
it will be incredibly hard to group
combinations of them to store the item ids in deliminated fields.
which is why I was hopeing a EAV (not EVA as I had previously
mentioned) schema would be useful.

In regards to serge's response.. I definately found the pivot example
a step in the right direction. I have reduced
my thousand selects to one for a count and one for retrieving the
data.. However it is proving to still be very slow
when executing.

I'll briefly describe the setup and the sql queries.. please tell me
if there is anything I could do to optimise
this.. or is it just unavoidable with this size table.

Attribute table:
property_id (BIGINT), ATTRIBUTE_ID (BIGINT), ATTRIBUTE_VALUE (BIGINT)

Index1:
property_id,ATTRIBUTE_ID,ATTRIBUTE_VALUE (as primary)

Index2:
ATTRIBUTE_ID,ATTRIBUTE_VALUE

Dimension:
ATTRIBUTE_ID

I filled the table with 1,000,000 sample entries each one with 10
attribute definitions yeilding 10million rows.
I think one of the the biggest issues here is disk performance..
executing the count query takes a few minutes to complete.

I have a feeling it might go faster if it didn't have to select ALL
the property_ids and THEN then filter them.. but I couldn't find a way
to reference the CASE columns by alias. so I had to use a derived
table to capture all of them
then select ids from it where the values are what I need them to be!

The general idea is to get a count for each individual attribute value
NOT selected after having satisfied the
selected attributes. So the derived table sql counts the instances of
the attributes then returns a 1 if they
existed then the select is executed on that table with a where clause
ensuring all returned fields are = 1.

Any assistance would be greatly appriciated once again. I'm starting
to think I might have to scrap what I have made

and the thought is getting to me since the rest of the app is complete

SQL STATEMENTS:

////////////////////////////////////////////////////////////////////////
SELECT SQL:
////////////////////////////////////////////////////////////////////////

select

count (CASE WHEN attribute_id = 4 AND attribute_value = 46
          THEN attribute_value ELSE NULL END) as
property_position_count,
count (CASE WHEN attribute_id = 2 AND attribute_value = 7
          THEN attribute_value ELSE NULL END) as
property_base_type_count,
count (CASE WHEN attribute_id = 4 AND attribute_value >10
          THEN attribute_value ELSE NULL END) as
property_price_rent_count,
count (CASE WHEN attribute_id = 8 AND attribute_value = 22
          THEN attribute_value ELSE NULL END) as turnkey_count

from test2 INNER JOIN

table(

select property_id from
    table(
        select  property_id,
   COUNT(CASE WHEN test2.attribute_id = 4 AND test2.attribute_value =
46 THEN 1 ELSE  NULL END) as property_characteristics,

   COUNT(CASE WHEN test2.attribute_id = 5 AND attribute_value >1000
THEN 1 ELSE NULL END) as property_price_rent

   from test2 group by property_id
   ) table_of_items_that_satisfy_filter

   where table_of_items_that_satisfy_filter.property_price_rent=1
and
   table_of_items_that_satisfy_filter.property_characteristics=1

) B

ON test2.property_id = B.property_id

////////////////////////////////////////////////////////////////////////
FETCH DATA SQL:
////////////////////////////////////////////////////////////////////////

select *
from

table(
    select property_id from table(
        select property_id,
        COUNT(CASE WHEN ATTRIBUTE_ID = 5 AND ATTRIBUTE_VALUE = 4 THEN 1 ELSE
NULL END) as property_characteristics,
        COUNT(CASE WHEN ATTRIBUTE_ID = 7 AND ATTRIBUTE_VALUE >344633 THEN 1
ELSE NULL END) as property_price_rent
        from test2 group by property_id
    ) B

    where B.property_price_rent=1 and B.property_characteristics=1
) filtered_table

INNER JOIN places ON places.places_id = filtered_table.property_id

Thanks again,

Vasili
Serge Rielau - 04 Dec 2007 12:35 GMT
Vasili,

I'm dropping PICK from the posts..

Which version of DB2 are you on? DB2 for LUW added some query rewrite
improvements very recently.

Cheers
Serge

PS: Have you considered XML? Schema chaos is XML turf.

Signature

Serge Rielau
DB2 Solutions Development
IBM Toronto Lab

vasilip - 04 Dec 2007 18:05 GMT
I'm using DB2 9.1 (windows) the express-c version currently till this
project pics up and I need to scale..  I plan on moving to linux but
once I saw what a pain it was to convert some of the tables
(especially the ones containing spatial data) I decided to postpone
the switch till I have completed the project and these priority issues
are out of the way.

my original method was using xml where each entity had an xml field
that contained
<Attributes>
  <Attribute id='5' value='6'>
  ...
</Attributes>

I was querying the table using XQueries and XMLExists caluses.. this
proved to be quite slow even though I had indexed //Attribute[@id] and
[@value]

I also tried creating a VIEW (based on the xml fields) which was in
the form of ID, Attribute_ID, and Attribute_Value which also performed
slowly (for a couple thousand records..) this view is what prompted me
to scrap that idea and go for a relational table with the same 3
fields, which brought me to my original question that you helped me
with (the self join vs pivot table issue)

The pivot table method was working pretty well when the number of
items was under 100,000 or so.. but once the table got much bigger all
I hear is hard disk thrashing with about 20% cpu usage for a few
minutes.

I can I can export the table for you and include my SQL queries
together with the Explain results for the queries if it would be of
any help.

thanks again,
Vasili

> Vasili,
>
[quoted text clipped - 12 lines]
> DB2 Solutions Development
> IBM Toronto Lab
ChrisC - 05 Dec 2007 23:30 GMT
Hi.

...
> Dimension:
> ATTRIBUTE_ID

What do you mean by 'Dimension'?  Is this an MDC dimension by any
chance?

Also, looking at your SQL statements, if you include some values in
the WHERE clause this might perform much faster (depending, of course,
on your data).  In your query you only reference 4 total attribute_id
values, and if you filter by those, it should work better:

select
count (CASE WHEN attribute_id = 4 AND attribute_value = 46
          THEN attribute_value ELSE NULL END) as
property_position_count,
count (CASE WHEN attribute_id = 2 AND attribute_value = 7
          THEN attribute_value ELSE NULL END) as
property_base_type_count,
count (CASE WHEN attribute_id = 4 AND attribute_value >10
          THEN attribute_value ELSE NULL END) as
property_price_rent_count,
count (CASE WHEN attribute_id = 8 AND attribute_value = 22
          THEN attribute_value ELSE NULL END) as turnkey_count
from test2 INNER JOIN
table(
select property_id from
    table(
        select  property_id,
   COUNT(CASE WHEN test2.attribute_id = 4 AND test2.attribute_value =
46 THEN 1 ELSE  NULL END) as property_characteristics,
   COUNT(CASE WHEN test2.attribute_id = 5 AND attribute_value >1000
THEN 1 ELSE NULL END) as property_price_rent
   from test2
/* first added where clause */
WHERE attribute_id IN (4, 5)
group by property_id
   ) table_of_items_that_satisfy_filter
   where table_of_items_that_satisfy_filter.property_price_rent=1
and
   table_of_items_that_satisfy_filter.property_characteristics=1
) B
ON test2.property_id = B.property_id
/* second added where clause */
WHERE attribute_id in (2, 4, 8)

Thanks,
Chris
vasilip - 06 Dec 2007 09:40 GMT
Hello!

Yup, its a MDC dimension.. After briefly reading what the purpose of
such dimensions were I had a feeling it might go faster if I created a
dimension based on attribute_id. not really sure if it did much .. but
I guess if the data is grouped by attribute ids it would be better
when filtering by them.

your suggestion on the where clause did help speed it up
significantly. I'm going to try adding the attribute values to the
where clause as well aside from the attribute ids!

thanks,

Vasili

> Hi.
>
[quoted text clipped - 47 lines]
> Thanks,
> Chris
Tony Gravagno - 01 Dec 2007 19:50 GMT
This inquiry was cross-posted to DB2 and the Pick groups - are which
DBMS are you running?  Your references to attributes and multivalues
is distinctly Pick but the references to SQL make it sound like you're
using DB2.

For an MV environment this is easy.  Your data can be stored in a
cross-reference, for example as Make*FuelType*1 where 1 means it's
data, and Make*FuelType*0 is a count that gets update when the
database is changed.  Rather than maintaining two record types you
could just use regular MV indices (depending on your platform) or use
BASIC to DCOUNT the values in a record.

For you relational guys what that means is that the data is stored
with a primary key like 178062*593881 and all of the items for that
combination are stored as value-delimited fields which look like this:
 7436366757]256765843545]6788945646]6884566537
That eliminates the need for another query on yet another table.  A
DCOUNT in the above case would return 4.

In MV the data can also be stored in common save-lists, so you just
use the equivalent of a stored procedure to GET-LIST Make*FuelType and
use that to directly reference the primary file.  That's 2 queries no
matter how deep the combination of features.

So far I don't see a problem no matter which MV DBMS you're using or
which language you're using to get to it.  In fact, the same solutions
apply to an RDBMS.  I'm sure I've missed something in your specific
example but I do this sort of thing every day so I'm sure about the
general approach.

HTH
Tony Gravagno
Nebula Research and Development
TG@ removethispleaseNebula-RnD.com

>Does anyone have any recomendations for design of a database to be
>used in a site such as mobile.de or funda.nl?
[quoted text clipped - 38 lines]
>
>Vasili Papaconstantinou
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.