Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / June 2005

Tip: Looking for answers? Try searching our database.

Why spurious tuples with fifth normal form?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
stowellt@gmail.com - 15 Jun 2005 18:14 GMT
Hello All,

I have spent countless hours searching the net to find the answer, but
am still confused. I have a db structure where I have decomposed a
table into three smaller tables, but when I join them all back
together, I get spurious tuples. Now I thought that the whole point of
fifth normal form was to allow you to successfully decompose a table
into smaller tables, then join them back together without spurious
results. Do I have the wrong idea about this, or if not, what am I
doing wrong? Thanks.
Tony Andrews - 15 Jun 2005 21:14 GMT
> Hello All,
>
[quoted text clipped - 6 lines]
> results. Do I have the wrong idea about this, or if not, what am I
> doing wrong? Thanks.

Maybe you decomposed it wrong?  Difficult to say based on the
information you have given so far (i.e. none!)
stowellt@gmail.com - 15 Jun 2005 23:19 GMT
Here is some more info:

The original table before I decomposed it was like this:

parameter_id     // FK to a list of chemical parameters (silver, gold,
etc.)
classification_id   // FK to a list of classifications (drinking water,
irrigation water, livestock
url_id // FK to a list of url links to web sites

I then decomposed the table into the following three relvars:

parameter_url
classification_url
parameter_classification

If I then re-join all three tables, I get spurious results. I am
stumped as to the problem. did I decompse wrong? There is a complete
many-to-many-to-many relationship between each of the three FK's. Any
help is greatly appreciated.
Jan Hidders - 16 Jun 2005 00:23 GMT
> Here is some more info:
>
[quoted text clipped - 5 lines]
> irrigation water, livestock
> url_id // FK to a list of url links to web sites

Does parameter_id identify a single parameter or a list of parameters
(which is what you seem to say in your comment)?

At first sight this looks very much like it's already in 5NF, but of
course this all depends on which functional dependencies (FDs),
multi-valued dependencies (MVDs) and join-dependencies (JDs) you think
hold. Can you tell us a bit more about that?

> If I then re-join all three tables, I get spurious results. I am
> stumped as to the problem. did I decompse wrong? There is a complete
> many-to-many-to-many relationship between each of the three FK's.

You mean that with every parameter_id there can be multiple
classification-id's associated, et cetera? That only tells you that
there are no FDs betweeen the individual columns. There could still be
FDs between combinations of columns, and it says absolutely nothing
about the possiblity of MVDs and JDs.

If you want to check for MVDs (multi-valued dep.'s) then the relevant
question is if these relationship are independent, for example if the
fact that a parameter_id is associated with a certain classification_id
is completely independent of any involved url_id.

-- Jan Hidders
stowellt@gmail.com - 16 Jun 2005 04:43 GMT
I really appreciate your info on this.

To answer your questions,

parameter_id refers to a chemical from a list (table) of chemicals,
i.e.

parameter_id  parameter
1.............silver
2.............gold
3.............cobalt

etc.

classification_id and url_id each link to similar tables. Basically,
all three fields are part of the primary key. The whole point of the
table is just to store the fact that silver in drinking water has a
link to x website. Or gold in irrigation water has a link to the same
website, or many websites, etc. So I don't think there are any mvd's
because all three are inter-dependant (needed to identify a unique
fact), not independant. Please correct me if I'm wrong.

As far as a join dependency, I'm not sure how to check for that. Is
there an easy way? Thanks.
Jan Hidders - 16 Jun 2005 08:31 GMT
> As far as a join dependency, I'm not sure how to check for that. Is
> there an easy way?

Let's for brevity call the relation R(P,C,U). I think you are right that
there are no MVDs, which means that we can already rule out all binary
JDs, since those are always equivalent with MVDs (e.g. MVD P->>C is
equivalent with a JD {PC, PY}).

Since there are only three columns the only other possible non-trivial
JD is JD {PC, PU, CU}. So does this hold? You can formulate this
constraint in at least two ways:

(1) It always holds that R = R[PC] JOIN R[PU] JOIN R[CU]
    where R[PC] denotes the projection of R on the columns P and C, etc.

Bascially this says that the associated decomposition is information
preserving, and you already saw that this is not always the case. Be
sure to check that you have to do *two* joins, not just one.

A reformulation of (1) that some find easier to understand is the following:

(2) If there is a pair (p,c) in R[PC], a pair (p,u) in R[PU] and a pair
(c,u) in R[CU] then the triple (p,c,u) appears in R

So it should hold that if a certain parameter p is associated with a
classification c, and this parameter is also associated with a URL u,
and that URL is itself associated with classification c, then it must
hold that (p,c,u) is necessarily in the relation.

Since you got spurious tuples when joining everything back together,
that latter statement is apparently not true, and the JD does not hold.

Congratulations, you have reached 5NF. :-)

-- Jan Hidders
stowellt@gmail.com - 16 Jun 2005 16:17 GMT
Thank you for the information, I really appreciate you taking the time
to explain. I can now  be confident that leaving everything in the one
table is correct :) Out of curiousity, I have seen data sets where
there is a join dependency due to a cyclic constraint, but I'm not sure
what exactly constitutes a cyclic constraint. I thought my dataset had
one, but apparently not. How does one go about determining a cyclic
constraint? I apologize for all the questions, but it's the only way I
learn and I find it very interesting. Thanks again.
Jan Hidders - 16 Jun 2005 20:13 GMT
> Out of curiousity, I have seen data sets where
> there is a join dependency due to a cyclic constraint, but I'm not sure
> what exactly constitutes a cyclic constraint. I thought my dataset had
> one, but apparently not. How does one go about determining a cyclic
> constraint?

Do you want to know how to recognize if a certain join dependency holds,
or do you want to know how to check if a certain JD that you found is
cyclic? But first, perhaps, is it precisly clear to you what a JD is?

-- Jan Hidders
stowellt@gmail.com - 16 Jun 2005 20:39 GMT
Good questions. Please correct me if I'm wrong, but isn't a join
dependency when you are able to join decomposed tables on common keys
to get the original table? In my case, I thought I had found a join
dependency because most of the time, if I joined the decomposed tables
together I got the original back. But every so often a spurious tuple
showed up. Would this be an example of checking if a JD held? (and it
didn't hold when the spurious tuple showed up)? Is the only way to
check this by trial-and-error?

So if there would have been no spurious results, then I assume the JD
would have held and thus I should have decomposed. I'm just not sure
why my data set didn't have the JD when I have seen similar sets of
data (with three inter-dependant columns) that did have JD's. Are all
JD's cyclic? Thanks, I appreciate the explanations.
Jan Hidders - 17 Jun 2005 18:43 GMT
> Please correct me if I'm wrong, but isn't a join
> dependency when you are able to join decomposed tables on common keys
> to get the original table?

Almost. The common columns on which you join don't have to be keys.

> In my case, I thought I had found a join
> dependency because most of the time, if I joined the decomposed tables
> together I got the original back. But every so often a spurious tuple
> showed up. Would this be an example of checking if a JD held?

Yes. That is precisely what it is.

> (and it
> didn't hold when the spurious tuple showed up)? Is the only way to
> check this by trial-and-error?

In practice, yes, but if you have very good analytical and mathematical
skills then it is also possible to recognize the constraints that imply
this. They usually have the form that I already mentioned; loosely
speaking something like "if certain separate combinations of subsets of
columns appear in the relation then the full combination also must appear".

> So if there would have been no spurious results, then I assume the JD
> would have held and thus I should have decomposed.

No, not always, only if the JD that you found is not implied by the
candidate keys. How can you check this? You use the following very
simple algorithm:

Input:
  - A set of candidate keys { K_1, ..., K_n }
  - A join dependency JD { C_1, ..., C_m }
    where C_1, .., C_m are sets of column names

Steps:
1. Let H be the set { C_1, ..., C_m }
2. If for two different sets C_i and C_j in H the intersection of C_i
and C_j contains the candidate key K_p, then replace C_i and C_j in H
with the union of C_i and C_j
3. Repeat step 2 until there are no more two different C_i and C_j with
a common candidate key
4. If one of the sets in the resulting H contains all the column names
of the relation then the answer is yes, otherwise no.

To give a small example: let for the relation R(A,B,C,D,E) the candidate
keys be AB and C, and assume that the JD { ABC, ABE, CE } holds. Is it
implied?:

step 1. H = { ABC, ABD, CE }
step 2. (#1) H = { ABCD, CE } (because ABC and ABC share the c.k. AB)
step 2. (#2) H = { ABCDE } (because ABCDE and CE share the c.k. C)
step 3. no more two distinct components with common cand. key
step 4. The JD is implied by the cand. keys because ABCDE is in H

So even though this join dependency holds, you do not have to split.

Exercise for the reader: explain why trivial JDs (i.e. JDs where at
least one component contains all column names) are according to the
algorithm above implied by any set of candidate keys, including the
empty set.

> I'm just not sure
> why my data set didn't have the JD when I have seen similar sets of
> data (with three inter-dependant columns) that did have JD's.

Hard to say. In my experience JDs that are not simply FDs or MVDs are
pretty rare, and JDs that are not implied by the candidate keys even
more so. So my guess would be that actually what you saw where FDs or
MVDs but weren't recognized as such. But that's just a guess.

> Are all JD's cyclic?

No. If you really want me to, I could explain to you what it means (it
refers to the acyclicity of the hypergraph being described by the JD)
but actually it is irrelevant for the decision of splitting or not. For
that you only have to know if the JD is implied by the candidate keys,
as I explained above.

-- Jan Hidders
Jan Hidders - 17 Jun 2005 20:15 GMT
> To give a small example: let for the relation R(A,B,C,D,E) the candidate
> keys be AB and C, and assume that the JD { ABC, ABE, CE } holds. Is it
> implied?:

Oops. The JD should be (see below) JD {ABC, AB*D*, CE}

> step 1. H = { ABC, ABD, CE }
> step 2. (#1) H = { ABCD, CE } (because ABC and ABC share the c.k. AB)
[quoted text clipped - 3 lines]
>
> So even though this join dependency holds, you do not have to split.

-- Jan hidders
stowellt@gmail.com - 20 Jun 2005 16:08 GMT
Just wanted to say thanks again for the info. I will try to put it into
practice in future designs.
Jan Hidders - 17 Jun 2005 20:22 GMT
Ahem. *cough* Just checking if you guys were paying attention. :-)

> step 1. H = { ABC, ABD, CE }
> step 2. (#1) H = { ABCD, CE } (because ABC and ABC share the c.k. AB)
                                                 ^^^
Should be "ABD"

> step 2. (#2) H = { ABCDE } (because ABCDE and CE share the c.k. C)
                                      ^^^^^

Should be "ABCD"

> step 3. no more two distinct components with common cand. key
> step 4. The JD is implied by the cand. keys because ABCDE is in H
>
> So even though this join dependency holds, you do not have to split.

That's still correct. :-)

-- Jan Hidders
Jan Hidders - 15 Jun 2005 23:56 GMT
> I have spent countless hours searching the net to find the answer, but
> am still confused. I have a db structure where I have decomposed a
[quoted text clipped - 4 lines]
> results. Do I have the wrong idea about this, or if not, what am I
> doing wrong?

You are only supposed to split if there is a join-dependency that does
not follow from the key constraints. If this is not the case then you
are already in 5NF. It sounds like you are decomposing your relation
without ever checking if there was a join-dependency that required this.
In fact, if there are spurious tuples then that actually proves (pretty
much by definition) that your decomposition does not correspond to a
join-dependency.

-- Jan Hidders
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.