Database Forum / General DB Topics / DB Theory / June 2005
Why spurious tuples with fifth normal form?
|
|
Thread rating:  |
stowellt@gmail.com - 15 Jun 2005 18:14 GMT Hello All,
I have spent countless hours searching the net to find the answer, but am still confused. I have a db structure where I have decomposed a table into three smaller tables, but when I join them all back together, I get spurious tuples. Now I thought that the whole point of fifth normal form was to allow you to successfully decompose a table into smaller tables, then join them back together without spurious results. Do I have the wrong idea about this, or if not, what am I doing wrong? Thanks.
Tony Andrews - 15 Jun 2005 21:14 GMT > Hello All, > [quoted text clipped - 6 lines] > results. Do I have the wrong idea about this, or if not, what am I > doing wrong? Thanks. Maybe you decomposed it wrong? Difficult to say based on the information you have given so far (i.e. none!)
stowellt@gmail.com - 15 Jun 2005 23:19 GMT Here is some more info:
The original table before I decomposed it was like this:
parameter_id // FK to a list of chemical parameters (silver, gold, etc.) classification_id // FK to a list of classifications (drinking water, irrigation water, livestock url_id // FK to a list of url links to web sites
I then decomposed the table into the following three relvars:
parameter_url classification_url parameter_classification
If I then re-join all three tables, I get spurious results. I am stumped as to the problem. did I decompse wrong? There is a complete many-to-many-to-many relationship between each of the three FK's. Any help is greatly appreciated.
Jan Hidders - 16 Jun 2005 00:23 GMT > Here is some more info: > [quoted text clipped - 5 lines] > irrigation water, livestock > url_id // FK to a list of url links to web sites Does parameter_id identify a single parameter or a list of parameters (which is what you seem to say in your comment)?
At first sight this looks very much like it's already in 5NF, but of course this all depends on which functional dependencies (FDs), multi-valued dependencies (MVDs) and join-dependencies (JDs) you think hold. Can you tell us a bit more about that?
> If I then re-join all three tables, I get spurious results. I am > stumped as to the problem. did I decompse wrong? There is a complete > many-to-many-to-many relationship between each of the three FK's. You mean that with every parameter_id there can be multiple classification-id's associated, et cetera? That only tells you that there are no FDs betweeen the individual columns. There could still be FDs between combinations of columns, and it says absolutely nothing about the possiblity of MVDs and JDs.
If you want to check for MVDs (multi-valued dep.'s) then the relevant question is if these relationship are independent, for example if the fact that a parameter_id is associated with a certain classification_id is completely independent of any involved url_id.
-- Jan Hidders
stowellt@gmail.com - 16 Jun 2005 04:43 GMT I really appreciate your info on this.
To answer your questions,
parameter_id refers to a chemical from a list (table) of chemicals, i.e.
parameter_id parameter 1.............silver 2.............gold 3.............cobalt
etc.
classification_id and url_id each link to similar tables. Basically, all three fields are part of the primary key. The whole point of the table is just to store the fact that silver in drinking water has a link to x website. Or gold in irrigation water has a link to the same website, or many websites, etc. So I don't think there are any mvd's because all three are inter-dependant (needed to identify a unique fact), not independant. Please correct me if I'm wrong.
As far as a join dependency, I'm not sure how to check for that. Is there an easy way? Thanks.
Jan Hidders - 16 Jun 2005 08:31 GMT > As far as a join dependency, I'm not sure how to check for that. Is > there an easy way? Let's for brevity call the relation R(P,C,U). I think you are right that there are no MVDs, which means that we can already rule out all binary JDs, since those are always equivalent with MVDs (e.g. MVD P->>C is equivalent with a JD {PC, PY}).
Since there are only three columns the only other possible non-trivial JD is JD {PC, PU, CU}. So does this hold? You can formulate this constraint in at least two ways:
(1) It always holds that R = R[PC] JOIN R[PU] JOIN R[CU] where R[PC] denotes the projection of R on the columns P and C, etc.
Bascially this says that the associated decomposition is information preserving, and you already saw that this is not always the case. Be sure to check that you have to do *two* joins, not just one.
A reformulation of (1) that some find easier to understand is the following:
(2) If there is a pair (p,c) in R[PC], a pair (p,u) in R[PU] and a pair (c,u) in R[CU] then the triple (p,c,u) appears in R
So it should hold that if a certain parameter p is associated with a classification c, and this parameter is also associated with a URL u, and that URL is itself associated with classification c, then it must hold that (p,c,u) is necessarily in the relation.
Since you got spurious tuples when joining everything back together, that latter statement is apparently not true, and the JD does not hold.
Congratulations, you have reached 5NF. :-)
-- Jan Hidders
stowellt@gmail.com - 16 Jun 2005 16:17 GMT Thank you for the information, I really appreciate you taking the time to explain. I can now be confident that leaving everything in the one table is correct :) Out of curiousity, I have seen data sets where there is a join dependency due to a cyclic constraint, but I'm not sure what exactly constitutes a cyclic constraint. I thought my dataset had one, but apparently not. How does one go about determining a cyclic constraint? I apologize for all the questions, but it's the only way I learn and I find it very interesting. Thanks again.
Jan Hidders - 16 Jun 2005 20:13 GMT > Out of curiousity, I have seen data sets where > there is a join dependency due to a cyclic constraint, but I'm not sure > what exactly constitutes a cyclic constraint. I thought my dataset had > one, but apparently not. How does one go about determining a cyclic > constraint? Do you want to know how to recognize if a certain join dependency holds, or do you want to know how to check if a certain JD that you found is cyclic? But first, perhaps, is it precisly clear to you what a JD is?
-- Jan Hidders
stowellt@gmail.com - 16 Jun 2005 20:39 GMT Good questions. Please correct me if I'm wrong, but isn't a join dependency when you are able to join decomposed tables on common keys to get the original table? In my case, I thought I had found a join dependency because most of the time, if I joined the decomposed tables together I got the original back. But every so often a spurious tuple showed up. Would this be an example of checking if a JD held? (and it didn't hold when the spurious tuple showed up)? Is the only way to check this by trial-and-error?
So if there would have been no spurious results, then I assume the JD would have held and thus I should have decomposed. I'm just not sure why my data set didn't have the JD when I have seen similar sets of data (with three inter-dependant columns) that did have JD's. Are all JD's cyclic? Thanks, I appreciate the explanations.
Jan Hidders - 17 Jun 2005 18:43 GMT > Please correct me if I'm wrong, but isn't a join > dependency when you are able to join decomposed tables on common keys > to get the original table? Almost. The common columns on which you join don't have to be keys.
> In my case, I thought I had found a join > dependency because most of the time, if I joined the decomposed tables > together I got the original back. But every so often a spurious tuple > showed up. Would this be an example of checking if a JD held? Yes. That is precisely what it is.
> (and it > didn't hold when the spurious tuple showed up)? Is the only way to > check this by trial-and-error? In practice, yes, but if you have very good analytical and mathematical skills then it is also possible to recognize the constraints that imply this. They usually have the form that I already mentioned; loosely speaking something like "if certain separate combinations of subsets of columns appear in the relation then the full combination also must appear".
> So if there would have been no spurious results, then I assume the JD > would have held and thus I should have decomposed. No, not always, only if the JD that you found is not implied by the candidate keys. How can you check this? You use the following very simple algorithm:
Input: - A set of candidate keys { K_1, ..., K_n } - A join dependency JD { C_1, ..., C_m } where C_1, .., C_m are sets of column names
Steps: 1. Let H be the set { C_1, ..., C_m } 2. If for two different sets C_i and C_j in H the intersection of C_i and C_j contains the candidate key K_p, then replace C_i and C_j in H with the union of C_i and C_j 3. Repeat step 2 until there are no more two different C_i and C_j with a common candidate key 4. If one of the sets in the resulting H contains all the column names of the relation then the answer is yes, otherwise no.
To give a small example: let for the relation R(A,B,C,D,E) the candidate keys be AB and C, and assume that the JD { ABC, ABE, CE } holds. Is it implied?:
step 1. H = { ABC, ABD, CE } step 2. (#1) H = { ABCD, CE } (because ABC and ABC share the c.k. AB) step 2. (#2) H = { ABCDE } (because ABCDE and CE share the c.k. C) step 3. no more two distinct components with common cand. key step 4. The JD is implied by the cand. keys because ABCDE is in H
So even though this join dependency holds, you do not have to split.
Exercise for the reader: explain why trivial JDs (i.e. JDs where at least one component contains all column names) are according to the algorithm above implied by any set of candidate keys, including the empty set.
> I'm just not sure > why my data set didn't have the JD when I have seen similar sets of > data (with three inter-dependant columns) that did have JD's. Hard to say. In my experience JDs that are not simply FDs or MVDs are pretty rare, and JDs that are not implied by the candidate keys even more so. So my guess would be that actually what you saw where FDs or MVDs but weren't recognized as such. But that's just a guess.
> Are all JD's cyclic? No. If you really want me to, I could explain to you what it means (it refers to the acyclicity of the hypergraph being described by the JD) but actually it is irrelevant for the decision of splitting or not. For that you only have to know if the JD is implied by the candidate keys, as I explained above.
-- Jan Hidders
Jan Hidders - 17 Jun 2005 20:15 GMT > To give a small example: let for the relation R(A,B,C,D,E) the candidate > keys be AB and C, and assume that the JD { ABC, ABE, CE } holds. Is it > implied?: Oops. The JD should be (see below) JD {ABC, AB*D*, CE}
> step 1. H = { ABC, ABD, CE } > step 2. (#1) H = { ABCD, CE } (because ABC and ABC share the c.k. AB) [quoted text clipped - 3 lines] > > So even though this join dependency holds, you do not have to split. -- Jan hidders
stowellt@gmail.com - 20 Jun 2005 16:08 GMT Just wanted to say thanks again for the info. I will try to put it into practice in future designs.
Jan Hidders - 17 Jun 2005 20:22 GMT Ahem. *cough* Just checking if you guys were paying attention. :-)
> step 1. H = { ABC, ABD, CE } > step 2. (#1) H = { ABCD, CE } (because ABC and ABC share the c.k. AB) ^^^ Should be "ABD"
> step 2. (#2) H = { ABCDE } (because ABCDE and CE share the c.k. C) ^^^^^
Should be "ABCD"
> step 3. no more two distinct components with common cand. key > step 4. The JD is implied by the cand. keys because ABCDE is in H > > So even though this join dependency holds, you do not have to split. That's still correct. :-)
-- Jan Hidders
Jan Hidders - 15 Jun 2005 23:56 GMT > I have spent countless hours searching the net to find the answer, but > am still confused. I have a db structure where I have decomposed a [quoted text clipped - 4 lines] > results. Do I have the wrong idea about this, or if not, what am I > doing wrong? You are only supposed to split if there is a join-dependency that does not follow from the key constraints. If this is not the case then you are already in 5NF. It sounds like you are decomposing your relation without ever checking if there was a join-dependency that required this. In fact, if there are spurious tuples then that actually proves (pretty much by definition) that your decomposition does not correspond to a join-dependency.
-- Jan Hidders
|
|
|