Database Forum / General DB Topics / DB Theory / February 2008
Principle of Orthogonal Design
|
|
Thread rating:  |
JOG - 09 Jan 2008 01:51 GMT I was wondering what your current stances towards the principle of current design is cdt - info about the POOD is actually pretty sparse on google, which has not helped my own understanding. I gather that Date has realigned his opinion - although what to I know not - and that Darwen rejected the original POOD paper outright given that McGovern posits that:
R1 { X INTEGER, Y INTEGER } R2 { A INTEGER, B INTEGER }
violates the principle, whatever the relations' attribute names. Instinctively it does seem rather odd that a predicates such as:
* on Day:X the shop had noCustomers:Y * on Roll:A, the dice showed the Number:B
cannot share the same database. Have I interpreted the debate correctly? Any insights or corrections are, as ever, appreciated - POOD is certainly thought provoking, and the concept that an update need not require specifcation of a table name is an interesting one.
DM Unseen - 09 Jan 2008 12:49 GMT > I was wondering what your current stances towards the principle of > current design is cdt - info about the POOD is actually pretty sparse [quoted text clipped - 16 lines] > POOD is certainly thought provoking, and the concept that an update > need not require specifcation of a table name is an interesting one. My understanding is that the debate centers around typing.
Date advocates strong typing: ie X is of type "shop number" and A is a "roll number", both represent different aspects of the UOD so they have a different type (I disregard subtyping here for simplicity), *even if* the actual underlying representations (e.g. natural numbers) would be the same. Having different types, means the type engine can distinguish them.
DM Unseen
JOG - 10 Jan 2008 01:31 GMT > > I was wondering what your current stances towards the principle of > > current design is cdt - info about the POOD is actually pretty sparse [quoted text clipped - 25 lines] > would be the same. Having different types, means the type engine can > distinguish them. Hmmm, I see. I have always viewed a type as a set of values and a collection of allowable operations. If there exist two data types where these values and operations are the same, well as far as I can logically see, that makes them the same thing (even if they have two references).
Now I could follow the concept more if Date & McGovern weren't conflating the concept of attribute and type, and defined an attribute as a (role-name, type) pair, but as it stands is seems very odd to me.
> DM Unseen TroyK - 09 Jan 2008 20:37 GMT > I was wondering what your current stances towards the principle of > current design is cdt - info about the POOD is actually pretty sparse [quoted text clipped - 16 lines] > POOD is certainly thought provoking, and the concept that an update > need not require specifcation of a table name is an interesting one. I'm sure you've run across this exchange (since it seems to be the source of the example cited), but: http://www.dbdebunk.com/page/page/3010532.htm
From Pascal, addressing McGovern: "It is quite possible that the difference between you and CJD stems from the difference between his header/body view of a relation, which you (and recently me) do not espouse, and therefore differing definitions of a tuple."
Regarding the example you gave above, it seems the disagreement may be stemming, as you indicate, from a confusion over whether the types for the relations' attributes are declared from a syntactic domain or from a semantic domain. The example looks, to me, like the former. I would expect if it were the latter, domain names would more closely reflect that.
TroyK
TroyK - 09 Jan 2008 21:01 GMT > I was wondering what your current stances towards the principle of > current design is cdt - info about the POOD is actually pretty sparse [quoted text clipped - 16 lines] > POOD is certainly thought provoking, and the concept that an update > need not require specifcation of a table name is an interesting one. I thought I'd work up an example that illustrates part of McGovern's argument (as I understand it). Comments and corrections are most welcome:
Assuming we want to represent employees and their status as either salaried, commissioned, or both, we might come up with the following design (assuming the most obvious interpretations):
Table: Emp Emp# IsSalaried IsCommissioned ==== ---------- -------------- 1 Y N 2 N Y 3 Y Y
Another possible design could be: Table: Emp Emp# ==== 1 2 3
Table: SalariedEmp (FK, Emp# references Emp.Emp#) Emp# ==== 1 3
Table: CommissionedEmp (FK, Emp# references Emp.Emp#) Emp# ==== 2 3
Although we can derive the same information from each design, the former is to be prefered because the latter violates POOD (two tables with identical headers, even semantically speaking). Additionally, it requires that we inspect the tables' names in order to gather the full meaning of the propositions.
TroyK
JOG - 10 Jan 2008 02:23 GMT > > I was wondering what your current stances towards the principle of > > current design is cdt - info about the POOD is actually pretty sparse [quoted text clipped - 59 lines] > > TroyK I like this example Troy - It is both concise and clear. I was just mulling and came up with a possible schema for a "kennel club", that might help clarify some of the issues I'm seeing.
owners = {name, age, joined} PK(name) dogs = {name, age, joined} PK(name) ownership = {owner, dog} FK(owner, owners.name) FK(dog, dogs.name)
As it stand this is clearly not in POOD, and a proposition such as <Bess, 18, 01/08/08> is ambiguous. So it got me thinking about how we know in real life what a proposition refers to.
1) I know the context implicitly of <Bess, 18, 01/08/08> (from previous conversation, who i'm talking to, etc) 2) Surrounding text: <Bess, 18, 01/08/08> = "The dog called Bess is 18 years old and joined the kennel club on 01/08/08"
I don't think we can deign to rely on a computer ever being able to cope with implict context (leave that to the CYC people to bang there heads against ad infintum), so that appears to leave me with option 2. In turn, I see two ways of integrating this explicit surrounding context.
1) Via role names so, the proposition becomes: There is a dog with (dogsName:Bess) of (age:18) and who (joined: 01/08/08)
2) Via another attribute: There (is_a:dog) with (name:Bess) of (age:18) and who (joined: 01/08/08)
Which is preferable, and why that would be so I am uncertain. It would be easy to say "this is all just a design decision", but I'm starting to think there might be some sort of salient lesson in there. Somewhere. Maybe.
DM Unseen - 10 Jan 2008 12:32 GMT > > > I was wondering what your current stances towards the principle of > > > current design is cdt - info about the POOD is actually pretty sparse [quoted text clipped - 97 lines] > > - Tekst uit oorspronkelijk bericht weergeven - Could be something like
joiners= {name, age, joined, type} PK(name,type) where type in {'Owner','Dog'}
ownership = {owner, dog} FK(owner, joiners.name AND joiners.type='Owner') FK(dog, joiners.name AND joiners.type='Dog')
This could still be generalised further bit I think this would be in POOD.
And yes POOD implies "semantical" types and not syntactical. This could also imply that we are dropping too much information in the conversion from semantical (conceptual) models like NIAM/ORM to logical models like the relational models.
DM Unseen
TroyK - 10 Jan 2008 20:13 GMT > > > > I was wondering what your current stances towards the principle of > > > > current design is cdt - info about the POOD is actually pretty sparse [quoted text clipped - 117 lines] > > - Show quoted text - You make a good point about the possible lossy conversion from conceptual to logical (inasmuch as "encoding" information in the relations' names implies this loss).
I'm wondering if we should consider this type of POOD violation a non- issue, though. In SQL, we always need to specify which table we're talking about (both for updates and data retrieval), so that forces one to be specific about the propositions they're interested in. Even in a more pure RM implementation, we would need to reference which relvar in particular we're updating (or deriving from).
In practice, I use POOD more to inform my design decisions around constraint-based table/view partitioning.
TroyK
Brian Selzer - 13 Jan 2008 13:54 GMT >> > I was wondering what your current stances towards the principle of >> > current design is cdt - info about the POOD is actually pretty sparse [quoted text clipped - 57 lines] >> requires that we inspect the tables' names in order to gather the full >> meaning of the propositions. I think that the requirement that we inspect table names comes from the correlation between RM and predicate logic. In predicate logic, there are predicate symbols and there are individual symbols, and under an interpretation, both the predicate symbols and the individual symbols are assigned meaning. The two relations,
sine {x, y}
and
cosine {x, y}.
have the same heading, but have totally different meanings--even though the same individuals are exemplified in both whenever (x - pi / 4) modulus pi is zero. The tuple, {pi / 4, sqrt(2) / 2}, appears in both relations yet has a different meaning assigned to it from each predicate. Interestingly, though, there is still only one individual represented by that tuple even though it appears in both relations.
>> TroyK > [quoted text clipped - 33 lines] > to think there might be some sort of salient lesson in there. > Somewhere. Maybe. JOG - 13 Jan 2008 14:14 GMT > >> > I was wondering what your current stances towards the principle of > >> > current design is cdt - info about the POOD is actually pretty sparse [quoted text clipped - 71 lines] > > have the same heading, but have totally different meanings Interesting example. So sine and cosine as stated would break POOD (which appears intended to assist with view updates). To avoid breaking the POOD, we'd have either:
operation{ input:x, output, y, type:t } t E {sin, cos}
or,
sin {sin_input:x, sin_output:y} cos {cos_input:x, cos_output:y}
Well, my instinct tells me that the latter is kludgy given x plays exactly the same role in both predicates, and as such one would expect them to have exactly the same name (and who knows, that might be useful information). But then the former would generate enormous tables given it would have to incorporate every conceivable operation with an input and an output of the same type as used by sin and cos. Practically that appears to be undesirable. Theoretically I guess there is nothing wrong with it.
> --even though the > same individuals are exemplified in both whenever (x - pi / 4) modulus pi is [quoted text clipped - 40 lines] > > to think there might be some sort of salient lesson in there. > > Somewhere. Maybe. Brian Selzer - 13 Jan 2008 16:39 GMT >> >> > I was wondering what your current stances towards the principle of >> >> > current design is cdt - info about the POOD is actually pretty [quoted text clipped - 76 lines] > Interesting example. So sine and cosine as stated would break POOD > (which appears intended to assist with view updates). ... Wasn't POOD intended to eliminate redundancy across relations? For example, in the two following relations,
ES {EmployeeNumber, EmployeeName, Salary} EC {EmployeeNumber, EmployeeName, Commission}
where EmployeeNumber is the key, the dependent attribute, EmployeeName, is redundant for employees that receive both salary and commission.
So because EmployeeNumber --> EmployeeName appears in both and "means the same thing" in both, it is a violation of POOD. I submit that it would not be a violation of POOD if there were a database constraint,
IS_EMPTY(ES JOIN EC {EmployeeNumber})
Which would mean that employees couldn't receive both a salary and a commission. Since it would not then be possible for an individual to be exemplified by both ES and EC, there isn't any overlapping meaning, so there isn't a POOD violation.
For a similar example, consider the following database schemata,
Case 1:
PS {Part#, Serial#, Warehouse, Bin} KEY {Part#, Serial#}
PSL {Part#, Serial#, Warehouse, Bin, Lot#} KEY {Part#, Serial#}
IS_EMPTY(PS JOIN PSL {Part#, Serial#})
In this case, PS is for the location of parts that are being tracked by serial number only, whereas PSL is for the location of parts that are being tracked by both lot and serial number.
Note that {Part#, Serial#} --> {Warehouse, Bin} appears in both.
Case 2:
PS {Part#, Serial#, Warehouse, Bin, HasLot} KEY {Part#, Serial#} HasLot IN {'Yes', 'No'}
PSL {Part#, Serial#, Lot#} KEY {Part#, Serial#} FOREIGN KEY {Part#, Serial#} REFERENCES PS
In this case, PS is for the location of parts that are being tracked by serial number or by lot number and serial number, PSL is for recording the lot numbers for parts that are being tracked by both lot and serial number, and there is an explicit indication in HasLot as to whether there should be a tuple in PSL for parts that are being tracked by both lot and serial.
I think that Case 1 does not violate POOD due to the database constraint,
IS_EMPTY(PS JOIN PSL {Part#, Serial#}).
This database constraint limits the possible interpretations of the predicates of PS and PSL so that the meanings of {Part#, Serial#, Warehouse, Bin} in PS and PSL must be different.
I prefer case 1 because in it the distinction between parts that are being tracked by serial number only and parts that are being tracked by both lot number can be determined due to the schema only, whereas in case 2, that distinction can only be interpreted by examining the value of HasLot.
> ... To avoid > breaking the POOD, we'd have either: [quoted text clipped - 62 lines] >> > to think there might be some sort of salient lesson in there. >> > Somewhere. Maybe. Jonathan Leffler - 14 Jan 2008 06:35 GMT >[...] >> I think that the requirement that we inspect table names comes from the [quoted text clipped - 22 lines] > sin {sin_input:x, sin_output:y} > cos {cos_input:x, cos_output:y} In its strongest form, POOD discounts names of attributes as well as names of relations - so the 'or' option would not pass muster. The first option mostly passes POOD - but not the real world test.
The general intention is that presented with a tuple, using just the types, the DBMS should be able to determine which table the row should be inserted in.
The sine and cosine tables are fairly compelling examples.
Another that I find compelling as a counter-example (illustrating that POOD is not the be-all and end-all of DB design) is a Bill of Materials. It is kinda important to know which value is the 'part that is built' and which is the 'part that is used to build'. And both, by definition, have to be from the same domain of part numbers.
[Aside: I think there's a typo in the 'operation' example; the comma separating output and y should probably be a colon. And then I'm not clear but it seems that the x and y in the examples are types, whereas in the original mention of sine and cosine, they were attribute names of a common type.]
> Well, my instinct tells me that the latter is kludgy given x plays > exactly the same role in both predicates, and as such one would expect [quoted text clipped - 4 lines] > Practically that appears to be undesirable. Theoretically I guess > there is nothing wrong with it. As an occasionally useful twist on DB design, I consider POOD worth having at the back of my mind. However, I don't think it is practically of much significance -- even if the DBMS were able to tell where to insert data when presented with a tuple. In my opinion, ignoring attribute names is not healthy.
 Signature Jonathan Leffler #include <disclaimer.h> Email: jleffler@earthlink.net, jleffler@us.ibm.com Guardian of DBD::Informix v2007.0914 -- http://dbi.perl.org/
publictimestamp.org/ptb/PTB-2270 haval 2008-01-14 06:00:04 A3E3F8DD3E3BD5472410A4CC430A01CE
DM Unseen - 14 Jan 2008 10:17 GMT > >> > I was wondering what your current stances towards the principle of > >> > current design is cdt - info about the POOD is actually pretty sparse [quoted text clipped - 116 lines] > > - Tekst uit oorspronkelijk bericht weergeven -- Tekst uit oorspronkelijk bericht niet weergeven - Mmm,
I'm not sure using functions is going to help us to understand POOD, beacause it is linked to the semantical meaning of types, not the syntactical meaning. This is absent in mathematical functions.
DM Unseen using relations to simulatie functions
Brian Selzer - 17 Jan 2008 14:14 GMT On 13 jan, 14:54, "Brian Selzer" <br...@selzer-software.com> wrote: [snip]
>> I think that the requirement that we inspect table names comes from the >> correlation between RM and predicate logic. In predicate logic, there are [quoted text clipped - 17 lines] >> though, there is still only one individual represented by that tuple even >> though it appears in both relations. [snip]
>Mmm, > [quoted text clipped - 4 lines] >DM Unseen >using relations to simulatie functions The functions were simple examples that everyone here should have been able to recognize, but the properties exhibited by these mathematical functions can also be exhibited by relations that are not also mathematical functions. For example, suppose that you have a relation for which machines an employee knows how to run, and an relation for which machines each employee is actually running:
MachinesEmployeesAreTrainedOn {Employee, Machine}
and
MachinesEmployeesAreRunning {Employee, Machine}
where in each case Employee draws its values from the domain Employees, Machine draws its values from the domain Machines, and the key is the entire heading.
An employee can be trained on zero to many machines, and an employee can be running zero to many machines. An employee can be trained on a machine that they are not running--there can be a tuple in MachinesEmployeesAreTrainedOn without a tuple in MachinesEmployeesAreRunning, an employee may be being trained on a machine--there can be a tuple in MachinesEmployeesAreRunning without one in MachinesEmployeesAreTrainedOn, or an employee may be running a machine that they're trained on--there can be a tuple in MachinesEmployeesAreRunning that is identical to a tuple in MachinesEmployeesAreTrainedOn.
As in the function example, the above relations have the same heading, but different meanings. So if the relation name corresponds to the predicate symbol and the values from the domains correspond to individual symbols, then under an interpretation, only part of the meaning can come from individual assignments: the balance must be from the assignment of meaning to the predicate symbol. This calls into question the idea that it should be possible to determine which relation an inserted tuple is destined for.
Jan Hidders - 17 Jan 2008 17:08 GMT > This calls into question the idea that it should > be possible to determine which relation an inserted tuple is destined for. Apologies for being lazy, but could someone explain to me in a nutshell why this should be possible at all? At first sight this looks like complete nonsense to me.
-- Jan Hidders
JOG - 18 Jan 2008 01:57 GMT > > This calls into question the idea that it should > > be possible to determine which relation an inserted tuple is destined for. > > Apologies for being lazy, but could someone explain to me in a > nutshell why this should be possible at all? At first sight this looks > like complete nonsense to me. Well, as far as I gather, being able to determine a unique predicate for any proposition being inserted into the database is desirable in order to allow view updates to be more easily be translated to changes in underlying base relations.
I cannot claim to fully understand the reasoning behind this, but view updating hence appears to have been POOD's underlying motivation. There is a relatively old draft paper by Date and McGoveran at living at http://www.dbdebunk.com/page/page/622331.htm which might be more illuminating.
> -- Jan Hidders paul c - 18 Jan 2008 03:13 GMT ...
>> Apologies for being lazy, but could someone explain to me in a >> nutshell why this should be possible at all? At first sight this looks [quoted text clipped - 5 lines] > in underlying base relations. > ... POOD seems too-little discussed to me. Maybe POFN too.
I'm glad the example was about machines because I like the mechanical view, saves a lot of time when I've forgotten to charge the batteries for my mystic-meter. In an ideal world where every design decision was considered against every attribute/role name, I think the example would be considered a straw man - if it were up to me, I wouldn't allow those two tables/relvars to use the same attribute names.
I realize there are people here who might say what's wrong with same role names because the RM doesn't forbid them. All that proves is that there are naive people here as well as elsewhere! I was just laughing over one of CJ Date's latest examples about the tuple that states that the supplier is not in Paris. I think one must cater one's design towards the mechanical advantage a "machine" offers and using the same names seems a bit thoughtless to me.
It might look neat for defining terse union views without having to resort to some kind of 'rename' macro operator baggage but I have a feeling avoiding a rename is too cute and will bite one back, sooner or later. Maybe rename also gives some kind of legitimacy for inserting to both 'sides' of a union view?
Jan Hidders - 18 Jan 2008 12:51 GMT > > > This calls into question the idea that it should > > > be possible to determine which relation an inserted tuple is destined for. [quoted text clipped - 13 lines] > athttp://www.dbdebunk.com/page/page/622331.htmwhich might be more > illuminating. It mentions the view update problem but doesn't really explain how it is connected.
Anyway, the stronger POOD that requires that headers are distinct sounds like nonsense to me. Why would R(A, B) be a worse design than R(R_A, R_B)?
The weaker POOD looks more interesting to me. I even found a published paper about it:
http://www.wseas.us/e-library/conferences/2006madrid/papers/512-451.pdf
What's interesting is that it seems to address the problem of normalization at the schema level, so not just restricted to one relation, as is usual in normalization theory. You might even formulate a new normal form at that level:
A schema is said to be in HNF if all database dependencies follow logically from the dependencies at relation level (so domain, tuple and relation constraints) and the inclusion dependencies.
So basically it says that inclusion dependencies are the only needed dependencies at database level. Note the similarity with 5NF (all join dependencies logically follow from the key dependencies) which says that the key constraints are the only needed constraints at relation level. That similarity is not a coincidence. In both cases we are eliminating redundancy. To get to a real normalization theory you would have to identify more concretely the type of database constraints / dependencies we are talking about.
What POOD has to do with HNF is left to the reader as an exercise. ;-)
-- Jan Hidders
mAsterdam - 18 Jan 2008 14:20 GMT >>>> This calls into question the idea that it should >>>> be possible to determine which relation an inserted tuple is destined for. [quoted text clipped - 23 lines] > > http://www.wseas.us/e-library/conferences/2006madrid/papers/512-451.pdf Unfortunately the link times out.
> What's interesting is that it seems to address the problem of > normalization at the schema level, so not just restricted to one [quoted text clipped - 15 lines] > > What POOD has to do with HNF is left to the reader as an exercise. ;-) -- What you see depends on where you stand.
Jan Hidders - 18 Jan 2008 14:51 GMT > >>>> This calls into question the idea that it should > >>>> be possible to determine which relation an inserted tuple is destined for. [quoted text clipped - 25 lines] > > Unfortunately the link times out. Hmm, not for me. But to help you out:
http://www.adrem.ua.ac.be/bibrem/pubs/pood.pdf
-- Jan Hidders
mAsterdam - 18 Jan 2008 19:49 GMT ...
>>> Anyway, the stronger POOD that requires that headers are distinct >>> sounds like nonsense to me. Why would R(A, B) be a worse design than [quoted text clipped - 7 lines] > > http://www.adrem.ua.ac.be/bibrem/pubs/pood.pdf Thank you for helping me out by providing the document.
I started reading "Extended Principle of Orthogonal Design" by Erki Eessaar
User defined datatypes (UDT) give the designer of a database more decisions, more room for wrong decisions. Row-, array-, reference-, multiset collection types - choices, choices, choices. How to deal with that freedom? Enter the Principle of Orthogonal Design (POD): If you have a tuple-type, make sure to have only one base relvar for recording that tuple-type. Nonloss decomposition is explained briefly.
And here is where I lost Date & McGoveran (or Erki Eessaars interpration, I don't know) in Chapter 2): "The meanings of R1A(t) and R1B(t) are said to overlap iff it is possible to construct some tuple t so that R1A(t) and R1B(t) are both true".
Note that it does /not/ say 'all tuples t', but 'some tuple t', So it does, in particular, /not/ exclude the possibilty that R1A(t') is true and R1B(t') is not true or vice versa, in other words, that R1A and R1B may be mutually independent. With this trick, meaning is forced into synonymity with the signature of the relation. I call BS. Guidelines to reduce design choices are welcome, but not this one.
The extended version (Chapter 4.2.) implicitly uses the same criterion for detecting "overlapping meanings".
Brian Selzer (elsewhere in this thread) did a good job by providing a sensible situation with two relations with the same signature but distinct, independent meanings.
Let's not extend this.
-- What you see depends on where you stand.
Jan Hidders - 19 Jan 2008 12:06 GMT > ... > >>> Anyway, the stronger POOD that requires that headers are distinct [quoted text clipped - 22 lines] > If you have a tuple-type, make sure to have only one base > relvar for recording that tuple-type. Hmm, I would call that the strong POOD (or strong POD), and that, I would agree, seems to make no sense.
> Nonloss decomposition is explained briefly. > [quoted text clipped - 11 lines] > With this trick, meaning is forced into synonymity with the > signature of the relation. No, no, not exactly. It could be that there are tuple constraints that don't allow you to construct the tuple in question. Say you have R(a,b) and S(a,b) and for R the tuple constraint that b > 5 and for S the tuple constraint that b <= 5. In that case you cannot construct a tuple that is both in R and S, but they still have the same header.
What they probably should have said is: R and S are said to have overlapping meaning if it does not follow from the constraints / dependencies that the intersection of R and S is always empty.
Note that I'm talking about dependencies and not the relvar predicates as they do, but the intent is the same. So this, or rather the design principle based on this definition, I would call the weak POOD. Does that make more sense to you?
-- Jan Hidders
mAsterdam - 19 Jan 2008 15:55 GMT >> ... >>>>> Anyway, the stronger POOD that requires that headers are distinct [quoted text clipped - 10 lines] >> I started reading "Extended Principle of Orthogonal Design" >> by Erki Eessaar below: EE
>> User defined datatypes (UDT) give the designer of a database >> more decisions, more room for wrong decisions. [quoted text clipped - 7 lines] > Hmm, I would call that the strong POOD (or strong POD), and that, I > would agree, seems to make no sense. We agree on that, that is clear. So ok, but - from the huge category of easily asked but hard to answer questions: - why?
Why doesn't the (strong) PoOD make sense to you?
It does not make sense to me, and I am putting some effort into finding out why it doesn't. The track I am on now gives rejection because of inappropriate equalization of meaning with form (i.c. the heading). This track does not give me any distinction between the validity of the strong (EE: original) and the weak (EE: extended) PoOD, because both build on the sentence quoted hereunder.
>> EE: Chapter 2: >> "The meanings of R1A(t) and R1B(t) are said to overlap [quoted text clipped - 13 lines] > the tuple constraint that b <= 5. In that case you cannot construct a > tuple that is both in R and S, but they still have the same header. Which could be rephrased as different B domains for R and S (so that R, S becomes an acceptable schema to PoOD), but I won't for the sake of argument.
> What they probably should have said is: R and S are said to have > overlapping meaning if it does not follow from the constraints / > dependencies that the intersection of R and S is always empty. Maybe, but it would not take away my objection. As long as R\S and S\R are allowed to be non-empty, R and S are independent, regardless of their heading.
> Note that I'm talking about dependencies and not the relvar predicates > as they do, but the intent is the same. So this, or rather the design > principle based on this definition, I would call the weak POOD. Does > that make more sense to you? Maybe stating the obvious: not really.
I'll try to elaborate:
Design case: S and R are sets of tuples we are trying to schematize:
If we do not allow R\S and S\R to be non-empty, the R,S design will be rejected, because R and S represent the same set, not because of (weak or strong) PoOD. If we allow R\S and S\R to be non-empty, R and S are completely independent. If we allow only one of R\S and S\R to be non-empty, one of R and S is a subset of the other.
Wether a tuple in S carries the same meaning as or a different meaning than an identical tuple in R is not based on their having the same signature (heading if you prefer).
This changes when we apply DM Unseen's 'object to role transformation': Stick the original relation name in a role attribute, like so:
T(role, a, b), role∈{'R','S'},
which forces any stuff together - based on shape, not on meaning.
-- What you see depends on where you stand.
Jan Hidders - 20 Jan 2008 12:40 GMT > >> ... > >>>>> Anyway, the stronger POOD that requires that headers are distinct [quoted text clipped - 30 lines] > > Why doesn't the (strong) PoOD make sense to you? It is at the same time too strong and too weak. It is too strong because it forbids cases where there is in fact no redundancy, and at the same time allows cases where there is redundancy. Moreover, you can always trivially satisfy it by renaming R(a,b) to R(r_a,r_b). How exactly does that improve the design?
> It does not make sense to me, and I am putting some effort > into finding out why it doesn't. The track I am on now gives [quoted text clipped - 3 lines] > of the strong (EE: original) and the weak (EE: extended) > PoOD, because both build on the sentence quoted hereunder. Note that my definition differs in an important way from theirs. I forgot to emphasize this the last time.
> >> EE: Chapter 2: > >> "The meanings of R1A(t) and R1B(t) are said to overlap [quoted text clipped - 17 lines] > R, S becomes an acceptable schema to PoOD), but I won't for the sake > of argument. For the sake of the argument, please do. :-) It would allow me to reply with the following: In that case there is still a counterexample, namely if there is a tuple constraint a < b for R and a >= b for S. Of course if you want, you can redefine the notion of header such that this is also included.
> > What they probably should have said is: R and S are said to have > > overlapping meaning if it does not follow from the constraints / [quoted text clipped - 3 lines] > As long as R\S and S\R are allowed to be non-empty, > R and S are independent, regardless of their heading. In that case you still might have redundancy. If you decompose in to the following three, R' = R/S, RS' = R intersect S, S' = S/R, then you have removed that redundancy. That's the motivation if the definition of "overlapping meaning" that I gave.
Actually, to really remove all such redundancy one would would need a stronger notion of "overlapping meaning", so we you can also deal with overlap modulo renaming, but I don't want to complicate things too much at this point.
-- Jan Hidders
JOG - 20 Jan 2008 21:35 GMT > > >> ... > > >>>>> Anyway, the stronger POOD that requires that headers are distinct [quoted text clipped - 36 lines] > can always trivially satisfy it by renaming R(a,b) to R(r_a,r_b). How > exactly does that improve the design? Well this is an interesting question, because I am yet to see any reason as to why it be beneficial, but can see how it makes life harder given that it prevents a union without a preceding rename.
I find the example of phone numbers useful (a scenario also discussed on TTM lists a while back): for a selection of propositions discussing people's work, home and mobile numbers is it possible to state a consistent preference for a schema from:
1) Contacts = {phone_type, name, number} 2) Home = {name, number} Work = {name, number} Mobile = {name, number}
I find it interesting that the PoOD recommends 1, the option that I believe most designers would intuitively go for, but that in its 'strong form' also suggests that 2 is perfectly satisfactory if roles are simply renamed as:
Home = {name, home_number} Work = {name, work_number} Mobile = {name, mobile_number}
Without any evidence of improvement this might confer to view updating, I certainly see little sense therein. However, I wonder if there is not a preferable guideline that also recommends 1 along the lines of eliminating application bias (a Principle of Bias Elimination perhaps). Such a guideline would also reject other feasible schema such as,
Toms_numbers = {type, number} Franks_numbers = {type, number} etc...
that one might never realistically consider, but differ little conceptually to a Home, Work, Mobile divisions, suggesting instead that if it is possible for propositions to be collectivized under a single predicate, then they should be?
Either way, I certainly find that appealing to notions of "meaning" within formal design recommendations seems to head towards very slippery ground.
> > It does not make sense to me, and I am putting some effort > > into finding out why it doesn't. The track I am on now gives [quoted text clipped - 54 lines] > > -- Jan Hidders Bob Badour - 20 Jan 2008 23:23 GMT >>>>>... >>>>> [quoted text clipped - 63 lines] > Work = {name, work_number} > Mobile = {name, mobile_number} The RM proponents I have read are all strongly opposed to encoding meaning in attribute names, relation names etc., and were all opposed at the time folks were discussing POOD a lot.
I do not believe POOD ever suggests 2 is perfectly satisfactory.
[snip]
mAsterdam - 20 Jan 2008 23:53 GMT > ... (a scenario also discussed on TTM lists a while back) ... Is it alive? More than a year back I sent a 'request to subscribe' to ttm-subscribe@thethirdmanifesto.com but I never got a response. Is the address correct?
> Either way, I certainly find that appealing to notions of "meaning" > within formal design recommendations seems to head towards very > slippery ground. Jonathan Leffler - 21 Jan 2008 07:13 GMT >> ... (a scenario also discussed on TTM lists a while back) ... > > Is it alive? More than a year back I sent a > 'request to subscribe' to ttm-subscribe@thethirdmanifesto.com > but I never got a response. Is the address correct? The TTM list is alive and delivering emails most days.
From one of today's messages:
List-Help: <mailto:ttm-help@thethirdmanifesto.com> List-Unsubscribe: <mailto:ttm-unsubscribe@thethirdmanifesto.com> List-Subscribe: <mailto:ttm-subscribe@thethirdmanifesto.com> Reply-To: ttm@thethirdmanifesto.com
I don't know why you ran into problems a year ago, but I've been on the list since August 2005 without problem.
 Signature Jonathan Leffler #include <disclaimer.h> Email: jleffler@earthlink.net, jleffler@us.ibm.com Guardian of DBD::Informix v2007.0914 -- http://dbi.perl.org/
publictimestamp.org/ptb/PTB-2326 ripemd160 2008-01-21 06:00:04 0639216F2AD6EC81B98F21B9AE2308216FB8E46B
mAsterdam - 21 Jan 2008 08:13 GMT > The TTM list is alive and delivering emails most days. > [quoted text clipped - 7 lines] > I don't know why you ran into problems a year ago, but I've been on the > list since August 2005 without problem. Thank you. I retried and received a WELCOME email.
Brian Selzer - 21 Jan 2008 14:15 GMT >> > >> ... >> > >>>>> Anyway, the stronger POOD that requires that headers are distinct [quoted text clipped - 77 lines] > that if it is possible for propositions to be collectivized under a > single predicate, then they should be? I think that there are times when it makes sense to 'collectivize' under a single predicate, and there are times when it does not. In particular, if the individuals represented are just different kinds of the same sort of thing, such as is the case with the different kinds of phone numbers, then it makes sense to 'collectivize' those attributes that are common to all of those individuals under a single predicate. If, on the other hand, the individuals represented are different sorts of things that just happen to have the same attributes, then it doesn't. A disjunctive predicate--something like, for example, each tuple exemplifies an individual that is an X or a Y, just doesn't make any sense to me. A conjunctive predicate on the other hand--for a relation that has multiple keys, for example--is not a problem since each tuple would exemplify an individual that is /both/ an X /and/ a Y. So if the predicate can be stated so that it is not disjunctive, then 'collectivizing' is to be preferred.
> Either way, I certainly find that appealing to notions of "meaning" > within formal design recommendations seems to head towards very [quoted text clipped - 59 lines] >> >> -- Jan Hidders David Cressey - 21 Jan 2008 15:09 GMT > I think that there are times when it makes sense to 'collectivize' under a > single predicate, and there are times when it does not. In particular, if [quoted text clipped - 10 lines] > that is /both/ an X /and/ a Y. So if the predicate can be stated so that it > is not disjunctive, then 'collectivizing' is to be preferred. It seems to me that you are describing the generalization-specialization pattern. Am I reading you right?
> > Either way, I certainly find that appealing to notions of "meaning" > > within formal design recommendations seems to head towards very > > slippery ground. Why? If you called it "the semantics of the data" instead, would that make it less slippery ground?
JOG - 21 Jan 2008 17:58 GMT > > I think that there are times when it makes sense to 'collectivize' under a > > single predicate, and there are times when it does not. In particular, if [quoted text clipped - 24 lines] > Why? If you called it "the semantics of the data" instead, would that make > it less slippery ground? Absolutely not - worse in fact, because then it would be greased with buzzwords too ;) Years of working with the semantic web, ontologies, expert systems, etc have emphasised to me that "meaning" is only something that can be obtained via situated embodiement within the environment concerned, and that context is far too complex a beast to be tamed by computerized encoding. As such I'd rather rules appealed to functional dependencies and logical consequents than to appeal to the slippery notion of "meaning".
Jan Hidders - 22 Jan 2008 08:53 GMT > > "Brian Selzer" <br...@selzer-software.com> wrote in message > [quoted text clipped - 37 lines] > to functional dependencies and logical consequents than to appeal to > the slippery notion of "meaning". Hear hear. But I think the part of POOD that actually does make sense and really removes redundancy and update anomalies can be defined that way. For example, if there is an inclusion dependency between a relation R and S then it makes sense to say that there is overlap in meaning. You can generalize this by defining the notion of overlap such that there is overlap if there is an inclusion dependency between (1) the result of a selection of R and (2) S, or vice versa. Or even more general, if "a selection of R" is interpreted as any conjunctive query, possibly with equalities and inequalities, that selects a subset of R. In all these case it holds that if you violate the POOD rule that says that there can be no overlap in meaning between different components of non-trivial join dependencies, then there is indeed redundancy. Moreover, in those cases you can solve that redundancy by making the redundant part explicit in a relation by horizontal and vertical splits.
Does that cover all types of redundancy caused by dependencies? I suspect not. For that you probably need to generalize the rule further such that it not only looks at overlap between components of join dependencies but also between the result of arbitrary conjunctive queries and components of join dependencies. It's currently not clear to me under which circumstances you are then still sure that this redundancy can be removed.
-- Jan Hidders
mAsterdam - 22 Jan 2008 09:46 GMT >>>> I think that there are times when it makes sense to >>>> 'collectivize' under a single predicate, and there [quoted text clipped - 39 lines] > relation R and S then it makes sense to say that there is overlap in > meaning. ?
inclusion dependency --> meaning overlap
?
Slowly, please - where has the 'might' (from the redundancy in your other post) gone? And the 'qualified' in the inclusion dependency?
> You can generalize this by defining the notion of overlap > such that there is overlap if there is an inclusion dependency between > (1) the result of a selection of R and (2) S, or vice versa. ... Jan Hidders - 22 Jan 2008 13:18 GMT > >>>> I think that there are times when it makes sense to > >>>> 'collectivize' under a single predicate, and there [quoted text clipped - 43 lines] > > inclusion dependency --> meaning overlap Yes. An inclusions dependency implies meaning overlap. It is a sufficient condition, but not a necessary one.
> ? > > Slowly, please - where has the 'might' (from the redundancy in > your other post) gone? That applied to my improved version of EE's definition of meaning overlap. Under the definition I'm using here this "might" has become a "must".
> And the 'qualified' in the inclusion dependency? Saying that there is a qualified inclusion dependency between R and S is equivalent with saying that there is an inclusion dependency between (1) the result of a conjunctive query that selects tuples from R and (2) S, which I talked about in the subsequent generalization steps in the preceding posting. I though that a step by step introduction might make the concept and the reasoning behind it a bit easier to grasp.
Apologies if this is all a bit confusing, but I'm developing these ideas as I post, and I'm not sure yet what the best way is to explain them.
-- Jan Hidders
mAsterdam - 22 Jan 2008 18:57 GMT >>>>>>> ... I certainly find that appealing to notions >>>>>>> of "meaning" within formal design recommendations [quoted text clipped - 20 lines] > >> ? The easiest case of meaning overlap, then, should be with the the most simple form of inclusion dependency: RI, the foreign key. How does the meaning overlap with simple RI?
>> Slowly, please - where has the 'might' (from the redundancy in >> your other post) gone? > > That applied to my improved version of EE's definition of meaning > overlap. Under the definition I'm using here this "might" has become a > "must". Another change of the definition of meaning overlap? To what?
>> And the 'qualified' in the inclusion dependency? > [quoted text clipped - 5 lines] > introduction might make the concept and the reasoning behind it a bit > easier to grasp. I don't think the difficulties are in the genericity of the definition of the inclusion dependency. ISTM the problems are in the definition of 'meaning overlap'. You changed it twice since I mentioned my objection to Erki's definition.
> Apologies if this is all a bit confusing, but I'm developing these > ideas as I post, and I'm not sure yet what the best way is to explain > them. Making the changes in your used definitions stand out more would definitely help. It is very hard to keep track of what you consider invariant and variant to your line of reasoning if definitions change without mention.
Jan Hidders - 23 Jan 2008 01:45 GMT > >>>>>>> ... I certainly find that appealing to notions > >>>>>>> of "meaning" within formal design recommendations [quoted text clipped - 25 lines] > RI, the foreign key. > How does the meaning overlap with simple RI? Well, if you have R(a,b) and S(c,d) and a FK R[a,b] to S[c,d], then there is meaning overlap. The inclusion dependencies we are considering have to cover all the attributes of the involved relations.
> >> Slowly, please - where has the 'might' (from the redundancy in > >> your other post) gone? [quoted text clipped - 4 lines] > > Another change of the definition of meaning overlap? To what? Ok. For the time being I promise to stick to the following definition:
DEFINITION: Two relations R and S are said to have overlap in meaning if there is an inclusion dependency from all attributes of R to all attributes of S, or vice versa.
Once you are comfortable with this we can move on the the more subtle cases of meaning overlap. To be complete let met also fix the POOD rule for the moment:
DEFINITION: A schema is said to violate POOD if it contains two different relations R and S such that a component of a non-trivial join dependency of R overlaps in meaning with a component of a non- trivial join dependency of S.
So far so good?
-- Jan Hidders
mAsterdam - 23 Jan 2008 15:00 GMT >>>>>>>>> ... I certainly find that appealing to notions >>>>>>>>> of "meaning" within formal design recommendations [quoted text clipped - 26 lines] > considering have to cover all the attributes of the involved > relations. Your definiens is close to D&M's ¹², even the isomorphism requirement is now there ('cover all the attributes'), with an added application restriction to inclusion dependencies.
The defieniendum 'meaning overlap' is defined even narrower again. Let me try an analogy here. Our coffeemachine design and test team, defines 'coffee' as 'the substance we can put in the coffeeholder of our espresso machine without wrecking the machine.' Chris: "Let's try caustic soda." David: "No coffee, the pressure will get too high. Tea is good coffee, though." Chris: "When the beans are grinded too fine the stuff will clutter the sieve". Jan: "We'll have to exclude that, to."
It is meaning overlap, Jim, but not as we know it.
An inconvenient implication: "In other words, such an 'individual user database' ought not to include any views and/or base tables whose meanings overlap" ² together with your definiens gives: all foreign keys are out.
I'll repeat that: all foreign keys are out.
¹ "The Principle of Orthogonal Design" draft, Date & McGoveran part 1: http://www.dbdebunk.com/page/page/622331.htm ² idem part 2: http://www.dbdebunk.com/page/page/622312.htm
>>>> Slowly, please - where has the 'might' (from the redundancy in >>>> your other post) gone? [quoted text clipped - 19 lines] > > So far so good? Does PoODling flush bathwater, baby, or part of both? The jury isn't even out yet.
Jan Hidders - 24 Jan 2008 10:58 GMT > >>>>>>>>> ... I certainly find that appealing to notions > >>>>>>>>> of "meaning" within formal design recommendations [quoted text clipped - 30 lines] > requirement is now there ('cover all the attributes'), with > an added application restriction to inclusion dependencies. Correct. A small but very important difference.
> The defieniendum 'meaning overlap' is defined even > narrower again. Correct. I wanted to start simple. So you would like a wider definition? Your wish is my command:
DEFINITION: Two relations R and S are said to have overlap in meaning if there is a dependency that requires that sometimes some tuples of R are also in S after renaming the attribute names, or vice versa.
It's a bit hand-wavy for my taste, because it doesn't define the type of dependencies but it will probably do for the moment.
> An inconvenient implication: > "In other words, such an 'individual user database' ought > not to include any views and/or base tables whose meanings overlap" ² > together with your definiens gives: all foreign keys are out. Under their definition of the POOD rule, yes, but not under mine. Btw. normalization rules never should be applied to views, only to base tables, and it is a normalization rule we are talking about here.
Unfortunately I also just noticed a related flaw in my POOD rule, so I'm going to have to redefine it slightly. First some additional terminology:
DEFINITION: A join dependency is said to be a proper if it does not hold that any of its components is a subset of another component.
Now the rule:
DEFINITION: A schema is said to violate POOD if it contains relations R and S such that a component C of a proper join dependency of R overlaps in meaning with a component D of a proper join dependency of S, and either R and S are different, C and D are different, or both.
> > So far so good? > > Does PoODling flush bathwater, baby, or part of both? > The jury isn't even out yet. I suspect with the new POOD the baby is quite safe. :-)
-- Jan Hidders
mAsterdam - 25 Jan 2008 08:18 GMT > ... I wanted to start simple ... Even though this suggests otherwise, I'll risk assuming that you understood the coffee analogy, and snipped it for focus.
Let's do some coffee-machine refactoring,..
> DEFINITION: Two relations R and S are said to have overlap in meaning > if there is a dependency that requires that sometimes some tuples of R > are also in S after renaming the attribute names, or vice versa. ..and start simple ;-)
1. The 'renaming etc...' is only there to exclude trivialities, caused by a clumsy (at least for this coffee-machine) definition of tuple-types, clumsy because it includes attribute names. We'll have a definiton without attribute names (to be formulated later if necessary).
2. It is not at all clear a priori what if anything this has to with meaning. To avoid distraction by connotation we'll substitute the definiendum by one with less connotation, and see if we can substitute it back when the conclusions are clear. My first hunch was 'symbolic redundancy' (thinking of Neo) but that would be loaded as well. It is the chaff to be filtered out by the PoOD, so PoOD-chaff. Clean enough?
1. 'renaming etc...' -- out 2. PoOD-chaff
DEFINITION: Two relations R and S are said to have PoOD-chaff if there is a dependency that requires that sometimes some tuples of R are also in S.
> It's a bit hand-wavy for my taste, because it doesn't define the type > of dependencies but it will probably do for the moment. Ok.
Note: The isomorphy of R and S is implicit. The tuples can be in both R and S so R-tuples and S-tuples have to be of the same type.
> ...additional terminology: > > DEFINITION: A join dependency is said to be a proper if it does not > hold that any of its components is a subset of another component. Proper is a nicer than non-trivial.
> Now the rule: > [quoted text clipped - 8 lines] > > I suspect with the new POOD the baby is quite safe. :-) DEFINITION: A schema is said to violate POOD if it contains relations R and S such that a component C of a proper join dependency of R has PoOD-chaff with a component D of a proper join dependency of S, and either R and S are different, C and D are different, or both.
Now let's see if there is meaning overlap, that is, is it ok to s/PoOD-chaff/meaning overlap/ back? To check that, I'll first have to make sure I understand the definition.
It looks to me that in order to have this problem, we need to have two relations R and S, where 5NF is relevant in both, with isomorphy between components of R and components of S (which is quite imaginable when integrating databases in the same business domain, e.g. after a merger, or with similar but distinct types of complex contracts). Am I correct thusfar?
I don't have much experience with reasoning about 5NF. Aside: The reason for that may be, that whenever I come across the not immediately obvious, I start stating easily understood natural language sentences, expressing the elementary facts we are modeling. From consensus on those we'll formalize to predicates and check again. From these predicates the already normalized relations follow (except the naming, but there is strong support in the predicates).
Jan Hidders - 25 Jan 2008 16:25 GMT > > ... I wanted to start simple ... > > Even though this suggests otherwise, I'll risk assuming > that you understood the coffee analogy, and snipped it > for focus. I *think* I understood it, but I didn't want to get into a debate about it's appropriateness. My position is that in this case we don't really know precisely what this "coffee" thing is anyway. It is a rather vague and informal notion, so we can take the liberty to redefined in a way that seems to most practical for the time being. What matters is if we end up with a normal form that makes sense, not if we have precisely established what meaning overlap is. The latter is more a philosophical question anyway. But if this confuses you, then I have no problem with calling it something else.
> Let's do some coffee-machine refactoring,.. > [quoted text clipped - 9 lines] > We'll have a definiton without attribute names (to be formulated > later if necessary). No. I'm sorry, but that is not acceptable. Not allowing renaming in the tuples changes the properties of the normal form. If you want we can change from the named to the unnamed perspective where tuples are ordered unlabeled tuples, but then you will need to replace the end with "after reordering the elements of the tuple", so not much would be gained. Note that the definition is talking about tuples, not tuple types. Redefining the notion of tuple type would nog change anything. Redefining the notion of tuple would, but as I already said, then there is not much simplification.
Btw. not allowing relabeling / reordering in inclusion dependencies provably restricts their expressive power in ways that cannot be solved by picking better attribute names in your schema. So I would not say that these are "trivialities that are caused by a clumsy definition of tuple types".
> 2. It is not at all clear a priori what if anything this > has to with meaning. To avoid distraction by connotation [quoted text clipped - 4 lines] > that would be loaded as well. It is the chaff to be filtered out > by the PoOD, so PoOD-chaff. Not *all* overlap has to be removed, so I wouldn't always want to call it chaff. It would also be nicer if the name reflected that there is some kind of overlap or that the relations partially coincide and that this is caused by dependencies. Could we settle on "two relations have dependency-induced overlap" for the time being?
> 1. 'renaming etc...' -- out > 2. PoOD-chaff [quoted text clipped - 11 lines] > can be in both R and S so R-tuples and S-tuples have to be > of the same type. You mean that the headers are isomorphic, not the relations themselves, right? But what do you mean by isomorphic? Can I replace attribute names or domains or both? Actually, now that I think about it, whatever your definition, this is only true if you assume that different domains are always disjoint.
> > ...additional terminology: > > > DEFINITION: A join dependency is said to be a proper if it does not > > hold that any of its components is a subset of another component. > > Proper is a nicer than non-trivial. Note that it's semantically different. Not all proper join dependencies are non-trivial, and not all non-trivial join dependencies are proper.
> > Now the rule: > [quoted text clipped - 17 lines] > s/PoOD-chaff/meaning overlap/ back? To check that, I'll > first have to make sure I understand the definition. Ok.
> It looks to me that in order to have this problem, > we need to have two relations R and S, where 5NF is [quoted text clipped - 4 lines] > with similar but distinct types of complex contracts). > Am I correct thusfar? Roughly. The "5NF is relevant" is a bit misleading. All that is required is that there are proper join dependencies, but these are always present. For example, any relation R(a,b,c) will have always the trivial join dependency JD [{a,b,c}], i.e., the join dependency with one component that contains all attribute names in the header. Although trivial, it is also proper. So you can violate the rule, even if there is no non-trivial join dependency. A simple example would be as follows: R(a,b) and S(c,d) with ID R[a,b] -> S[c,d], It violates the rule because you also have for S the trivial and proper JD [{c,d}]. So do you think that there is redundancy here that can be removed?
Btw. the "with isomorphism between components" can be simplified to "with components of equal size". After all, components are just sets of attribute names, so they have equal size iff they are isomorphic.
> I don't have much experience with reasoning about 5NF. I actually don't think that is necessary. Whether the relations are in 5NF or not is not really irrelevant. But you understand what lossless decompositions are, no? And I assume you also understand how they are implied by functional dependencies. So then you might start first with considering cases where all the JDs are implied by FDs. For example, say we have R(a,b,c) and S(d,e,f) with POOD chaff / dependency-induced overlap between R[a,b] and S[d,e], for example because we have ID R[a,b] -> S[d,e]. We would then violate the rule if there is for R a JD [{a,b}, {a,c}] and for S a JD [{d,e}, {d,f}]. These JDs are implied if we have for R the FD a -> b,c and for S the FD d -> e,f. So, in that case, do you think that there is redundancy that can be removed?
-- Jan Hidders
mAsterdam - 26 Jan 2008 21:06 GMT This is all way over my head. Jan, you may be used to sailing these waters; I am not. I think it is interesting, but very time-consuming. Am I talking rubbish? I try not to, but I don't know - nobody else chimes in. You keep replying, but hey, you are a nice guy :-)
I'll make some bold statements here anyway, but I can't keep this up.
>>> ... I wanted to start simple ... >> Even though this suggests otherwise, I'll risk assuming [quoted text clipped - 6 lines] > rather vague and informal notion, so we can take the liberty to > redefined in a way that seems to most practical for the time being... Informal as far as most of its nature is concerned, sure. Its existence, however, is neither vague nor informal. Without it, the coffee-machine loses its purpose. It is because of this, that pragmatical redefinitions must indeed be temporary and tracked.
From D&M's draft, ch 5 'The question of meaning', ( http://www.dbdebunk.com/page/page/622331.htm ): "... every table, be it a base table, a view, a query result, or whatever certainly does have an associated meaning. And, of course, users must be aware of those meanings if they are to use the database correctly and effectively."
>> Let's do some coffee-machine refactoring,.. >> [quoted text clipped - 15 lines] > with "after reordering the elements of the tuple", so not much would > be gained. Wether the relation between heading and tuples goes via names or ordering is relevant or not. If it is not I want it out of scope. Remarks by others make me doubt wether it is relevant or not.
Very early in this thread, ---- JOG wrote:
> DM Unseen wrote: >> My understanding is that the debate centers around typing. [quoted text clipped - 15 lines] > conflating the concept of attribute and type, and defined an attribute > as a (role-name, type) pair, but as it stands is seems very odd to me. ----
> Note that the definition is talking about tuples, not tuple > types. Redefining the notion of tuple type would nog change anything. That is simply not true. It just would not change the /wording/ of the definition. Redefining the notion of tuple type may change the notion of tuple.
> Redefining the notion of tuple would, but as I already said, then > there is not much simplification. [quoted text clipped - 4 lines] > not say that these are "trivialities that are caused by a clumsy > definition of tuple types". And the remedy is to include the attribute names in the tuple-types? So it seems. D&M draft: "...type-compatibility as we define it requires the two tables to have identical column names." I am trying to be careful about the side-effects. The example below shows that they may not be nice.
>> 2. It is not at all clear a priori what if anything this >> has to with meaning. To avoid distraction by connotation [quoted text clipped - 10 lines] > this is caused by dependencies. Could we settle on "two relations have > dependency-induced overlap" for the time being? ... Ok.
DEFINITION: Two relations R and S are said to have dependency-induced overlap if there is a dependency that requires that sometimes some tuples(1) of R are also in S.
(1) for some definition of tuples that allows restricted reshuffling of its values. To do.
>> Note: The isomorphy of R and S is implicit. The tuples >> can be in both R and S so R-tuples and S-tuples have to be >> of the same type. > > You mean that the headers are isomorphic, not the relations > themselves, right? No, that is not what I meant. In the vocabulary I am more or less used to: when the headers are isomorph, the relations are. Iff, even.
> But what do you mean by isomorphic? > Can I replace attribute names or domains or both? Attribute name replacement is irrelevant in this.
Maybe an example helps.
Three relations, BOM, MOB and WIRE. +---------------------------------------------+
|Name: BOM | |Predicate: Part PART# is part of IS_PART_OF# | |Key: (PART#, IS_PART_OF#) | +---------------------------------------------+
|Domain: |PTNUM | PTNUM | |Attribute: |PART# | IS_PART_OF# | +---------------------------------------------+
|Values: 2 1 | | 3 1 | | 4 1 | | 4 5 | +---------------------------------------------+
MOB is a view on BOM with the attributes switched.
+--------------------------------------------------------+
|Name: WIRES | |Predicate: We have a wire to connect L_PART# to R_PART# | |Key: (L_PART#, R_PART#) | +--------------------------------------------------------+
|Domain: |PTNUM | PTNUM | |Attribute: |L_PART# | R_PART# | +--------------------------------------------------------+
|Values: 2 3 | | 3 2 | | 4 3 | | 4 5 | +--------------------------------------------------------+
BOM and WIRES are isomorph, BOM and MOB are isomorph, WIRES and MOB are isomorph. The attributenames are irrelevant to this. Their isomorphism is determined by their tupletypes. All tuples of BOM, MOB and WIRES are of type (PTNUM, PTNUM).
This seems pretty clear and straightfoward to me, but it is not compliant with D&M's "...type-compatibility as we define it requires the two tables to have identical column names." Making it compliant I'll lose clarity for what? For /maybe/ a feature that can determine which tuples to touch based on their type alone.
> Actually, now that I think about > it, whatever your definition, this is only true if you assume that > different domains are always disjoint. How so?
>>> ...additional terminology: >>> DEFINITION: A join dependency is said to be a proper if it does not [quoted text clipped - 3 lines] > dependencies are non-trivial, and not all non-trivial join > dependencies are proper. Thanks, I have to read more carefully.
Trivial dependencies are the ones implied by the candidate keys. You include some of those in the problem space while leaving some non-trivial dependencies out based on wether any of its components is a subset of another component?
Let's see. This is because we are dealing with two relations S and R, and we have to make sure that we get all dependency-overlap, including the overlap of the trivial JD's.
Ok. I missed that.
>>> Now the rule: >>> DEFINITION: A schema is said to violate POOD if it contains relations [quoted text clipped - 31 lines] > with one component that contains all attribute names in the header. > Although trivial, it is also proper. Yep, I think I get that.
> So you can violate the rule, even > if there is no non-trivial join dependency. A simple example would be [quoted text clipped - 6 lines] > "with components of equal size". After all, components are just sets > of attribute names, so they have equal size iff they are isomorphic. Now I'm lost again.
>> I don't have much experience with reasoning about 5NF. > > I actually don't think that is necessary. Whether the relations are in > 5NF or not is not really irrelevant. But you understand what lossless > decompositions are, no? And I assume you also understand how they are > implied by functional dependencies. Up to BCNF.
> So then you might start first with > considering cases where all the JDs are implied by FDs. For example, [quoted text clipped - 4 lines] > if we have for R the FD a -> b,c and for S the FD d -> e,f. So, in > that case, do you think that there is redundancy that can be removed? Jan Hidders - 27 Jan 2008 12:20 GMT mAsterdam schreef:
> This is all way over my head. Jan, you may be used to > sailing these waters; I am not. I think it is interesting, [quoted text clipped - 3 lines] > > I'll make some bold statements here anyway, but I can't keep this up. Your non-bold statements are also be highly appreciated. :-)
Too keep things manageable I'll try to trim the discussion heavily and keep what I think is really essential. So if I snip any of your comments it is not because I don't think they are interesting, but just that at the moment I think they are not at the core of the issue.
> >>> ... I wanted to start simple ... > >> Even though this suggests otherwise, I'll risk assuming [quoted text clipped - 12 lines] > It is because of this, that pragmatical redefinitions > must indeed be temporary and tracked. Sure, we agree on that. It's just that I think it is sufficient to say at the beginning something like "for this discussion we are going to define 'overlapping meaning' as .." and you want to be more careful and really use another word. As I already said, I'm fine with that, so I don't think more discussion is needed here.
[... very big and painful snip ...]
> DEFINITION: Two relations R and S are said to have dependency-induced > overlap if there is a dependency that requires that sometimes some > tuples(1) of R are also in S. > > (1) for some definition of tuples that allows restricted > reshuffling of its values. To do. The only way to achieve (1) so that it also takes all normal inclusion dependencies into account is to define tuples as something equivalent to bags of values. Such an operation on its internal organs is going to change the relational model beyond recognition. I'm going to strongly insist that we stick to the classical definitions of the named perspective and state in the definition that we are talking about inclusion up to relabeling:
DEFINITION: Two relations R and S are said to have dependency-induced overlap if there is a dependency that requires that sometimes some tuples of R are also in S after renaming the attribute names.
[... another big and painful snip ...]
> >>> ...additional terminology: > >>> DEFINITION: A join dependency is said to be a proper if it does not [quoted text clipped - 8 lines] > Trivial dependencies are the ones implied by the > candidate keys. Not precisely. Trivial dependencies are those that hold in any schema. For example the FK a,b -> a is always true as long as you have attributes 'a' and 'b' in the header. Another example is the JD [ {a,b}, {b,c}, {a,b,c} ] if the attributes in the header are {a,b,c}. This is always a lossless decomposition, but also a very redundant one. Another is the JD [ {a,b,c} ].
> You include some of those in the problem space > while leaving some non-trivial dependencies out based on wether [quoted text clipped - 3 lines] > S and R, and we have to make sure that we get all > dependency-overlap, including the overlap of the trivial JD's. Yes. The intuition is that if there is a JD the relation is in fact a combination of one or more predicates namely the ones that correspond to each component of the JD. It is for these predicates that we want to check overlap. We want to explicitly include the case where it is actually just one, so we also want to consider the trivial JD [ {a, b, c} ] for the header {a, b, c}.
In that terminology you can also perhaps understand why we only care about proper JDs. Suppose we have JD [ {a, b}, {b, c} ]. Then it follows that the following JD also holds: JD [ {a}, {a, b}, {b, c} ]. I think you'll understand that if the first defines a lossless decomposition, then so does the second for the simple reason that it contains at least as much information as the first. But it is not true that the component {a} also necessarily corresponds to some meaningful predicate. In fact, it probably doesn't. So that is why we want to exclude those.
Having said that, there are actually more JD's that need to be excluded than I initially thought. For example, if the JD [ {a, b}, {b, c} ] holds then also the JD [ {a, c}, {a, b}, {b, c} ] holds. Also here there is not necessarily a meaningful intuitive predicate associated with the component {a, c}. So, at the risk of causing you a serious migraine I'm going to have to tweak my definitions a little.
DEFINITION: A join dependency is said to be minimal if there is not another join dependency with a set of components that is a proper subset of the set of components of the first.
Examples: If JD [ {a, b}, {b, c} ] holds then JD [ {a, b}, {b, c}, {a, c} ] is not minimal. The JD [ {a}, {a, b} ] is never minimal because the trivial JD [ {a, b} ] always holds.
Note that a minimal join dependency is always also a proper join dependency.
The new rule then becomes:
DEFINITION: A schema is said to violate POOD if it contains relations R and S such that a component C of a minimal join dependency of R overlaps in meaning with a component D of a minimal join dependency of S, and either R and S are different, C and D are different, or both.
I just realized by rereading your comments that there might be some confusion on the notion of "component". From now on I'll strictly use the word component for a set of attribute names in a join dependency. So I would like to make the wording in the rule a bit more explicit: (changed parts are emphasized with "_")
DEFINITION: A schema is said to violate POOD if it contains relations R and S such that _the projection of R_ on a component C of a minimal join dependency of R overlaps in meaning with _the projection of S_ on a component D of a minimal join dependency of S, and either R and S are different, C and D are different, or both.
> >> It looks to me that in order to have this problem, > >> we need to have two relations R and S, where 5NF is [quoted text clipped - 26 lines] > > Now I'm lost again. My fault. I thought you were talking about components of JDs.
> >> I don't have much experience with reasoning about 5NF. > > [quoted text clipped - 4 lines] > > Up to BCNF. Good. Let's for the moment restrict ourselves to just CKs. So say we have a relation R(a,b,c,d) with a CK {a,b}. What are the minimal join dependencies that hold?
And what if we have the CKs {a,b} and {b,c}?
So do you see for which projections we need to check overlap with other projections of other relations?
-- Jan Hidders
Jan Hidders - 27 Jan 2008 12:45 GMT > mAsterdam schreef: > [quoted text clipped - 128 lines] > overlaps in meaning with a component D of a minimal join dependency of > S, and either R and S are different, C and D are different, or both. Oops. The "overlaps in meaning" should of course read "has dependency- induced overlap". Please apply Hanlon's razor. ;-)
-- Jan Hidders
David Cressey - 27 Jan 2008 12:49 GMT > mAsterdam schreef: > > This is all way over my head. Jan, you may be used to > > sailing these waters; I am not. I think it is interesting, > > but very time-consuming. > > Am I talking rubbish? I try not to, but I don't know - nobody > > else chimes in. You keep replying, but hey, you are a nice guy :-) It's even further over my head.
> > >>> DEFINITION: A join dependency is said to be a proper if it does not > > >>> hold that any of its components is a subset of another component. [quoted text clipped - 14 lines] > This is always a lossless decomposition, but also a very redundant > one. Another is the JD [ {a,b,c} ]. I really need to understand the word "trivial" as used above in a more careful manner than I do.
Unfortunately, my intuitive grasp of the word "trivial" comes from hearing it used by engineers rather than mathematicians. When engineers use the word "trivial", there is almost always an overtone of snobbism present. A problem is "trivial" if it's unworthy of the attention of someone so important as the person speaking. It should be assigned to a "junior" person.
In mathematics, the "trivial case" is almost always easy, but easiness is not what relegates it to triviality. It's more like "orthogonality to the peculiarities of the item under discussion", or something like that.
So I need to understand "trivial dependencies", in a really tight fashion. My loose understanding of one form of trivial dependence is that a component of a composite key is trivially dependent on that key. In this case, "trivial" means, to me, that the dependency provides no additonal information about anything. The sort of thing that, in common parlance, would draw the response, "well, duh!"
But I don't have a general understanding of what makes trivial dependencies trivial.
Bob Badour - 27 Jan 2008 13:28 GMT >>mAsterdam schreef: >> [quoted text clipped - 50 lines] > But I don't have a general understanding of what makes trivial dependencies > trivial. I might suggest: "Inferable from the definition."
Presumably one can infer (calculate/prove) from the definition of FK that: FK a,b -> a
Or did he mean FD above instead of FK?
Jan Hidders - 27 Jan 2008 20:15 GMT > >>mAsterdam schreef: > [quoted text clipped - 52 lines] > > I might suggest: "Inferable from the definition." Yes. A good suggestion.
> Presumably one can infer (calculate/prove) from the definition of FK that: > FK a,b -> a > > Or did he mean FD above instead of FK? Yes, I did. Apologies for the typo.
-- Jan Hidders
Marshall - 27 Jan 2008 18:35 GMT > "Jan Hidders" <hidd...@gmail.com> wrote in message > > But I don't have a general understanding of what makes trivial dependencies > trivial. My understanding of "trivial" is that it has a very specific meaning in the context of FDs, namely that A -> B is trivial iff B is a subset of A.
Marshall
Jan Hidders - 27 Jan 2008 20:51 GMT > > mAsterdam schreef: > > > This is all way over my head. Jan, you may be used to [quoted text clipped - 26 lines] > I really need to understand the word "trivial" as used above in a more > careful manner than I do. Ok. I'll try to give it a shot.
So the general meaning is that a dependency is trivial if it holds in any schema in which it makes sense (i.e. that defines the used relation and attribute names) and independent of which constraints / dependencies have been specified.
For functional dependencies, for example, these are exactly all FDs X -
> Y such that Y is a subset of X. So FD a,b -> b is such an FD and FD a -> b is not. So the trivial FDs always hold, and the non-trivial ones only hold if they have been explicitly specified or if they follow from constraints that have been explicitly specified. The intuition behind the term "trivial" is that if you write down a complete list of the dependencies that hold, then you can always write down the trivial ones first, because these will hold, no matter what the specific situation is.
This notion has been generalized for all kinds of dependencies. For multi-value dependencies (MVDs) this characterization is a bit more complicated. An MVD X ->> Y is trivial iff Y is a subset of X or the union of X and Y is the header of the relation. So for R(a,b,c) the MVD a ->> b,c always holds.
For join dependencies it actually gets easier again. We know that a JD [ X1, .. , Xn ] is trivial iff one of the components Xi is the header of the relation. Clearly if one of the components of your decomposition is the whole original relation, the decomposition is going to be lossless.
> So I need to understand "trivial dependencies", in a really tight fashion. > My loose understanding of one form of trivial dependence is that a component > of a composite key is trivially dependent on that key. In this case, > "trivial" means, to me, that the dependency provides no additonal > information about anything. The sort of thing that, in common parlance, > would draw the response, "well, duh!" Yes, that intuition is roughly correct. And you can generalize this even beyond the types of dependencies I mentioned. As usually defined, all dependencies are a specific syntactical subclass of statements / formulas in first order logic. All trivial dependencies are then defined as the tautologies within that subclass.
Does that shed some light?
-- Jan Hidders
David Cressey - 27 Jan 2008 22:57 GMT > Yes, that intuition is roughly correct. And you can generalize this > even beyond the types of dependencies I mentioned. As usually defined, > all dependencies are a specific syntactical subclass of statements / > formulas in first order logic. All trivial dependencies are then > defined as the tautologies within that subclass.
> Does that shed some light? Yes, it does. Thanks.
|
|