Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / February 2008

Tip: Looking for answers? Try searching our database.

Principle of Orthogonal Design

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
JOG - 09 Jan 2008 01:51 GMT
I was wondering what your current stances towards the principle of
current design is cdt - info about the POOD is actually pretty sparse
on google, which has not helped my own understanding. I gather that
Date has realigned his opinion - although what to I know not - and
that Darwen rejected the original POOD paper outright given that
McGovern posits that:

R1 { X INTEGER, Y INTEGER }
R2 { A INTEGER, B INTEGER }

violates the principle, whatever the relations' attribute names.
Instinctively it does seem rather odd that a predicates such as:

* on Day:X the shop had noCustomers:Y
* on Roll:A, the dice showed the Number:B

cannot share the same database. Have I interpreted the debate
correctly? Any insights or corrections are, as ever, appreciated -
POOD is certainly thought provoking, and the concept that an update
need not require specifcation of a table name is an interesting one.
DM Unseen - 09 Jan 2008 12:49 GMT
> I was wondering what your current stances towards the principle of
> current design is cdt - info about the POOD is actually pretty sparse
[quoted text clipped - 16 lines]
> POOD is certainly thought provoking, and the concept that an update
> need not require specifcation of a table name is an interesting one.

My understanding is that the debate centers around typing.

Date advocates strong typing: ie X is of type "shop number" and A is a
"roll number", both represent different aspects of the UOD so they
have a different type (I disregard subtyping here for simplicity),
*even if* the actual underlying representations (e.g. natural numbers)
would be the same. Having different types, means the type engine can
distinguish them.

DM Unseen
JOG - 10 Jan 2008 01:31 GMT
> > I was wondering what your current stances towards the principle of
> > current design is cdt - info about the POOD is actually pretty sparse
[quoted text clipped - 25 lines]
> would be the same. Having different types, means the type engine can
> distinguish them.

Hmmm, I see. I have always viewed a type as a set of values and a
collection of allowable operations. If there exist two data types
where these values and operations are the same, well as far as I can
logically see, that makes them the same thing (even if they have two
references).

Now I could follow the concept more if Date & McGovern weren't
conflating the concept of attribute and type, and defined an attribute
as a (role-name, type) pair, but as it stands is seems very odd to me.

> DM Unseen
TroyK - 09 Jan 2008 20:37 GMT
> I was wondering what your current stances towards the principle of
> current design is cdt - info about the POOD is actually pretty sparse
[quoted text clipped - 16 lines]
> POOD is certainly thought provoking, and the concept that an update
> need not require specifcation of a table name is an interesting one.

I'm sure you've run across this exchange (since it seems to be the
source of the example cited), but: http://www.dbdebunk.com/page/page/3010532.htm

From Pascal, addressing McGovern: "It is quite possible that the
difference between you and CJD stems from the difference between his
header/body view of a relation, which you (and recently me) do not
espouse, and therefore differing definitions of a tuple."

Regarding the example you gave above, it seems the disagreement may be
stemming, as you indicate, from a confusion over whether the types for
the relations' attributes are declared from a syntactic domain or from
a semantic domain. The example looks, to me, like the former. I would
expect if it were the latter, domain names would more closely reflect
that.

TroyK
TroyK - 09 Jan 2008 21:01 GMT
> I was wondering what your current stances towards the principle of
> current design is cdt - info about the POOD is actually pretty sparse
[quoted text clipped - 16 lines]
> POOD is certainly thought provoking, and the concept that an update
> need not require specifcation of a table name is an interesting one.

I thought I'd work up an example that illustrates part of McGovern's
argument (as I understand it). Comments and corrections are most
welcome:

Assuming we want to represent employees and their status as either
salaried, commissioned, or both, we might come up with the following
design (assuming the most obvious interpretations):

Table: Emp
Emp#  IsSalaried  IsCommissioned
====  ----------  --------------
1     Y           N
2     N           Y
3     Y           Y

Another possible design could be:
Table: Emp
Emp#
====
1
2
3

Table: SalariedEmp (FK, Emp# references Emp.Emp#)
Emp#
====
1
3

Table: CommissionedEmp (FK, Emp# references Emp.Emp#)
Emp#
====
2
3

Although we can derive the same information from each design, the
former is to be prefered because the latter violates POOD (two tables
with identical headers, even semantically speaking). Additionally, it
requires that we inspect the tables' names in order to gather the full
meaning of the propositions.

TroyK
JOG - 10 Jan 2008 02:23 GMT
> > I was wondering what your current stances towards the principle of
> > current design is cdt - info about the POOD is actually pretty sparse
[quoted text clipped - 59 lines]
>
> TroyK

I like this example Troy - It is both concise and clear. I was just
mulling and came up with a possible schema for a "kennel club", that
might help clarify some of the issues I'm seeing.

owners = {name, age, joined} PK(name)
dogs = {name, age, joined}  PK(name)
ownership = {owner, dog}  FK(owner, owners.name) FK(dog, dogs.name)

As it stand this is clearly not in POOD, and a proposition such as
<Bess, 18, 01/08/08> is ambiguous. So it got me thinking about how we
know in real life what a proposition refers to.

1) I know the context implicitly of <Bess, 18, 01/08/08> (from
previous conversation, who i'm talking to, etc)
2) Surrounding text: <Bess, 18, 01/08/08> = "The dog called Bess is 18
years old and joined the kennel club on 01/08/08"

I don't think we can deign to rely on a computer ever being able to
cope with implict context (leave that to the CYC people to bang there
heads against ad infintum), so that appears to leave me with option 2.
In turn, I see two ways of integrating this explicit surrounding
context.

1) Via role names so, the proposition becomes:
There is a dog with (dogsName:Bess) of (age:18) and who (joined:
01/08/08)

2) Via another attribute:
There (is_a:dog) with (name:Bess) of (age:18) and who (joined:
01/08/08)

Which is preferable, and why that would be so I am uncertain. It would
be easy to say "this is all just a design decision", but I'm starting
to think there might be some sort of salient lesson in there.
Somewhere. Maybe.
DM Unseen - 10 Jan 2008 12:32 GMT
> > > I was wondering what your current stances towards the principle of
> > > current design is cdt - info about the POOD is actually pretty sparse
[quoted text clipped - 97 lines]
>
> - Tekst uit oorspronkelijk bericht weergeven -

Could be something like

joiners= {name, age, joined, type} PK(name,type)
where type in {'Owner','Dog'}

ownership = {owner, dog}  FK(owner, joiners.name AND
joiners.type='Owner') FK(dog, joiners.name AND joiners.type='Dog')

This could still be generalised further bit I think this would be in
POOD.

And yes POOD implies "semantical" types and not syntactical.
This could also imply that we are dropping too much information in the
conversion from semantical (conceptual) models like NIAM/ORM to
logical models like the relational models.

DM Unseen
TroyK - 10 Jan 2008 20:13 GMT
> > > > I was wondering what your current stances towards the principle of
> > > > current design is cdt - info about the POOD is actually pretty sparse
[quoted text clipped - 117 lines]
>
> - Show quoted text -

You make a good point about the possible lossy conversion from
conceptual to logical (inasmuch as "encoding" information in the
relations' names implies this loss).

I'm wondering if we should consider this type of POOD violation a non-
issue, though. In SQL, we always need to specify which table we're
talking about (both for updates and data retrieval), so that forces
one to be specific about the propositions they're interested in. Even
in a more pure RM implementation, we would need to reference which
relvar in particular we're updating (or deriving from).

In practice, I use POOD more to inform my design decisions around
constraint-based table/view partitioning.

TroyK
Brian Selzer - 13 Jan 2008 13:54 GMT
>> > I was wondering what your current stances towards the principle of
>> > current design is cdt - info about the POOD is actually pretty sparse
[quoted text clipped - 57 lines]
>> requires that we inspect the tables' names in order to gather the full
>> meaning of the propositions.

I think that the requirement that we inspect table names comes from the
correlation between RM and predicate logic.  In predicate logic, there are
predicate symbols and there are individual symbols, and under an
interpretation, both the predicate symbols and the individual symbols are
assigned meaning.  The two relations,

sine {x, y}

and

cosine {x, y}.

have the same heading, but have totally different meanings--even though the
same individuals are exemplified in both whenever (x - pi / 4) modulus pi is
zero.  The tuple, {pi / 4, sqrt(2) / 2}, appears in both relations yet has a
different meaning assigned to it from each predicate.  Interestingly,
though, there is still only one individual represented by that tuple even
though it appears in both relations.

>> TroyK
>
[quoted text clipped - 33 lines]
> to think there might be some sort of salient lesson in there.
> Somewhere. Maybe.
JOG - 13 Jan 2008 14:14 GMT
> >> > I was wondering what your current stances towards the principle of
> >> > current design is cdt - info about the POOD is actually pretty sparse
[quoted text clipped - 71 lines]
>
> have the same heading, but have totally different meanings

Interesting example. So sine and cosine as stated would break POOD
(which appears intended to assist with view updates). To avoid
breaking the POOD, we'd have either:

operation{ input:x, output, y, type:t }
t E {sin, cos}

or,

sin  {sin_input:x, sin_output:y}
cos {cos_input:x, cos_output:y}

Well, my instinct tells me that the latter is kludgy given x plays
exactly the same role in both predicates, and as such one would expect
them to have exactly the same name (and who knows, that might be
useful information). But then the former would generate enormous
tables given it would have to incorporate every conceivable operation
with an input and an output of the same type as used by sin and cos.
Practically that appears to be undesirable. Theoretically I guess
there is nothing wrong with it.

> --even though the
> same individuals are exemplified in both whenever (x - pi / 4) modulus pi is
[quoted text clipped - 40 lines]
> > to think there might be some sort of salient lesson in there.
> > Somewhere. Maybe.
Brian Selzer - 13 Jan 2008 16:39 GMT
>> >> > I was wondering what your current stances towards the principle of
>> >> > current design is cdt - info about the POOD is actually pretty
[quoted text clipped - 76 lines]
> Interesting example. So sine and cosine as stated would break POOD
> (which appears intended to assist with view updates).  ...

Wasn't POOD intended to eliminate redundancy across relations?  For example,
in the two following relations,

ES {EmployeeNumber, EmployeeName, Salary}
EC {EmployeeNumber, EmployeeName, Commission}

where EmployeeNumber is the key, the dependent attribute, EmployeeName, is
redundant for employees that receive both salary and commission.

So because EmployeeNumber --> EmployeeName appears in both and "means the
same thing" in both, it is a violation of POOD.  I submit that it would not
be a violation of POOD if there were a database constraint,

IS_EMPTY(ES JOIN EC {EmployeeNumber})

Which would mean that employees couldn't receive both a salary and a
commission.  Since it would not then be possible for an individual to be
exemplified by both ES and EC, there isn't any overlapping meaning, so there
isn't a POOD violation.

For a similar example, consider the following database schemata,

Case 1:

PS {Part#, Serial#, Warehouse, Bin}
 KEY {Part#, Serial#}

PSL {Part#, Serial#, Warehouse, Bin, Lot#}
 KEY {Part#, Serial#}

IS_EMPTY(PS JOIN PSL {Part#, Serial#})

In this case, PS is for the location of parts that are being tracked by
serial number only, whereas PSL is for the location of parts that are being
tracked by both lot and serial number.

Note that {Part#, Serial#} --> {Warehouse, Bin} appears in both.

Case 2:

PS {Part#, Serial#, Warehouse, Bin, HasLot}
 KEY {Part#, Serial#}
 HasLot IN {'Yes', 'No'}

PSL {Part#, Serial#, Lot#}
 KEY {Part#, Serial#}
 FOREIGN KEY {Part#, Serial#} REFERENCES PS

In this case, PS is for the location of parts that are being tracked by
serial number or by lot number and serial number, PSL is for recording the
lot numbers for parts that are being tracked by both lot and serial number,
and there is an explicit indication in HasLot as to whether there should be
a tuple in PSL for parts that are being tracked by both lot and serial.

I think that Case 1 does not violate POOD due to the database constraint,

IS_EMPTY(PS JOIN PSL {Part#, Serial#}).

This database constraint limits the possible interpretations of the
predicates of PS and PSL so that the meanings of {Part#, Serial#, Warehouse,
Bin} in PS and PSL must be different.

I prefer case 1 because in it the distinction between parts that are being
tracked by serial number only and parts that are being tracked by both lot
number can be determined due to the schema only, whereas in case 2, that
distinction can only be interpreted by examining the value of HasLot.

> ...  To avoid
> breaking the POOD, we'd have either:
[quoted text clipped - 62 lines]
>> > to think there might be some sort of salient lesson in there.
>> > Somewhere. Maybe.
Jonathan Leffler - 14 Jan 2008 06:35 GMT
>[...]
>> I think that the requirement that we inspect table names comes from the
[quoted text clipped - 22 lines]
> sin  {sin_input:x, sin_output:y}
> cos {cos_input:x, cos_output:y}

In its strongest form, POOD discounts names of attributes as well as
names of relations - so the 'or' option would not pass muster.  The
first option mostly passes POOD - but not the real world test.

The general intention is that presented with a tuple, using just the
types, the DBMS should be able to determine which table the row should
be inserted in.

The sine and cosine tables are fairly compelling examples.

Another that I find compelling as a counter-example (illustrating that
POOD is not the be-all and end-all of DB design) is a Bill of Materials.
 It is kinda important to know which value is the 'part that is built'
and which is the 'part that is used to build'.  And both, by definition,
have to be from the same domain of part numbers.

[Aside: I think there's a typo in the 'operation' example; the comma
separating output and y should probably be a colon.  And then I'm not
clear but it seems that the x and y in the examples are types, whereas
in the original mention of sine and cosine, they were attribute names of
a common type.]

> Well, my instinct tells me that the latter is kludgy given x plays
> exactly the same role in both predicates, and as such one would expect
[quoted text clipped - 4 lines]
> Practically that appears to be undesirable. Theoretically I guess
> there is nothing wrong with it.

As an occasionally useful twist on DB design, I consider POOD worth
having at the back of my mind.  However, I don't think it is practically
of much significance -- even if the DBMS were able to tell where to
insert data when presented with a tuple.  In my opinion, ignoring
attribute names is not healthy.

Signature

Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler@earthlink.net, jleffler@us.ibm.com
Guardian of DBD::Informix v2007.0914 -- http://dbi.perl.org/

publictimestamp.org/ptb/PTB-2270 haval 2008-01-14 06:00:04
A3E3F8DD3E3BD5472410A4CC430A01CE

DM Unseen - 14 Jan 2008 10:17 GMT
> >> > I was wondering what your current stances towards the principle of
> >> > current design is cdt - info about the POOD is actually pretty sparse
[quoted text clipped - 116 lines]
>
> - Tekst uit oorspronkelijk bericht weergeven -- Tekst uit oorspronkelijk bericht niet weergeven -

Mmm,

I'm not sure using functions is going to help us to understand POOD,
beacause it is linked to the semantical meaning of types, not the
syntactical meaning. This is absent in mathematical functions.

DM Unseen
using relations to simulatie functions
Brian Selzer - 17 Jan 2008 14:14 GMT
On 13 jan, 14:54, "Brian Selzer" <br...@selzer-software.com> wrote:
[snip]

>> I think that the requirement that we inspect table names comes from the
>> correlation between RM and predicate logic. In predicate logic, there are
[quoted text clipped - 17 lines]
>> though, there is still only one individual represented by that tuple even
>> though it appears in both relations.

[snip]

>Mmm,
>
[quoted text clipped - 4 lines]
>DM Unseen
>using relations to simulatie functions

The functions were simple examples that everyone here should have been able
to recognize, but the properties exhibited by these mathematical functions
can also be exhibited by relations that are not also mathematical functions.
For example, suppose that you have a relation for which machines an employee
knows how to run, and an relation for which machines each employee is
actually running:

MachinesEmployeesAreTrainedOn {Employee, Machine}

and

MachinesEmployeesAreRunning {Employee, Machine}

where in each case Employee draws its values from the domain Employees,
Machine draws its values from the domain Machines, and the key is the entire
heading.

An employee can be trained on zero to many machines, and an employee can be
running zero to many machines.  An employee can be trained on a machine that
they are not running--there can be a tuple in MachinesEmployeesAreTrainedOn
without a tuple in MachinesEmployeesAreRunning, an employee may be being
trained on a machine--there can be a tuple in MachinesEmployeesAreRunning
without one in MachinesEmployeesAreTrainedOn, or an employee may be running
a machine that they're trained on--there can be a tuple in
MachinesEmployeesAreRunning that is identical to a tuple in
MachinesEmployeesAreTrainedOn.

As in the function example, the above relations have the same heading, but
different meanings.  So if the relation name corresponds to the predicate
symbol and the values from the domains correspond to individual symbols,
then under an interpretation, only part of the meaning can come from
individual assignments: the balance must be from the assignment of meaning
to the predicate symbol.  This calls into question the idea that it should
be possible to determine which relation an inserted tuple is destined for.
Jan Hidders - 17 Jan 2008 17:08 GMT
> This calls into question the idea that it should
> be possible to determine which relation an inserted tuple is destined for.

Apologies for being lazy, but could someone explain to me in a
nutshell why this should be possible at all? At first sight this looks
like complete nonsense to me.

-- Jan Hidders
JOG - 18 Jan 2008 01:57 GMT
> > This calls into question the idea that it should
> > be possible to determine which relation an inserted tuple is destined for.
>
> Apologies for being lazy, but could someone explain to me in a
> nutshell why this should be possible at all? At first sight this looks
> like complete nonsense to me.

Well, as far as I gather, being able to determine a unique predicate
for any proposition being inserted into the database is desirable in
order to allow view updates to be more easily be translated to changes
in underlying base relations.

I cannot claim to fully understand the reasoning behind this, but view
updating hence appears to have been POOD's underlying motivation.
There is a relatively old draft paper by Date and McGoveran at living
at
http://www.dbdebunk.com/page/page/622331.htm which might be more
illuminating.

> -- Jan Hidders
paul c - 18 Jan 2008 03:13 GMT
...
>> Apologies for being lazy, but could someone explain to me in a
>> nutshell why this should be possible at all? At first sight this looks
[quoted text clipped - 5 lines]
> in underlying base relations.
> ...

POOD seems too-little discussed to me.  Maybe POFN too.

I'm glad the example was about machines because I like the mechanical
view, saves a lot of time when I've forgotten to charge the batteries
for my mystic-meter.  In an ideal world where every design decision was
considered against every attribute/role name, I think the example would
be considered a straw man - if it were up to me, I wouldn't allow those
two tables/relvars to use the same attribute names.

I realize there are people here who might say what's wrong with same
role names because the RM doesn't forbid them.  All that proves is that
there are naive people here as well as elsewhere!  I was just laughing
over one of CJ Date's latest examples about the tuple that states that
the supplier is not in Paris.  I think one must cater one's design
towards the mechanical advantage a "machine" offers and using the same
names seems a bit thoughtless to me.

It might look neat for defining terse union views without having to
resort to some kind of 'rename' macro operator baggage but I have a
feeling avoiding a rename is too cute and will bite one back, sooner or
later.  Maybe rename also gives some kind of legitimacy for inserting to
both 'sides' of a union view?
Jan Hidders - 18 Jan 2008 12:51 GMT
> > > This calls into question the idea that it should
> > > be possible to determine which relation an inserted tuple is destined for.
[quoted text clipped - 13 lines]
> athttp://www.dbdebunk.com/page/page/622331.htmwhich might be more
> illuminating.

It mentions the view update problem but doesn't really explain how it
is connected.

Anyway, the stronger POOD that requires that headers are distinct
sounds like nonsense to me. Why would R(A, B) be a worse design than
R(R_A, R_B)?

The weaker POOD looks more interesting to me. I even found a published
paper about it:

http://www.wseas.us/e-library/conferences/2006madrid/papers/512-451.pdf

What's interesting is that it seems to address the problem of
normalization at the schema level, so not just restricted to one
relation, as is usual in normalization theory. You might even
formulate a new normal form at that level:

 A schema is said to be in HNF if all database dependencies follow
logically from the dependencies at relation level (so domain, tuple
and relation constraints) and the inclusion dependencies.

So basically it says that inclusion dependencies are the only needed
dependencies at database level. Note the similarity with 5NF (all join
dependencies logically follow from the key dependencies) which says
that the key constraints are the only needed constraints at relation
level. That similarity is not a coincidence. In both cases we are
eliminating redundancy. To get to a real normalization theory you
would have to identify more concretely the type of database
constraints / dependencies we are talking about.

What POOD has to do with HNF is left to the reader as an exercise. ;-)

-- Jan Hidders
mAsterdam - 18 Jan 2008 14:20 GMT
>>>> This calls into question the idea that it should
>>>> be possible to determine which relation an inserted tuple is destined for.
[quoted text clipped - 23 lines]
>
> http://www.wseas.us/e-library/conferences/2006madrid/papers/512-451.pdf

Unfortunately the link times out.

> What's interesting is that it seems to address the problem of
> normalization at the schema level, so not just restricted to one
[quoted text clipped - 15 lines]
>
> What POOD has to do with HNF is left to the reader as an exercise. ;-)

--
What you see depends on where you stand.
Jan Hidders - 18 Jan 2008 14:51 GMT
> >>>> This calls into question the idea that it should
> >>>> be possible to determine which relation an inserted tuple is destined for.
[quoted text clipped - 25 lines]
>
> Unfortunately the link times out.

Hmm, not for me. But to help you out:

http://www.adrem.ua.ac.be/bibrem/pubs/pood.pdf

-- Jan Hidders
mAsterdam - 18 Jan 2008 19:49 GMT
...
>>> Anyway, the stronger POOD that requires that headers are distinct
>>> sounds like nonsense to me. Why would R(A, B) be a worse design than
[quoted text clipped - 7 lines]
>
> http://www.adrem.ua.ac.be/bibrem/pubs/pood.pdf

Thank you for helping me out by providing the document.

I started reading "Extended Principle of Orthogonal Design"
by Erki Eessaar

User defined datatypes (UDT) give the designer of a database
more decisions, more room for wrong decisions.
Row-, array-, reference-, multiset collection types
- choices, choices, choices.
How to deal with that freedom?
Enter the Principle of Orthogonal Design (POD):
If you have a tuple-type, make sure to have only one base
relvar for recording that tuple-type.
Nonloss decomposition is explained briefly.

And here is where I lost Date & McGoveran
(or Erki Eessaars interpration, I don't know) in
Chapter 2):
"The meanings of R1A(t) and R1B(t) are said to overlap
iff it is possible to construct some tuple t so that R1A(t)
and R1B(t) are both true".

Note that it does /not/ say 'all tuples t', but 'some tuple t',
So it does, in particular,  /not/ exclude the possibilty
that R1A(t') is true and R1B(t') is not true or vice versa,
in other words, that R1A and R1B may be mutually independent.
With this trick, meaning is forced into synonymity with the
signature of the relation. I call BS. Guidelines to reduce
design choices are welcome, but not this one.

The extended version (Chapter 4.2.) implicitly uses the
same criterion for detecting "overlapping meanings".

Brian Selzer (elsewhere in this thread) did a good job by
providing a sensible situation with two relations with the
same signature but distinct, independent meanings.

Let's not extend this.

--
What you see depends on where you stand.
Jan Hidders - 19 Jan 2008 12:06 GMT
> ...
> >>> Anyway, the stronger POOD that requires that headers are distinct
[quoted text clipped - 22 lines]
> If you have a tuple-type, make sure to have only one base
> relvar for recording that tuple-type.

Hmm, I would call that the strong POOD (or strong POD), and that, I
would agree, seems to make no sense.

> Nonloss decomposition is explained briefly.
>
[quoted text clipped - 11 lines]
> With this trick, meaning is forced into synonymity with the
> signature of the relation.

No, no, not exactly. It could be that there are tuple constraints that
don't allow you to construct the tuple in question. Say you have
R(a,b) and S(a,b) and for R the tuple constraint that b > 5 and for S
the tuple constraint that b <= 5. In that case you cannot construct a
tuple that is both in R and S, but they still have the same header.

What they probably should have said is: R and S are said to have
overlapping meaning if it does not follow from the constraints /
dependencies that the intersection of R and S is always empty.

Note that I'm talking about dependencies and not the relvar predicates
as they do, but the intent is the same. So this, or rather the design
principle based on this definition, I would call the weak POOD. Does
that make more sense to you?

-- Jan Hidders
mAsterdam - 19 Jan 2008 15:55 GMT
>> ...
>>>>> Anyway, the stronger POOD that requires that headers are distinct
[quoted text clipped - 10 lines]
>> I started reading "Extended Principle of Orthogonal Design"
>> by Erki Eessaar
below: EE

>> User defined datatypes (UDT) give the designer of a database
>> more decisions, more room for wrong decisions.
[quoted text clipped - 7 lines]
> Hmm, I would call that the strong POOD (or strong POD), and that, I
> would agree, seems to make no sense.

We agree on that, that is clear.
So ok, but - from the huge category of easily asked but
hard to answer questions: - why?

Why doesn't the (strong) PoOD make sense to you?

It does not make sense to me, and I am putting some effort
into finding out why it doesn't. The track I am on now gives
rejection because of inappropriate equalization of meaning
with form (i.c. the heading).
This track does not give me any distinction between the validity
of the strong (EE: original) and the weak (EE: extended)
PoOD, because both build on the sentence quoted hereunder.

>> EE: Chapter 2:
>> "The meanings of R1A(t) and R1B(t) are said to overlap
[quoted text clipped - 13 lines]
> the tuple constraint that b <= 5. In that case you cannot construct a
> tuple that is both in R and S, but they still have the same header.

Which could be rephrased as different B domains for R and S (so that
R, S becomes an acceptable schema to PoOD), but I won't for the sake
of argument.

> What they probably should have said is: R and S are said to have
> overlapping meaning if it does not follow from the constraints /
> dependencies that the intersection of R and S is always empty.

Maybe, but it would not take away my objection.
As long as R\S and S\R are allowed to be non-empty,
R and S are independent, regardless of their heading.

> Note that I'm talking about dependencies and not the relvar predicates
> as they do, but the intent is the same. So this, or rather the design
> principle based on this definition, I would call the weak POOD. Does
> that make more sense to you?

Maybe stating the obvious: not really.

I'll try to elaborate:

Design case:
S and R are sets of tuples we are trying to schematize:

If we do not allow R\S and S\R to be non-empty, the R,S design will be
rejected, because R and S represent the same set, not because of
(weak or strong) PoOD.
If we allow R\S and S\R to be non-empty, R and S are completely
independent. If we allow only one of R\S and S\R to be non-empty,
one of R and S is a subset of the other.

Wether a tuple in S carries the same meaning as or a different meaning
than an identical tuple in R is not based on their having the same
signature (heading if you prefer).

This changes when we apply DM Unseen's
'object to role transformation':
Stick the original relation name in a role attribute, like so:

T(role, a, b), role∈{'R','S'},

which forces any stuff together - based on shape, not on meaning.

--
What you see depends on where you stand.
Jan Hidders - 20 Jan 2008 12:40 GMT
> >> ...
> >>>>> Anyway, the stronger POOD that requires that headers are distinct
[quoted text clipped - 30 lines]
>
> Why doesn't the (strong) PoOD make sense to you?

It is at the same time too strong and too weak. It is too strong
because it forbids cases where there is in fact no redundancy, and at
the same time allows cases where there is redundancy. Moreover, you
can always trivially satisfy it by renaming R(a,b) to R(r_a,r_b). How
exactly does that improve the design?

> It does not make sense to me, and I am putting some effort
> into finding out why it doesn't. The track I am on now gives
[quoted text clipped - 3 lines]
> of the strong (EE: original) and the weak (EE: extended)
> PoOD, because both build on the sentence quoted hereunder.

Note that my definition differs in an important way from theirs. I
forgot to emphasize this the last time.

> >> EE: Chapter 2:
> >> "The meanings of R1A(t) and R1B(t) are said to overlap
[quoted text clipped - 17 lines]
> R, S becomes an acceptable schema to PoOD), but I won't for the sake
> of argument.

For the sake of the argument, please do. :-) It would allow me to
reply with the following: In that case there is still a
counterexample, namely if there is a tuple constraint a < b for R and
a >= b for S. Of course if you want, you can redefine the notion of
header such that this is also included.

> > What they probably should have said is: R and S are said to have
> > overlapping meaning if it does not follow from the constraints /
[quoted text clipped - 3 lines]
> As long as R\S and S\R are allowed to be non-empty,
> R and S are independent, regardless of their heading.

In that case you still might have redundancy. If you decompose in to
the following three, R' = R/S, RS' = R intersect S, S'  = S/R, then
you have removed that redundancy. That's the motivation if the
definition of "overlapping meaning" that I gave.

Actually, to really remove all such redundancy one would would need a
stronger notion of "overlapping meaning", so we you can also deal with
overlap modulo renaming, but I don't want to complicate things too
much at this point.

-- Jan Hidders
JOG - 20 Jan 2008 21:35 GMT
> > >> ...
> > >>>>> Anyway, the stronger POOD that requires that headers are distinct
[quoted text clipped - 36 lines]
> can always trivially satisfy it by renaming R(a,b) to R(r_a,r_b). How
> exactly does that improve the design?

Well this is an interesting question, because I am yet to see any
reason as to why it be beneficial, but can see how it makes life
harder given that it prevents a union without a preceding rename.

I find the example of phone numbers useful (a scenario also discussed
on TTM lists a while back):  for a selection of propositions
discussing people's work, home and mobile numbers is it possible to
state a consistent preference for a schema from:

1) Contacts = {phone_type, name, number}
2) Home = {name, number}
Work = {name, number}
Mobile = {name, number}

I find it interesting that the PoOD recommends 1,  the option that I
believe most designers would intuitively go for, but that in its
'strong form' also suggests that 2 is perfectly satisfactory if roles
are simply renamed as:

Home = {name, home_number}
Work = {name, work_number}
Mobile = {name, mobile_number}

Without any evidence of improvement this might confer to view
updating, I certainly see little sense therein. However, I wonder if
there is not a preferable guideline that also recommends 1 along the
lines of eliminating application bias (a Principle of Bias Elimination
perhaps). Such a guideline would also reject other feasible schema
such as,

Toms_numbers = {type, number}
Franks_numbers = {type, number}
etc...

that one might never realistically consider, but differ little
conceptually to a Home, Work, Mobile divisions, suggesting instead
that if it is possible for propositions to be collectivized under a
single predicate, then they should be?

Either way, I certainly find that appealing to notions of "meaning"
within formal design recommendations seems to head towards very
slippery ground.

> > It does not make sense to me, and I am putting some effort
> > into finding out why it doesn't. The track I am on now gives
[quoted text clipped - 54 lines]
>
> -- Jan Hidders
Bob Badour - 20 Jan 2008 23:23 GMT
>>>>>...
>>>>>
[quoted text clipped - 63 lines]
> Work = {name, work_number}
> Mobile = {name, mobile_number}

The RM proponents I have read are all strongly opposed to encoding
meaning in attribute names, relation names etc., and were all opposed at
the time folks were discussing POOD a lot.

I do not believe POOD ever suggests 2 is perfectly satisfactory.

[snip]
mAsterdam - 20 Jan 2008 23:53 GMT
> ... (a scenario also discussed on TTM lists a while back) ...

Is it alive? More than a year back I sent a
'request to subscribe' to ttm-subscribe@thethirdmanifesto.com
but I never got a response. Is the address correct?

> Either way, I certainly find that appealing to notions of "meaning"
> within formal design recommendations seems to head towards very
> slippery ground.
Jonathan Leffler - 21 Jan 2008 07:13 GMT
>> ... (a scenario also discussed on TTM lists a while back) ...
>
> Is it alive? More than a year back I sent a
> 'request to subscribe' to ttm-subscribe@thethirdmanifesto.com
> but I never got a response. Is the address correct?

The TTM list is alive and delivering emails most days.

From one of today's messages:

  List-Help: <mailto:ttm-help@thethirdmanifesto.com>
  List-Unsubscribe: <mailto:ttm-unsubscribe@thethirdmanifesto.com>
  List-Subscribe: <mailto:ttm-subscribe@thethirdmanifesto.com>
  Reply-To: ttm@thethirdmanifesto.com

I don't know why you ran into problems a year ago, but I've been on the
list since August 2005 without problem.

Signature

Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler@earthlink.net, jleffler@us.ibm.com
Guardian of DBD::Informix v2007.0914 -- http://dbi.perl.org/

publictimestamp.org/ptb/PTB-2326 ripemd160 2008-01-21 06:00:04
0639216F2AD6EC81B98F21B9AE2308216FB8E46B

mAsterdam - 21 Jan 2008 08:13 GMT
> The TTM list is alive and delivering emails most days.
>
[quoted text clipped - 7 lines]
> I don't know why you ran into problems a year ago, but I've been on the
> list since August 2005 without problem.

Thank you.
I retried and received a WELCOME email.
Brian Selzer - 21 Jan 2008 14:15 GMT
>> > >> ...
>> > >>>>> Anyway, the stronger POOD that requires that headers are distinct
[quoted text clipped - 77 lines]
> that if it is possible for propositions to be collectivized under a
> single predicate, then they should be?

I think that there are times when it makes sense to 'collectivize' under a
single predicate, and there are times when it does not.  In particular, if
the individuals represented are just different kinds of the same sort of
thing, such as is the case with the different kinds of phone numbers, then
it makes sense to 'collectivize' those attributes that are common to all of
those individuals under a single predicate.  If, on the other hand, the
individuals represented are different sorts of things that just happen to
have the same attributes, then it doesn't.  A disjunctive
predicate--something like, for example, each tuple exemplifies an individual
that is an X or a Y, just doesn't make any sense to me.  A conjunctive
predicate on the other hand--for a relation that has multiple keys, for
example--is not a problem since each tuple would exemplify an individual
that is /both/ an X /and/ a Y.  So if the predicate can be stated so that it
is not disjunctive, then 'collectivizing' is to be preferred.

> Either way, I certainly find that appealing to notions of "meaning"
> within formal design recommendations seems to head towards very
[quoted text clipped - 59 lines]
>>
>> -- Jan Hidders
David Cressey - 21 Jan 2008 15:09 GMT
> I think that there are times when it makes sense to 'collectivize' under a
> single predicate, and there are times when it does not.  In particular, if
[quoted text clipped - 10 lines]
> that is /both/ an X /and/ a Y.  So if the predicate can be stated so that it
> is not disjunctive, then 'collectivizing' is to be preferred.

It seems to me that you are describing the generalization-specialization
pattern.
Am I reading you right?

> > Either way, I certainly find that appealing to notions of "meaning"
> > within formal design recommendations seems to head towards very
> > slippery ground.

Why?  If you called it "the semantics of the data"  instead, would that make
it less slippery ground?
JOG - 21 Jan 2008 17:58 GMT
> > I think that there are times when it makes sense to 'collectivize' under a
> > single predicate, and there are times when it does not.  In particular, if
[quoted text clipped - 24 lines]
> Why?  If you called it "the semantics of the data"  instead, would that make
> it less slippery ground?

Absolutely not - worse in fact, because then it would be greased with
buzzwords too ;) Years of working with the semantic web, ontologies,
expert systems, etc have emphasised to me that "meaning" is only
something that can be obtained via situated embodiement within the
environment concerned, and that context is far too  complex a beast to
be tamed by computerized encoding. As such I'd rather rules appealed
to functional dependencies and logical consequents than to appeal to
the slippery notion of "meaning".
Jan Hidders - 22 Jan 2008 08:53 GMT
> > "Brian Selzer" <br...@selzer-software.com> wrote in message
>
[quoted text clipped - 37 lines]
> to functional dependencies and logical consequents than to appeal to
> the slippery notion of "meaning".

Hear hear. But I think the part of POOD that actually does make sense
and really removes redundancy and update anomalies can be defined that
way. For example, if there is an inclusion dependency between a
relation R and S then it makes sense to say that there is overlap in
meaning. You can generalize this by defining the notion of overlap
such that there is overlap if there is an inclusion dependency between
(1) the result of a selection of R and (2) S, or vice versa. Or even
more general, if "a selection of R" is interpreted as any conjunctive
query, possibly with equalities and inequalities, that selects a
subset of R. In all these case it holds that if you violate the POOD
rule that says that there can be no overlap in meaning between
different components of non-trivial join dependencies, then there is
indeed redundancy. Moreover, in those cases you can solve that
redundancy by making the redundant part explicit in a relation by
horizontal and vertical splits.

Does that cover all types of redundancy caused by dependencies? I
suspect not. For that you probably need to generalize the rule further
such that it not only looks at overlap between components of join
dependencies but also between the result of arbitrary conjunctive
queries and components of join dependencies. It's currently not clear
to me under which circumstances you are then still sure that this
redundancy can be removed.

-- Jan Hidders
mAsterdam - 22 Jan 2008 09:46 GMT
>>>> I think that there are times when it makes sense to
>>>> 'collectivize' under a single predicate, and there
[quoted text clipped - 39 lines]
> relation R and S then it makes sense to say that there is overlap in
> meaning.

?

inclusion dependency --> meaning overlap

?

Slowly, please - where has the 'might' (from the redundancy in
your other post) gone? And the 'qualified' in the inclusion dependency?

> You can generalize this by defining the notion of overlap
> such that there is overlap if there is an inclusion dependency between
> (1) the result of a selection of R and (2) S, or vice versa. ...
Jan Hidders - 22 Jan 2008 13:18 GMT
> >>>> I think that there are times when it makes sense to
> >>>> 'collectivize' under a single predicate, and there
[quoted text clipped - 43 lines]
>
> inclusion dependency --> meaning overlap

Yes. An inclusions dependency implies meaning overlap. It is a
sufficient condition, but not a necessary one.

> ?
>
> Slowly, please - where has the 'might' (from the redundancy in
> your other post) gone?

That applied to my improved version of EE's definition of meaning
overlap. Under the definition I'm using here this "might" has become a
"must".

> And the 'qualified' in the inclusion dependency?

Saying that there is a qualified inclusion dependency between R and S
is equivalent with saying that there is an inclusion dependency
between (1) the result of a conjunctive query that selects tuples from
R and (2) S, which I talked about in the subsequent generalization
steps in the preceding posting. I though that a step by step
introduction might make the concept and the reasoning behind it a bit
easier to grasp.

Apologies if this is all a bit confusing, but I'm developing these
ideas as I post, and I'm not sure yet what the best way is to explain
them.

-- Jan Hidders
mAsterdam - 22 Jan 2008 18:57 GMT
>>>>>>> ... I certainly find that appealing to notions
>>>>>>> of "meaning" within formal design recommendations
[quoted text clipped - 20 lines]
>
>> ?

The easiest case of meaning overlap, then, should be
with the the most simple form of inclusion dependency:
RI, the foreign key.
How does the meaning overlap with simple RI?

>> Slowly, please - where has the 'might' (from the redundancy in
>> your other post) gone?
>
> That applied to my improved version of EE's definition of meaning
> overlap. Under the definition I'm using here this "might" has become a
> "must".

Another change of the definition of meaning overlap? To what?

>> And the 'qualified' in the inclusion dependency?
>
[quoted text clipped - 5 lines]
> introduction might make the concept and the reasoning behind it a bit
> easier to grasp.

I don't think the difficulties are in the genericity of the
definition of the inclusion dependency. ISTM the problems
are in the definition of 'meaning overlap'. You changed
it twice since I mentioned my objection to Erki's definition.

> Apologies if this is all a bit confusing, but I'm developing these
> ideas as I post, and I'm not sure yet what the best way is to explain
> them.

Making the changes in your used definitions stand
out more would definitely help. It is very hard to
keep track of what you consider invariant and variant
to your line of reasoning if definitions change
without mention.
Jan Hidders - 23 Jan 2008 01:45 GMT
> >>>>>>> ... I certainly find that appealing to notions
> >>>>>>> of "meaning" within formal design recommendations
[quoted text clipped - 25 lines]
> RI, the foreign key.
> How does the meaning overlap with simple RI?

Well, if you have R(a,b) and S(c,d) and a FK R[a,b] to S[c,d], then
there is meaning overlap. The inclusion dependencies we are
considering have to cover all the attributes of the involved
relations.

> >> Slowly, please - where has the 'might' (from the redundancy in
> >> your other post) gone?
[quoted text clipped - 4 lines]
>
> Another change of the definition of meaning overlap? To what?

Ok. For the time being I promise to stick to the following definition:

DEFINITION: Two relations R and S are said to have overlap in meaning
if there is an inclusion dependency from all attributes of R to all
attributes of S, or vice versa.

Once you are comfortable with this we can move on the the more subtle
cases of meaning overlap. To be complete let met also fix the POOD
rule for the moment:

DEFINITION: A schema is said to violate POOD if it contains two
different relations R and S such that a component of a non-trivial
join dependency of R overlaps in meaning with a component of a non-
trivial join dependency of S.

So far so good?

-- Jan Hidders
mAsterdam - 23 Jan 2008 15:00 GMT
>>>>>>>>> ... I certainly find that appealing to notions
>>>>>>>>> of "meaning" within formal design recommendations
[quoted text clipped - 26 lines]
> considering have to cover all the attributes of the involved
> relations.

Your definiens is close to D&M's ¹², even the isomorphism
requirement is now there ('cover all the attributes'), with
an added application restriction to inclusion dependencies.

The defieniendum 'meaning overlap' is defined even
narrower again. Let me try an analogy here.
Our coffeemachine design and test team, defines
'coffee' as 'the substance we can put in the coffeeholder
of our espresso machine without wrecking the machine.'
Chris: "Let's try caustic soda."
David: "No coffee, the pressure will get too high.
       Tea is good coffee, though."
Chris: "When the beans are grinded too fine the stuff
       will clutter the sieve".
Jan:   "We'll have to exclude that, to."

It is meaning overlap, Jim, but not as we know it.

An inconvenient implication:
"In other words, such an 'individual user database' ought
not to include any views and/or base tables whose meanings overlap" ²
together with your definiens gives: all foreign keys are out.

I'll repeat that: all foreign keys are out.

¹ "The Principle of Orthogonal Design" draft, Date & McGoveran
part 1: http://www.dbdebunk.com/page/page/622331.htm
² idem part 2: http://www.dbdebunk.com/page/page/622312.htm

>>>> Slowly, please - where has the 'might' (from the redundancy in
>>>> your other post) gone?
[quoted text clipped - 19 lines]
>
> So far so good?

Does PoODling flush bathwater, baby, or part of both?
The jury isn't even out yet.
Jan Hidders - 24 Jan 2008 10:58 GMT
> >>>>>>>>> ... I certainly find that appealing to notions
> >>>>>>>>> of "meaning" within formal design recommendations
[quoted text clipped - 30 lines]
> requirement is now there ('cover all the attributes'), with
> an added application restriction to inclusion dependencies.

Correct. A small but very important difference.

> The defieniendum 'meaning overlap' is defined even
> narrower again.

Correct. I wanted to start simple. So you would like a wider
definition? Your wish is my command:

DEFINITION: Two relations R and S are said to have overlap in meaning
if there is a dependency that requires that sometimes some tuples of R
are also in S after renaming the attribute names, or vice versa.

It's a bit hand-wavy for my taste, because it doesn't define the type
of dependencies but it will probably do for the moment.

> An inconvenient implication:
> "In other words, such an 'individual user database' ought
> not to include any views and/or base tables whose meanings overlap" ²
> together with your definiens gives: all foreign keys are out.

Under their definition of the POOD rule, yes, but not under mine. Btw.
normalization rules never should be applied to views, only to base
tables, and it is a normalization rule we are talking about here.

Unfortunately I also just noticed a related flaw in my POOD rule, so
I'm going to have to redefine it slightly. First some additional
terminology:

DEFINITION: A join dependency is said to be a proper if it does not
hold that any of its components is a subset of another component.

Now the rule:

DEFINITION: A schema is said to violate POOD if it contains relations
R and S such that a component C of a proper join dependency of R
overlaps in meaning with a component D of a proper join dependency of
S, and either R and S are different, C and D are different, or both.

> > So far so good?
>
> Does PoODling flush bathwater, baby, or part of both?
> The jury isn't even out yet.

I suspect with the new POOD the baby is quite safe. :-)

-- Jan Hidders
mAsterdam - 25 Jan 2008 08:18 GMT
> ... I wanted to start simple ...
Even though this suggests otherwise, I'll risk assuming
that you understood the coffee analogy, and snipped it
for focus.

Let's do some coffee-machine refactoring,..

> DEFINITION: Two relations R and S are said to have overlap in meaning
> if there is a dependency that requires that sometimes some tuples of R
> are also in S after renaming the attribute names, or vice versa.

..and start simple ;-)

1. The 'renaming etc...' is only there to exclude trivialities,
caused by a clumsy (at least for this coffee-machine)
definition of tuple-types, clumsy because it includes attribute names.
We'll have a definiton without attribute names (to be formulated
later if necessary).

2. It is not at all clear a priori what if anything this
has to with meaning. To avoid distraction by connotation
we'll substitute the definiendum by one with less connotation,
and see if we can substitute it back when the conclusions are
clear.
My first hunch was 'symbolic redundancy' (thinking of Neo) but
that would be loaded as well. It is the chaff to be filtered out
by the PoOD, so PoOD-chaff.
Clean enough?

1. 'renaming etc...' -- out
2. PoOD-chaff

DEFINITION: Two relations R and S are said to have PoOD-chaff
if there is a dependency that requires that sometimes some
tuples of R are also in S.

> It's a bit hand-wavy for my taste, because it doesn't define the type
> of dependencies but it will probably do for the moment.
Ok.

Note: The isomorphy of R and S is implicit. The tuples
can be in both R and S so R-tuples and S-tuples have to be
of the same type.

> ...additional terminology:
>
> DEFINITION: A join dependency is said to be a proper if it does not
> hold that any of its components is a subset of another component.

Proper is a nicer than non-trivial.

> Now the rule:
>
[quoted text clipped - 8 lines]
>
> I suspect with the new POOD the baby is quite safe. :-)

DEFINITION: A schema is said to violate POOD if it contains relations
R and S such that a component C of a proper join dependency of R
has PoOD-chaff with a component D of a proper join dependency of
S, and either R and S are different, C and D are different, or both.

Now let's see if there is meaning overlap, that is, is it ok to
s/PoOD-chaff/meaning overlap/ back? To check that, I'll
first have to make sure I understand the definition.

It looks to me that in order to have this problem,
we need to have two relations R and S, where 5NF is
relevant in both, with isomorphy between components
of R and components of S
(which is quite imaginable when integrating databases
in the same business domain, e.g. after a merger, or
with similar but distinct types of complex contracts).
Am I correct thusfar?

I don't have much experience with reasoning about 5NF.
Aside: The reason for that may be, that whenever I come across
       the not immediately obvious, I start stating easily understood
       natural language sentences, expressing the elementary facts we
       are modeling. From consensus on those we'll formalize to
       predicates and check again. From these predicates the already
       normalized relations follow (except the naming, but there
       is strong support in the predicates).
Jan Hidders - 25 Jan 2008 16:25 GMT
> > ... I wanted to start simple ...
>
> Even though this suggests otherwise, I'll risk assuming
> that you understood the coffee analogy, and snipped it
> for focus.

I *think* I understood it, but I didn't want to get into a debate
about it's appropriateness. My position is that in this case we don't
really know precisely what this "coffee" thing is anyway. It is a
rather vague and informal notion, so we can take the liberty to
redefined in a way that seems to most practical for the time being.
What matters is if we end up with a normal form that makes sense, not
if we have precisely established what meaning overlap is. The latter
is more a philosophical question anyway. But if this confuses you,
then I have no problem with calling it something else.

> Let's do some coffee-machine refactoring,..
>
[quoted text clipped - 9 lines]
> We'll have a definiton without attribute names (to be formulated
> later if necessary).

No. I'm sorry, but that is not acceptable. Not allowing renaming in
the tuples changes the properties of the normal form. If you want we
can change from the named to the unnamed perspective where tuples are
ordered unlabeled tuples, but then you will need to replace the end
with "after reordering the elements of the tuple", so not much would
be gained. Note that the definition is talking about tuples, not tuple
types. Redefining the notion of tuple type would nog change anything.
Redefining the notion of tuple would, but as I already said, then
there is not much simplification.

Btw. not allowing relabeling / reordering in inclusion dependencies
provably restricts their expressive power in ways that cannot be
solved by picking better attribute names in your schema. So I would
not say that these are "trivialities that are caused by a clumsy
definition of tuple types".

> 2. It is not at all clear a priori what if anything this
> has to with meaning. To avoid distraction by connotation
[quoted text clipped - 4 lines]
> that would be loaded as well. It is the chaff to be filtered out
> by the PoOD, so PoOD-chaff.

Not *all* overlap has to be removed, so I wouldn't always want to call
it chaff. It would also be nicer if the name reflected that there is
some kind of overlap or that the relations partially coincide and that
this is caused by dependencies. Could we settle on "two relations have
dependency-induced overlap" for the time being?

> 1. 'renaming etc...' -- out
> 2. PoOD-chaff
[quoted text clipped - 11 lines]
> can be in both R and S so R-tuples and S-tuples have to be
> of the same type.

You mean that the headers are isomorphic, not the relations
themselves, right? But what do you mean by isomorphic? Can I replace
attribute names or domains or both? Actually, now that I think about
it, whatever your definition, this is only true if you assume that
different domains are always disjoint.

> > ...additional terminology:
>
> > DEFINITION: A join dependency is said to be a proper if it does not
> > hold that any of its components is a subset of another component.
>
> Proper is a nicer than non-trivial.

Note that it's semantically different. Not all proper join
dependencies are non-trivial, and not all non-trivial join
dependencies are proper.

> > Now the rule:
>
[quoted text clipped - 17 lines]
> s/PoOD-chaff/meaning overlap/ back? To check that, I'll
> first have to make sure I understand the definition.

Ok.

> It looks to me that in order to have this problem,
> we need to have two relations R and S, where 5NF is
[quoted text clipped - 4 lines]
> with similar but distinct types of complex contracts).
> Am I correct thusfar?

Roughly. The "5NF is relevant" is a bit misleading. All that is
required is that there are proper join dependencies, but these are
always present. For example, any relation R(a,b,c) will have always
the trivial join dependency JD [{a,b,c}], i.e., the join dependency
with one component that contains all attribute names in the header.
Although trivial, it is also proper. So you can violate the rule, even
if there is no non-trivial join dependency. A simple example would be
as follows: R(a,b) and S(c,d) with ID R[a,b] -> S[c,d], It violates
the rule because you also have for S the trivial and proper JD
[{c,d}]. So do you think that there is redundancy here that can be
removed?

Btw. the "with isomorphism between components" can be simplified to
"with components of equal size". After all, components are just sets
of attribute names, so they have equal size iff they are isomorphic.

> I don't have much experience with reasoning about 5NF.

I actually don't think that is necessary. Whether the relations are in
5NF or not is not really irrelevant. But you understand what lossless
decompositions are, no? And I assume you also understand how they are
implied by functional dependencies. So then you might start first with
considering cases where all the JDs are implied by FDs. For example,
say we have R(a,b,c) and S(d,e,f) with POOD chaff / dependency-induced
overlap between R[a,b] and S[d,e], for example because we have ID
R[a,b] -> S[d,e]. We would then violate the rule if there is for R a
JD [{a,b}, {a,c}] and for S a JD [{d,e}, {d,f}]. These JDs are implied
if we have for R the FD a -> b,c and for S the FD d -> e,f. So, in
that case, do you think that there is redundancy that can be removed?

-- Jan Hidders
mAsterdam - 26 Jan 2008 21:06 GMT
This is all way over my head. Jan, you may be used to
sailing these waters; I am not. I think it is interesting,
but very time-consuming.
Am I talking rubbish? I try not to, but I don't know - nobody
else chimes in. You keep replying, but hey, you are a nice guy :-)

I'll make some bold statements here anyway, but I can't keep this up.

>>> ... I wanted to start simple ...
>> Even though this suggests otherwise, I'll risk assuming
[quoted text clipped - 6 lines]
> rather vague and informal notion, so we can take the liberty to
> redefined in a way that seems to most practical for the time being...

Informal as far as most of its nature is concerned, sure.
Its existence, however, is neither vague nor informal.
Without it, the coffee-machine loses its purpose.
It is because of this, that pragmatical redefinitions
must indeed be temporary and tracked.

From D&M's draft, ch 5 'The question of meaning',
( http://www.dbdebunk.com/page/page/622331.htm ):
"... every table, be it a base table, a view, a query result,
or whatever certainly does have an associated meaning.  And,
of course, users must be aware of those meanings if they are
to use the database correctly and effectively."

>> Let's do some coffee-machine refactoring,..
>>
[quoted text clipped - 15 lines]
> with "after reordering the elements of the tuple", so not much would
> be gained.

Wether the relation between heading and tuples goes via names or
ordering is relevant or not. If it is not I want it out of scope.
Remarks by others make me doubt wether it is relevant or not.

Very early in this thread,
----
JOG wrote:
> DM Unseen wrote:
>> My understanding is that the debate centers around typing.
[quoted text clipped - 15 lines]
> conflating the concept of attribute and type, and defined an attribute
> as a (role-name, type) pair, but as it stands is seems very odd to me.
----

> Note that the definition is talking about tuples, not tuple
> types. Redefining the notion of tuple type would nog change anything.

That is simply not true.
It just would not change the /wording/ of the definition.
Redefining the notion of tuple type may change the notion of tuple.

> Redefining the notion of tuple would, but as I already said, then
> there is not much simplification.
[quoted text clipped - 4 lines]
> not say that these are "trivialities that are caused by a clumsy
> definition of tuple types".

And the remedy is to include the attribute names in the tuple-types?
So it seems. D&M draft: "...type-compatibility as we define it
requires the two tables to have identical column names."
I am trying to be careful about the side-effects.
The example below shows that they may not be nice.

>> 2. It is not at all clear a priori what if anything this
>> has to with meaning. To avoid distraction by connotation
[quoted text clipped - 10 lines]
> this is caused by dependencies. Could we settle on "two relations have
> dependency-induced overlap" for the time being? ...

Ok.

DEFINITION: Two relations R and S are said to have dependency-induced
overlap if there is a dependency that requires that sometimes some
tuples(1) of R are also in S.

(1) for some definition of tuples that allows restricted
reshuffling of its values. To do.

>> Note: The isomorphy of R and S is implicit. The tuples
>> can be in both R and S so R-tuples and S-tuples have to be
>> of the same type.
>
> You mean that the headers are isomorphic, not the relations
> themselves, right?

No, that is not what I meant. In the vocabulary I am
more or less used to: when the headers are isomorph,
the relations are. Iff, even.

> But what do you mean by isomorphic?
> Can I replace attribute names or domains or both?

Attribute name replacement is irrelevant in this.

Maybe an example helps.

Three relations, BOM, MOB and WIRE.
+---------------------------------------------+
|Name: BOM                                    |
|Predicate: Part PART# is part of IS_PART_OF# |
|Key: (PART#, IS_PART_OF#)                    |
+---------------------------------------------+
|Domain:    |PTNUM  | PTNUM                   |
|Attribute: |PART#  | IS_PART_OF#             |
+---------------------------------------------+
|Values:        2        1                    |
|               3        1                    |
|               4        1                    |
|               4        5                    |
+---------------------------------------------+

MOB is a view on BOM with the attributes switched.

+--------------------------------------------------------+
|Name: WIRES                                             |
|Predicate: We have a wire to connect L_PART# to R_PART# |
|Key: (L_PART#, R_PART#)                                 |
+--------------------------------------------------------+
|Domain:    |PTNUM    | PTNUM                            |
|Attribute: |L_PART#  | R_PART#                          |
+--------------------------------------------------------+
|Values:        2        3                               |
|               3        2                               |
|               4        3                               |
|               4        5                               |
+--------------------------------------------------------+

BOM and WIRES are isomorph,
BOM and MOB are isomorph,
WIRES and MOB are isomorph.
The attributenames are irrelevant to this.
Their isomorphism is determined by
their tupletypes.
All tuples of BOM, MOB and WIRES are of
type (PTNUM, PTNUM).

This seems pretty clear and straightfoward to me,
but it is not compliant with D&M's
"...type-compatibility as we define it requires the two
tables to have identical column names."
Making it compliant I'll lose clarity for what?
For /maybe/ a feature that can determine which
tuples to touch based on their type alone.

> Actually, now that I think about
> it, whatever your definition, this is only true if you assume that
> different domains are always disjoint.

How so?

>>> ...additional terminology:
>>> DEFINITION: A join dependency is said to be a proper if it does not
[quoted text clipped - 3 lines]
> dependencies are non-trivial, and not all non-trivial join
> dependencies are proper.

Thanks, I have to read more carefully.

Trivial dependencies are the ones implied by the
candidate keys. You include some of those in the problem space
while leaving some non-trivial dependencies out based on wether
any of its components is a subset of another component?

Let's see. This is because we are dealing with two relations
S and R, and we have to make sure that we get all
dependency-overlap, including the overlap of the trivial JD's.

Ok. I missed that.

>>> Now the rule:
>>> DEFINITION: A schema is said to violate POOD if it contains relations
[quoted text clipped - 31 lines]
> with one component that contains all attribute names in the header.
> Although trivial, it is also proper.

Yep, I think I get that.

> So you can violate the rule, even
> if there is no non-trivial join dependency. A simple example would be
[quoted text clipped - 6 lines]
> "with components of equal size". After all, components are just sets
> of attribute names, so they have equal size iff they are isomorphic.

Now I'm lost again.

>> I don't have much experience with reasoning about 5NF.
>
> I actually don't think that is necessary. Whether the relations are in
> 5NF or not is not really irrelevant. But you understand what lossless
> decompositions are, no? And I assume you also understand how they are
> implied by functional dependencies.

Up to BCNF.

> So then you might start first with
> considering cases where all the JDs are implied by FDs. For example,
[quoted text clipped - 4 lines]
> if we have for R the FD a -> b,c and for S the FD d -> e,f. So, in
> that case, do you think that there is redundancy that can be removed?
Jan Hidders - 27 Jan 2008 12:20 GMT
mAsterdam schreef:
> This is all way over my head. Jan, you may be used to
> sailing these waters; I am not. I think it is interesting,
[quoted text clipped - 3 lines]
>
> I'll make some bold statements here anyway, but I can't keep this up.

Your non-bold statements are also be highly appreciated. :-)

Too keep things manageable I'll try to trim the discussion heavily and
keep what I think is really essential. So if I snip any of your
comments it is not because I don't think they are interesting, but
just that at the moment I think they are not at the core of the issue.

> >>> ... I wanted to start simple ...
> >> Even though this suggests otherwise, I'll risk assuming
[quoted text clipped - 12 lines]
> It is because of this, that pragmatical redefinitions
> must indeed be temporary and tracked.

Sure, we agree on that. It's just that I think it is sufficient to say
at the beginning something like "for this discussion we are going to
define 'overlapping meaning' as .." and you want to be more careful
and really use another word. As I already said, I'm fine with that, so
I don't think more discussion is needed here.

[... very big and painful snip ...]

> DEFINITION: Two relations R and S are said to have dependency-induced
> overlap if there is a dependency that requires that sometimes some
> tuples(1) of R are also in S.
>
> (1) for some definition of tuples that allows restricted
> reshuffling of its values. To do.

The only way to achieve (1) so that it also takes all normal inclusion
dependencies into account is to define tuples as something equivalent
to bags of values. Such an operation on its internal organs is going
to change the relational model beyond recognition. I'm going to
strongly insist that we stick to the classical definitions of the
named perspective and state in the definition that we are talking
about inclusion up to relabeling:

DEFINITION: Two relations R and S are said to have dependency-induced
overlap if there is a dependency that requires that sometimes some
tuples of R are also in S after renaming the attribute names.

[... another big and painful snip ...]

> >>> ...additional terminology:
> >>> DEFINITION: A join dependency is said to be a proper if it does not
[quoted text clipped - 8 lines]
> Trivial dependencies are the ones implied by the
> candidate keys.

Not precisely. Trivial dependencies are those that hold in any schema.
For example the FK a,b -> a is always true as long as you have
attributes 'a' and 'b' in the header. Another example is the JD
[ {a,b}, {b,c}, {a,b,c} ] if the attributes in the header are {a,b,c}.
This is always a lossless decomposition, but also a very redundant
one. Another is the JD [ {a,b,c} ].

> You include some of those in the problem space
> while leaving some non-trivial dependencies out based on wether
[quoted text clipped - 3 lines]
> S and R, and we have to make sure that we get all
> dependency-overlap, including the overlap of the trivial JD's.

Yes. The intuition is that if there is a JD the relation is in fact a
combination of one or more predicates namely the ones that correspond
to each component of the JD. It is for these predicates that we want
to check overlap. We want to explicitly include the case where it is
actually just one, so we also want to consider the trivial JD [ {a, b,
c} ] for the header {a, b, c}.

In that terminology you can also perhaps understand why we only care
about proper JDs. Suppose we have JD [ {a, b}, {b, c} ]. Then it
follows that the following JD also holds: JD [ {a}, {a, b}, {b, c} ].
I think you'll understand that if the first defines a lossless
decomposition, then so does the second for the simple reason that it
contains at least as much information as the first. But it is not true
that the component {a} also necessarily corresponds to some meaningful
predicate.  In fact, it probably doesn't. So that is why we want to
exclude those.

Having said that, there are actually more JD's that need to be
excluded than I initially thought. For example, if the JD [ {a, b},
{b, c} ] holds then also the JD  [ {a, c}, {a, b}, {b, c} ] holds.
Also here there is not necessarily a meaningful intuitive predicate
associated with the component {a, c}. So, at the risk of causing you a
serious migraine I'm going to have to tweak my definitions a little.

DEFINITION: A join dependency is said to be minimal if there is not
another join dependency with a set of components that is a proper
subset of the set of components of the first.

Examples: If JD [ {a, b}, {b, c} ] holds then JD [ {a, b}, {b, c}, {a,
c} ] is not minimal. The JD [ {a}, {a, b} ] is never minimal because
the trivial JD [ {a, b} ] always holds.

Note that a minimal join dependency is always also a proper join
dependency.

The new rule then becomes:

DEFINITION: A schema is said to violate POOD if it contains relations
R and S such that a component C of a minimal join dependency of R
overlaps in meaning with a component D of a minimal join dependency of
S, and either R and S are different, C and D are different, or both.

I just realized by rereading your comments that there might be some
confusion on the notion of "component". From now on I'll strictly use
the word component for a set of attribute names in a join dependency.
So I would like to make the wording in the rule a bit more explicit:
(changed parts are emphasized with "_")

DEFINITION: A schema is said to violate POOD if it contains relations
R and S such that _the projection of R_ on a component C of a minimal
join dependency of R overlaps in meaning with _the projection of S_ on
a component D of a minimal join dependency of S, and either R and S
are different, C and D are different, or both.

> >> It looks to me that in order to have this problem,
> >> we need to have two relations R and S, where 5NF is
[quoted text clipped - 26 lines]
>
> Now I'm lost again.

My fault. I thought you were talking about components of JDs.

> >> I don't have much experience with reasoning about 5NF.
> >
[quoted text clipped - 4 lines]
>
> Up to BCNF.

Good. Let's for the moment restrict ourselves to just CKs. So say we
have a relation R(a,b,c,d) with a CK {a,b}. What are the minimal join
dependencies that hold?

And what if we have the CKs {a,b} and {b,c}?

So do you see for which projections we need to check overlap with
other projections of other relations?

-- Jan Hidders
Jan Hidders - 27 Jan 2008 12:45 GMT
> mAsterdam schreef:
>
[quoted text clipped - 128 lines]
> overlaps in meaning with a component D of a minimal join dependency of
> S, and either R and S are different, C and D are different, or both.

Oops. The "overlaps in meaning" should of course read "has dependency-
induced overlap".  Please apply Hanlon's razor. ;-)

-- Jan Hidders
David Cressey - 27 Jan 2008 12:49 GMT
> mAsterdam schreef:
> > This is all way over my head. Jan, you may be used to
> > sailing these waters; I am not. I think it is interesting,
> > but very time-consuming.
> > Am I talking rubbish? I try not to, but I don't know - nobody
> > else chimes in. You keep replying, but hey, you are a nice guy :-)

It's even further over my head.

> > >>> DEFINITION: A join dependency is said to be a proper if it does not
> > >>> hold that any of its components is a subset of another component.
[quoted text clipped - 14 lines]
> This is always a lossless decomposition, but also a very redundant
> one. Another is the JD [ {a,b,c} ].

I really need to understand the word "trivial"  as used above in a more
careful manner than I do.

Unfortunately, my intuitive grasp of the word "trivial"  comes from hearing
it used by engineers rather than mathematicians.  When engineers use the
word "trivial",  there is almost always an overtone of snobbism present.  A
problem is "trivial"  if it's unworthy of the attention of someone so
important as the person speaking.  It should be assigned to a "junior"
person.

In mathematics,  the "trivial case" is almost always easy,  but easiness is
not what relegates it to triviality.  It's more like "orthogonality to the
peculiarities of the item under discussion",  or something like that.

So I need to understand "trivial dependencies",  in a really tight fashion.
My loose understanding of one form of trivial dependence is that a component
of a composite key is trivially dependent on that key.  In this case,
"trivial" means,  to me,  that the dependency provides no additonal
information about anything.  The sort of thing that, in common parlance,
would draw the response,  "well, duh!"

But I don't have a general understanding of  what makes trivial dependencies
trivial.
Bob Badour - 27 Jan 2008 13:28 GMT
>>mAsterdam schreef:
>>
[quoted text clipped - 50 lines]
> But I don't have a general understanding of  what makes trivial dependencies
> trivial.

I might suggest: "Inferable from the definition."

Presumably one can infer (calculate/prove) from the definition of FK that:
FK a,b -> a

Or did he mean FD above instead of FK?
Jan Hidders - 27 Jan 2008 20:15 GMT
> >>mAsterdam schreef:
>
[quoted text clipped - 52 lines]
>
> I might suggest: "Inferable from the definition."

Yes. A good suggestion.

> Presumably one can infer (calculate/prove) from the definition of FK that:
> FK a,b -> a
>
> Or did he mean FD above instead of FK?

Yes, I did. Apologies for the typo.

-- Jan Hidders
Marshall - 27 Jan 2008 18:35 GMT
> "Jan Hidders" <hidd...@gmail.com> wrote in message
>
> But I don't have a general understanding of  what makes trivial dependencies
> trivial.

My understanding of "trivial" is that it has a very specific meaning
in the context of FDs, namely that A -> B is trivial iff B is a subset
of A.

Marshall
Jan Hidders - 27 Jan 2008 20:51 GMT
> > mAsterdam schreef:
> > > This is all way over my head. Jan, you may be used to
[quoted text clipped - 26 lines]
> I really need to understand the word "trivial"  as used above in a more
> careful manner than I do.

Ok. I'll try to give it a shot.

So the general meaning is that a dependency is trivial if it holds in
any schema in which it makes sense (i.e. that defines the used
relation and attribute names) and independent of which constraints /
dependencies have been specified.

For functional dependencies, for example, these are exactly all FDs X -
> Y such that Y is a subset of X. So FD a,b -> b is such an FD and FD
a -> b is not. So the trivial FDs always hold, and the non-trivial
ones only hold if they have been explicitly specified or if they
follow from constraints that have been explicitly specified. The
intuition behind the term "trivial" is that if you write down a
complete list of the dependencies that hold, then you can always write
down the trivial ones first, because these will hold, no matter what
the specific situation is.

This notion has been generalized for all kinds of dependencies. For
multi-value dependencies (MVDs) this characterization is a bit more
complicated. An MVD X ->> Y is trivial iff Y is a subset of X or the
union of X and Y is the header of the relation. So for R(a,b,c) the
MVD a ->> b,c always holds.

For join dependencies it actually gets easier again. We know that a JD
[ X1, .. , Xn ] is trivial iff one of the components Xi is the header
of the relation. Clearly if one of the components of your
decomposition is the whole original relation, the decomposition is
going to be lossless.

> So I need to understand "trivial dependencies",  in a really tight fashion.
> My loose understanding of one form of trivial dependence is that a component
> of a composite key is trivially dependent on that key.  In this case,
> "trivial" means,  to me,  that the dependency provides no additonal
> information about anything.  The sort of thing that, in common parlance,
> would draw the response,  "well, duh!"

Yes, that intuition is roughly correct. And you can generalize this
even beyond the types of dependencies I mentioned. As usually defined,
all dependencies are a specific syntactical subclass of statements /
formulas in first order logic.  All trivial dependencies are then
defined as the tautologies within that subclass.

Does that shed some light?

-- Jan Hidders
David Cressey - 27 Jan 2008 22:57 GMT
> Yes, that intuition is roughly correct. And you can generalize this
> even beyond the types of dependencies I mentioned. As usually defined,
> all dependencies are a specific syntactical subclass of statements /
> formulas in first order logic.  All trivial dependencies are then
> defined as the tautologies within that subclass.

> Does that shed some light?

Yes, it does.  Thanks.