Database Forum / General DB Topics / DB Theory / September 2007
Multiple-Attribute Keys and 1NF
|
|
Thread rating:  |
JOG - 28 Aug 2007 13:26 GMT I am still fighting with the theoretical underpinning for 1NF. As such, any comments would be greatfully accepted. The reason for my concern is that there /seems/ instances where 1NF is insufficient. An example occurred to me while I was wiring up a dimmer switch (at the behest of mrs. JOG, to whom there may only be obeyance). Now I don't know the situation in the US, but in the UK a while back the colour codes for domestic main circuit wiring changed. Naturally the two schemes exist in tandem, as exhibited in every house I've had the joy of doing some DIY in:
Brown -> live. Red -> live Blue -> neutral. Black -> neutral. Green and yellow -> earth.
The issue with encoding these propositions is that the candidate key for each proposition may consist of one _or_ two colours. Now I have a couple of options, none of which seem satisfactory. I could leave green & yellow as some sort of set-value composite, but obviously this would affect my querying capabilities, so thats out straight off the bat. Similarly adding attributes Colour1 and a nullable Colour2 is simply so hideous it isn't worth consideration. So, I could ungroup to give me:
Colour Type ----------------- Brown live Red live Black neutral Blue neutral Green earth Yellow earth -----------------
But again this is unsatisfactory as I have lost the information that one wire is green and yellow, but none is brown /and/ red.
I could introduce a surrogate to give me:
Id Colour Type ----------------- 1 Brown live 2 Red live 3 Black neutral 4 Blue neutral 5 Green earth 5 Yellow earth -----------------
But this seems wholly artificial given that all the information I required for identification was available in the original propositions, and that did not require some artificial id. A [shudder] non 1NF variation such as:
Id Colour Type ----------------- 1 Brown live 2 Red live 3 Black neutral 4 Blue neutral 5 Green, earth Yellow -----------------
is clearly hideous as it denies the fundamental mathematical principle that that one attribute should take one value from one domain, nevermind the fact that it introduces query bias.
I could of course introduce nested relations, but I am uncertain as to the theoretical consequences of having nested relation as a key (I guess it would be fine, if adding seemingly unnessecary complexity to subsequent queries). But moreover it again seems unintuitive, given that in this case it would indicating that the original propositions contained, as a value for one of their attributes, a further proposition, and this was not the case.
I am having a crisis of faith with the way 1NF is currently viewed. Any ideas to solve my dilemma? Am I on my own in being perturbed?
Regards, Jim.
Kevin Kirkpatrick - 28 Aug 2007 13:51 GMT > I am still fighting with the theoretical underpinning for 1NF. As > such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 77 lines] > > Regards, Jim. Sounds like there are not "wire colors" with a meaning so much as "wire patterns", with a meaning:
Wire Pattern | Meaning Brown | live Red | live Black | neutral Blue | neutral Green and Yellow | earth Rainbow | telephone
Wire Pattern | color Brown | Brown Red | Red Black | Black Blue | Blue Green and Yellow | Green Green and Yellow | Yellow Rainbow | Red Rainbow | Yellow Rainbow | Blue Rainbow | Green
Would this clear up the issue?
David Cressey - 28 Aug 2007 14:35 GMT > I am still fighting with the theoretical underpinning for 1NF. As > such, any comments would be greatfully accepted. The reason for my > concern is that there /seems/ instances where 1NF is insufficient. Insufficient for what? I wasn't able to infer this from your example.
> An > example occurred to me while I was wiring up a dimmer switch (at the [quoted text clipped - 9 lines] > Black -> neutral. > Green and yellow -> earth. In the US, house current is typically at a nominal 120V, except for a few circuits, like stoves that are driven at a nominal 240V. Nominal 120V can vary all the way down to 110V. At some point below that, "brown out" begins.
Where the coded meaning of the wires gets to be "interesting" is where you have an overhead light controlled by a wall switch. If there are two double pole switches controlling the same light it gets more interesting.
In general, the meaning is:
Black -- live Red -- live (out of phase with black) White -- neutral Green -- ground bare -- ground.
However, in many homes, the wire from the appliance to the controlling switch has been The stove in my house has a clock/timer on it that is driven by 120V is wired with the standard 3 connector wire consisting of a white wire, a black wire, and a bare wire.
In this case, the black wire is used to carry (unswitched) power from the overhead junction box to the switch. The companion white wire is used to carry (switched) power from the switch to the power side of the light circuit, which is a black wire.
The above results in a white wire being connected to a black wire. This looks "wrong" to a DIY neophyte. The official code uses bits of colored tape to indicate such things as "white coded as black", but that's over my head.
The electrical wiring in some homes dates back about a century, before the wires had colors. Things get really interesting then.
All of this is a digression from 1NF. Again, 1NF is insufficient for what?
JOG - 28 Aug 2007 15:06 GMT > > I am still fighting with the theoretical underpinning for 1NF. As > > such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 51 lines] > The electrical wiring in some homes dates back about a century, before the > wires had colors. Things get really interesting then. Ah, when men were real men, and wired electrics up with their teeth ;)
> All of this is a digression from 1NF. Again, 1NF is insufficient for > what? Insufficient is probably the wrong word. I'm having trouble finding a direct 1NF encoding of propositions such as:
Brown -> live. Red -> live Blue -> neutral. Black -> neutral. Green ^ yellow -> earth.
that doesn't require the addition of a surrogate identifier or the use of nested relations, neither of which seem to exist in the original propositions.
Its perhaps a symptom concern with propositions where there are two components that play the same role - such as a friendship relation: E.g. if Fred and Barney are friends we have to encode them under different attribute names in RM, whereas they are actually playing exactly the same role (of "friend"). Using "Friend1" and "Friend2" don't seem wholly elegant. If I took Kevin's approach (which I am still chewing over) for this latter example I'd end up with:
Id Friend -------------- 1 Fred 1 Barney 2 Wilma 2 Betty
again with the use of some sort of surrogate identifier, to assuage the fact that a friendship (just like a pattern) is identified by more than one attribute playing equal roles. This addition seems to be adding complexity where it did not originally exist. (The caveat to all this being that historically any misgivings with RM I've had have turned out to be down to a weakness in my grasp of it). J.
Bob Badour - 28 Aug 2007 17:05 GMT > I am still fighting with the theoretical underpinning for 1NF. As > such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 77 lines] > > Regards, Jim. There is one obvious way to represent that in 1NF:
Create a color domain where a single value represents green and yellow, another value represents green, and a third represents yellow etc. The domain could even represent thick green/thin yellow as a separate value from thick yellow/thin green if one chooses.
Regardless whether one creates only the domain or also uses it as a candidate key for some sort of lookup table, the resulting relation is simply:
Colour Type ======= ------- ...
Your ID above is one example of such a domain. However, the domain need not be numeric or have any external numeric representations. It need only exist with a distinct value for green and yellow.
JOG - 28 Aug 2007 17:22 GMT > > I am still fighting with the theoretical underpinning for 1NF. As > > such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 96 lines] > not be numeric or have any external numeric representations. It need > only exist with a distinct value for green and yellow. Well, practically, the surrogate key is the way that I would go. My question is rather whether this corresponds naturally to the original propositions, which don't require a new domain in order to be manipulated in FOL. I seem to remember a while back there was a discussion involving Marshall and a few others considering situations where a nested relation was /necessary/ (I need to have a dig for it), and it didn't sit comfortably then.
paul c - 28 Aug 2007 17:37 GMT > ... I seem to remember a while back there was a > discussion involving Marshall and a few others considering situations > where a nested relation was /necessary/ (I need to have a dig for it), > and it didn't sit comfortably then. > ... My old favourite is the relation that shows combinations, say with two attributes that make a composite key. It's a bit obscure maybe, but an example query is "combinations of parts that have ever been shipped". I think the only way to answer it *verbatim* is to group {shipment#, part#} on part# and then project away shipment#. I seem to recall that Date uses a "catalog" example involving different sets of candidate keys for a single relation.
p
Bob Badour - 28 Aug 2007 17:43 GMT >>>I am still fighting with the theoretical underpinning for 1NF. As >>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 101 lines] > propositions, which don't require a new domain in order to be > manipulated in FOL. You assume a color domain so imagining a different color domain changes the design without adding anything new.
I seem to remember a while back there was a
> discussion involving Marshall and a few others considering situations > where a nested relation was /necessary/ (I need to have a dig for it), > and it didn't sit comfortably then. I would argue that nested relations are never necessary; although, they are certainly handy at times. I would choose the discipline of base relations having no nested relations. In fact, the princicple of cautious design suggests--as tools evolve toward nested relations--to allow them only in derived relations.
JOG - 28 Aug 2007 18:34 GMT > >>>I am still fighting with the theoretical underpinning for 1NF. As > >>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 104 lines] > You assume a color domain so imagining a different color domain changes > the design without adding anything new. Okay, you're right - not a new domain, just a different one. If I had started with domain of all colours C (clearly containing the colour "grey" given the presence of the u there), I read you as proposing that it be replaced with a labelled powerset of C. Howwwever, would occams razor not suggest that we should prefer a domain made up of atomic individuals, as opposed to aliased sets, which will require an extra step to decompose?
I guess my question is heading towards what is theoretically wrong about having:
wires = { { (Colour, green), (Colour,yellow), (Type, earth) }, { (Colour, black), (Type, neutral) } }
as opposed to:
wires = { { (Pattern, greenAndYellow), (Type, earth) }, { (Pattern, solidBlack), (Type, neutral) } }
patterns = { { (Pattern, greenAndYellow), (Contains, green) }, { (Pattern, greenAndYellow), (Contains, yellow) }, { (Pattern, solidBlack), (Contains, black) } }
The first version seems so much impler, and while it doesn't accord to the traditional view of 1NF, I am unclear as to how it would harm manipulation.
> I seem to remember a while back there was a > [quoted text clipped - 7 lines] > cautious design suggests--as tools evolve toward nested relations--to > allow them only in derived relations. I would agree with this.
Bob Badour - 28 Aug 2007 19:41 GMT >>>>>I am still fighting with the theoretical underpinning for 1NF. As >>>>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 113 lines] > atomic individuals, as opposed to aliased sets, which will require an > extra step to decompose? I don't recall suggesting anything about sets--just a domain that has a distinct value that means "green and yellow".
JOG - 28 Aug 2007 19:53 GMT > >>>>>I am still fighting with the theoretical underpinning for 1NF. As > >>>>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 116 lines] > I don't recall suggesting anything about sets--just a domain that has a > distinct value that means "green and yellow". Okay, sure. But then to be able to query for green and yellow individually one must employ a further relation encoding two more propositions that state "'Green and yellow' contains 'Green'" and that "'Green and yellow' contains 'Yellow'" respectively. One then has a schema with two domains - one for the composites and one for individual colours (which is what I was inferring when I initially said a new one was being added).
paul c - 28 Aug 2007 20:05 GMT ...
>> I don't recall suggesting anything about sets--just a domain that has a >> distinct value that means "green and yellow". [quoted text clipped - 6 lines] > individual colours (which is what I was inferring when I initially > said a new one was being added). I took Bob B to mean something else, eg., allowing the colour purple doesn't require an app to record what primary colours purple is usually fashioned from (maybe they're green and blue, I forget, but which they are doesn't change the point).
p
Bob Badour - 28 Aug 2007 20:23 GMT >>>>>>>I am still fighting with the theoretical underpinning for 1NF. As >>>>>>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 124 lines] > individual colours (which is what I was inferring when I initially > said a new one was being added). Assuming one has a need to query for green separately, I suppose one can define an operator on the domain to that effect. If one invents a requirement that requires a second domain, then one will need a second domain regardless.
JOG - 28 Aug 2007 23:43 GMT > >>>>>>>I am still fighting with the theoretical underpinning for 1NF. As > >>>>>>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 129 lines] > requirement that requires a second domain, then one will need a second > domain regardless. Well that sort of brings us full circle back to to my query as to whether a structure that doesn't require that second domain, such as a set where elements themselves are pure mathematical relations containing attribute/value pairs:
Wires = { {(Color, Yellow), (Color, Green), (Type, earth)}} {(Color, blue), (Type, live)} }
has any negative theoretical impacts. I can see immediately that this would affect WHERE and ON clauses in the algebra, and one would get more use out of an GROUP/UNGROUP statements, but I see nothing inherently /bad/.
Yet that is.
Bob Badour - 29 Aug 2007 00:12 GMT >>>>>>>>>I am still fighting with the theoretical underpinning for 1NF. As >>>>>>>>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 132 lines] > Well that sort of brings us full circle back to to my query as to > whether a structure that doesn't require that second domain, Let me be clear: Unless you invent a requirement that requires a second domain, no second domain is required. If one invents such a requirement, one is.
such as a
> set where elements themselves are pure mathematical relations > containing attribute/value pairs: > > Wires = { {(Color, Yellow), (Color, Green), (Type, earth)}} {(Color, > blue), (Type, live)} } But a set is a second domain. You have 1) colors and 2) sets of colors. Actually, you have sets of some supertype of color and type?!? Yuck!
> has any negative theoretical impacts. I can see immediately that this > would affect WHERE and ON clauses in the algebra, and one would get > more use out of an GROUP/UNGROUP statements, but I see nothing > inherently /bad/. > > Yet that is. Supposing you have a requirement that you must be able to use the green and the yellow separately and supposing you choose to use a set of values (i.e. an RVA), you have already identified the problem that when one ungroups, one loses the information that green and yellow belong as a pair.
To preserve this information, the dbms would have to have some facility to generate an artificial identifier for the pair. However, if one normalized the base relations, one would already have the identifier, and it would be rather simple to construct the RVA in a derived relation.
JOG - 29 Aug 2007 00:24 GMT > >>>>>>>>>I am still fighting with the theoretical underpinning for 1NF. As > >>>>>>>>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 147 lines] > But a set is a second domain. You have 1) colors and 2) sets of colors. > Actually, you have sets of some supertype of color and type?!? Yuck! I'm worrying that you have misinterpreted what I have sketched out there. I haven't specified any domain sets at all in the above - it is just a set of propositions, and as with RM, each element is a mapping from attribute names onto values, that's all. I have no idea why you think I have supertypes, etc, in there. (which I agree would be yuck)
> > has any negative theoretical impacts. I can see immediately that this > > would affect WHERE and ON clauses in the algebra, and one would get [quoted text clipped - 13 lines] > normalized the base relations, one would already have the identifier, > and it would be rather simple to construct the RVA in a derived relation. Much food for thought. Thanks for the responses Bob.
Bob Badour - 29 Aug 2007 00:42 GMT >>>>>>>>>>>I am still fighting with the theoretical underpinning for 1NF. As >>>>>>>>>>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 154 lines] > why you think I have supertypes, etc, in there. (which I agree would > be yuck) Based on this set: {(Color, Yellow), (Color, Green), (Type, earth)}
That is not a tuple. A tuple would be:
{(Color, {Yellow, Green}), (Type, earth)}
The names must be unique within a tuple.
>>>has any negative theoretical impacts. I can see immediately that this >>>would affect WHERE and ON clauses in the algebra, and one would get [quoted text clipped - 15 lines] > > Much food for thought. Thanks for the responses Bob. You are very welcome. Some of this stuff seems obvious to me now, but at one time it was anything but.
JOG - 29 Aug 2007 01:05 GMT > >>>>>>>>>>>I am still fighting with the theoretical underpinning for 1NF. As > >>>>>>>>>>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 156 lines] > > Based on this set: {(Color, Yellow), (Color, Green), (Type, earth)} Still not seeing where you get supertypes from. I just see a mapping of roles in a proposition to corresponding values.
> That is not a tuple. A tuple would be: > > {(Color, {Yellow, Green}), (Type, earth)} Yes, I realize it is not a db-tuple, because if one relaxes 1NF then one doesn't have a db-relation at all. That set-valued element still represents a proposition however, and is in fact a relation in the true mathematical sense. I find this representation interesting because a JOIN becomes a union of these elements, and a natural join is generated by default as one would expect.
> The names must be unique within a tuple. > [quoted text clipped - 20 lines] > You are very welcome. Some of this stuff seems obvious to me now, but at > one time it was anything but. JOG - 29 Aug 2007 01:11 GMT > > >>>>>>>>>>>I am still fighting with the theoretical underpinning for 1NF. As > > >>>>>>>>>>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 170 lines] > because a JOIN becomes a union of these elements, and a natural join > is generated by default as one would expect. or perhaps I am doing the opposite and keeping 1NF and relaxing the use of finite partial functions to represent tuples. I find the definition of 1NF to be a pretty nebulous beast. The "NFNF" mob for example seem to produce relations with set-values which seem entirely in 1NF to me.
> > The names must be unique within a tuple. > [quoted text clipped - 20 lines] > > You are very welcome. Some of this stuff seems obvious to me now, but at > > one time it was anything but. David Cressey - 29 Aug 2007 06:23 GMT > or perhaps I am doing the opposite and keeping 1NF and relaxing the > use of finite partial functions to represent tuples. I find the > definition of 1NF to be a pretty nebulous beast. The "NFNF" mob for > example seem to produce relations with set-values which seem entirely > in 1NF to me. If you model your data using relations, and if you accept the proposition that all relations are inherently in 1NF, then the definition of 1NF becomes moot, for your purposes. Maybe that's why it's so nebulous.
I still work with the older definition of 1NF, and I model my data into SQL tables rather than relations. Given this starting place, the question "is the table under discussion in 1NF or not" is still a relevant one, and it has a clear answer. Nothing nebulous about it.
(At the conceptual level, I don't model my data into anything but attributes, with associated entities and relationships. That's a different discussion, and 1NF need not enter that discussion).
Jan Hidders - 29 Aug 2007 06:39 GMT > > >>>>>>>>>>>I am still fighting with the theoretical underpinning for 1NF. As > > >>>>>>>>>>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 166 lines] > Yes, I realize it is not a db-tuple, because if one relaxes 1NF then > one doesn't have a db-relation at all. That is irrelevant. In a NFNF setting tuples are still defined as a certain kind of function, and what you gave is not a function.
> That set-valued element still > represents a proposition however, and is in fact a relation in the > true mathematical sense. I find this representation interesting > because a JOIN becomes a union of these elements, and a natural join > is generated by default as one would expect. That depends a little on what one would expect. ;-) One elegant definition of the natural join of two relations R and S is for example { t1 + t2 | t1 in R, t2 in S, t1 + t2 is a tuple }. If you change the definiton of tuple as you propose this doesn't work anymore. Of course it is not hard to come up with a definition that does generalize the natural join correctly.
-- Jan Hidders
JOG - 29 Aug 2007 12:05 GMT > > > >>>>>>>>>>>I am still fighting with the theoretical underpinning for 1NF. As > > > >>>>>>>>>>>such, any comments would be greatfully accepted. The reason for my [quoted text clipped - 169 lines] > That is irrelevant. In a NFNF setting tuples are still defined as a > certain kind of function, and what you gave is not a function. Aye, I am aware I have dropped the correspondence between a tuple and a finite partial function, and left a unrestricted mathematical relation in its place. What I really wanted to explore were the connotations of doing this.
> > That set-valued element still > > represents a proposition however, and is in fact a relation in the [quoted text clipped - 6 lines] > { t1 + t2 | t1 in R, t2 in S, t1 + t2 is a tuple }. If you change the > definiton of tuple as you propose this doesn't work anymore. Well hey, the definition of a tuple in terms of RM is a bit quirky anyhow (nevermind cross-product). Why let Codd have all the fun ;)
> Of course it is not hard to come up with a definition that does generalize the > natural join correctly. > > -- Jan Hidders paul c - 29 Aug 2007 17:23 GMT ...
>>>> That is not a tuple. A tuple would be: >>>> {(Color, {Yellow, Green}), (Type, earth)} [quoted text clipped - 23 lines] >> Of course it is not hard to come up with a definition that does generalize the >> natural join correctly. I too am interested in seeing it explored even though it might require discarding some conventional ideas. How "natural join" might work does seem a basic question since inferencing is a main attraction of Codd's model that most people understand fairly readily, just as they dig his table presentation idea. Maybe a different kind of group/ungroup and it would be important to define "insert" as well. I'm not pre-supposing that if it is feasible that it wouldn't end up being a way to implement a physical layer rather than the logical view a user sees.
p
David Cressey - 29 Aug 2007 06:10 GMT > Okay, sure. But then to be able to query for green and yellow > individually one must employ a further relation encoding two more [quoted text clipped - 3 lines] > individual colours (which is what I was inferring when I initially > said a new one was being added). It took me a while to realize that what you meant from your original description was that "a green and yellow wire means earth". I had thought you meant "a green wire means earth" and "a yellow wire means earth". Pardon me for being dense.
Clearly what we have here is not a domain of colors, but a domain of color codes, where a color code contains one or more colors, and maybe a "thick or thin" qualifier on each color.
It's not clear to me why you need to able to query on simple colors, unless you need to decompose the color coding scheme into its constituent parts for some reason.
There are lot of code domains where each code is made up of a set of more primitive elements. Perhaps a very relevant one might be "character code". If I have the following primitive elements:
B1, B2, B4, B8, B16, B32, B64, B128 (which might be an odd way of labelling bits 0 through 7 of a byte), I can think of the character code for 'A' as being B64+B1. Now I could query on all the character codes without necessarily having an operator that would yield "all the codes that include B1".
I think that the colors, as constituents of color codes, play the same role as bits, as constituents of character codes. Do you agree?
JOG - 29 Aug 2007 11:55 GMT > > Okay, sure. But then to be able to query for green and yellow > > individually one must employ a further relation encoding two more [quoted text clipped - 31 lines] > I think that the colors, as constituents of color codes, play the same role > as bits, as constituents of character codes. Do you agree? Yes. I mean no. No, yes. Gnngh ;)
Ok, of course I understand your point - a wire can be viewed as having a colour code, which itself has constituent parts. But its just one interpretation right. I am still seeing a difference between the propositions: * There is a colour-code "yellow and green" that denotes "earth". * The casing of an earth wire features the colour yellow and the colour green.
Its just like the difference between the propositions: * My office is B42 * My office is on floor B, room 42.
There are instances where I may well want to encode as the second proposition forms. And /if/ that were the case (iff), well 1NF is precluding me from doing this in terms of the wire example.
Bob Badour - 29 Aug 2007 12:49 GMT >>>Okay, sure. But then to be able to query for green and yellow >>>individually one must employ a further relation encoding two more [quoted text clipped - 49 lines] > proposition forms. And /if/ that were the case (iff), well 1NF is > precluding me from doing this in terms of the wire example. I disagree. You have already noted that 1NF allows this with exactly 2 relations (or with 1 relation and one or more operations on the color code domain.)
JOG - 29 Aug 2007 14:16 GMT > >>"JOG" <j...@cs.nott.ac.uk> wrote in message > [quoted text clipped - 57 lines] > relations (or with 1 relation and one or more operations on the color > code domain.) True, I do see that, but it does so by requiring the invention of a colour-code concept which isn't in the proposition "The casing of an earth wire features the colour yellow and the colour green".
Brian Selzer - 29 Aug 2007 19:03 GMT >> >>"JOG" <j...@cs.nott.ac.uk> wrote in message >> [quoted text clipped - 72 lines] > colour-code concept which isn't in the proposition "The casing of an > earth wire features the colour yellow and the colour green". You have to consider the entire relation value: what about the propositions (treating or exclusively, of course):
"The casing of a live wire features the colour brown or the colour red."
"The casing of a neutral wire features the colour blue or the colour black."
Write a predicate for the relation schema that when extentially quantified and extended yields a set of atomic formulae that implies all three of the propositions above. I think you'll find that the colour-code concept is in that predicate.
JOG - 29 Aug 2007 22:21 GMT > >> >>"JOG" <j...@cs.nott.ac.uk> wrote in message > [quoted text clipped - 84 lines] > propositions above. I think you'll find that the colour-code concept is in > that predicate. I agree. I hold little stock with set based values so in RM I would go for the addition of colour-code foreign key.
But what if we weren't tied to a traditional relational schema and tweaked the system so it could allow propositions with more than one role of the same name without decomposing them. As Jan pointed out 'tuples' are no longer functions - they would be unrestricted binary relations (subsets of attribute x values). We could produce a comparatively simpler and less cluttered schema, predicate in a very similar manner as before, and with a few simple alterations could have an equally effective WHERE mechanism. My concern however would be the consequences to JOIN.
Brian Selzer - 30 Aug 2007 01:41 GMT >> >> >>"JOG" <j...@cs.nott.ac.uk> wrote in message >> [quoted text clipped - 109 lines] > an equally effective WHERE mechanism. My concern however would be the > consequences to JOIN. I'm not sure I understand what you are driving at. In the example you provided, it is the combinations of values from a simple domain that have significance, regardless of whether they're wrapped in a single attribute or not. To me it doesn't make sense to have multiple attributes with the same name--the attribute names correspond to free variables in a predicate: how could you assign multiple values to the same variable? But you can certainly assign a set of values to a variable that expects a set of values, since a set is a value! And you can certainly have a predicate with free variables that range over relations and free variables that range over individuals--it's just that the predicate is no longer first order.
JOG - 30 Aug 2007 12:27 GMT > >> "JOG" <j...@cs.nott.ac.uk> wrote in message > [quoted text clipped - 120 lines] > name--the attribute names correspond to free variables in a predicate: how > could you assign multiple values to the same variable? Well consider it this way. If I have the propositions:
The person named Jim speaks the language English The person named Jim speaks the language German The person named Brian speaks the language English
I have three propositions, and hopefully we'd agree there are two roles in these propositions: name and speaks_language. So in FOL I could write these propositions as: [P1] Name(x, Jim) -> speaks_language(x, English) [P2] Name(x, Jim) -> speaks_language(x, English) [P3] Name(x, Brian) -> speaks_language(x, English)
Are we agreed up to there? If so then [P1] ^ [P2] gives us (via composition): Name(x, Jim) -> speaks_language(x, English) ^ speaks_language(x, English)
and we are left with a sentence that has two distinct roles, one of which appears twice. All of this sort of thinking has been driven by a distaste us having to add a magic 'header' component to a relation (probably as a consequence of reading pascal's work), and the desire to bring roles back into the equation.
> But you can > certainly assign a set of values to a variable that expects a set of values, > since a set is a value! And you can certainly have a predicate with free > variables that range over relations and free variables that range over > individuals--it's just that the predicate is no longer first order. Bob Badour - 30 Aug 2007 13:46 GMT >>>>"JOG" <j...@cs.nott.ac.uk> wrote in message >> [quoted text clipped - 150 lines] >>variables that range over relations and free variables that range over >>individuals--it's just that the predicate is no longer first order. Where did Germany go?
JOG - 30 Aug 2007 14:30 GMT > >>"JOG" <j...@cs.nott.ac.uk> wrote in message > [quoted text clipped - 156 lines] > > Where did Germany go? Good grief, the perils of cut and paste. It should of course have been:
----------------- [P1] Name(x, Jim) -> speaks_language(x, English) [P2] Name(x, Jim) -> speaks_language(x, German) [P3] Name(x, Brian) -> speaks_language(x, English)
Are we agreed up to there? If so then [P1] ^ [P2] gives us (via composition): Name(x, Jim) -> speaks_language(x, English) ^ speaks_language(x, German) -----------------
Was fur ein dummkopf....
David Cressey - 30 Aug 2007 15:26 GMT > Well consider it this way. If I have the propositions: > [quoted text clipped - 19 lines] > (probably as a consequence of reading pascal's work), and the desire > to bring roles back into the equation. Is the subject of speakers and languages contrived? (I'm at risk of becoming obsessive on the subject of "contrived"). If it is, I'd like to suggest that we return to a classic contrived example for this newsgroup: the subject of pizzas and toppings.
You can, if you like extend the topic to pizzas toppings and cheeses. You can then go to google groups and look up the history of the discussion of the subject of pizzas, toppings, and cheeses in this newsgroup (c.d.t.).
You will find an extensive discussion of the questions that arise when an attribute value can be a set, rather than just a single value. Some of that discussion makes sense to me. Some of it is just pure blather. There are plenty of inputs from the MV sect of the NFNF religion.
My apologies, JOG, if you were a participant in those discussions. My memory fails me. If not, the principal value of recapitulating that example, rather than covering speakers and languages, or cows and colors, or wires and color codes, is that a lot of the responses are already neatly collected in google groups. Those who do not learn from on line discussions are condemned to repeat them. (Apologies to Santayana enthusiasts).
> > But you can > > certainly assign a set of values to a variable that expects a set of values, > > since a set is a value! And you can certainly have a predicate with free > > variables that range over relations and free variables that range over > > individuals--it's just that the predicate is no longer first order. See above.
Neo - 30 Aug 2007 23:08 GMT > I'd like to suggest that we return to > a classic contrived example for this newsgroup: > the subject of pizzas and toppings. Below dbd script models two orders with pizzas, the second with multiple cheeses and topping.
(new 'order) (new 'pizza) (new 'size) (new 'crust) (new 'cheese) (new 'topping) (new 'coke 'drink)
(; Create order#1 consisting of a small pizza) (new 'order#1 'order) (set (it) item (block (new) (set pizza instance (it)) (set+ (it) size 'small) (set+ (it) crust 'thin) (set+ (it) cheese 'mozzarella) (set+ (it) topping 'veggies) (it)))
(; Create order#2 consisting of a large pizza and 4 drinks) (new 'order#2 'order) (set (it) item (block (new) (set pizza instance (it)) (set+ (it) size 'large) (set+ (it) crust 'thick) (set+ (it) cheese 'mozzarella) (set+ (it) cheese 'parmesan) (set+ (it) topping 'sausage) (set+ (it) topping 'pepperoni) (set+ (it) topping 'olive) (it))) (set+ (it) item coke quantity '4)
(; Get orders with small pizza) (; Gets order#1) (and (get order instance *) (get * item (and (get pizza instance *) (get * size small))))
(; Get orders with pizza with mozzarella cheese) (; Gets order#1 and order#2) (and (get order instance *) (get * item (and (get pizza instance *) (get * cheese mozzarella))))
For details, see www.dbfordummies.com/example/ex305.asp
Brian Selzer - 30 Aug 2007 18:41 GMT >> >> "JOG" <j...@cs.nott.ac.uk> wrote in message >> [quoted text clipped - 161 lines] > > Are we agreed up to there? Not exactly. What you have are the roles Name and Language which appear as free variables in the predicate Speaks. A sentence in FOL is a closed formula, for example,
exists Name exists Language Speaks(Name,Language)
where each quantifier binds a free variables in Speaks. Supposing that the domains for Name and Language are,
Names {Jim, Brian, Sue} and Languages {English, German, French}
respectively, an interpretation of the sentence gives,
Speaks(Jim,English) /\ Speaks(Jim,German) /\ ~Speaks(Jim,French) /\ Speaks(Brian,English) /\ ~Speaks(Brian,German) /\ ~Speaks(Brian,French) /\ ~Speaks(Sue,English) /\ ~Speaks(Sue,German) /\ ~Speaks(Sue,French)
Which under the closed world assumption becomes,
Speaks(Jim,English) /\ Speaks(Jim,German) /\ Speaks(Brian,English)
From this it can be deduced that Jim speaks both English and German, and that Jim and Brian both speak English. Under the domain closure assumption, it can be deduced that Sue does not exist, and that the only languages that exist are English and German. Sue can exist and French can exist, but since neither are referenced, neither does. It should be noted that just because Brian exists and German exists doesn't mean that Brian speaks German. The truth value for Speaks(Brian,German) was assigned false under the given interpretation.
> If so then [P1] ^ [P2] gives us (via > composition): [quoted text clipped - 13 lines] >> variables that range over relations and free variables that range over >> individuals--it's just that the predicate is no longer first order. JOG - 30 Aug 2007 19:22 GMT > >> "JOG" <j...@cs.nott.ac.uk> wrote in message > [quoted text clipped - 171 lines] > > exists Name exists Language Speaks(Name,Language) Well that is certainly one possibility, and of course I realise that it is how Codd prescribed encoding a proposition in his 1969 paper. I am suggesting that:
Ex has_Name(x, persons_name) -> speaks_language(x, language)
is an equally valid, if not better option. Why? Because we can explicitly incorporate attribute names (which remember Codd just bolted on, redefining a mathematical relation in the process), and secondly the key is clearly expressed (all attributes to the left of the ->) - there is no need for a magic header.
> where each quantifier binds a free variables in Speaks. Supposing that the > domains for Name and Language are, [quoted text clipped - 13 lines] > ~Speaks(Sue,German) /\ > ~Speaks(Sue,French) Aye, but where have those all important attribute names disappeared to?
> Which under the closed world assumption becomes, > [quoted text clipped - 10 lines] > truth value for Speaks(Brian,German) was assigned false under the given > interpretation. I understand this view, but all CWA (imo) should do is tell us that no /propositions/ exist discussing sue. Surely we want a database that, if we ask it whether sue exists, responds "not as far as I've been told" instead of "no definitely not. no sireee. never ever"?
> > If so then [P1] ^ [P2] gives us (via > > composition): [quoted text clipped - 13 lines] > >> variables that range over relations and free variables that range over > >> individuals--it's just that the predicate is no longer first order. Bob Badour - 30 Aug 2007 20:14 GMT >>>>"JOG" <j...@cs.nott.ac.uk> wrote in message >> [quoted text clipped - 183 lines] > secondly the key is clearly expressed (all attributes to the left of > the ->) - there is no need for a magic header. How does it express multiple candidate keys?
JOG - 30 Aug 2007 21:33 GMT > >>"JOG" <j...@cs.nott.ac.uk> wrote in message > [quoted text clipped - 189 lines] > > How does it express multiple candidate keys? Bloody good question sir. I hadn't really thought about it - there is no notion of a key in predicate logic. In fact if one observes multiple keys you've probably encoding more than one proposition. I dinked about google for a common example and ended up with: {empID, SSN, city, zip} where empID and SSN are both candidates. In that case we've actually got:
empID -> SSN ^ city ^ zip SSN -> empID ^ city ^ zip
Off the top of my head, I'd say record either format and specify in the set's intension that SSN<-> empID.
Bob Badour - 30 Aug 2007 21:43 GMT >>>>"JOG" <j...@cs.nott.ac.uk> wrote in message >> [quoted text clipped - 202 lines] > Off the top of my head, I'd say record either format and specify in > the set's intension that SSN<-> empID. I was thinking more along the lines of say a schedule relation:
Teacher Room Time Jim 100 4:00pm Bob 100 3:00pm Bob 200 4:00pm Jim 200 3:00pm
It has two candidate keys {Teacher,Time} and {Room,Time}
Neo - 30 Aug 2007 22:42 GMT > Teacher Room Time > Jim 100 4:00pm > Bob 100 3:00pm > Bob 200 4:00pm > Jim 200 3:00pm Below dbd script models above. In addition, Jim, Bob and Brian teach in room 300 at 5:00 pm.
(new 'teacher) (new 'room)
(new 'tuple1 'TeacherRoomTime) (set+ (it) teacher 'jim) (set+ (it) room '100) (set+ (it) time '1600)
(new 'tuple2 'TeacherRoomTime) (set+ (it) teacher 'bob) (set+ (it) room '100) (set+ (it) time '1500)
(new 'tuple3 'TeacherRoomTime) (set+ (it) teacher 'bob) (set+ (it) room '200) (set+ (it) time '1600)
(new 'tuple4 'TeacherRoomTime) (set+ (it) teacher 'jim) (set+ (it) room '200) (set+ (it) time '1500)
(new 'tuple5 'TeacherRoomTime) (set+ (it) teacher 'jim) (set+ (it) teacher 'bob) (set+ (it) teacher 'brian) (set+ (it) room '300) (set+ (it) time '1700)
(; Get teachers for room 300 at time 1700) (; Gets jim, bob and brian) (get (& (get TeacherRoomTime instance *) (get * room 300) (get * time 1700)) teacher *)
JOG - 30 Aug 2007 23:46 GMT > >>>>"JOG" <j...@cs.nott.ac.uk> wrote in message > [quoted text clipped - 212 lines] > > It has two candidate keys {Teacher,Time} and {Room,Time} Ahhh, the good old irreducible tuple, overlapping superkey example. Its been too long I tell you, too long. A good example. However, theres a much better reasoning against my 'left of the -> is the key' reasoning. I like to call it the 'that makes no sense whatsoever' complaint. Or the 'thats enough whisky for you' retort. I wrote nay but a few posts back:
[P1] Name(x, Jim) -> speaks_language(x, English) [P2] Name(x, Jim) -> speaks_language(x, German)
Name:Jim appears twice as an antecedent. Genius. Hardly a key then. It is an enigmatic soul who manages to disprove himself in his own examples :/
I'm currently standing by the fact that its good to have attribute names explicitly stated though.
Brian Selzer - 31 Aug 2007 00:25 GMT [big snip]
>> > Are we agreed up to there? >> [quoted text clipped - 16 lines] > secondly the key is clearly expressed (all attributes to the left of > the ->) - there is no need for a magic header. I won't go here, since you've already realized that Name isn't a key. It is true that Name multidetermines Language, however.
>> where each quantifier binds a free variables in Speaks. Supposing that >> the [quoted text clipped - 17 lines] > Aye, but where have those all important attribute names disappeared > to? They are encoded in the predicate Speaks. It is not important to know the exact composition of the predicate Speaks, other than that the only free variables that appear in it are Name and Language. The parameterized notation above is easier to write yet conveys the same information than
(Speaks such that Name := Jim and Language := English) /\ (Speaks such that Name := Jim and Language := German) /\ ...etc....
>> Which under the closed world assumption becomes, >> [quoted text clipped - 20 lines] > that, if we ask it whether sue exists, responds "not as far as I've > been told" instead of "no definitely not. no sireee. never ever"? But that's more of an open world view, which shades the meaning of every aspect of the database. I'm not saying it is wrong, but it should be understood, then, that the content in the database is limited to what is known to be true instead of what is actually true. It also provides a basis for 3VL, since by accepting the open world interpretation, you are accepting that there can be facts that are not known.
Also, it's not "never ever," but rather "definitely not in this picture of reality."
>> > If so then [P1] ^ [P2] gives us (via >> > composition): [quoted text clipped - 14 lines] >> >> variables that range over relations and free variables that range over >> >> individuals--it's just that the predicate is no longer first order. Neo - 30 Aug 2007 22:18 GMT > I have three propositions, and hopefully we'd agree there are two > roles in these propositions: name and speaks_language. So in FOL I > could write these propositions as: > [P1] Name(x, Jim) -> speaks_language(x, English) > [P2] Name(x, Jim) -> speaks_language(x, German) > [P3] Name(x, Brian) -> speaks_language(x, English) In dbd, the above are expressed as:
(new 'speak 'verb)
(new 'english 'language) (new 'german 'language)
(new 'jim 'person) (set jim speak english) (set jim speak german)
(new 'brian 'person) (set brian speak english)
(; Get persons who speaks english) (; Gets jim and brian) (get * speak english)
(; Get persons who speak english and german) (; Ges jim) (& (get * speak english) (get * speak german))
Bob Badour - 30 Aug 2007 01:42 GMT >>>>>>"JOG" <j...@cs.nott.ac.uk> wrote in message >> [quoted text clipped - 97 lines] > an equally effective WHERE mechanism. My concern however would be the > consequences to JOIN. What would you offer in place of the RM's logical identity.
JOG - 30 Aug 2007 12:34 GMT > >>Write a predicate for the relation schema that when extentially quantified > >>and extended yields a set of atomic formulae that implies all three of the [quoted text clipped - 15 lines] > > What would you offer in place of the RM's logical identity. Nothing. I am utterly convinced by Date et al's arguments in favour of logical identity. (Why would I need to replace it?) I just wanna model propositions, and they are always identified by their contents.
Bob Badour - 30 Aug 2007 13:44 GMT >>>>Write a predicate for the relation schema that when extentially quantified >>>>and extended yields a set of atomic formulae that implies all three of the [quoted text clipped - 19 lines] > logical identity. (Why would I need to replace it?) I just wanna model > propositions, and they are always identified by their contents. In: {{(Color: green), (Color: yellow), (Type: earth)}}
What provides logical identity?
JOG - 30 Aug 2007 14:27 GMT > >>>>Write a predicate for the relation schema that when extentially quantified > >>>>and extended yields a set of atomic formulae that implies all three of the [quoted text clipped - 23 lines] > > What provides logical identity? I may be misunderstanding you, but let me take a stab. The identity of any set of course lies in its elements (i.e. in this of a single propositions, the ordered pairs). Given we know Colors are the antecedents in the proposition we are modelling, this has to be been defined in the collectivizing predicate for the whole collection of rows. We also know therefore there may not exist another set of pairs containing the same Colors, so we can identify the whole proposition through examination of just those roles. All works just as per normal in RM. Is this what you meant?
Bob Badour - 30 Aug 2007 14:55 GMT >>>>>>Write a predicate for the relation schema that when extentially quantified >>>>>>and extended yields a set of atomic formulae that implies all three of the [quoted text clipped - 33 lines] > through examination of just those roles. All works just as per normal > in RM. Is this what you meant? I haven't got a clue what you said. In the RM, every value is uniquely identifiable by the combination of relation name, attribute name and any candidate key value. That's logical identity as it was originally spelled out.
Two values above have the same attribute name.
JOG - 30 Aug 2007 16:41 GMT > >>>>>>Write a predicate for the relation schema that when extentially quantified > >>>>>>and extended yields a set of atomic formulae that implies all three of the [quoted text clipped - 35 lines] > > I haven't got a clue what you said. I just regurgitated leibniz identity.
> In the RM, every value is uniquely > identifiable by the combination of relation name, attribute name and any > candidate key value. That's logical identity as it was originally > spelled out. > > Two values above have the same attribute name. Now you've lost me. A "value" is not identifiable by its relation name and attribute name. This makes no sense to me. Where in predicate logic does that come from? A value is just a value. It is identifiable in its own right as being an individual from a domain.
An individual piece of /data/ however (which is perhaps what you mean by a value) has an identity made up of a combination of an attribute name and a corresponding value. One needs both to identify the data item. A proposition in turn is identifiable by its contents, which is a set of those data items. Regards, J.
Bob Badour - 30 Aug 2007 17:00 GMT >>>>>>>>Write a predicate for the relation schema that when extentially quantified >>>>>>>>and extended yields a set of atomic formulae that implies all three of the [quoted text clipped - 49 lines] > logic does that come from? A value is just a value. It is identifiable > in its own right as being an individual from a domain. I mispoke. "Any value represented in a relvar"
> An individual piece of /data/ however (which is perhaps what you mean > by a value) has an identity made up of a combination of an attribute > name and a corresponding value. One needs both to identify the data > item. A proposition in turn is identifiable by its contents, which is > a set of those data items. Regards, J. I repeat: two pieces of data have the same name, Color.
JOG - 30 Aug 2007 17:58 GMT > >>>>>>>>Write a predicate for the relation schema that when extentially quantified > >>>>>>>>and extended yields a set of atomic formulae that implies all three of the [quoted text clipped - 51 lines] > > I mispoke. "Any value represented in a relvar" Well it is still just a value whether its in a relvar or not - it needs no extra identity. A database table is just a set of propositions. A proposition is encoded as a set of attribute-value pairs. That's it surely?
Any notion of identity is as defined by set theory.
> > An individual piece of /data/ however (which is perhaps what you mean > > by a value) has an identity made up of a combination of an attribute [quoted text clipped - 3 lines] > > I repeat: two pieces of data have the same name, Color. Well no - a piece of data doesn't have a 'name' does it? It's just a combination of attribute and value. The number-7. name-Fred. color- red. A datum's identity is defined by the /combination/ of these two parts, and that alone - not by a label, or an alias, or an OID (as I'm sure you'd agree).
And if two datum share one of these parts (the attribute component) , well so what - they are still identifiably different things. I'm scratching my head to see the problem you are envisaging here.
Bob Badour - 30 Aug 2007 18:33 GMT >>>>>>>>>>Write a predicate for the relation schema that when extentially quantified >>>>>>>>>>and extended yields a set of atomic formulae that implies all three of the [quoted text clipped - 72 lines] > parts, and that alone - not by a label, or an alias, or an OID (as I'm > sure you'd agree). No, I don't agree. I suggest you see the definition of Logical Identity in Codd's 12 rules.
JOG - 30 Aug 2007 19:00 GMT > >>>>>>>>>>Write a predicate for the relation schema that when extentially quantified > >>>>>>>>>>and extended yields a set of atomic formulae that implies all three of the [quoted text clipped - 75 lines] > No, I don't agree. I suggest you see the definition of Logical Identity > in Codd's 12 rules. Well, I have to contest again - you are no doubt referring to "rule 2:The guaranteed access rule", and that makes no reference to the term identity (...and that is what you asked me about.) Rule 2 is stating : "every individual value in the database must be logically addressable by specifying the name of the table, the name of the column and the primary key value of the containing row."
Logically "addressable" - that's a very different kettle of fish to identity. In your original question did you mean to ask then: "What provides logical addressibality?" if one has two attributes playing the same role? I won't respond to that in advance, because I don't want to put words into your mouth. Regards, J.
Bob Badour - 30 Aug 2007 20:12 GMT >>>>>>>>>>>>Write a predicate for the relation schema that when extentially quantified >>>>>>>>>>>>and extended yields a set of atomic formulae that implies all three of the [quoted text clipped - 88 lines] > the same role? I won't respond to that in advance, because I don't > want to put words into your mouth. Regards, J. Yes. If you prefer, "logical addressibility". The term I have known for quite some time is "logical identity", but if you prefer, we can call it "logical addressibility".
JOG - 30 Aug 2007 21:13 GMT > Yes. If you prefer, "logical addressibility". Identity has a very specific meaning in set theory, hence the confusion. Addressibility makes a lot more sense to me.
> The term I have known for > quite some time is "logical identity", but if you prefer, we can call it > "logical addressibility". The simple answer to your question is that allowing multiple attributes would break the guaranteed access rule. (Its probably worth pointing out that its seen a lot of contention anyhow).
To me the point of the rule is to preclude non-logical access via pointers and OID's, and in that I agree with it wholeheartedly. It was also aimed at proscribing set based values, and I agree with that too. Wouldn't change a thing there - and relaxing 1NF doesn't affect that as far as I can tell.
What negative impact do you envision? An insert it unaffected, as is delete, and an update still just replaces one proposition with another. The only real consequence I can see would be to where-clauses which would require a tweak.
Brian Selzer - 31 Aug 2007 03:13 GMT >> >>>>>>>>>>Write a predicate for the relation schema that when extentially >> >>>>>>>>>>quantified [quoted text clipped - 99 lines] > by specifying the name of the table, the name of the column and the > primary key value of the containing row." Pardon me for being a stickler about this. I got this from dbdebunk:
"Each and every datum (atomic value) is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name."
A datum is an /atomic/ value, not an individual value. Atomic--implying that it cannot be separated into components.
So having more than one value for a particular role violates the guaranteed access rule either way you look at it. If the column names aren't unique, then you can't access a particular datum by a column name. If a value is a collection of component values, then you can't access a particular datum (component value), but only the collection in which it is contained.
But you're right that accessibility has nothing to do with identity. A value can appear many times in many different tuples and in many different relations. Logical identity ensures that no matter how many times a value appears in a database, it always maps to the same individual in the universe of discourse.
> Logically "addressable" - that's a very different kettle of fish to > identity. In your original question did you mean to ask then: "What > provides logical addressibality?" if one has two attributes playing > the same role? I won't respond to that in advance, because I don't > want to put words into your mouth. Regards, J. JOG - 31 Aug 2007 11:37 GMT >[snip] > "JOG" <j...@cs.nott.ac.uk> wrote in message [quoted text clipped - 6 lines] > > Pardon me for being a stickler about this. I got this from dbdebunk: no worries - stickling is fine.
> "Each and every datum (atomic value) is guaranteed to be logically > accessible by resorting to a combination of table name, primary key value > and column name." Coupla things - Date and Darwen argue against the idea of atomicity, and they also complain about the use of 'primary key'. I also think Codds use of the term datum is incorrect. Throughout history data has required an attribute-value pair. The word is derived from the latin for 'statement of fact', its use in science always requires that the value is described. Its common sense really - if we don't know what a value means, well its just noise. Imagine the binary value 1000001. Ascii(1000001) makes it an A, Number1000001) makes it 65, etc.
Either way, this doesn't matter as long as we know what each other mean.
> A datum is an /atomic/ value, not an individual value. Atomic--implying > that it cannot be separated into components. [quoted text clipped - 4 lines] > collection of component values, then you can't access a particular datum > (component value), but only the collection in which it is contained. Well I've never suggested multiple values contained in a collection. But yes as I said, multiple roles does break the guaranteed access rule. My question is now (in the continuuing hunt for the theory behind 1NF) is why on earth would that be a problem? I don't see any affect on the relational algebra.
> But you're right that accessibility has nothing to do with identity. A > value can appear many times in many different tuples and in many different [quoted text clipped - 7 lines] > > the same role? I won't respond to that in advance, because I don't > > want to put words into your mouth. Regards, J. Brian Selzer - 31 Aug 2007 12:21 GMT >>[snip] >> "JOG" <j...@cs.nott.ac.uk> wrote in message [quoted text clipped - 42 lines] > behind 1NF) is why on earth would that be a problem? I don't see any > affect on the relational algebra. What about restriction?
R {{A:4, A:5, B:3}, {A:3,A:4,B:4}}
R WHERE A = 3? Do you return an empty relation, or {{A:3,A:4,B:4}}? If A = 3 is true, then A = 4 is also true, but shouldn't that be impossible?
If A were a set, then you could write, R WHERE 3 IN A
R WHERE A = 4 AND A = 5? Shouldn't A = 4 AND A = 5 always return false?
>> But you're right that accessibility has nothing to do with identity. A >> value can appear many times in many different tuples and in many [quoted text clipped - 10 lines] >> > the same role? I won't respond to that in advance, because I don't >> > want to put words into your mouth. Regards, J. JOG - 31 Aug 2007 12:47 GMT > >>[snip] > >> "JOG" <j...@cs.nott.ac.uk> wrote in message [quoted text clipped - 52 lines] > If A = 3 is true, then A = 4 is also true, but shouldn't that be > impossible? Well in my own musings, the former. I viewed the WHERE clause in the light of set membership, so one asks whether the tuple contains the pair (A,3): i.e. WHERE contains(A, 3) => { { (A,3), (A,4), (B,4) } } WHERE contains(A, 4) => { { (A,3), (A,4), (B,4) }, { (A,4), (A,5), (B,2) } }
But you could also ask for existence of tuples. i.e WHERE exists(A, 1) => {} which is asking to return only propositions where there is only 1 pair featuring A as the attribute.
Or generally: i.e. WHERE exists(Role, x) => { p ε R | ∃x (Role, x) ε p }
> If A were a set, then you could write, > R WHERE 3 IN A [quoted text clipped - 16 lines] > >> > the same role? I won't respond to that in advance, because I don't > >> > want to put words into your mouth. Regards, J. JOG - 31 Aug 2007 12:53 GMT > Or generally: > i.e. WHERE exists(Role, x) => { p ε R | ∃x (Role, x) > ε p } so much for trying to use html char codes. Lets try again:
Or generally: WHERE exists(Role, 1) => { p E R | EXISTS!x (Role, x) E p } WHERE exists(Role, n) => { p E R | EXISTS=n x (Role, x) E p }
Bob Badour - 31 Aug 2007 13:29 GMT >>[snip] >>"JOG" <j...@cs.nott.ac.uk> wrote in message [quoted text clipped - 40 lines] > behind 1NF) is why on earth would that be a problem? I don't see any > affect on the relational algebra. You earlier suggested that union would suffice for join. But supposing
{{(Color: green), (Color: yellow), (Type: earth)} ,{(Color: black), (Type: neutral)}}
is valid, then the following is valid too:
{{(Color: green), (Color: yellow), (Color: black) , (Type: earth), (Type: neutral)}}
Which, of course, is a union of two of your propositions.
How does that not affect the algebra?
David Cressey - 31 Aug 2007 13:58 GMT > Well I've never suggested multiple values contained in a collection. > But yes as I said, multiple roles does break the guaranteed access > rule. My question is now (in the continuuing hunt for the theory > behind 1NF) is why on earth would that be a problem? I don't see any > affect on the relational algebra. I honestly think that the impetus behind "normalization" in the Codd 1970 paper is more of a stopgap than a theory. (I'm not familiar with the 1969 paper, and I only read the 1970 paper after I began participating in the discussions in c.d.t.) In the 1970 paper, Codd suggests that it may be worthwhile to consider the subset of schemas that contain only atomic attributes. (He didn't use the word "schemas", but I hope I can use it without introducing confusion.)
He pointed out that such a restriction did not thereby reduce the expressiveness to the system, in that for every unnormalized schema, there existed an equivalent normalized schema. "normalized" in the 1970 paper is called 1NF in later writings, once further normal forms were discovered.
There is one other piece of the 1NF definition in the 1970 paper, the "no duplicates rule". The no duplicates rule has to do with the representation of a relation, and not with a relation itself. Codd imagined (correctly) that the first relational database systems would use records to represent tuples and (virtual) arrays of records to represent relations. In a relation, there is no such thing as "a tuple appearing twice". However, in an array of records, there is such a thing as two of the records having identical contents. Codd ruled that out as a practical stop gap, in order to prevent the implementations from diverging from the properties of mathematical relations in an unnecessary and harmful way. This is my reading of the 1970 paper, in regard to 1NF theory.
There's a connection between the "atomic values" rule and the "no duplicates rule", at the implementation level.
consider the following fact:
Jack speaks English and German.
Let's say we are about to include this fact in a relation stored somewhere in a relational database, and that one of the columns of a relational table is "set of languages spoken".
Further, let's say that there is already a tuple in the relation with the following fact stored:
Jack speaks German and English.
As a practical matter, in terms of the representation of data inside a database, it can be extraordinarily difficult to ascertain that these two propositions, together, violate the "no duplicates rule"
Notice that my focus has been entirely on the implementation, and not on the relational algebra itself. With regard to the relational algebra itself, I believe your understanding is correct.
So what the heck are implementation oriented issues doing in the 1970 paper? I believe Codd wanted to get across two main ideas: building a system for relational databases would be a good idea. And building such a system was also feasable. It's for this second reason that I believe Codd added some material that is primarily about implementation, rather than about the power of relational algebra itself.
This is my insight, such as it is. I hope it helps.
Bob Badour - 31 Aug 2007 14:22 GMT >>Well I've never suggested multiple values contained in a collection. >>But yes as I said, multiple roles does break the guaranteed access [quoted text clipped - 26 lines] > mathematical relations in an unnecessary and harmful way. This is my > reading of the 1970 paper, in regard to 1NF theory. I disagree slightly with your interpretation. Codd did not disallow physical duplication. Duplicates in sets have no meaning, thus: { 1, 2, 1 } = { 1, 2 } = { 2, 2, 2, 2, 1 } etc.
At the logical level, duplicates count only once, and the physical structure conveys no meaning. By divorcing physical structure from logical interpretation, one enables physical independence. Thus duplicating information in an index alters the performance characteristics without changing the meaning of queries etc.
> There's a connection between the "atomic values" rule and the "no duplicates > rule", at the implementation level. [quoted text clipped - 28 lines] > > This is my insight, such as it is. I hope it helps. Again, I disagree slightly. While I do not know Codd's intent other than the intent expressed in his works, his observation that one can normalize quite mechanistically provides an implementation for the RVA.
Of course, that won't necessarily protect one from the update anomalies the higher normal forms address.
David Cressey - 31 Aug 2007 19:12 GMT > >>Well I've never suggested multiple values contained in a collection. > >>But yes as I said, multiple roles does break the guaranteed access [quoted text clipped - 36 lines] > duplicating information in an index alters the performance > characteristics without changing the meaning of queries etc. It looks like I was wrong. I scanned the 1970 paper again and didn't find a mention of the "No duplicate rule". I must have confused the 1970 paper with some of the writings that refer to it.
The case I was mentioning would be more like
{1, 2} = {2, 1}
than the ones you outlined. The only near reference to this in the 1970 paper is in the discussion of "Project", where he says that after eliminating come columns, duplicates left in the result table must be eliminated. I infer from this that he did not intend to allow duplicate rows to be passed response to a query, at least in the case of a project.
My comments have nothing to do with indexes.
> > There's a connection between the "atomic values" rule and the "no duplicates > > rule", at the implementation level. [quoted text clipped - 32 lines] > the intent expressed in his works, his observation that one can > normalize quite mechanistically provides an implementation for the RVA. Hmm....
> Of course, that won't necessarily protect one from the update anomalies > the higher normal forms address. Agreed.
JOG - 31 Aug 2007 14:28 GMT > > Well I've never suggested multiple values contained in a collection. > > But yes as I said, multiple roles does break the guaranteed access [quoted text clipped - 59 lines] > > This is my insight, such as it is. I hope it helps. I think there is a lot of truth in what you say. Codd was impressive in his ability to be theoretical and pragmatic. His success comes in part I believe to his social awareness of the current climate in which he was working.
Brian Selzer - 01 Sep 2007 02:31 GMT >> Well I've never suggested multiple values contained in a collection. >> But yes as I said, multiple roles does break the guaranteed access [quoted text clipped - 10 lines] > attributes. (He didn't use the word "schemas", but I hope I can use it > without introducing confusion.) The whole point of 1NF boils down to the following two sentences from the June, 1970 article: "The adoption of a relational model of data, as described above, permits the defelopment of a universal data sublanguage based on an applied predicate calculus. A first-order predicate calculus suffices if the collection of relations is in normal form." Therefore, the impetus behind "normalization" is to model the universal data sublanguage after a first-order logic rather than some higher order logic.
> He pointed out that such a restriction did not thereby reduce the > expressiveness to the system, in that for every unnormalized schema, [quoted text clipped - 54 lines] > > This is my insight, such as it is. I hope it helps. JOG - 01 Sep 2007 11:21 GMT > >> Well I've never suggested multiple values contained in a collection. > >> But yes as I said, multiple roles does break the guaranteed access [quoted text clipped - 18 lines] > impetus behind "normalization" is to model the universal data sublanguage > after a first-order logic rather than some higher order logic. That's a great quote - probably the best reasoning i've seen for 1NF. Thanks.
> > He pointed out that such a restriction did not thereby reduce the > > expressiveness to the system, in that for every unnormalized schema, [quoted text clipped - 54 lines] > > > This is my insight, such as it is. I hope it helps. dawn - 02 Sep 2007 21:36 GMT > > "David Cressey" <cresse...@verizon.net> wrote in message > [quoted text clipped - 25 lines] > That's a great quote - probably the best reasoning i've seen for 1NF. > Thanks. I've suggested that I can do much simpler things in my code if the user will put up with fewer features before. Sometimes they go for that. There are trade-offs. That might make the code more reliable, faster, more maintainable, less costly to write, or this or that, but users don't just say "sure, whatever is easiest within the code, the machine, or the theory, that is all I need."
In fact, most of us would not suggest that users roll over and play dead in their efforts to improve their jobs just because of some complexity in providing them the features they desire. Similarly, users of data models should not just accept first-order logic as the end of the story. Those users are working to get a job done, with requirements to do such things as ripple deletes, ordering data, blending multiple propositions into one in various ways, handling many- to-many relationships, etc.
So, even if it were the beginning of the story, the ability to employ first order logic is not the end of the story on this one, I would hope. Don't give up. The industry needs you. Cheers! --dawn (going back to lurking, not to worry).
<snip>
David Cressey - 01 Sep 2007 13:17 GMT > >> Well I've never suggested multiple values contained in a collection. > >> But yes as I said, multiple roles does break the guaranteed access [quoted text clipped - 18 lines] > impetus behind "normalization" is to model the universal data sublanguage > after a first-order logic rather than some higher order logic. Thank you, Brian. This clarifies the issue enormously for me
|
|