Database Forum / General DB Topics / DB Theory / July 2004
In an RDBMS, what does "Data" mean?
|
|
Thread rating:  |
Anthony W. Youngman - 14 May 2004 00:44 GMT In relational theory, everyone seems to be talking about modelling "data", but I've never seen an explanation of what "data" is. As far as I can tell, C&D took this philosophical concept of "data", and then built their relational theory on top of it. That's okay. We have a (fairly) simple, consistent model. But what the heck IS data?
Okay. Let's explain where I'm coming from. You've seen me going on about "evidence" and "science" etc etc. So I'm going to drag science into this, Newtonian Mechanics, to be precise (of course).
Newton came up with these philosophical concepts called "mass", "energy", "space" and "time". On these, he built his (fairly) simple consistent model. And then Einstein came along and said he'd got his fundamentals wrong - mass and energy were the same thing, and space and time were the same thing. And because Newton didn't take the fact that these things were interchangeable, his model didn't work when compared to reality.
Okay. So what is "data". Because if we can't anchor that in the real world, we have no way of knowing if, or how strongly, relational theory is relevant (and usable) in the real world.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Leandro Guimarães Faria Corsetti Dutra - 14 May 2004 02:29 GMT > Newton came up with these philosophical concepts called "mass", > "energy", "space" and "time". On these, he built his (fairly) simple [quoted text clipped - 3 lines] > fact that these things were interchangeable, his model didn't work > when compared to reality. Nice story. But irrelevant.
> Okay. So what is "data". Because if we can't anchor that in the real > world, we have no way of knowing if, or how strongly, relational > theory is relevant (and usable) in the real world. So you are suggesting Newton wasn't (and isn't) relevant in the real world? Or are you just trying to be smart?
Now, it is a nice thing to be smart. But remember it is not everyday we face situations where Relativity is relevant and usable in the real world... in everyday life Newtonian physics are quite useful, and unless you are in some limit situation relevant -- and much simpler than The Real Thing.
 Signature Leandro Guimarães Faria Corsetti Dutra +55 (11) 5685 2219 Av Sgto Geraldo Santana, 1100 6/71 +55 (11) 5686 9607 04.674-000 São Paulo, SP BRASIL http://br.geocities.com./lgcdutra/
x - 14 May 2004 13:46 GMT "Leandro Guimaraes Faria Corsetti Dutra" <leandro@dutra.fastmail.fm> wrote
> > Newton came up with these philosophical concepts called "mass", > > "energy", "space" and "time". On these, he built his (fairly) simple [quoted text clipped - 18 lines] > and unless you are in some limit situation relevant -- and much > simpler than The Real Thing. Anthony said because we work with data, we should know what data is. He would want an answer to his question: "But what the heck IS data ?" Of course this is a trivial question for you :-) I remember Fabian Pascal started one of his seminars with several such "trivial" questions.
Why you have not answered the question ?
Anthony W. Youngman - 15 May 2004 22:58 GMT >> Now, it is a nice thing to be smart. But remember it is not >> everyday we face situations where Relativity is relevant and usable in [quoted text clipped - 9 lines] > >Why you have not answered the question ? Thanks, X.
I take it Leandro is parading his ignorance, rather than seeking enlightenment.
But I'll try to enlighten him, anyway. We now know that mass as it really is, and mass as it is defined in Newton's model, aren't quite the same thing. Therefore, as Leandro says, we know that Newtonian Mechanics, for the most part, works, and we also know where it doesn't work.
But *I* don't know what "data" is "as it really is", and from the answers I've got so far I don't think anybody else does. The best definition so far is for data as it is defined in the relational model (and that's pretty much the only proper definition anybody's tried to give).
And if we haven't got a philosophical definition, we can't compare the philosophical and theoretical definitions, and therefore we haven't got a clue as to whether either "the relational model mostly works", or (and this is important) where its limitations are and where it breaks down.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
mAsterdam - 16 May 2004 01:10 GMT ...
>> Why you have not answered the question ? ...
> But *I* don't know what "data" is "as it really is", and from the > answers I've got so far I don't think anybody else does. The best [quoted text clipped - 6 lines] > a clue as to whether either "the relational model mostly works", or (and > this is important) where its limitations are and where it breaks down. I won't answer the original question either (I'll just rephrase it), but I will share some thoughts about just what "data" means. Just a few associated concepts I have used to have some grasp of this - a semantical network, if you will. I have no sources or proofs, no famous philosofer to refer you to.
The network roughly consists of: sign, media, shape and meaning.
We have signs. They serve to communicate. Signs: A handshake, a hieroglyph, an ideogram (e.g. a chinese character), a sonogram (roman, arab character), a facial expression, a traffic light on red, an alarm - these are elementary, but I would also include: the collected works of <your favorite moviestar>
In order to (just) exist all of these signs have media and shape, their pure existence does *not* require human (or just active) interpretation to assign meaning to them. Their function (purpose, ie communication), however *does* require some interpretation activity.
This combination of sign and meaning we call data.
To illustrate that this is not trivial: Data (but not signs by themselves) can represent other signs: I can write "The traffic light was red", but they can also represent other data: "We stopped because of the traffic light".
Aside: From here (sign and meaning) on "up" there is actually a lot of philosofical work and practical research. Disciplines: Semiotics, semiology and linguistics. (Note: no computer needed)
Now, when we assign same or similar meanings to bitpatterns, most of the time conviniently represented by the same shape but evidently on another medium, we have computerdata, data for short.
Finally, the rephrase of your question: How does the type of DBMS affect what we consider data?
mAsterdam - 16 May 2004 02:02 GMT ...
>> Why you have not answered the question ? ...
> But *I* don't know what "data" is "as it really is", and from the > answers I've got so far I don't think anybody else does. The best [quoted text clipped - 6 lines] > a clue as to whether either "the relational model mostly works", or (and > this is important) where its limitations are and where it breaks down. I won't answer the original question either (I'll just rephrase it), but I will share some thoughts about just what "data" means. Just a few associated concepts I have used to have some grasp of this - a semantical network, if you will. I have no sources or proofs, no famous philosofer to refer you to.
The network roughly consists of: sign, media, shape and meaning.
We have signs. They serve to communicate. Signs: A handshake, a hieroglyph, an ideogram (e.g. a chinese character), a sonogram (roman, arab character), a facial expression, a traffic light on red, an alarm - these are elementary, but I would also include: the collected works of <your favorite moviestar>
In order to (just) exist all of these signs have media and shape, their pure existence does *not* require human (or just active) interpretation. Their function (purpose, ie communication), however *does* require some interpretation activity to assign meaning to them.
This combination of sign and meaning we call data.
To illustrate that this is not trivial: Data (but not signs by themselves) can represent other signs: I can write "The traffic light was red", but they can also represent other data: "We stopped because of the traffic light".
Aside: From here (sign and meaning) on "up" (towards information, knowledge, insight, wisdom, action, ...) there is actually a lot of philosofical work and practical research. Disciplines: Semiotics, semiology and linguistics. (Note: no computer needed)
Now, when we assign same or similar meanings to bitpatterns, most of the time conviniently represented by the same shape but evidently on another medium, we have computerdata, data for short.
Finally, the rephrase of your question: How does the type of DBMS affect what we consider data?
Anthony W. Youngman - 17 May 2004 14:37 GMT >... >>> Why you have not answered the question ? [quoted text clipped - 15 lines] >I have no sources or proofs, no famous >philosofer to refer you to. <major chomp>
>Aside: From here (sign and meaning) on "up" (towards >information, knowledge, insight, wisdom, action, ...) [quoted text clipped - 10 lines] >Finally, the rephrase of your question: >How does the type of DBMS affect what we consider data? Okay. That's actually a very good insight ...
Now let's go back to "The Philosophy of Science" :-) and Newton :-) For my first attempt at a Masters, practically the first thing we did was "The philosophy of Science". And, helped by both students and a lecturer who didn't have a clue (the student extrapolated a line from the origin, through an asymptote, to a random position in number-space, and then used this to ridicule the theory he didn't like. And the lecturer said "good argument" !?!?!? )
I'm going to start saying "metaphysics" instead of philosophy here - I think it's a subset of philosophy, and a better word to use, but as you can see, I'm really getting into territory I don't understand ...
Anyway. Newtonian Mechanics is a self-contained, consistent, mathematical theory. It relies on the concepts (call them "axioms") of mass, energy, space, and time (and maybe more). We can define mass in mathematical terms as "F=ma, where mass m is the constant property describing the resistance of an object to a change in its velocity". Likewise, space "is a co-ordinate system with distance measured in metres along three mutually perpendicular axes". I won't attempt to define energy or time ...
But just as those four concepts have neat, clean, mathematical definitions they also have messy real world definitions. Mass can be defined as "my god it's heavy", or "come on! PUSH!". Space is "where are you?" or "I'm here, you're there".
Metaphysics is, I believe, the attempt to clarify both the real-world definitions and the mathematical definitions, and to try to make sure that they are describing the same thing. This is why, despite knowing that Newtonian Mechanics is wrong, we find it so useful. We know the two definitions don't match, we know WHERE they don't match, and we can predict with certainty that where the discrepancy is minimal, Newtonian Mechanics will give us a suitably accurate answer.
Now I'm going to get into the difference between "relational theory" and "relational database theory" :-) Another analogy coming up - Linux and microkernels :-) Linus realised that all this research into "Microkernel Operating Systems" was actually just as applicable to "Operating Systems". I'm putting peoples' noses out of joint because, whether they realise it or not, they believe in "relational database theory" (think Tanenbaum saying he'd give Linus an F :-) And yet, I keep on saying Pick data should be normalised! So I'm actually very pro relational theory (just leave relational databases out of it! :-)
Now here comes the crunch. As I see it, in relational *database* theory, the concept of "data" lies on this metaphysical boundary. And this is why I view every relational database I've ever seen as a tangled mess of spaghetti. What the hell is "data"! What's the real world equivalent? Like any true mathematician :-) the relational database theorists seem to be saying "metaphysics? that's not our problem. That's just an implementation detail!". Except that, going back to Newton, the fact that energy and mass are interchangeable and, as such, the equation "F=ma where m is a constant" isn't true, isn't an "implementation detail". Well, to God it may be, but it certainly isn't to us!
Going to another thread, where Lauri asked what were the advantages of Pick, I'd say that one of them is a very clear metaphysical interface. To compare Pick and Relational Database Theory ...
A Pick FILE is a real-world collective noun. What's a relational table?
A Pick RECORD is a real-world object. What's a relational row? A noun? An adjective? A gerund? (relation, for those who don't know their grammar)
A Pick FIELD is a real-world adjective. What's a relational column? An adjective? A gerund?
Because Pick's metaphysical layer is at a higher level than Relational Database Theory, we can then implement relational theory WITHIN our model without having the nasty spaghetti of a vague and undefined real-world interface. And I can righteously and reasonably throw my hands up in horror and tear my hair out when presented with a Pick database that hasn't been normalised :-)
So. Can anyone come up with a clear, simple, and NON-VAGUE definition of what "data" means when specified in a real-world, not a mathematical, context. Or come up with a perfectly good reason of why you don't have to! (Basically, because you've done it somewhere else, because you've got to do it somewhere!)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk The society which scorns excellence in plumbing because plumbing is a humble activity, and tolerates shoddiness in philosophy because it is an exalted activity, will have neither good plumbing nor good philosophy. Neither its pipes nor its theories will hold water. John W Gardner
Dawn M. Wolthuis - 17 May 2004 23:23 GMT <snip>
> And yet, I keep on saying Pick > data should be normalised! So I'm actually very pro relational theory > (just leave relational databases out of it! :-) This wasn't the crux of your post, Wol, but just a minor point that relational theorist take all of the functional dependency normal forms and state at the front of each that the data must FIRST be in FIRST NORMAL FORM and some would state that the definition of normalization requires that the data be in 1NF. So, while I accept normal forms that are based on functional dependeny logic, I'm fine with keeping a list of valid e-mail addresses together during this process. I don't want to put words in your mouth, but when you are pro normalization, are you including 1NF in hat? --dawn
Anthony W. Youngman - 18 May 2004 23:43 GMT ><snip> >> And yet, I keep on saying Pick [quoted text clipped - 10 lines] >mouth, but when you are pro normalization, are you including 1NF in >hat? --dawn As a tool of analysis, yes. For storing the data, no.
Why first normal? If data is normalised, there is no redundancy. Like so many things relational, First Normal Form seems to be case of carrying things to unnecessary and not-required extremes.
It's incredibly easy to transform other normal forms to first normal. It's not easy to go the other way (assuming you wish to reconstruct a real-world object, that is). So NFNF is functionally equivalent to FNF, but the reverse is not true.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
x - 18 May 2004 10:28 GMT > >... > >>> Why you have not answered the question ? [quoted text clipped - 116 lines] > to! (Basically, because you've done it somewhere else, because you've > got to do it somewhere!) I have read somewhere that in addition to mass, energy, space, and time there is also information. I'm not an expert, but I've heard the ADN is an example of this. I've read that "information" show up in systems with cycles. By accident, I've found this http://www.bkent.net/Doc/darxrp.htm
Chris Hoess - 19 May 2004 07:55 GMT > I've read that "information" show up in systems with cycles. > By accident, I've found this http://www.bkent.net/Doc/darxrp.htm I probably don't do the book justice just from a quick skim of the extract, but I felt compelled to comment on one point of the extract. The author claims, quite reasonably, that data models are artificial constructs and can never completely represent the true nature of information, and goes on to provide various philosophical examples of recategorization. While this will doubtless stimulate discussion from many here, I think it may be a red herring from a purely database perspective, in that these categories already exist, to some degree, in the way information is handled. Databases don't exist in vacuo; they're fed (and consulted) by users who would have some system of mental categorization even if they were shuffling everything around with paper and pencil. So while it may be philosophically interesting, the questions raised may not impinge directly on databases--except that we must recognize that the organization of data within a database can and will change with circumstances, and the database should provide facilities for changing this structure with minimum inconvenience.
 Signature Chris Hoess
mAsterdam - 19 May 2004 00:20 GMT <major chomp>
>> How does the type of DBMS affect what we consider data? >> [quoted text clipped - 60 lines] > > A Pick FILE is a real-world collective noun. What's a relational table? 1. A contradictio in terminis. 2. A collection of similarly shaped utterances.
> A Pick RECORD is a real-world object. What's a relational row? A noun? > An adjective? A gerund? (relation, for those who don't know their grammar) 1. A contradictio in terminis. 2. One utterance.
> A Pick FIELD is a real-world adjective. What's a relational column? An > adjective? A gerund? Mu.
You compare P.FILE to S.TABLE, P.RECORD to S.ROW and P.FIELD to S.COLUMN. What do we learn from this comparison? Nothing. These terms are all taken out of the context where they have meaning. One may just as well choose to compare P.FILE to S.SCHEMA, P.RECORD to S.VIEW and P.FIELD to S.TABLE - it doesn't mean anything. It is out of context.
> Because Pick's metaphysical layer is at a higher level That depends on which terms you compare from one realm to which other terms from the other. It's your pick. (sorry :-)
> than Relational > Database Theory, we can then implement relational theory WITHIN our > model without having the nasty spaghetti of a vague and undefined > real-world interface. And I can righteously and reasonably throw my > hands up in horror and tear my hair out when presented with a Pick > database that hasn't been normalised :-) Yup. That even goes for very old fixed record batch processing.
> So. Can anyone come up with a clear, simple, and NON-VAGUE definition of > what "data" means when specified in a real-world, not a mathematical, > context. Or come up with a perfectly good reason of why you don't have > to! (Basically, because you've done it somewhere else, because you've > got to do it somewhere!) Yup. It seems most people prefer to have that done implicitely or at least by someone else.
n++k - 28 May 2004 10:14 GMT > A Pick FILE is a real-world collective noun. What's a relational table? A sentence that has not yet been uttered, because it relates "unknown values."
> A Pick RECORD is a real-world object. What's a relational row? A noun? > An adjective? A gerund? (relation, for those who don't know their > grammar) A statement of fact, as an utterance of the "meta" sentence described above.
> A Pick FIELD is a real-world adjective. What's a relational column? An > adjective? A gerund? any piece of utterable information.
Karel Miklav - 18 May 2004 12:10 GMT > ... >>> Why you have not answered the question ? [quoted text clipped - 4 lines] >> or (and this is important) where its limitations are and where it >> breaks down. It mostly works, but we have some clues where it breaks too: metadata, use patterns...
> I won't answer the original question either (I'll just rephrase it), > but I will share some thoughts about just what "data" means. [quoted text clipped - 11 lines] > I would also include: the collected works > of <your favorite moviestar> I think our aim is to model reality and entertain users by creating nice illusions or giving them competitive advantage by reducing entropy in their work environment or by predicting the future.
There are many realities, but let me mention two; the reality of the current IT with implemented infrastructure and the worldview of a modern intellectual. Our interpretation of what's implemented in our (heads) is what we try to model in our toys. And by what we learnt this is nothing like mechanical wheels of a watch nor computer's random access memory and not even the relational database. The problem is in compressing the representation of data and easing the recall of that data. Here it becomes useful to know what data is, but for the current state of the art that has unfortunately already been settled.
> In order to (just) exist all of these signs have media and shape, > their pure existence does *not* require human (or just active) > interpretation. Their function (purpose, ie > communication), however *does* require some > interpretation activity to assign meaning to them. That's what you think and if I'm ever your customer, you won't model it that way :) Seriously, I don't believe in _pure_ sh.ts or that anything exists without being observed/interpreted, but I'll not go deeper as it may look like off-topic religion bashing.
> This combination of sign and meaning we call data. I'd say fixation of this on a media is called data, couse otherwise you can't recall it later. And there is a very important thing that folks miss: if you vanish and nobody knows the way you fixed that data there's just (series of ones and zeros) without meaning. Thus a fixation can't be generally called data without known way to interpret it.
Regards, Karel Miklav
mAsterdam - 18 May 2004 23:32 GMT >> ... >>>> Why you have not answered the question ? [quoted text clipped - 7 lines] > It mostly works, but we have some clues where it breaks too: metadata, > use patterns... [snip]
>> The network roughly consists of: sign, media, shape and meaning. >> [quoted text clipped - 8 lines] > illusions or giving them competitive advantage by reducing entropy in > their work environment or by predicting the future. The modeling of *what* of reality? Surely not all of it.
> There are many realities, but let me mention two; the reality of the > current IT with implemented infrastructure and the worldview of a modern [quoted text clipped - 3 lines] > and not even the relational database. The problem is in compressing the > representation of data and easing the recall of that data. Here you are speaking of data allready gathered, right?
> Here it > becomes useful to know what data is, but for the current > state of the art that has unfortunately already been settled. Settled? I don't think the understanding of what we now call data we has grown beyond the metaphore level yet (unlike for instance our understanding of 'number' or 'motion').
>> In order to (just) exist all of these signs have media and shape, >> their pure existence does *not* require human (or just active) [quoted text clipped - 6 lines] > exists without being observed/interpreted, but I'll not go deeper as it > may look like off-topic religion bashing. Watch out, cats! :-)
>> This combination of sign and meaning we call data. > [quoted text clipped - 3 lines] > just (series of ones and zeros) without meaning. Thus a fixation can't > be generally called data without known way to interpret it. Although this suggests you have a way around Shroedingers cat whithout reverting to 'purity' or 'essence' etc... (and I don't) we do agree on that. Do you have an idea *why* folks miss this?
Karel Miklav - 19 May 2004 07:14 GMT >> I think our aim is to model reality and entertain users by creating >> nice illusions or giving them competitive advantage by reducing >> entropy in their work environment or by predicting the future. > > The modeling of *what* of reality? Surely not all of it. As little as possible to solve the case, I don't see the problem here.
>> Here it becomes useful to know what data is, but for the current state >> of the art that has unfortunately already been settled. > > Settled? I don't think the understanding of what we now call data > we has grown beyond the metaphore level yet (unlike for instance > our understanding of 'number' or 'motion'). Computers can mostly only work with data that's captured as a sequence of bits. 17th century philosophers made the model, 20th century computer scientist implemented it and I don't see how you could escape that now. And most people here have clients with limited resources and strong competition and there's very, very little margin for experimentation.
>>> This combination of sign and meaning we call data. >> [quoted text clipped - 9 lines] > (and I don't) we do agree on that. Do you have an idea > *why* folks miss this? We were learnt that way, now we're trying to adapt to the world as we see it.
Regards, Karel Miklav
Leandro Guimaraens Faria Corsetti Dutra - 17 May 2004 16:23 GMT > I take it Leandro is parading his ignorance, rather than seeking > enlightenment. If you had any to offer...
> But *I* don't know what "data" is "as it really is", and from the > answers I've got so far I don't think anybody else does. As far as I remember my Philosophy, that's where English Objectivists -- that's not their real name, I forget it -- went wrong. They wanted to start from data, and couldn't define that.
That's the other reason for my not answering the original question -- there is no answer, other than the trivial -- and useless -- ones already given. The other reason, it's irrelevant to our discussions here.
> The best definition so far is for data as it is defined in the > relational model (and that's pretty much the only proper definition > anybody's tried to give). Which definition, in which version of whose version of it?
> And if we haven't got a philosophical definition, we can't compare the > philosophical and theoretical definitions, and therefore we haven't got > a clue as to whether either "the relational model mostly works", or (and > this is important) where its limitations are and where it breaks down. It would be more interesting to compare not to a non-existing, non-achievable philosophical definition, but to misunderstanding. Like the differentiation of data and metadata.
 Signature Leandro Guimarães Faria Corsetti Dutra +55 (11) 5685 2219 Av Sgto Geraldo Santana, 1100 6/71 +55 (11) 5686 9607 04.674-000 São Paulo, SP BRASIL http://br.geocities.com./lgcdutra/
x - 14 May 2004 08:39 GMT > Okay. So what is "data". Because if we can't anchor that in the real > world, we have no way of knowing if, or how strongly, relational theory > is relevant (and usable) in the real world. Data: ---------- 1. facts 2. encoded information
Dawn M. Wolthuis - 14 May 2004 14:02 GMT > > Okay. So what is "data". Because if we can't anchor that in the real > > world, we have no way of knowing if, or how strongly, relational theory [quoted text clipped - 4 lines] > 1. facts > 2. encoded information I'd vote for adding this nice short, crisp definition of data to our glossary. --dawn
x - 14 May 2004 15:19 GMT > > > Okay. So what is "data". Because if we can't anchor that in the real > > > world, we have no way of knowing if, or how strongly, relational theory [quoted text clipped - 7 lines] > I'd vote for adding this nice short, crisp definition of data to our > glossary. --dawn Oops. I forgot one archaic meaning: FATE :-)
mAsterdam - 14 May 2004 21:58 GMT >>>Data: >>>---------- [quoted text clipped - 5 lines] > > Oops. I forgot one archaic meaning: FATE :-) And the plural of datum (eng: date)?
So it should be: [Data] 0. fate 1. facts 2. encoded information 3. dates
- except I think it doesn't help at all. Maybe this is how the metadata modellers got to 900.
:-) Anthony W. Youngman - 14 May 2004 19:54 GMT >> > Okay. So what is "data". Because if we can't anchor that in the real >> > world, we have no way of knowing if, or how strongly, relational theory [quoted text clipped - 7 lines] >I'd vote for adding this nice short, crisp definition of data to our >glossary. --dawn It is nice and crisp. But (see my other post) if "data" is the philosophical gateway linking the real world and database theory, then it's far too simplistic.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Mike Nicewarner - 14 May 2004 18:12 GMT I agree that Data is defined as facts and that the facts could be encoded in some way. However, information is simply defined as data in context. For instance, a value of data could be 12. 12 by itself is data, but it lacks meaning until you put it in context to say it is a specific baby's weight at 1 year, taken at the doctor's office on a specific date. Then, the date in the context becomes information that can be used. Much of the data in a database is in a very limited and incomplete context, and is incorrectly called information, because of business assumptions about the missing context.
My 2.5 cents. :-)
 Signature Mike Nicewarner [TeamSybase] http://www.datamodel.org mike@nospam!datamodel.org Sybase product enhancement requests: http://www.isug.com/cgi-bin/ISUG2/submit_enhancement
> In relational theory, everyone seems to be talking about modelling > "data", but I've never seen an explanation of what "data" is. As far as [quoted text clipped - 20 lines] > Cheers, > Wol Mike Preece - 04 Jun 2004 03:39 GMT Sorry for the delayed response.
> I agree that Data is defined as facts and that the facts could be encoded in > some way. However, information is simply defined as data in context. Context. Important.
> For > instance, a value of data could be 12. 12 by itself is data, but it lacks [quoted text clipped - 4 lines] > called information, because of business assumptions about the missing > context. I'm thinking back to a previous thread in this ng where the fact that relationships between data can be implied by their physical proximity in a Pick database. It makes good sense logically to physically store data in context. Never mind Codd's wallop.
Mike.
> My 2.5 cents. :-) Dawn M. Wolthuis - 04 Jun 2004 03:46 GMT > Sorry for the delayed response. <snip>
> I'm thinking back to a previous thread in this ng where the fact that > relationships between data can be implied by their physical proximity > in a Pick database. It makes good sense logically to physically store > data in context. Never mind Codd's wallop. And popping up one level on that, since I let others care about the physical storage, it makes sense to logically model data in context as well. Cheers! --dawn.
Alan - 14 May 2004 18:29 GMT From "Fundamentals of Database Systems", Elmasri & Navathe [some direct quote, some rephrased for brevity] :
Data: Known facts that can be recorded and have implicit meaning. [direct quote]
Database: A logically coherent collection of related real-world data assembled for a specific purpose. [rephrased]
See? It's not all that complicated. You are applying way too much GRAVITY to your question.
> In relational theory, everyone seems to be talking about modelling > "data", but I've never seen an explanation of what "data" is. As far as [quoted text clipped - 20 lines] > Cheers, > Wol Anthony W. Youngman - 14 May 2004 19:53 GMT In message <2gkdtnF3saspU1@uni-berlin.de>, Alan <alan@erols.com> writes
>From "Fundamentals of Database Systems", Elmasri & Navathe [some direct >quote, some rephrased for brevity] : > >Data: Known facts that can be recorded and have implicit meaning. [direct >quote] Nice quote. But I'm being philosophical here. Mass, Energy, and Time are all (from Newton's standpoint) simple, immutable things. Space is as well, although it's slightly different, because it's three orthogonal instances of length.
By these standards, "data" is woefully vague and undefined. And it's not even atomic! Within the theory it's chopped up into tuples, which are themselves chopped up into (I'm not into terminology here) keys, attributes, relations, and probably other stuff besides.
>Database: A logically coherent collection of related real-world data >assembled for a specific purpose. [rephrased] Given that "data" is so vague, how do we know it's related to the real world?
>See? It's not all that complicated. You are applying way too much GRAVITY to >your question. > :-) But I'm looking for the TOE of data. We know Newton got it wrong. Energy and mass are the same thing. Time is merely a fourth dimension of space. But at least Newton had his philosophical anchors to the real world firmly in place, even if he knew something was wrong.
"data" is not an anchor. It's a formless cloud. One fact may be "object X exists". Another may be "Person A is the mother of Person B". And again, "object Z is blue". Each of those is a different *type* of fact, a different "immutable object". And RDBMS theory lumps them all together in the amorphous philosophical concept of data, and then dismantles them inside the theory, despite the fact that they can't be dismantled in the real world.
Just as we couldn't combine mass and energy and move them inside the theory until we realised that they were interchangeable - e=mc^2 - so we can't move "data" inside relational theory and deal with it there unless we have a rule that can transform one type of data into another. And until we have that rule, we need to treat the different types of data as external to the theory, and have a one-2-one mapping of those with reality.
Cheers, Wol
>> In relational theory, everyone seems to be talking about modelling >> "data", but I've never seen an explanation of what "data" is. As far as [quoted text clipped - 28 lines] >> as Lies-to-People. >> The Science of Discworld : (c) Terry Pratchett 1999
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Alan - 14 May 2004 20:47 GMT Not everything can be expressed in a formula. Data is an example. You are also confused about the way data is 'chopped up". It isn't. Data has different characteristics depending on its use and the point of view from which one views it (verty much like relativity). Let me try to express this in a way that may be satisfactory to you. It is necessary to do a top-down explanation, I think.
Given a minimum of third normal form:
An instantiated database contains stored information about a miniworld (domain, if you like).
Information is represented by one or more tuples in one or more tables that may or may not be joined to one or more other tables, or itself (a table).
Tables consist of tuples (rows). A tuple represents a piece of complete, bounded, and finite information about the primary key.
The primarty key is a piece of discrete data, as is each attribute (column) in the tuple (row).
Under ideal circumstances, the primary key is a piece of meaningful data from the miniworld (sometimes it is necessary to create an artificial key, but this is still a piece of discrete data. Sometimes the primary key is made up of several attributes (composite key), but this is consistent for each tuple. Although each attribute represents a discrete piece of data, when combined into a composite key, the composite key is also a discrete piece of data, but now contains more information. In chemistry, this would be a "compound" made from several "elements".
Because the Primary Key is unique, and all attributes in a tuple are about that key and nothing but the key, each tuple is complete, bounded, and finite. If the tuple is complete, bounded, and finite, then each element of the tuple (the attributes) must also be complete, bounded, and finite. The attributes themselves do not contain "information" until the tuple is realized.
Knowledge is realized by the examination of all of the information.
So, we have
data is contained in attributes information is contained in tuples (which are made of attributes) information is also contained in tables, which are really just many tuples knowledge is contained in a database
It's not confusing at all if you don't want it to be.
"Relation" is a term from logical modeling and can be thought of as a "superclass" term that encompasses "entities" and "relationships" and has no place in this argument.
BTW, Time is not the 4th dimension of space. Space is expressed in three dimensions. Time is another dimension for sure, but of something larger that we can't yet identify. For now, we can say that space and time are dimensions of the universe. Space is measured by three dimensions. The universe is measured by the three dimesions of space plus the dimension of time. Of course, there may be more dimensions.
> In message <2gkdtnF3saspU1@uni-berlin.de>, Alan <alan@erols.com> writes > >From "Fundamentals of Database Systems", Elmasri & Navathe [some direct [quoted text clipped - 80 lines] > >> as Lies-to-People. > >> The Science of Discworld : (c) Terry Pratchett 1999 reports
> as Lies-to-People. > The Science of Discworld : (c) Terry Pratchett 1999 Laconic2 - 14 May 2004 21:26 GMT If we are going to bring Newton into this forum, again, let's go back to the data. And let's see if we can get it right, this time.
Tycho Brahe made years worth of very careful meticulous observations as to the positions of the planets, at observed points in time. That's data.
Johannes Kepler studied Brahe's observations for years, and discovered that the orbits of the planets were elliptical, with one focus at the sun. He also discovered the "equal areas in equal times" rule for how fast they are moving. That's analysis.
What Newton added were the laws of motion, and the law of gravitation. That's physics.
All this talk about how "Newton got it wrong, and Einstein got it right" is a bunch of claptrap. The people in this forum, for the most part, don't know what they are talking about.
There are internal problems, at the cosmological level, with Newton's view of the universe. But that's not what led Einstein to push the envelope further. Physics was in crisis in the 19th century, due to results like the Michelson-Morley experiment. That's more data.
It's data that Einstein had and Newton did not.
Tony - 15 May 2004 15:22 GMT > If we are going to bring Newton into this forum, again, let's go back to > the data. And let's see if we can get it right, this time. [quoted text clipped - 13 lines] > is a bunch of claptrap. The people in this forum, for the most part, don't > know what they are talking about. True. For the most point our expertise, if any, is in databases not physics. But some people just can't help bringing their secondary school-level knowledge of physics into every topic for some reason (not that I'm claiming to have any more than that myself). It is very tiresome.
Anthony W. Youngman - 16 May 2004 00:43 GMT >> All this talk about how "Newton got it wrong, and Einstein got it right" >> is a bunch of claptrap. The people in this forum, for the most part, don't [quoted text clipped - 5 lines] >(not that I'm claiming to have any more than that myself). It is very >tiresome. And some of us like bringing our 3rd-year undergrad Physics knowledge (from a top-5 Uni) into it, too :-)
It's just that I find Newtonian mechanics an excellent analogy. To express it in computerese, both Newtonian Mechanics and Relational Theory are instances of the class Mathematical_Theory. BOTH are mathematically perfect (well, I know Newtonian Mechanics is).
I just find it fascinating that, while we know that Newtonian Mechanics doesn't belong in the set Accurately_Matches_The_Real_World, so many people here (on the grounds of it's mathematical correctness) seem to believe that relational theory does. That argument just doesn't make sense to me.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Alfredo Novoa - 16 May 2004 12:45 GMT >It's just that I find Newtonian mechanics an excellent analogy. To >express it in computerese, both Newtonian Mechanics and Relational >Theory are instances of the class Mathematical_Theory. You never learn. Newtonian Mechanics are not mathematical theory, they are physics. They are derived from the observation of the physical world phenomenons.
>I just find it fascinating that, while we know that Newtonian Mechanics >doesn't belong in the set Accurately_Matches_The_Real_World False. Newtonian Mechanics matches the physical world very accurately in many circumstances and in almost every practical circumstance. They are extremely useful.
If we compare The Relational Model with Newtonian Mechanics, then the Pick approach should be compared with troglodite superstition.
Bill H - 16 May 2004 17:54 GMT "Alfredo Novoa" <alfredo@ncs.es> wrote in message
> ...Newtonian Mechanics matches the physical world very accurately > in many circumstances and in almost every practical circumstance. They > are extremely useful. > > If we compare The Relational Model with Newtonian Mechanics, then the > Pick approach should be compared with troglodite superstition. I'm surprised at seeing such a miscomparison. Maybe I shouldn't be. :-)
Bill
Leandro Guimaraens Faria Corsetti Dutra - 20 May 2004 05:22 GMT > the >> Pick approach should be compared with troglodite superstition. > > I'm surprised at seeing such a miscomparison. Why do you consider it a miscomparision?
 Signature Leandro Guimarães Faria Corsetti Dutra +55 (11) 5685 2219 Av Sgto Geraldo Santana, 1100 6/71 +55 (11) 5686 9607 04.674-000 São Paulo, SP BRASIL http://br.geocities.com./lgcdutra/
Dawn M. Wolthuis - 16 May 2004 13:02 GMT > >> All this talk about how "Newton got it wrong, and Einstein got it right" > >> is a bunch of claptrap. The people in this forum, for the most part, don't [quoted text clipped - 19 lines] > believe that relational theory does. That argument just doesn't make > sense to me. While I have no knowledge related to Newtonian Mechanics, I can agree with your comparison when it comes to applying Mathematical theories. There are folks who think that Mathematics, like science, is a discipline of discovery. Others, like me, believe it to be a creative act -- our use of the logic in our brains to propose axioms and then draw logical conclusions from those. We create Mathematics, sometimes in order to address the real world (counting sheep, for example) and sometimes without such a trigger in nature. Mathematical errors can be found by proving new theorems or showing where previous proofs were incorrect. There is no need to talk about anything in the real world in order to talk about such Mathematics. Folks on this list who want to discuss "relational theory" as strictly a Mathematical theory are correct in suggesting that my questions, pretty much all of them, are outside of the scope of such a theory and would, therefore, we unwelcoming of such in this forum.
If we have such a mathematical theory we can "apply it". That act is a scientific one and one that can easily be done poorly. The application of Mathematics is like the application of a metaphor (I know, I know, I've said that many times before) where the Mathematics will fit some aspects of our target domain and possibly not fit others. While it might lay down perfectly on top of its target application, it is likely there will be many areas physically related to the domain for which the Mathematical theory is irrelevant. For example, with the counting of sheep, we can apply the set of Integers with some basic arithimetic functions and we can get the counting right. But that will not tell us what to do if one sheep is missing. Such a question would be orthogonal to the "Counting Theory" that so many shepards are into. A shepherd who is immersed only in such a theory could lose their entire flock while sticking to the truth of their theory, convinced that if they only study it more and learn more about it, they will solve this problem too. That is why "sheep herding theory" is not the same as "counting theory".
My interest is in helping Little Bo Peep as well as the owner of those sheep. I'm curious about why when she took a course in college about shepherding, most of the time was spent talking about counting them, which didn't actually help her much when she got to the "real world". That is why I do not feel guilty about bringing up issues about databases in a database theory newsgroup. If this were a "relational theory" newsgroup where the goal were to push the edges of a Mathematical theory without interest in whether this theory were useful to databases or in what way it might be useful or not, that would be a different discussion.
Cheers! --dawn
Tony - 16 May 2004 14:30 GMT > >> All this talk about how "Newton got it wrong, and Einstein got it right" > >> is a bunch of claptrap. The people in this forum, for the most part, don't [quoted text clipped - 8 lines] > And some of us like bringing our 3rd-year undergrad Physics knowledge > (from a top-5 Uni) into it, too :-) I am suitably impressed and humbled... ;-)
> It's just that I find Newtonian mechanics an excellent analogy. To > express it in computerese, both Newtonian Mechanics and Relational [quoted text clipped - 6 lines] > believe that relational theory does. That argument just doesn't make > sense to me. You keep saying that (on and on, tediously...) but it just doesn't work, does it? After all, didn't NASA put a man on the moon using Newtonian Mechanics? Expensive and complex successful experiments have been done to observe the effects of relativity, but it hardly impacts on the real world as lived in by us humans does it? If your analogy holds any water at all (to give you the benefit of very large doubt), it suggests that relational theory will do just fine for pretty much anything we ever want to do "in the real world".
Tony - 16 May 2004 19:21 GMT > > I just find it fascinating that, while we know that Newtonian Mechanics > > doesn't belong in the set Accurately_Matches_The_Real_World, so many [quoted text clipped - 10 lines] > doubt), it suggests that relational theory will do just fine for > pretty much anything we ever want to do "in the real world". Perhaps more to the point, Newtonian Mechanics is an attempt (accurate or not) to model "how the world works". By contrast, database theory (any database theory) is merely trying to come up with the best way to computerize book-keeping. The two are hardly comparable endeavours, are they?
Anthony W. Youngman - 17 May 2004 09:21 GMT >> > I just find it fascinating that, while we know that Newtonian Mechanics >> > doesn't belong in the set Accurately_Matches_The_Real_World, so many [quoted text clipped - 16 lines] >computerize book-keeping. The two are hardly comparable endeavours, >are they? But both are attempts to apply a mathematical model to a real world problem. Viewed from a dispassionate oversight, both are instances of the SAME problem, and the same techniques can be applied to solving them. Namely "how well does my mathematical model work in the real world?".
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Marshall Spight - 22 May 2004 04:25 GMT > But both are attempts to apply a mathematical model to a real world > problem. Viewed from a dispassionate oversight, both are instances of > the SAME problem, and the same techniques can be applied to solving > them. Namely "how well does my mathematical model work in the real > world?". It's clear you love this analogy, but it doesn't work.
What we put in the database is data, not the real world. Neither do we attempt to say anything about the real world with our databases. Consider a payroll database. Does it contain one single fact about the natural world? It does not. It has names, social security numbers, addresses, salaries, phone numbers, etc. These are all 100% human constructs; none of them are found anywhere in the real world; they are exclusively in our heads.
I suppose you will counter with some NASA database or something. But what will it have in it? Let's say it's full of the positions of rocks. But how do we record those positions? With a GPS machine that tells us lattitude and longitude. Note that we don't have *actual* rocks in the database; we only have data for the lat/lon pairs. You could comb all over the surface of Mars or Earth and never find a lattitude line.
The internal predicate is in the database; the external predicate is in our heads. Humans convert from one to the other; machines can't. It's imperative that that the humans be able to tell the difference.
Marshall
Anthony W. Youngman - 22 May 2004 13:54 GMT >> But both are attempts to apply a mathematical model to a real world >> problem. Viewed from a dispassionate oversight, both are instances of [quoted text clipped - 11 lines] >constructs; none of them are found anywhere in the real world; they >are exclusively in our heads. Well, if they're not facts about the real world, then I presume they are imaginary musings? In which case they are no better than fantasy. So why bother with them?
Names, Social Security Numbers, etc etc are all ways of describing real things (in these cases a person). An address describes a real thing - a building. Etcetera.
But the point is, if you do not have some way of FORMALLY converting between a person (you, me, whoever) or a phone (a physical thing you can hold) or a building (something you can look at), and the data that describes those things, then your theory of data MUST be unscientific.
Bearing in mind that this is the study of philosophy ("does the tree, continue to be, if no-one's there to see") I'm quite happy with a scrappy attempt to explain things. But the conversion has to be both ways - with "mass" we know exactly what Newton meant in his mathematical theory, and we know exactly what we mean in the real world when we pick up a heavy object. And we (now, thanks to Einstein) know that those two definitions (the real and the mathematical) don't quite tie up.
But if you can't give me a way of converting between "data" and the real-world objects it describes - in both directions! - then by definition any theory of data must be unfalsifiable, therefor it is unscientific, therefor it lives very firmly in the realms of mathematics and religion. I'm sorry, but I'm a scientist by training and I most definitely don't believe in that religion.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
x - 24 May 2004 13:48 GMT > >> But both are attempts to apply a mathematical model to a real world > >> problem. Viewed from a dispassionate oversight, both are instances of [quoted text clipped - 15 lines] > imaginary musings? In which case they are no better than fantasy. So why > bother with them? But they could be fantasy :-)
> Names, Social Security Numbers, etc etc are all ways of describing real > things (in these cases a person). An address describes a real thing - a > building. Etcetera. Or an imaginary thing :-) The question is: How do you test if some "fact" is real or imaginary ?
> But the point is, if you do not have some way of FORMALLY converting > between a person (you, me, whoever) or a phone (a physical thing you can > hold) or a building (something you can look at), and the data that > describes those things, then your theory of data MUST be unscientific. Well, we can have data about many kinds of "things": physical, chemical, imaginary, etc.:-) Why are you interested only in "physical" ones ? :-)
> Bearing in mind that this is the study of philosophy ("does the tree, > continue to be, if no-one's there to see") I'm quite happy with a [quoted text clipped - 3 lines] > up a heavy object. And we (now, thanks to Einstein) know that those two > definitions (the real and the mathematical) don't quite tie up. Many of us gave up asking WHY long time ago. Instead, we ask HOW MANY/MUCH :-)
> But if you can't give me a way of converting between "data" and the > real-world objects it describes - in both directions! - then by > definition any theory of data must be unfalsifiable, therefor it is > unscientific, therefor it lives very firmly in the realms of mathematics > and religion. I'm sorry, but I'm a scientist by training and I most > definitely don't believe in that religion. We have NOTARIES, ACCOUNTANTS, LAWYERS,... :-)
Anthony W. Youngman - 17 May 2004 09:18 GMT >> I just find it fascinating that, while we know that Newtonian Mechanics >> doesn't belong in the set Accurately_Matches_The_Real_World, so many [quoted text clipped - 10 lines] >doubt), it suggests that relational theory will do just fine for >pretty much anything we ever want to do "in the real world". I think you need to read up - and fast!
If NASA had used Newtonian Mechanics, from what I know, the astronauts would never have come back.
Even under such "near earth" conditions as that, the discrepancy between Newtonian Mechanics and Relativity would have been enough to ensure the rockets ran out of fuel, stranding the astronauts in space.
We're talking velocities of 7 miles a second here, more than fast enough for relativity to make itself felt. That's roughly c*10^-5 - not small beer. Actually - it looks like we probably need relativity even with the Shuttle!
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony - 18 May 2004 10:59 GMT > >> I just find it fascinating that, while we know that Newtonian Mechanics > >> doesn't belong in the set Accurately_Matches_The_Real_World, so many [quoted text clipped - 12 lines] > > I think you need to read up - and fast! I will indeed read up - though don't worry, there is no real urgency: I am not personally involved in putting men on the Moon.
> If NASA had used Newtonian Mechanics, from what I know, the astronauts > would never have come back. [quoted text clipped - 7 lines] > beer. Actually - it looks like we probably need relativity even with the > Shuttle! Despite being no expert, I am pretty confident that you are completely wrong here. 0.0000376 * c sounds pretty small to me. How much discrepancy in fuel usage could that lead to - a millilitre even? I bet whatever difference it makes is insignificant compared to other more mundane factors such as the accuracy of measuring the rate of fuel use, quality of fuel, etc.
But yes, I will do a little Googling to see if you are right. If I had a hat, I'd be prepared to eat it if it turned out you were correct.
Laconic2 - 18 May 2004 15:28 GMT > But yes, I will do a little Googling to see if you are right. If I > had a hat, I'd be prepared to eat it if it turned out you were > correct. You are right, Tony. Your hat, if you had a hat, would be safe. The divergence between Newtonian mechanics and Einsteinian mechanics for the entire Apollo mission is less than the margin of error in the instruments on board.
OTOH, the Apollo missions did carry a fair number of instruments to the moon whose purpose was to capture data that would confirm or contradict Einstein's predictions. AFAIK, Einstein is batting a thousand.
You actually don't have to go so far afield to find a connection between Einstein's theories and everyday life. Some percentage (I don't know how much) of Europe's electric energy is generated by nuclear plants. Inside those plants, nuclear fission is the source of energy. And that energy corresponds to the reduction of mass that results from the splitting of certain kinds of nuclei.
Chris Hoess - 19 May 2004 07:41 GMT > It's just that I find Newtonian mechanics an excellent analogy. To > express it in computerese, both Newtonian Mechanics and Relational > Theory are instances of the class Mathematical_Theory. BOTH are > mathematically perfect (well, I know Newtonian Mechanics is). But you're missing an important point, namely, Newtownian mechanics incorporates into it distinct physical concepts such as mass, distance, and time. Relational theory does not. This is why we can't set up some experiment to test "relational theory" as such against the real world and see what happens: only by creating a specific schema which links together machine-readable definitions of relations and constraints and the semantic import of those relations can we try and test relational theory, or any other general theory of data modelling, against the real world.
To put it another way, relational theory is analogous to the equation for a Gaussian distribution, f(x) = ae^(-bx^2). Were I to assert that Gaussian distributions are useful in describing scientific phenomena, you might ask me for a test; and what are f, a, b, and x? And when I tell you that it depends on the phenomenon we are trying to describe, and that f, a, b, and x can be many different things, you might mistake it to be of no practical value, as it makes no verifiable predictions. But if I were to substitute for f C, the concentration, for a C0/sqrt(4piDt), for b 1/4Dt, and proclaimed x to be distance, I would have made use of a Gaussian distribution to describe the process of diffusion, and it could be checked experimentally and the predictions of the equation (Fick's Second Law) verified. Only by giving a physical interpretation to the variables of the Gaussian distribution does it become a scientifically verifiable theory; and only by creating a schema which we associate with semantics are we able to test the application of the relational model to our problems.
Having established that the relational model is an underlying mathematical framework bound to reality by the "glue" of the schemas we create, we're on better grounds to discuss the applicability of the model without premature calls for "experiment". We know that data in the relational model is formulated as logical propositions whose validity is evaluated by first-order logic. Hence my tenative suggestion in a post here about a month ago for examining alternatives to the relational model: are logical propositions the best way to formulate data, and do we need more power than first-order logic can bring us (and what trade-offs does that present)?
(Incidentally, can we agree that while consistency is not sufficient to prove the correctness of a data model, it is necessary?)
 Signature Chris Hoess
Anthony W. Youngman - 20 May 2004 00:28 GMT >> It's just that I find Newtonian mechanics an excellent analogy. To >> express it in computerese, both Newtonian Mechanics and Relational [quoted text clipped - 9 lines] >import of those relations can we try and test relational theory, or any >other general theory of data modelling, against the real world. If we can't set up an experiment (even a Gedanken thought experiment), then relational theory is not provable, therefor it is not scientific, therefor it is irrelevant to the real world, therefor why the hell are we using it :-)
As a scientist/engineer type, not a mathematician, I want some experimental proof at least. Unfortunately, all the (anecdotal) evidence I have says that other models work better ...
>To put it another way, relational theory is analogous to the equation for a >Gaussian distribution, f(x) = ae^(-bx^2). Were I to assert that Gaussian [quoted text clipped - 11 lines] >only by creating a schema which we associate with semantics are we able to >test the application of the relational model to our problems. Yup! We have an experiment!
>Having established that the relational model is an underlying mathematical >framework bound to reality by the "glue" of the schemas we create, we're on [quoted text clipped - 5 lines] >propositions the best way to formulate data, and do we need more power than >first-order logic can bring us (and what trade-offs does that present)? If we accept that data is an abstract proposition INSIDE relational theory, then I might well agree that logical propositions, first-order logic etc may well be the best way to formulate data. But that implies that data is fundamental to database theory in the same way as mass and energy individually are fundamental to Special Relativity - ie they are NOT - there is a supra-concept called mass-energy, and the transformation between mass and energy is part of the theory and nothing to do with the metaphysical interface to reality ...
>(Incidentally, can we agree that while consistency is not sufficient to >prove the correctness of a data model, it is necessary?) Of course. I'd actually rephrase that. While (internal) consistency may prove the model to be correct (mathematically), we need external consistency to prove the model accurate (here we go - arguing over the meaning of words again :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Paul - 20 May 2004 10:18 GMT > If we can't set up an experiment (even a Gedanken thought experiment), > then relational theory is not provable, therefor it is not scientific, > therefor it is irrelevant to the real world, therefor why the hell are > we using it :-) Newtonian mechanics is more like a particular instance of a database in the relational model, rather than the model itself.
The relational model is really just an implementation of first-order predicate logic that is suitable for computers.
Logic is more like a "meta-theory": it's kind of how we reason *about how we reason*, so it's a bit self-referential.
For a particular database we can test it experimentally: we add data, query it and check that the answers correspond with reality.
For first-order predicate logic itself, it's almost axiomatic that it corresponds to reality, because we are saying this is how we argue logically by definition. Godel proved that first order logic is "complete" in some sense (see here for example: http://www.sm.luth.se/~torkel/eget/godel/completeness.html), though the whole area of Godel is guaranteed to cause confusion and misunderstanding, and will possibly explode your brain.
>> (Incidentally, can we agree that while consistency is not sufficient to >> prove the correctness of a data model, it is necessary?) [quoted text clipped - 3 lines] > consistency to prove the model accurate (here we go - arguing over the > meaning of words again :-) But in order to prove the model is accurate externally we'd have to use logic. So we've got a chicken and egg situation here. What logic is external to logic itself?
Paul.
mountain man - 20 May 2004 11:32 GMT > > If we can't set up an experiment (even a Gedanken thought experiment), > > then relational theory is not provable, therefor it is not scientific, [quoted text clipped - 32 lines] > logic. So we've got a chicken and egg situation here. What logic is > external to logic itself? Random truths (Chaitin) and unprovable truths (Godel). See http://www.mountainman.com.au/GIF/logic_space_1.jpg
Pete Brown Falls Creek Oz
Laconic2 - 20 May 2004 15:28 GMT > If we can't set up an experiment (even a Gedanken thought experiment), > then relational theory is not provable, therefor it is not scientific, [quoted text clipped - 4 lines] > experimental proof at least. Unfortunately, all the (anecdotal) evidence > I have says that other models work better ... You make an interesting point here. I would add that the same arguments that would render relational theory not provable would equally well render the theory non falsifiable. In that case, the question for the engineer becomes moot. The question "why the hell are we using it" can be countered by "why the hell not".
I would suggest that the disciplined practices of engineers are based on several sources. One is prior experience, either the personal experience of an individual engineer, or the distilled experience of other engineers. Another is the accumulated results of science, and of other specialties within engineering. A third is the results of mathematics. A fourth is the study of how people carry out certain data management and data manipulation tasks in the absence of automation. A fifth is the study of the strengths and defects of "legacy systems".
Sorry the list got so long.
My personal experience tells me that the relational data model can be, in certain circumstances, an enormous aid in managing the complexity of defining the data itself, and in clarifying certain issues in the development of application software.
This is a far cry from saying that "all data should be in 1NF".
Anthony W. Youngman - 20 May 2004 22:31 GMT >> If we can't set up an experiment (even a Gedanken thought experiment), >> then relational theory is not provable, therefor it is not scientific, [quoted text clipped - 11 lines] >The question "why the hell are we using it" can be countered by "why the >hell not". Actually, you and me both have just said exactly the same thing. "provable" and "falsifiable" both mean exactly the same thing as far as science goes - to take that widely misquoted saying "the exception proves the rule (is wrong)". The bit in parentheses is ignored or unknown to most people who quote the saying ... Look in a dictionary. "to prove" can mean "to test".
As for "why the hell not" - well we should be looking for theories that ARE provable/falsifiable. Because if relational theory is not falsifiable, then equally we can't show that it works in practice (for any suitable value of "works"). Would you trust an engineer using Newtonian Mechanics if you had no way of knowing whether relativistic effects were likely in that particular application?
>I would suggest that the disciplined practices of engineers are based on >several sources. One is prior experience, either the personal experience [quoted text clipped - 6 lines] > >Sorry the list got so long. Well, when I was talking about Newtonian Mechanics metaphysics my list was about that long :-) but if things have to be complete and comprehensive, then sometimes they do get long ...
>My personal experience tells me that the relational data model can be, in >certain circumstances, an enormous aid in managing the complexity of >defining the data itself, and in clarifying certain issues in the >development of application software. I would very much agree ... indeed I would say it almost invariably is a great help, if used as your TOOL and not as your MASTER!
>This is a far cry from saying that "all data should be in 1NF". And the same here. As far as I am concerned, the job of the "database analyst designer" is to take real-world information, and convert it to a data schema for the database. If that includes conversion to 1NF, this involves a massive loss of metadata, meaning that the conversion is one-way and cannot be reversed, and therefore the act of conversion renders the whole thing unprovable / unfalsifiable / unscientific.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Laconic2 - 21 May 2004 14:07 GMT > I would very much agree ... indeed I would say it almost invariably is a > great help, if used as your TOOL and not as your MASTER! Agreed. The old saying is, "A fool with a tool is still a fool".
Or perhaps the pragma is better expressed as:
"If you can dream, and not make dreams your master, If you can think, and not make thoughts your aim, If you can meet with triumph and disaster, And treat those two impostors just the same,"
On this subthread, I think the difference between you and me is more how we choose to express things than on the substance of the matter.
Marshall Spight - 22 May 2004 04:40 GMT > As for "why the hell not" - well we should be looking for theories that > ARE provable/falsifiable. How might one falsify arithmetic? If arithmetic was falsified, would that mean it wasn't useful anymore?
Marshall
Paul - 22 May 2004 11:26 GMT > How might one falsify arithmetic? If arithmetic was falsified, would > that mean it wasn't useful anymore? It depends what you mean by "falsify arithmetic". There's a result by Tarski ( http://plato.stanford.edu/entries/tarski-truth/ ) that says essentially that no language can talk about the truth of sentences contained within itself without leading to things like the liar paradox.
You need to have a "meta-language" for arithmetic in order to talk about whether statements in arithmetic are true or not.
I guess the problem is where do you start? Set theory I suppose but then how do you talk rigorously about set theory?
All this stuff is very subtle but I think it is useful to know a bit about this kind of thing if you're interested in relational database theory.
If you mean that arithmetic is inconsistent i.e. there is a statement where you can prove both it and its negation, then that means *everything* in arithetic is both true and false.
Check out this though: http://plato.stanford.edu/entries/mathematics-inconsistent/ Inconsistent Mathematics, where you have theories that use non-classical logic and can deal with inconsistencies without collapsing in on themselves.
Paul.
Anthony W. Youngman - 22 May 2004 14:34 GMT >> As for "why the hell not" - well we should be looking for theories that >> ARE provable/falsifiable. > >How might one falsify arithmetic? If arithmetic was falsified, would >that mean it wasn't useful anymore? What an excellent question !!! Because if I answer it properly, it clearly explains the difference between mathematics and science. Thanks!
Arithmetic is part of mathematics. Therefor, it is NOT falsifiable. We merely prove it correct or incorrect. The best example is "reductio ad absurdam" - if from our starting point we end up with two mutually exclusive results then either our starting point or our logic must be wrong. Now what's this got to do with science?
Let's go from arithmetic to geometry. In three dimensions we have Euclidean geometry. We can prove it correct (or self-consistent - same thing). In four dimensions, we have special relativity, and again we can prove it correct. In two dimensions, we have planar, spherical, and toroidal geometry, and yet again, we can prove them correct.
NOW! Let's apply all three of our two-dimensional geometries to the surface of the earth. THIS is the "falsifiable" bit.
Let's use planar geometry to describe the little bit of the world we can see. I know a little bit of American geography, as do many others, so I'll use that. Let's say we're in Kansas. We know New York is 1500 miles east, and Dallas is 2000 miles south. So we predict the distance and direction from New York to Dallas. The reality is we are going to be well wrong - we've just falsified the assumption that the world is flat. Or, to put it another way, "planar geometry does not describe the world". Toroidal geometry will come up with a similar mess.
Spherical geometry, on the other hand will be pretty close. So either we've cocked up on our geometry or, as is actually the case, the earth is an approximate sphere not a perfect one. Newton mapped his mathematical "mass", "energy", "space" and "time" to the real-world equivalents, and came up with a load of predictions that mostly worked. So he concluded that his maths was wrong. If he'd concluded that reality wasn't quite as he envisaged it, he might well have beaten Einstein to the theory of relativity!
So no. Your question "how do we falsify arithmetic" is meaningless. But science is about falsifying theories *based* *on* arithmetic (and other branches of mathematics). Use the maths to make a prediction about the real world, and then prove (as in test) the theory by seeing if the prediction is true or false. And if the prediction is falsified by an exception, then you've just got an example of "the exception proves the theory is wrong".
And that's why I say Newtonian Mechanics is scientific - it is a mathematical theory that can be proved/falsified, while Relational Theory is unscientific because I can see no way - not even with a Gedanken thought experiment - of trying to falsify it.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Bill H - 24 May 2004 17:17 GMT Wol:
"Anthony W. Youngman" <wol@thewolery.demon.co.uk> wrote in message
[snipped]
> ... As far as I am concerned, the job of the "database > analyst designer" is to take real-world information, and convert it to a > data schema for the database. If that includes conversion to 1NF, this > involves a massive loss of metadata, meaning that the conversion is > one-way and cannot be reversed, and therefore the act of conversion > renders the whole thing unprovable / unfalsifiable / unscientific. But it still might be useful under the circumstances. :-)
Bill
Marshall Spight - 22 May 2004 04:35 GMT > If we can't set up an experiment (even a Gedanken thought experiment), > then relational theory is not provable, therefor it is not scientific, Correct, relational theory is not scientific.
> therefor it is irrelevant to the real world, therefor why the hell are > we using it :-) Because it is *mathematical.*
I can imagine giving you a four function calculator, and you saying, how can I devise a real-world, scientific experiment to verify the validity of this thing, and then throwing it out because you couldn't.
Four function calculators are not scientific, but they are still useful, mathematically.
> As a scientist/engineer type, not a mathematician, I want some > experimental proof at least. I am a computer scientist, which is a kind of mathematician. I have no illusion that what I do relates to the physical world.
> Unfortunately, all the (anecdotal) evidence > I have says that other models work better ... I have this gedanken experiment that says, what if I have two apples and I try to take away three. In the real world, I get an error, because once I have taken away two, I no longer have any apples that I can take away. Therefor, only positive integers are scientific. I have no use for negative numbers because they are not scientific, either, since there are no negative numbers I can observe in the natural world.
Marshall
Anthony W. Youngman - 22 May 2004 15:58 GMT >> If we can't set up an experiment (even a Gedanken thought experiment), >> then relational theory is not provable, therefor it is not scientific, > >Correct, relational theory is not scientific. Good. We agree :-)
>> therefor it is irrelevant to the real world, therefor why the hell are >> we using it :-) > >Because it is *mathematical.* So I can use any theory I like, so long as it's mathematical, then?
You'd be quite happy for me to calculate your aeroplane's route from A to B using whatever geometrical theory I cared for, and you wouldn't object if I used a theory who's practical effect was to destroy your aircraft in a huge fireball as it underwent a "controlled flight into terrain", just as long as I could prove the maths I was using was perfectly sound. The fact that it was the wrong theory for the real-world task in hand wouldn't bother you in the slightest?
>I can imagine giving you a four function calculator, and you saying, >how can I devise a real-world, scientific experiment to verify the >validity of this thing, and then throwing it out because you couldn't. > >Four function calculators are not scientific, but they are still >useful, mathematically. Well, actually, I could think of an experiment. "If I type '4' '+' '4' '*' '4' '=' into this thing, then it should come up '20' but might come up '32' ". And either way, I will be happy at using it because I can predict (ie "do science") that it will come up with a "correct" answer, and I can verify that answer.
Actually, I've just done exactly that with the calculator in my copy of Windows, and guess which answer it came up with ...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Gene Wirchenko - 23 May 2004 23:25 GMT [snip]
>>Four function calculators are not scientific, but they are still >>useful, mathematically. [quoted text clipped - 7 lines] >Actually, I've just done exactly that with the calculator in my copy of >Windows, and guess which answer it came up with ... I would have to guess since it could come up with either!
In Standard view, the answer is 32.
In Scientific view, the answer is 20. (In this view, parentheses are available for grouping operations.)
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
Anthony W. Youngman - 15 May 2004 23:05 GMT >All this talk about how "Newton got it wrong, and Einstein got it right" >is a bunch of claptrap. The people in this forum, for the most part, don't >know what they are talking about. Well, I would say Newton got it wrong, and I do know what I'm talking about, and I know I'm right :-)
>There are internal problems, at the cosmological level, with Newton's view >of the universe. But that's not what led Einstein to push the envelope >further. Physics was in crisis in the 19th century, due to results like >the Michelson-Morley experiment. That's more data. > >It's data that Einstein had and Newton did not. Except Newton DID have data that told him he was wrong. And he spent pretty much the rest of his life trying to work out why his theory didn't work completely.
Fundamental to Newtonian Mechanics is the conservation of mass - it cannot be created or destroyed. To Newton, this seemed obvious. To us, well, we know he got it wrong - we know the rule is that mass-energy is conserved, and that mass CAN be created and destroyed.
Mercury's orbit is relativistic, not classical. Try as he might, Newton just could not get his calculations and Tycho's data to agree.
Einstein just had a couple of insights that Newton didn't, due quite likely as you say to Michelson-Morley amongst other things. More data always does make life easier :-) and the data he had led him to suspect that the law of conservation of mass might actually be wrong ... the rest as they say is history ... (there's a nice story of the same sort of thing happening to Dick Feynman :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Laconic2 - 16 May 2004 02:34 GMT Try as I might, I cannot find confirmation of your extraordinary assertion that the relativisitc precession of mercury was observable in Tycho's data. I don't think you are right about this.
Keep in mind that the vast majority of mercury's precession is explainable, in classical Newtonian mechanics, by the gravitational attraction of the other planets.
In the timelines I've seen, the observation of 35 arcseconds per century of excess precession of Mercury was attributed to an observation in 1845 by Leverrier. It was further corrected to an excess of 43 arcseconds per century by Newcomb in 1882. Before Einstein, the excess precession of Mercury was attributed to a hitherto unknown (and, it turns out nonexistent) planet inside the orbit of mercury, to which they gave the name "Vulcan". (Live long and prosper).
But the descriptions of the amount of time for which you need observations of Mercury to obtain these findings are very long. So long that I find it doubtful that Tycho could have observed for long enough for his data to detect the Einsteinian precession.
As far as Newton refining and cross checking his work, and seeking to verify or falsify it down to the last epsilon (so to speak), I find that very easy to believe. In fact, his own assessment of his work is that he felt like a little child, playing with the shells on the seashore, while the vast ocean of truth lay undiscovered before him. And Einstein, when asked to comment on Newton's work, said that his own work would have been impossible without Newton's earlier work.
Those people in this forum who seem to have every human gift except humility might do well to learn from such people as Newton and Einstein.
Anthony W. Youngman - 20 May 2004 00:42 GMT >Try as I might, I cannot find confirmation of your extraordinary assertion >that the relativisitc precession of mercury was observable in Tycho's data. [quoted text clipped - 8 lines] >Leverrier. It was further corrected to an excess of 43 arcseconds per >century by Newcomb in 1882. I did a "google" on "mercury orbit newton relativity", and it gave me a load of good pages. About the first one I looked at (the third or so it found) gave me rather bigger figures than yours for precession (although it did have a few problems...)
>Before Einstein, the excess precession of Mercury was attributed to a >hitherto unknown (and, it turns out nonexistent) planet inside the orbit of >mercury, to which they gave the name "Vulcan". (Live long and prosper). And apparently half the excess precession found by Newton was due to relativity ...
>But the descriptions of the amount of time for which you need observations >of Mercury to obtain these findings are very long. So long that I find it >doubtful that Tycho could have observed for long enough for his data to >detect the Einsteinian precession. How long? Don't forget. Tycho STARTED these observations in about 1550. Newton was around about 1750. So he actually had about 200 years worth of data to play with.
>As far as Newton refining and cross checking his work, and seeking to verify >or falsify it down to the last epsilon (so to speak), I find that very easy [quoted text clipped - 6 lines] >Those people in this forum who seem to have every human gift except >humility might do well to learn from such people as Newton and Einstein. And modern man would do well to learn humility from the ancients. 1 arcsecond is easily detected today, I would think. And it wouldn't surprise me if Newton had access to some pretty accurate instruments too - why shouldn't he be able to resolve with that sort of accuracy too? With two centuries of data, that makes well over an arcminute due to relativity alone. We know he could detect that sort of accuracy, because he was trying to explain it!
(The website I looked at said the precession was more like 540 arcseconds a year, but it also said there were 360 arcseconds in a degree, so I think it has mislaid a few powers of ten somewhere :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Laconic2 - 20 May 2004 16:02 GMT > I did a "google" on "mercury orbit newton relativity", and it gave me a > load of good pages. About the first one I looked at (the third or so it > found) gave me rather bigger figures than yours for precession (although > it did have a few problems...) Yes. The figure are considerably bigger because the precession of Mercury is for the most part, due to attraction from the other planents. The only figures I quoted were the "excess" (that is, non Newtonian) observed precession of Mercury.
> (The website I looked at said the precession was more like 540 > arcseconds a year, but it also said there were 360 arcseconds in a > degree, so I think it has mislaid a few powers of ten somewhere :-) There are 3600 arcseconds in a degree.
As far as the connection to this forum goes, I think the discussion in here reminds me more of the discussions between Simplicio, Salviati, and Sagredo in Galileo's writings.
I can just hear Simplicio saying something like:
<quote> Aristotle's axioms are self evident, and his logic is irrefutable. Therefore his conclusions are correct.
Therefore, if you report experimental observations that contradict his conclusions, then you are either lying or you have been misled by your infatuation with experimental observation.
If you had the proper respect for your betters you would restrain yourself from making such rash claims, in contradiction of the wisdom of the ancients. And if you had proper training in philosophical thinking, you would be able to confirm Aritstotle's work for yourself, instead of all this nonsense about taking cannonball up to the top of a tower and dropping them.
</quote>
Anthony W. Youngman - 20 May 2004 22:36 GMT >> (The website I looked at said the precession was more like 540 >> arcseconds a year, but it also said there were 360 arcseconds in a >> degree, so I think it has mislaid a few powers of ten somewhere :-) > >There are 3600 arcseconds in a degree. Yup :-) 60 seconds times 60 minutes = 3600 seconds in a degree :-) I knew that.
>As far as the connection to this forum goes, I think the discussion in here >reminds me more of the discussions between [quoted text clipped - 18 lines] > ></quote> PERFECT! That's a quote I would have loved to have had available to me earlier :-)
Thanks.
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk The society which scorns excellence in plumbing because plumbing is a humble activity, and tolerates shoddiness in philosophy because it is an exalted activity, will have neither good plumbing nor good philosophy. Neither its pipes nor its theories will hold water. John W Gardner
Laconic2 - 21 May 2004 14:30 GMT > PERFECT! That's a quote I would have loved to have had available to me > earlier :-) The thing is, I'm not satisfied with the arguments of either the pro relational camp or their challengers in this forum.
There's clearly a lot of intelligence and erudition in here, but it seems to be savagely misused, on both sides of the argument.
I've used the power of relational joins, ever since I was first exposed to the concept. And my first use involved nothing more sophisticated than Datatrieve and indexed files on a VAX. And the theorists in this forum who dismiss that as "not relational" have a fundamental synapse missing with regard to the connection between theory and pragma.
I've never used PICK, but from what I've read in here, if one were to study the reason why certain PICK applications were (and possibly still are) successful, and do the same for Datatrieve, one would find a surprising overlap. And I think data models would play a minor role in both studies. I'd love to see some rational discussion of that. But we'd have to get away from some of the cultural norms of this forum.
For me, the migration from Datatrieve to VAX Rdb/VMS, and later from that to Oracle were pretty natural. While I find much to criticize about SQL, it's far, far better than the access languages that grew up around CODASYL databases! If a better language can be designed, implemented and adopted, I'm all for that! But don't expect me to wait!
Anthony W. Youngman - 21 May 2004 23:37 GMT >> PERFECT! That's a quote I would have loved to have had available to me >> earlier :-) [quoted text clipped - 10 lines] >dismiss that as "not relational" have a fundamental synapse missing with >regard to the connection between theory and pragma. Well, think of a join, and then think of that join being along a "cascading delete" link - ie the linked table is an attribute of the master table.
In Pick, that join wouldn't be necessary, the linked table would logically and physically be part of the master table ...
And if you haven't got a cascading delete, chances are you either have a code lookup; or you're only interested in viewing fields in one table, but need the other table for certain SELECT fields. In the former case you declare a virtual field in Pick, and in the latter it may require (slightly) more effort on the part of the programmer, but a lot less effort on the part of the database...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Jonathan Leffler - 15 May 2004 01:30 GMT > In message <2gkdtnF3saspU1@uni-berlin.de>, Alan <alan@erols.com> writes > [quoted text clipped - 10 lines] > > By these standards, "data" is woefully vague and undefined. The recording of a mass, or of the energy of an object, or the time at which an event was perceived to occur, or any of a myriad other things, could be data. So, data encompasses all of the things you mention and many other things too.
> And it's not > even atomic! Within the theory it's chopped up into tuples, which are > themselves chopped up into (I'm not into terminology here) keys, > attributes, relations, and probably other stuff besides. Hmmm - that's an odd set of comments. Data generally are atomic facts. The way I'd view it is that a tuple is composed of individual pieces of data. And tuples are certainly not chopped up into relations. Attributes within a tuple contain 'atomic facts' (though when you get into sub-structures such as relation-valued attributes, the definition of atomic is more complex). Keys are properties of relations, etc. My goodness me, that paragraph is so confused as to be close to meaningless - and what meaning there is is almost all completely antithetical to the theory behind a RDBMS, which is what the subject of the thread is discussing.
 Signature Jonathan Leffler #include <disclaimer.h> Email: jleffler@earthlink.net, jleffler@us.ibm.com Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/
mountain man - 15 May 2004 17:43 GMT ...[trim sci.physics thread] ...
> Okay. So what is "data". Because if we can't anchor that in the real > world, we have no way of knowing if, or how strongly, relational theory > is relevant (and usable) in the real world. Relational theory is useful and relevant. For the people who are database academics, database technical, indeed anything database-centric, the theory is generally all they need and require to do what they do (within E2 below)
There are 3 software environments: E1 = Operating system and network os layer E2 = RDBMS layer E3 = application layer
The "data" is bound within E2, and although is operated on within E2 (hopefully in accordance with the RM), the ultimate control for these operations are from the end user within an organisation, via the app layer (E3). (GIGO)
However in the real world the data within the RDBMS is in fact owned by an organisation, not by the RDBMS vendor, nor the application vendor/developer, nor the RM, and in reality only has meaningful context for that organisation, at that instant in space & time. (production data backup)
[An aside: now I can see where the physics thread may have become self-emergent ;-]
The RM does not reflect the actuality of the above, nor make any provision for the management of the E3 layer because it is not yet completely evolved.
The catch-cry "the RM is just as applicable to database systems today, as it was in the early 1980's" should be taken as an indication that something is wrong with it as a pedagogic device for 2004.
The reason for this is that E2 and E3 have changed alot since 1980, particularly E2, the RDBMS software. Due to the emergence of addressable stored procedures in the RDBMS, there has been an effective "migration" of intelligence (code) from E3 to E2.
The boundaries between E2 and E3 are now probably best described as fractal, whereas in the past they were heavily demarked.
Back to your question on the "data". It is physically anchored by a backup, and theoretically anchored by the database schema, constraints, etc. from the perspective of the (incomplete) RM.
However in practice, it is a dynamic fluid element that must be managed, with the assistance of, but also outside the realm of the present applicability of the RM.
Change management is the name given to the bag that holds together everything that falls through the cracks of theory and out into the world of practice.
Pete Brown Falls Creek Oz
Alfredo Novoa - 16 May 2004 12:56 GMT >There are 3 software environments: >E1 = Operating system and network os layer >E2 = RDBMS layer >E3 = application layer
>The RM does not reflect the actuality of the above, nor >make any provision for the management of the E3 layer >because it is not yet completely evolved. No, the application layer is what must be adapted to the RM and not the contrary. What is not evolved is the application layer.
>The catch-cry "the RM is just as applicable to database >systems today, as it was in the early 1980's" should be >taken as an indication that something is wrong with it as >a pedagogic device for 2004. There are many things wrong in the application layer. For instance the application programming languages.
>The reason for this is that E2 and E3 have changed alot >since 1980, particularly E2, the RDBMS software. Due >to the emergence of addressable stored procedures in >the RDBMS But complete RDBMS's still don't exist.
>, there has been an effective "migration" of >intelligence (code) from E3 to E2. But not enough, and in the last years we are seeing a regression. A migration of business logic from SQL DBMS's to the crappy "Application Servers".
Regards Alfredo
mountain man - 17 May 2004 14:56 GMT > >There are 3 software environments: > >E1 = Operating system and network os layer [quoted text clipped - 7 lines] > No, the application layer is what must be adapted to the RM and not > the contrary. What is not evolved is the application layer. Demonstrated here is the entire application layer contained in the RDBMS software. Zero apps on clients: http://www.mountainman.com.au/software/southwind
This uses stored procedures, which are DBMS objects. These objects have functional relationships to the data structures and the data structures have an evolving structure via the objects. All is heavily inter-related and unified within the database system.
But the RM in its present state cannot reference this other-side-of-the-coin object data. It should be able to in the future, perhaps.
> >The catch-cry "the RM is just as applicable to database > >systems today, as it was in the early 1980's" should be [quoted text clipped - 10 lines] > > But complete RDBMS's still don't exist. Machines using the basic "un-blessed" principles of the RM have only been around for 25 years. These are good enough for me, because they (especially the more recent ones) do actually incorporate *much* of the basics of the RM.
> >, there has been an effective "migration" of > >intelligence (code) from E3 to E2. > > But not enough, Then you do agree that there exists (object) "data" within the SQL DBMS's that is unable to be referenced by the relational model of "data"?
> and in the last years we are seeing a regression. A > migration of business logic from SQL DBMS's to the crappy "Application > Servers". What do you think are the major elements behind this migration to these (I actually agree with your here) crappy "Apps boxes"? I used to suspect they were "caused by bad apps".
Pete Brown Falls Creek Oz
Alfredo Novoa - 18 May 2004 13:32 GMT >Machines using the basic "un-blessed" principles of the RM >have only been around for 25 years. These are good enough >for me, because they (especially the more recent ones) do >actually incorporate *much* of the basics of the RM. A truly RDBMS would be a lot better. Most of the everyday problems of the database programmers are due to the flaws of the current DBMSs.
>> >, there has been an effective "migration" of >> >intelligence (code) from E3 to E2. [quoted text clipped - 4 lines] >within the SQL DBMS's that is unable to be referenced >by the relational model of "data"? No, I mean that most people does not know how to take advantage on the few that SQL DBMS's offer.
> and in the last years we are seeing a regression. A >> migration of business logic from SQL DBMS's to the crappy "Application [quoted text clipped - 4 lines] >"Apps boxes"? I used to suspect they were "caused by bad >apps". The key elements are ignorance and the flaws of SQL DBMS's
Regards Alfredo
mountain man - 19 May 2004 09:41 GMT > >Machines using the basic "un-blessed" principles of the RM > >have only been around for 25 years. These are good enough > >for me, because they (especially the more recent ones) do > >actually incorporate *much* of the basics of the RM. > > A truly RDBMS would be a lot better. Well where is it?
> Most of the everyday problems of > the database programmers are due to the flaws of the current DBMSs. Not if you program in SQL from within the RDBMS.
> >> >, there has been an effective "migration" of > >> >intelligence (code) from E3 to E2. [quoted text clipped - 7 lines] > No, I mean that most people does not know how to take advantage on the > few that SQL DBMS's offer. Well, that may certainly be true, but does not relate to the applicability, or in this instance, the ineffectiveness of the current RM to address this (object) data.
> > and in the last years we are seeing a regression. A > >> migration of business logic from SQL DBMS's to the crappy "Application [quoted text clipped - 6 lines] > > The key elements are ignorance and the flaws of SQL DBMS's Either way, application servers are (usually) a step backwards. My focus is building suites of application system components as SQL stored procedures within the (R)DBMS to the extent that there exists zero components external to the (R)DBMS.
The modern (R)DBMS environment is capable of "internalising" the entire applications environment.
Pete Brown Falls Creek Oz
Alfredo Novoa - 19 May 2004 12:14 GMT >> A truly RDBMS would be a lot better. > >Well where is it? I hope it is in the near future.
>> Most of the everyday problems of >> the database programmers are due to the flaws of the current DBMSs. > >Not if you program in SQL from within the RDBMS. You suffer the problems specially if you program in SQL from within the SQL DBMS.
See Date's writings about the SQL flaws.
>Well, that may certainly be true, but does not relate >to the applicability, or in this instance, the ineffectiveness >of the current RM to address this (object) data. The RM supports objects. See The Third Manifesto.
>> The key elements are ignorance and the flaws of SQL DBMS's > >Either way, application servers are (usually) a step backwards. Agreed. They are network DBMS's without an storage engine.
>My focus is building suites of application system components >as SQL stored procedures within the (R)DBMS to the extent >that there exists zero components external to the (R)DBMS. And what is the problem with The Relational Model?
>The modern (R)DBMS environment is capable of >"internalising" the entire applications environment. And the future TRDBMS's will do it a lot better.
Regards Alfredo
mountain man - 19 May 2004 14:04 GMT "Alfredo Novoa" <alfredo@ncs.es> wrote :
...[trim]...
> >My focus is building suites of application system components > >as SQL stored procedures within the (R)DBMS to the extent > >that there exists zero components external to the (R)DBMS. > > And what is the problem with The Relational Model? It has a Godel-like incompleteness: http://www.mountainman.com.au/software/history/relational_model_incomplete.htm
Pete Brown Falls Creek Oz
Todd B - 19 May 2004 22:37 GMT > "Alfredo Novoa" <alfredo@ncs.es> wrote : > > > > And what is the problem with The Relational Model? > > It has a Godel-like incompleteness: > http://www.mountainman.com.au/software/history/relational_model_incomplete.htm I'm no mathematician, but didn't Godel prove that 'any' formal system is incomplete?
Also, the interpretation in the 'real' world of the symbols of any formal system seems to be pretty much up in the air.
Todd
mountain man - 20 May 2004 03:11 GMT > > "Alfredo Novoa" <alfredo@ncs.es> wrote : > > > > > > And what is the problem with The Relational Model? > > > > It has a Godel-like incompleteness: http://www.mountainman.com.au/software/history/relational_model_incomplete.htm
> I'm no mathematician, but didn't Godel prove that 'any' formal system > is incomplete? Yes, he did. But I am being specific about provision of one specific instance in which the incompleness of the RM is comprehendable.
> Also, the interpretation in the 'real' world of the symbols of any > formal system seems to be pretty much up in the air. In a database, relational or otherwise, the interpretations are usually sorted out in advance with respect to the data elements. They are interpretted with respect to the organisation (IMO)
Pete Brown Falls Creek Oz
Paul - 20 May 2004 10:37 GMT >>>>And what is the problem with The Relational Model? >>> >>>It has a Godel-like incompleteness: > > http://www.mountainman.com.au/software/history/relational_model_incomplete.htm I don't quite understand what you mean here. Even if you think that relational theory is missing something, I don't think it is a "Godel-like" incompleteness.
>>I'm no mathematician, but didn't Godel prove that 'any' formal system >>is incomplete? > > Yes, he did. But I am being specific about provision of one specific > instance in which the incompleness of the RM is comprehendable. Well, Godel acually proved that first-order predicate logic (upon which the relational model is based) is complete in some sense. The Incompleteness theorem only applies to theories that are above a certain complexity. To add to the confusion, there are slightly different meanings of "complete" here. See this page for more details: http://www.sm.luth.se/~torkel/eget/godel/completeness.html
I think essentially the difference is that you need to use logic to show that some other theories are incomplete, but to show the completeness of logic itself you've got a bit of a self-referential paradox. I could be completely wrong here though. Very interesting though.
Paul.
Anthony W. Youngman - 20 May 2004 22:39 GMT >>>>>And what is the problem with The Relational Model? >>>> [quoted text clipped - 19 lines] >different meanings of "complete" here. See this page for more details: >http://www.sm.luth.se/~torkel/eget/godel/completeness.html And isn't there something about if they are complete, then they also have to be simplistic (and therefore cannot be real-world accurate)?
>I think essentially the difference is that you need to use logic to >show that some other theories are incomplete, but to show the >completeness of logic itself you've got a bit of a self-referential >paradox. I could be completely wrong here though. Very interesting though. Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Christopher Browne - 20 May 2004 23:52 GMT Quoth "Anthony W. Youngman" <wol@thewolery.demon.co.uk>:
> And isn't there something about if they are complete, then they also > have to be simplistic (and therefore cannot be real-world accurate)? No, there isn't. You're just making that up because it would be convenient to your position.
 Signature (reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc")) http://cbbrowne.com/info/finances.html Rules of the Evil Overlord #22. "No matter how tempted I am with the prospect of unlimited power, I will not consume any energy field bigger than my head. <http://www.eviloverlord.com/>
Todd B - 20 May 2004 23:05 GMT > >>>>And what is the problem with The Relational Model? > >>> [quoted text clipped - 5 lines] > relational theory is missing something, I don't think it is a > "Godel-like" incompleteness. I'm not entirely certain, but it seems to me that any logic model that is consistent (i.e. theorems derived from the axioms do not contradict the axioms or other theorems so derived) will be unable to find certain truths within the system. And that seems to be Godel's sword in the stone (you know, he's actually not the first to come up with the idea, but the first to apply it to number theory). In other words, pretty much everything is Godel-like, unless you adapt an informal system, but then when you do that, you lose the power of logic altogether.
> >>I'm no mathematician, but didn't Godel prove that 'any' formal system > >>is incomplete? [quoted text clipped - 8 lines] > meanings of "complete" here. See this page for more details: > http://www.sm.luth.se/~torkel/eget/godel/completeness.html Good short article that touches on some key points of the theorem and its implications. But seriously, I'm a bit over my head here, since my only source on Godel is the book "Godel, Escher, Bach: a Golden Braid". I haven't read the actual Incompleteness proof.
Todd
mountain man - 27 May 2004 04:11 GMT > > >>>>And what is the problem with The Relational Model? > > >>> > > >>>It has a Godel-like incompleteness: http://www.mountainman.com.au/software/history/relational_model_incomplete.htm
> > I don't quite understand what you mean here. Even if you think that > > relational theory is missing something, I don't think it is a [quoted text clipped - 9 lines] > informal system, but then when you do that, you lose the power of > logic altogether. Not necessarily. Deduction goes out the window, true, but inference is still as valid as ever. The measure of the power of inference over the power of deduction is a tricky subject area, for sure.
...[trim]...
Pete Brown Falls Creek Oz
Paul - 27 May 2004 10:42 GMT >>I'm not entirely certain, but it seems to me that any logic model that >>is consistent (i.e. theorems derived from the axioms do not contradict [quoted text clipped - 10 lines] > power of inference over the power of deduction is a > tricky subject area, for sure. What's the difference between inference and deduction? Are they not the same thing?
Paul.
Tony - 20 May 2004 14:11 GMT > > "mountain man" <hobbit@southern_seaweed.com.op> wrote in message > news:<LXIqc.47648$TT.3115@news-server.bigpond.net.au>... [quoted text clipped - 12 lines] > instance > in which the incompleness of the RM is comprehendable. You may consider that the RM is incomplete, but it is NOT a "Godel-like" incompleteness: you are just attaching a fancy-sounding but irrelevant label to your claim. It is like describing any kind of uncertainty as "Heisenberg-like" or any kind of cat as "Schrodinger-like"!
Eric Kaun - 19 May 2004 22:20 GMT > There are 3 software environments: > E1 = Operating system and network os layer [quoted text clipped - 11 lines] > make any provision for the management of the E3 layer > because it is not yet completely evolved. I disagree, although your use of the term "management" is questionable. E1 provides services for E2; E2 does an analogous thing for E3. E2 provides E3 with data (whatever it means) and inferences about that data; that's its job.
> The catch-cry "the RM is just as applicable to database > systems today, as it was in the early 1980's" should be > taken as an indication that something is wrong with it as > a pedagogic device for 2004. It's more than a pedagogic device, but it's at least that.
> The reason for this is that E2 and E3 have changed alot > since 1980, particularly E2, the RDBMS software. Due > to the emergence of addressable stored procedures in > the RDBMS, there has been an effective "migration" of > intelligence (code) from E3 to E2. You contradict yourself here, as the migration doesn't mean that the previous services supplied by E2 are any more or less different. The code is running in a different place. Date's Intro to DB Systems book discusses various "levels" of the DB schemata - I forget the exact terms he uses, and I don't have the book here. So yes, there is a distinction between "shared" and "app-specific" components, whether they're running in the DBMS or not.
But that doesn't alter the fact that the code "objects" (E2b) in the DBMS are different from the relational "objects" (E2a) in the DBMS. There's still a logical separation, with E2b relying on E2a just like E2 relies on E1, and E3 on E2. Capiche?
I'm still curious what sorts of concepts (other than the vacuous term "management") you think the relational model should include for such things.
> The boundaries between E2 and E3 are now probably > best described as fractal, whereas in the past they were > heavily demarked. They're hardly fractal, and I would say that the layering is more severe in the E2b category. E2a remains solidly relational.
> However in practice, it is a dynamic fluid element that > must be managed, with the assistance of, but also outside [quoted text clipped - 3 lines] > together everything that falls through the cracks of theory > and out into the world of practice. I think I'm starting to agree with you on this, but still don't think that's the province of relational - at least not yet. I would like to see discussions on a standard system catalog - in essence, relational statements about relvars! Since the catalog is a set of relvars like the others, yet describes those others, we now have true relational metadata, an interesting topic...
- erk
Dawn M. Wolthuis - 20 May 2004 01:20 GMT <snip>
> Change management is the name given to the bag that holds > > together everything that falls through the cracks of theory > > and out into the world of practice. I think we agree on the problem being a rather narrow definition of databases or a vertical partitioning of the software development problem that isn't necessarily the best. Your solution to place everything "in the database" (if I understand you correctly) is fine from my perspective, but not the one I lean toward naturally -- I'd prefer an OO language to a declarative one, particularly a vendor-specific declarative language that is difficult to convert from one db to another.
> I think I'm starting to agree with you on this, but still don't think that's > the province of relational - at least not yet. I would like to see > discussions on a standard system catalog - in essence, relational statements > about relvars! Since the catalog is a set of relvars like the others, yet > describes those others, we now have true relational metadata, an interesting > topic... Agreed that the system catalog would be a good place to make some industry gains. For metadata such as "keywords" I'd think that employing those nested relations might give SQL-DBMS's or RDBMS's a place to start getting their operators shaped up for relation-valued attributes (or whatever one wants to call them) -- just a thought. --dawn
Eric Kaun - 21 May 2004 14:42 GMT > <snip> > > Change management is the name given to the bag that holds [quoted text clipped - 4 lines] > databases or a vertical partitioning of the software development problem > that isn't necessarily the best. Yes, I think so - my preference is to push things that have to work together from a single source, via a template-driven code generation approach (in lieu of higher-level languages, though code generation amounts to the same).
> Your solution to place everything "in the > database" (if I understand you correctly) is fine from my perspective, but > not the one I lean toward naturally -- I'd prefer an OO language to a > declarative one, Why not both? Something like Tutorial D, which offers both OO capabilities for domains, plus relations? In any even, we'll have to agree to disagree on the value of declarative... I'll simply point to the explosion of Java frameworks (Struts, Avalon, blah blah blah) and even language extensions (e.g. AspectJ). The "config files" are in most cases declarative statements of system structure, and in the case of aspects, are "cross-cutting" concerns which indicate the failings of the OO language (and I'd argue the OO paradigm itself). Languages like Lisp and Haskell don't require such "cross-cutting" because the languages themselves support abstractions orthogonal to objects. More than OO is needed, I think.
> particularly a vendor-specific declarative language that is > difficult to convert from one db to another. Couldn't agree more - a vendor-specific language is worthless. Witness even the various proprietary "extensions" to SQL...
> Agreed that the system catalog would be a good place to make some industry > gains. For metadata such as "keywords" I'd think that employing those > nested relations might give SQL-DBMS's or RDBMS's a place to start getting > their operators shaped up for relation-valued attributes (or whatever one > wants to call them) -- just a thought. --dawn Interesting - certainly relation-valued attributes would require decent relational operations. Or at least one would think. Industry might always, in its infinite wisdom, decide it knows better.
- erk
mountain man - 20 May 2004 03:11 GMT > > There are 3 software environments: > > E1 = Operating system and network os layer [quoted text clipped - 16 lines] > with data (whatever it means) and inferences about that data; that's its > job. The term management reflects the mandatory overview of all components in the system, and their coordination. As I have outlined, I have constructed an arrangment whereby all of E3 has been subsumed in the form of stored procedures, within the E2 environment.
The Relational Model and theory cannot distinguish this specific arrangment from any other, because it disregards E3 (application layer) because of its traditional frame of reference, which in historical terms is understandable, but is not so important in today's world.
This specific arrangement developed however is complete, and requires no other support to function. So you see, we may have a large number of stored procedures which act as E3 components, each written is SQL, each syntactically as per Date's exemplary treatment, each relating precisely and specifically to known data structures defined in the RM.
Yet the model can say nothing in its present state. This is an absurd state of affairs for database systems managment.
> > The catch-cry "the RM is just as applicable to database > > systems today, as it was in the early 1980's" should be > > taken as an indication that something is wrong with it as > > a pedagogic device for 2004. > > It's more than a pedagogic device, but it's at least that. It is an incomplete device.
> > The reason for this is that E2 and E3 have changed alot > > since 1980, particularly E2, the RDBMS software. Due [quoted text clipped - 8 lines] > I don't have the book here. So yes, there is a distinction between "shared" > and "app-specific" components, whether they're running in the DBMS or not. Date has one diagram and a few words to say about the apps environment. Sure, in my argument, as you note, the code is running in a different place. But which place? Inside the RDBMS software environment?
Date and the RM are not capable of uttering anything sensible about this state of affairs. The RM cannot address stored procedure object data, end of story.
> But that doesn't alter the fact that the code "objects" (E2b) in the DBMS > are different from the relational "objects" (E2a) in the DBMS. There's still [quoted text clipped - 3 lines] > I'm still curious what sorts of concepts (other than the vacuous term > "management") you think the relational model should include for such things. I understand the inter-dependencies, but you seem not to understand the term management. This term means the ability, or lack thereof, to properly look after everything in that environment.
> > The boundaries between E2 and E3 are now probably > > best described as fractal, whereas in the past they were > > heavily demarked. > > They're hardly fractal, and I would say that the layering is more severe in > the E2b category. E2a remains solidly relational. Lookup the term fractal basisn boundary.
> > However in practice, it is a dynamic fluid element that > > must be managed, with the assistance of, but also outside [quoted text clipped - 10 lines] > describes those others, we now have true relational metadata, an interesting > topic... The relational model was an ideal of Codd and the pioneers that has been promulgated by Date et al. It is at least 30 y/o and is not consistent with technological reality.
It has a Godel-like incompleteness: http://www.mountainman.com.au/software/history/relational_model_incomplete.htm
Pete Brown Falls Creek Oz
Eric Kaun - 21 May 2004 15:00 GMT > > I disagree, although your use of the term "management" is questionable. E1 > > provides services for E2; E2 does an analogous thing for E3. E2 provides [quoted text clipped - 4 lines] > The term management reflects the mandatory overview of all > components in the system, and their coordination. The term "management" has at least the same "Goedel-like" incompleteness that you refer to elsewhere, unless you really believe "overview" and "coordination" are precise. My point is simply that I don't know which aspects of "management" you're referring to - it has many definitions, components, and dimensions. Be precise.
However, I do have your papers printed out, and will be reading them shortly - so far I'm only judging by what you've written in these posts.
> As I have > outlined, I have constructed an arrangment whereby all of E3 > has been subsumed in the form of stored procedures, within the > E2 environment. The fact that they're executing in E2 doesn't imply "subsumption." They are still "objects" of a very different sort than those E2 traditionally "manages." For instance, those stored procedured could be written in arbitrary languages, and executed anywhere. It seems you see their value in their genericity, rather than in where they happen to execute.
> The Relational Model and theory cannot distinguish this specific > arrangment from any other, because it disregards E3 (application > layer) because of its traditional frame of reference, which in > historical terms is understandable, but is not so important in > today's world. So why does E1 disregard E2? Don't you think that's short-sighted and incomplete? How would you rectify that? Shouldn't they all be subsumed into E* ("*" as in transitive closure) ?
> This specific arrangement developed however is complete, In a Goedelian sense? Doubtful.
> and requires no other support to function. Not even E1?
> So you see, we > may have a large number of stored procedures which act > as E3 components, each written is SQL, each syntactically > as per Date's exemplary treatment, each relating precisely > and specifically to known data structures defined in the RM. That's no different than any other E3 component, is it? Like a Java program doing JDBC and issuing SQL Strings in exchange for ResultSets?
> Yet the model can say nothing in its present state. This is an > absurd state of affairs for database systems managment. So once E3 components are "subsumed" in E2, what more can be said about them? Specifically, what special properties or powers or whatever do they derived from executing in E2 rather than regarded as part of E3?
> Date has one diagram and a few words to say about the apps environment. > Sure, in my argument, as you note, the code is running in a different place. > But which place? Inside the RDBMS software environment? Could be - why does it matter? They're still code components, not relvars. What difference does it make where they run? I agree with it, don't get me wrong - but I think then you just happen to have E3 components running in E2, which indicates to me that your operational levels are perhaps orthogonal to the real issue of what "types" of components are being "managed".
> Date and the RM are not capable of uttering anything sensible about > this state of affairs. What should they say? In contrast to their muteness, say something - anything. I have no idea what you're looking to be said.
> The RM cannot address stored procedure object data, end of story. So say something, even informally, that should be part of some "RM++" theory. I have no idea what you're hinting at - this argument reminds me of internal auditors at a former company, who could point out things done incorrectly but were not allowed to say a word about what SHOULD be done instead.
> > I'm still curious what sorts of concepts (other than the vacuous term > > "management") you think the relational model should include for such > things. > > I understand the inter-dependencies, but you seem not to understand > the term management. Perhaps, but you're not helping. If you understand it thoroughly, then your words aren't helping the rest of us... granted that this isn't a "management" class, but some clarification would help.
> This term means the ability, or lack thereof, to > properly look after everything in that environment. "Properly look after"? That's clarification?
> > > However in practice, it is a dynamic fluid element that > > > must be managed, with the assistance of, but also outside [quoted text clipped - 17 lines] > has been promulgated by Date et al. It is at least 30 y/o and is > not consistent with technological reality.
> It has a Godel-like incompleteness: http://www.mountainman.com.au/software/history/relational_model_incomplete.htm
And your theory is Goedel-complete? Doubtful. Let's agree to stop waving Goedel and Occam about, and concentrate on specific areas of incompleteness that matter in both theory and practice...
- erk
Todd B - 21 May 2004 18:52 GMT > And your theory is Goedel-complete? Doubtful. Let's agree to stop waving > Goedel and Occam about, and concentrate on specific areas of incompleteness > that matter in both theory and practice... > > - erk Well said.
In a way, however, Godel's theorem is pertinent because it touches on the fact that a database, no matter what it's design is or underlaying structure is, will 'definitely' not be able to answer every question we want to ask it. Not that I'm being doomsday about logic :) I just think there can be a source of frustration in being able to answer a corporation's questions, and the culprit may not always be the database choice or database design, but may be that the question is simply unanswerable (although I have to admit, this has never actually happened to me, so take me with a grain of salt). It's something to think about, though.
Todd
Paul - 21 May 2004 20:08 GMT > In a way, however, Godel's theorem is pertinent because it touches on > the fact that a database, no matter what its design is or underlaying > structure is, will 'definitely' not be able to answer every question > we want to ask it. Are you certain this is true?
As I understand it: 1) Godel's Incompleteness theorem only applies to system that are powerful enough to model arithmetic. 2) It's impossible to model arithmetic using only first-order logic. 3) Relational theory (which basically *is* first-order logic) is actually both complete and consistent.
I'm not a professional logician though, and I know Godel's results are very open to misinterpretation, so I could well be wrong. I guess it depends on the exact definitions of "model", "theory", "system", "logic" etc., and what exactly we mean by "complete" and "consistent".
Also, does it actually matter? Because for example suppose I'm right and relational theory is complete, there are still questions like the transitive closure which can't be answered. That's because these questions can't even be written down in first order logic so they are meaningless within the system (so the system is still complete). But they are meaningful in a "real-world" sense, because we are thinking in a larger system which includes second-order logic.
I suppose at least we would know that in theory, every query that it is possible to formulate in some given relational query language can be answered.
Paul.
Todd B - 23 May 2004 23:22 GMT > > In a way, however, Godel's theorem is pertinent because it touches on > > the fact that a database, no matter what its design is or underlaying [quoted text clipped - 9 lines] > 3) Relational theory (which basically *is* first-order logic) is > actually both complete and consistent. To be honest, I don't know. I'll do some reading and certainly revisit this topic in this group (regardless of whether it bothers the other readers or not) after some good research.
> Also, does it actually matter? Because for example suppose I'm right and > relational theory is complete, there are still questions like the [quoted text clipped - 3 lines] > they are meaningful in a "real-world" sense, because we are thinking in > a larger system which includes second-order logic. Good point.
> I suppose at least we would know that in theory, every query that it is > possible to formulate in some given relational query language can be > answered. Can you give me an example of where there is proof of first order logic being complete? Keep in mind I'm sticking to the definition of complete as 'things that we prove true within the system are also true in the reality which we use the system to describe'. Is first order logic 'consistent'? Well, of course it is; it's kind of a requirement. Is it 'complete', though? I don't think so, but please prove me wrong or point me to some articles that do.
So, in summary, this last thing you say about every query being answerable is, IMO, 'incompletely' untrue :)
Perhaps there is a query that one could conjure in their head, but would be impossible to write down absolutely? I hate to do this, but I'm going to drop back into a classic example of unproveability because I'm lazy. Prove to me, without brute force methods, that the number 2481997 is prime. (Don't try, because it's not). The point is, for me anyway, is that - okay maybe from a more optimistic perspective - we have the ability to come up with questions, that any logic system will fall short in answering.
Todd
Bill H - 24 May 2004 17:09 GMT Todd:
Does this pass the "reasonableness" test? The thought that: ...there are questions that can't be answered so they're meaningless and, thus, ignored (so the system is still complete) doesn't say much for consistency (i.e. anything that shows inconsistency is ignored so we still have consistency).
With postulates like these, I'm depressed about getting A's in college logic and statistics classes, as they were obviously worthless. :-)
Bill
"Todd B" <toddkennethbenson@yahoo.com> wrote in message [snipped]
> > Also, does it actually matter? Because for example suppose I'm right and > > relational theory is complete, there are still questions like the [quoted text clipped - 3 lines] > > they are meaningful in a "real-world" sense, because we are thinking in > > a larger system which includes second-order logic. Paul - 24 May 2004 19:02 GMT > Does this pass the "reasonableness" test? The thought that: ...there are > questions that can't be answered so they're meaningless and, thus, ignored > (so the system is still complete) doesn't say much for consistency (i.e. > anything that shows inconsistency is ignored so we still have consistency). The point is that these questions can't even be *asked* in the system.
The system can still be internally complete and consistent though.
To talk about statements in a language we always need a meta-language. It can be that questions can be posed in the meta-language that can't in the language itself.
Suppose you have a "theory", e.g. field theory, with its various axioms. Then you can have various "models" that are kind of examples of this theory, for example the real numbers, complex numbers, etc.
Now what the Completeness Theorem says is that is something is true in every model of a given theory, it will be possible to prove it in the theory itself (using first order logic). So in other words if you start from your axioms and apply first order logic to them, it's possible for you to extract every possible true statement of your theory.
So I guess the applicability of databases here is that your relations are the axioms of your "theory". Your real-world interpretations of those relations are your "models" of the theory. And the Completeness Theorem assures you that everything you expect to be true in the real world will in fact be provable by the DBMS.
For example suppose I have a database containing the tuple: (1, 2, 'blue')
There could be many interpretations of what this means: for example "customer #1 bought 2 blue widgets" "on day 1 of the study we saw 2 blue cars"
But in any of these models if I look for distinct values of the third argument of my predicate (i.e. project on the "third" column) I expect to get the answer ('blue').
So my query language (which is really just first order logic) is guaranteed to give the right answer when I do: "SELECT DISTINCT colour FROM r"
And this same argument holds even for more complicated queries.
The interesting thing is that if you go up into second-order logic, there is no corresponding completeness theorem. So you may either have things that are true in all interpretations of a theory but you can't prove them in the theory itself, or you'd have things you can prove in the theory but which aren't true in some model of the theory.
So maybe Codd was wise to stick with first-order logic!
Paul.
Anthony W. Youngman - 25 May 2004 18:12 GMT >Suppose you have a "theory", e.g. field theory, with its various >axioms. Then you can have various "models" that are kind of examples of [quoted text clipped - 11 lines] >Theorem assures you that everything you expect to be true in the real >world will in fact be provable by the DBMS. And if they turn out to be false in the real world and provable in the DBMS, then the DBMS theory is wrong ... (or the DBMS predicts something is false when it turns out to be true ...)
Or if you can't prove it in the DBMS, then the theory is incomplete ...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Paul - 26 May 2004 10:51 GMT >> So I guess the applicability of databases here is that your relations >> are the axioms of your "theory". Your real-world interpretations of [quoted text clipped - 5 lines] > DBMS, then the DBMS theory is wrong ... (or the DBMS predicts something > is false when it turns out to be true ...) Well, the Completeness Theorem has a converse called the Soundness Theorem (http://en.wikipedia.org/wiki/Soundness_theorem), which assures us that first order logic is consistent. i.e. everything that you can prove in the DBMS is true in real life. This was known long before the Completeness Theorem I think, and is easier to prove.
> Or if you can't prove it in the DBMS, then the theory is incomplete ... The Completeness Theorem proves the "complete" part. i.e. everything that is true in all models or interpretations of the database will be provable by the DBMS.
Note that Godel's Incompleteness Theorem is something slightly different. That's really talking about the completeness of theories that just happen to be manipulated with first order logic. The Completeness Theorem is talking about the completeness of first-order logic itself. So in the first instance you could say first order logic is being a meta-language, but in the second instance it is just being a language.
Paul.
x - 26 May 2004 13:43 GMT > >> So I guess the applicability of databases here is that your relations > >> are the axioms of your "theory". Your real-world interpretations of [quoted text clipped - 17 lines] > that is true in all models or interpretations of the database will be > provable by the DBMS. Is something that is true in only one model provable by the DBMS ? What this "all models" thing has to do with databases ? Just one model wouldn't be enough ?
Paul - 26 May 2004 14:43 GMT >>The Completeness Theorem proves the "complete" part. i.e. everything >>that is true in all models or interpretations of the database will be [quoted text clipped - 3 lines] > What this "all models" thing has to do with databases ? > Just one model wouldn't be enough ? No, it'd have to be all models, because the DBMS can only prove things that are true under all circumstances, or in the most general case.
Suppose for example I have the following tuples in a relation:
('Alan', 'Bill') ('Bill', 'Chas')
Now in one model, this might mean: Alan is an ancestor of Bill. Bill is an ancestor of Chas.
So in this model, the tuple ('Alan', 'Chas') could also be legitimately added to this relation. i.e the proposition 'Alan is an ancestor of Chas' is true.
Similarly if it means "is a brother of'.
But consider the model where it means: Alan is a friend of Bill. Bill is a friend of Chas.
Then it doesn't follow that Alan is a friend of Chas. It could easily be that Alan hates Chas.
So the DBMS shouldn't be able to prove that ('Alan', 'Chas') is a legitimate tuple for that relation, because the DBMS has no idea what model is being used to interpret the database. And there's no way it could have an idea either.
I guess what it is really saying is that the model is larger than the theory, in the sense that it has concepts external to the theory. The theory can only prove things that are common to all models based on the theory (and the Completeness Theorem says it can *always* do this).
I'm not an expert though, so it's quite possible I've either misunderstood the theorem or misapplied it - please correct me if you think this is the case.
Paul.
Anthony W. Youngman - 26 May 2004 23:47 GMT >>> So I guess the applicability of databases here is that your >>>relations are the axioms of your "theory". Your real-world [quoted text clipped - 10 lines] >prove in the DBMS is true in real life. This was known long before the >Completeness Theorem I think, and is easier to prove. So if you use Newtonian Mechanics to prove where Mercury was 400 years ago, your proof is more accurate than Tycho Brahe's observations - which place it somewhere else?
You are making exactly the mistake that made me start this thread - you are assuming that the DBMS *defines* reality, rather than carrying out experiments to show that the DBMS accurately *describes* reality.
What you should have said is "IF the dbms is an accurate model of real life then ...". Which is basically what I said - if the dbms and real life disagree then the dbms model must be wrong. You seem to be saying that it's reality that's wrong ...
The problem I have is that the mathematicians seem to have taken C&D's idea of "data" and built this wonderful theory on top of it. Unfortunately, what they have not done is to define "data" in real-world terms (rather than mathematical), and as such there is no way we can go from a "proof within the model" to a formal description of the reality that that proof represents. So you can come up with all the proofs you like within the dbms, but you cannot show that the equivalent real-life scenario is true because you cannot describe that scenario accurately. So by definition the theory is unscientific because you cannot show that the dbms proof is true (or false) in real life.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Paul - 27 May 2004 11:16 GMT > So if you use Newtonian Mechanics to prove where Mercury was 400 years > ago, your proof is more accurate than Tycho Brahe's observations - which > place it somewhere else? The proof will still be 100% accurate. Newtonian Dynamics assumes certain axioms, which we now know to be slightly wrong. The first-order logic is still perfectly accurate; it's just your starting assumptions have changed.
> You are making exactly the mistake that made me start this thread - you > are assuming that the DBMS *defines* reality, rather than carrying out [quoted text clipped - 4 lines] > life disagree then the dbms model must be wrong. You seem to be saying > that it's reality that's wrong ... I'm just talking about the system of logic that enables us to talk about our database (our "theory" if you like). Whether our theory has axioms that correspond to the real world, or whether our interpretation (or "model") of our theory is accurate, is a totally different question.
> The problem I have is that the mathematicians seem to have taken C&D's > idea of "data" and built this wonderful theory on top of it. [quoted text clipped - 4 lines] > like within the dbms, but you cannot show that the equivalent real-life > scenario is true because you cannot describe that scenario accurately. What I'm saying isn't really relying on DBMSs at all, it's just pure logic. A DBMS is just an example of a system that uses it. We have several layers:
1. First-order logic itself (our meta-language) 2. Our theory (all the relations and tuples in the database, our axioms) 3. Our model (how we interpret our theory in the real world)
All I'm saying is that we know that part 1 is guaranteed to be complete and consistent. Parts 2 & 3 can be totally wrong, which is when your database will give answers that diverge from reality.
> So by definition the theory is unscientific because you cannot show that > the dbms proof is true (or false) in real life. Given that your axioms and your interpretation are correct, then I think you can show the DBMS proof is true in real life (for the reasons given above and in previous posts).
I know that the language used by logicians can seem very inpenetrable but I think it does actually make sense; it's not just a conspiracy of people talking gibberish and pretending to understand each other.
I don't know how much you've read about logic but it is very mathematical and well worth the steep learning curve. Wikipedia is a good place to start. Be warned though: logicians to have a tendency to go insane in later life; it is a serious brainfuck if you think about it too much!
Paul.
Dawn M. Wolthuis - 27 May 2004 12:11 GMT > > So if you use Newtonian Mechanics to prove where Mercury was 400 years > > ago, your proof is more accurate than Tycho Brahe's observations - which [quoted text clipped - 3 lines] > Newtonian Dynamics assumes certain axioms, which we now know to be > slightly wrong. If talking about mathematical axioms, they are not right or wrong -- they just are. It is the use of those axioms in some setting or another that could be inappropriate, not useful, or lead one to draw incorrect conclusions due to applying a poor mathematical analogy (metaphor) to the situation.
> The first-order logic is still perfectly accurate; it's > just your starting assumptions have changed. So the mathematics is right, but the science is wrong -- and I think that is a major point of this thread.
> > You are making exactly the mistake that made me start this thread - you > > are assuming that the DBMS *defines* reality, rather than carrying out [quoted text clipped - 9 lines] > that correspond to the real world, or whether our interpretation (or > "model") of our theory is accurate, is a totally different question. Exactly -- so I think you and Wol (and I) are in agreement on that. It is why whenever anyone suggests that the best way to set up a databases is by employing relational theory BECAUSE relational theory is based on mathematics, I laugh (then cry). I have an appreciation of what mathematics is and what it isn't. How do we determine whether a mathematical model is a good metaphor for what we are doing? We have to step outside of mathematics to do that. So, the proof that various aspects of relational theory have been good for use with DBMS's is not within mathematics.
> > The problem I have is that the mathematicians seem to have taken C&D's > > idea of "data" and built this wonderful theory on top of it. [quoted text clipped - 16 lines] > and consistent. Parts 2 & 3 can be totally wrong, which is when your > database will give answers that diverge from reality. Additionally, the metaphor we choose might limit us so that what we say is true, but not the whole story. And another possibility is that our metaphor is useful and provides accurate answers, but does so in a clumsy fashion so as to cost more than it needs to. The cost of one metaphor might be higher than another because the human brain or people in a particular culture might find one metaphor easier to grasp. If I tell a person on the street that I have data in a relation (using a mathematical metaphor), that might not be as good as telling them I have data in a folder (a non-mathematical metaphor), for example.
[Slight digression: If we could the 1st-order predicate logic behind the "folder" metaphor (ah ha -- how 'bout a function?) we could make some progress perhaps?]
> > So by definition the theory is unscientific because you cannot show that > > the dbms proof is true (or false) in real life. > > Given that your axioms and your interpretation are correct, then I think > you can show the DBMS proof is true in real life (for the reasons given > above and in previous posts). And how do you show that your interpretation is correct -- by not showing it to be incorrect, by showing many cases where it is correct? I think that is central to this discussion. I'm about to read the book someone mentioned, "Data and Reality," and perhaps that will shed some more light on that question.
Summarizing -- three questions: 1) (How) can we prove that our mathematical model (e.g. relational theory) aligns with what we are applying it to (e.g. databases)? I think we can only disprove it or fail to disprove it.
2) Are we missing some important aspects of databases (e.g. mountain man's concerns) if we limit ourselves to a single mathematical metaphor (e.g. to what relational theory can tell us, or can tell us today)?
3) Are we applying the best, most effective, most efficient, etc metaphor or is there something better to either supplement or replace it?
--dawn <snip>
Paul - 28 May 2004 15:36 GMT >> Newtonian Dynamics assumes certain axioms, which we now know to be >> slightly wrong. [quoted text clipped - 4 lines] > incorrect conclusions due to applying a poor mathematical analogy > (metaphor) to the situation. Well, OK, when I say the axioms are wrong I mean that the axioms don't quite give a theory on which we can base an accurate model of reality. (Though they may be good enough for an approximate model of reality).
> So the mathematics is right, but the science is wrong -- and I think > that is a major point of this thread. My point is that the DBMS is only concerned the mathematical part, and theory proves that it does it perfectly. The science part is beyond the scope of the DBMS - making sure that is OK is up to the database users.
>> I'm just talking about the system of logic that enables us to talk >> about our database (our "theory" if you like). Whether our theory [quoted text clipped - 6 lines] > databases is by employing relational theory BECAUSE relational theory > is based on mathematics, I laugh (then cry). Why? This seems like a reasonable statement. Suppose for example we based our DBMS on second-order logic. Then theory tells us we will have incompleteness (ignoring the fact that databases are finite!). So this would tell us that the mathematical part of the DBMS is on shaky ground. As it happens that DBMSs use first-order logic, we know it is rock-solid because of Godel's Completeness Theorem. That seems very reassuring to me. Maybe this point seems so obvious that people just take it for granted - they don't even realise that there is something to be proved in the first place.
Now it may well be that the "multivalue" database model also just uses first-order logic presented in a slightly obfuscated way, in which case you'd have the peace of mind for that as well.
> I have an appreciation of what mathematics is and what it isn't. How > do we determine whether a mathematical model is a good metaphor for > what we are doing? We have to step outside of mathematics to do > that. So, the proof that various aspects of relational theory have > been good for use with DBMS's is not within mathematics. The proof of the usefulness of the mathemtical part of DBMSs is definitely within mathematics. But as you say, deciding whether your model is a good metaphor for linking your database to reality is beyond the scope of both DBMSs and mathematics.
> [Slight digression: If we could the 1st-order predicate logic behind > the "folder" metaphor (ah ha -- how 'bout a function?) we could make > some progress perhaps?] I think the problem here is that if you want trees you can't do it with first-order logic.
>> Given that your axioms and your interpretation are correct, then I >> think you can show the DBMS proof is true in real life (for the [quoted text clipped - 5 lines] > read the book someone mentioned, "Data and Reality," and perhaps that > will shed some more light on that question. You can't; it's impossible. To show that your interpretation is correct we move away from mathematics into science. And in science you can never prove something, only disprove it. You just hypothesize that something is true and try to find a counterexample to show you were wrong.
> Summarizing -- three questions: 1) (How) can we prove that our > mathematical model (e.g. relational theory) aligns with what we are > applying it to (e.g. databases)? I think we can only disprove it or > fail to disprove it. Well we kind of go right to the very basis of everything: logic is by definition what we think of as truth, so it applies to everything. If p is true and q is true, then so is "p and q" true. We could build a DBMS around a logic where this isn't the case, but I don't think it would be very helpful! Alternatively we can go upwards to a more complex logic, but theory tells us this could cause incompleteness problems.
> 2) Are we missing some important aspects of databases (e.g. mountain > man's concerns) if we limit ourselves to a single mathematical > metaphor (e.g. to what relational theory can tell us, or can tell us > today)? I'm not quite sure what mountain man's point is. Is it that we should store things like constraints, view definitions, etc. in a relational format rather than as strings in some query language? I can see the appeal of this idea but I think how we store statements in our "meta-language" doesn't change the fact that our actual data is stored in relations. Or is it that we could store things like form layouts and application flow logic in tables - if so then I don't think this is a totally new idea, though maybe an interesting one to explore. MS Access had something like this built-in I think, created by a form wizard - "table-driven forms".
Either way I think this is orthogonal (excuse the buzzword!) to the central idea of relational database theory: to base things as closely as possible on first-order predicate logic.
> 3) Are we applying the best, most effective, most efficient, etc > metaphor or is there something better to either supplement or replace > it? I think we are. I think the insight that Codd had was to start with logic and build upwards from there, instead of putting together an ad-hoc data model first and then trying to reconcile it downwards to logic.
I think the only ways we could go would be to different logics e.g. multi-valued logic or "fuzzy" logic etc. I don't claim to know what these all are but a search should bring up various weird and wonderful logics.
Or upwards to higher-order logic, although I don't know if incompleteness becomes an issue then. Maybe because we are always dealing with unbounded but finite systems it doesn't apply or something. I think if you go this route you end up with things like Datalog or Prolog.
Paul.
Anthony W. Youngman - 28 May 2004 19:27 GMT >>> Newtonian Dynamics assumes certain axioms, which we now know to be >>>slightly wrong. [quoted text clipped - 14 lines] >theory proves that it does it perfectly. The science part is beyond the >scope of the DBMS - making sure that is OK is up to the database users. So, what you're saying, basically, is every time the users have a problem "it's an implementation thing", despite the fact that it may well be down to a screwy axiom? What you're saying is that you couldn't care less whether your axioms are correct or not?
Or, to put it more bluntly, you don't know whether your model correctly models the real world, and you don't care whether your model correctly models the real world, yet you loudly trumpet that it is the only model that can model the real world ... excuse me while I puke ...
Surely you owe it to your users to at least try and make sure the foundations of your theory are securely anchored in the real world, rather than building castles in the air and then blaming users when their real applications built on those castles come tumbling to earth in an awful heap.
Sorry, I don't want this to come over as nasty, but that last paragraph of yours that I quoted is basically just abdicating any responsibility on the part of mathematicians as to whether the theory is useful in any shape or form whatsoever; and worse, blames the users for incompetence if they can't get it to work.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony - 29 May 2004 18:15 GMT > >>> Newtonian Dynamics assumes certain axioms, which we now know to be > >>>slightly wrong. [quoted text clipped - 39 lines] > Cheers, > Wol You just don't get it, do you Wol? No matter how many times people try to explain it to you it just doesn't sink in. The relational model is NOT a model of "the real world" and it therefore doesn't have to correspond to the real world. It is a model of data, which is an abstract concept.
Now, when someone uses the relational model to build a database corresponding to some real world thing, say a payroll system, then it is up to the database designer (not the relational model) to ensure that what he builds corresponds to the reality he is building it for.
To go back to your favourite analogy (apologies everyone), it is like saying that algebra was responsible for the shortcomings of Newton's model of planetary motion. But it wasn't the algebra that "got it wrong", it was Newton's application of it. The relational model corresponds to algebra in this analogy, not to Newton's model of the solar system - that corresponds to a specific database design. Einstein didn't invent a better algebra, he designed a better model using the SAME algebra - like a later designer designing a better payroll database, but still using the same RDBMS.
Dawn M. Wolthuis - 29 May 2004 20:25 GMT > > >>> Newtonian Dynamics assumes certain axioms, which we now know to be > > >>>slightly wrong. [quoted text clipped - 45 lines] > to correspond to the real world. It is a model of data, which is an > abstract concept. and I just responded to Alfredo who said that data were facts and I thought for sure the idea was that these facts corresponded to reality.
I don't have a problem with the mathematical theory termed "relational theory" except when the words used are those used in set theory and the definitions are different ;-)
If there is a tight mathematical definition of "data" within relational theory, then that's great, but it is not the commonly used definition, I suspect. It is in the leap from doing relational theory to thinking that the application of such theory is the best approach to storing/retrieving propositions using computers by a business -- that is where there is a rather significant leap of faith. That connection is NOT science, although we could conceivably set up some experiments to collect a bit more information about whether it is better than some other approach. I'm not opposed to faith, but we need to call it what it is. There is mathematical relational theory and then a leap of faith in the use of relational theory for anything.
> Now, when someone uses the relational model to build a database > corresponding to some real world thing, say a payroll system, then it > is up to the database designer (not the relational model) to ensure > that what he builds corresponds to the reality he is building it for. And perhaps that person opts out of using (at least all of) relational theory and that's fine, right?
> To go back to your favourite analogy (apologies everyone), it is like > saying that algebra was responsible for the shortcomings of Newton's > model of planetary motion. But it wasn't the algebra that "got it > wrong", it was Newton's application of it. Agreed!
> The relational model > corresponds to algebra in this analogy, YES!
> not to Newton's model of the > solar system - that corresponds to a specific database design. wrong -- that corresponds to the use of relational theory at all while working with computers. It is not the specific implementation only that could be wrong -- it is the use of this theory AT ALL related to "data processing" that COULD BE wrong (I don't think it is entirely irrelevant, but there is nothing that proves its relevance except where "the proof is in the pudding" -- scientific observation, for example).
> Einstein didn't invent a better algebra, he designed a better model > using the SAME algebra - like a later designer designing a better > payroll database, but still using the same RDBMS. No, like a later database theorist designing a graphical theory or a functional theory that is better than the relational theory before it.
smiles. --dawn
Gene Wirchenko - 30 May 2004 05:31 GMT [snip]
>If there is a tight mathematical definition of "data" within relational >theory, then that's great, but it is not the commonly used definition, I [quoted text clipped - 7 lines] >relational theory and then a leap of faith in the use of relational theory >for anything. It is an even bigger leap of faith to operate without a theory underlying what you are doing.
>> Now, when someone uses the relational model to build a database >> corresponding to some real world thing, say a payroll system, then it [quoted text clipped - 3 lines] >And perhaps that person opts out of using (at least all of) relational >theory and that's fine, right? What is the replacing theory? Is it better or worse? How do you know? Is the consideration of better/worse a leap of faith? If not, why not?
[snip]
sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
Anthony W. Youngman - 01 Jun 2004 23:34 GMT >[snip] > [quoted text clipped - 24 lines] >know? Is the consideration of better/worse a leap of faith? If not, >why not? I was thinking of replying to Tony, but I think I can answer here.
And no, Tony, Einstein did NOT "build a better model" using the same algebra. What he DID do was realise that Newton's fundamental axioms were wrong. He redefined the metaphysical interface between reality and the model.
And the problem I have is that I cannot see any metaphysical interface between reality and relational theory. This is basically Dawn's point about "is relational theory even the right theory to use?".
As for Gene, I agree we need a theory, and actually, I think relational theory is a great theory. Unfortunately it is a theory about a - call it abstract, call it imaginary, they're the same thing - concept called "data" that does not seem to have any basis in the real world.
So what do I think should replace it? Nothing actually, we can just improve it. BUT IN DOING SO, IT WILL BE TRANSFORMED BEYOND RECOGNITION
:-) Go back to my analogies :-) In hindsight, we just can't understand why the Church couldn't see that Copernicus' theory that the planets orbit the sun didn't make sense. Except that *WE* have got Copernicus' theory wrong. He thought that the planets *circled* the sun. And as a result his theory was just as much as mess (if not more) than that of the Church who said the planets and sun orbited the earth. I think *that* is the current state of database theory.
What we NEED is a "theory of business analysis" - a formal theory that tells analysts how to analyse the real world. And I'm pretty damn confident that you can NOT create a theory that will do a reversible mapping between the real world and relational data.
This theory will then be the equivalent of Kepler and Newton discovering ellipses and calculus, or of Einstein realising that mass and energy were interchangeable. Basically, pretty much ALL of relational theory's axioms are taken as given by the mathematicians, and no thought is given as to whether they actually match the real world.
To give you a simple example, the business analyst analyses an invoice, and you design the database to store the data. Can you then ask the DATABASE to give you the invoice data back? Certainly with current relational databases accessed with SQL, you're relying on either an application programmed OVER the database, or a view which gives you multiple copies of data of which the original only had one.
Yes I know people are likely to say that "SQL is not genuine relational", but you're still relying on a view - even a valid relational one - or an application.
If we can't go - using formal theory - from the database back through the analysis to get back to the real world we started from, then we have no idea if our axioms are correct, and as Dawn says, we have no idea if relational theory is the correct theory to solve real world problems.
And as I said before, it we have no idea if it's the correct theory, why are we using it? Dawn was going on about faith. Do you have faith in business analysts to get the analysis correct, or would you rather have a formal, REVERSIBLE and PROVABLE (or testable, falsifiable, scientific, whatever term you want to use) logical theory to do it for you?
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Gene Wirchenko - 02 Jun 2004 01:42 GMT [snip]
>As for Gene, I agree we need a theory, and actually, I think relational >theory is a great theory. Unfortunately it is a theory about a - call it >abstract, call it imaginary, they're the same thing - concept called >"data" that does not seem to have any basis in the real world. That is not surprising since data is abstract.
>So what do I think should replace it? Nothing actually, we can just >improve it. BUT IN DOING SO, IT WILL BE TRANSFORMED BEYOND RECOGNITION >:-) I do not think so. See further.
>Go back to my analogies :-) In hindsight, we just can't understand why >the Church couldn't see that Copernicus' theory that the planets orbit [quoted text clipped - 3 lines] >Church who said the planets and sun orbited the earth. I think *that* is >the current state of database theory. No, the mess was smaller. The new theory was a better theory.
Newton's is pretty good and will work for everyday situations fine. Einstein's refines Newton's to cover yet more cases.
The world is nearly flat. The variation from that is a small fraction of an inch per mile. If you are dividing your backyard into plots for gardening, you are safe assuming that the world is flat. When you hit the big time, a different theory is needed. Before then, it is more complicated than you need.
[snip]
>If we can't go - using formal theory - from the database back through >the analysis to get back to the real world we started from, then we have >no idea if our axioms are correct, and as Dawn says, we have no idea if >relational theory is the correct theory to solve real world problems. There is meaning that the DBMS understands (for example, FK and RI), and there is meaning that the user understands (and the DBMS does not) such as what a location is.
A database models relevant portions of the Real World. What does relevant mean? Of interest to someone.
>And as I said before, it we have no idea if it's the correct theory, why >are we using it? Dawn was going on about faith. Do you have faith in It is the closest that we know of.
>business analysts to get the analysis correct, or would you rather have >a formal, REVERSIBLE and PROVABLE (or testable, falsifiable, scientific, >whatever term you want to use) logical theory to do it for you? I would rather have the theory, but in its absence, I will use what I have.
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
Anthony W. Youngman - 04 Jun 2004 00:18 GMT >[snip] > [quoted text clipped - 4 lines] > > That is not surprising since data is abstract. Well, is "mass" abstract? Or "energy"?
No they are not. They have formal mathematical definitions within Newtonian Mechanics or relativity, but they also have clear metaphysical descriptions within reality.
As far as I can tell, "relational data" does not have that metaphysical description.
>>So what do I think should replace it? Nothing actually, we can just >>improve it. BUT IN DOING SO, IT WILL BE TRANSFORMED BEYOND RECOGNITION [quoted text clipped - 11 lines] > > No, the mess was smaller. The new theory was a better theory. Basically, Kepler corrected Copernicus' axiom that "orbit == circle"
> Newton's is pretty good and will work for everyday situations >fine. Einstein's refines Newton's to cover yet more cases. More improvements here :-) The mathematical definition is steadily getting closer to the metaphysical reality ...
> The world is nearly flat. The variation from that is a small >fraction of an inch per mile. If you are dividing your backyard into >plots for gardening, you are safe assuming that the world is flat. >When you hit the big time, a different theory is needed. Before then, >it is more complicated than you need. But it doesn't make it correct ...
>[snip] > [quoted text clipped - 14 lines] > > It is the closest that we know of. It is the closest that YOU know of.
>>business analysts to get the analysis correct, or would you rather have >>a formal, REVERSIBLE and PROVABLE (or testable, falsifiable, scientific, >>whatever term you want to use) logical theory to do it for you? > > I would rather have the theory, but in its absence, I will use >what I have. Great. So why aren't you prepared to question the accuracy of the axiom that "data comes in tuples".
Yes, relational data DOES come in tuples - because that's what the definition says.
But if you can't come up with some formal way of converting between "real-world-data" and "relational tuples", then surely you have to come to the conclusion (which my and Dawn's EXPERIENCE has forced us to) that your tuple is equivalent to a Copernican circle - it may be close to reality but there's something seriously wrong somewhere that needs correcting - and it CAN'T be done WITHIN the theory, because the fault lies in the theory-to-reality map.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Gene Wirchenko - 04 Jun 2004 01:40 GMT >>[snip] >> [quoted text clipped - 6 lines] > >Well, is "mass" abstract? Or "energy"? No, but data is.
[snip]
>> No, the mess was smaller. The new theory was a better theory. > >Basically, Kepler corrected Copernicus' axiom that "orbit == circle" Yup.
>> Newton's is pretty good and will work for everyday situations >>fine. Einstein's refines Newton's to cover yet more cases. [quoted text clipped - 9 lines] > >But it doesn't make it correct ... It makes it correct *enough* for the simple case. And, of course, using the simpler form does introduce the possibility of scaling problems later.
Einstein's might not be the ultimate either, but that is not going to stop people from dividing up their backyards into plots for gardening.
[snip]
>> A database models relevant portions of the Real World. What does >>relevant mean? Of interest to someone. [quoted text clipped - 5 lines] > >It is the closest that YOU know of. Produce your theory, please, in comparable rigourousness to Codd's.
>>>business analysts to get the analysis correct, or would you rather have >>>a formal, REVERSIBLE and PROVABLE (or testable, falsifiable, scientific, [quoted text clipped - 8 lines] >Yes, relational data DOES come in tuples - because that's what the >definition says. You have just answered your question for me.
>But if you can't come up with some formal way of converting between >"real-world-data" and "relational tuples", then surely you have to come [quoted text clipped - 3 lines] >correcting - and it CAN'T be done WITHIN the theory, because the fault >lies in the theory-to-reality map. So maybe we need a second theory that deals with something that Codd's does not. In the meantime, I will not throw out the baby with the bathwater, and people will keep dividing up backyards.
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
Eric Kaun - 04 Jun 2004 15:18 GMT > [SNIP] > But if you can't come up with some formal way of converting between [quoted text clipped - 4 lines] > correcting - and it CAN'T be done WITHIN the theory, because the fault > lies in the theory-to-reality map. True, but I have yet to hear a better proposal. When it comes to modeling information, I suspect there will always be a gap. Relational advocates favor being able to derive truths from other truths, acknowledging of course that the internal predicates must be defined relative to an external one, and that that's a human effort which can always go awry. You and Dawn, as best I can understand, place more value on reproduction of the original inputs. I suspect there are simply different expectations; I'd rather stretch the computer to avoid stretching humans in ways they're not good at (e.g. repetitive symbolic manipulation).
- erk
Bill H - 05 Jun 2004 17:27 GMT erk:
Several notes below.
> > [SNIP] > > But if you can't come up with some formal way of converting between [quoted text clipped - 6 lines] > > True, but I have yet to hear a better proposal. I've noticed that many people aren't interested in a better proposal, or even a different proposal. Dogma rules. :-)
The main reason others use different data models is that they allow a much closer interaction between the language of dbms and applications and the environment they're designed to operate in (mostly the business community). Because of this, the cost of development, maintenance, and administration is significantly lower than those models having additional expertise and liaison requirements.
Now, this advantage may not be what you are looking for. It may not be, for that matter, what the CIO of a large company is looking for. However, in the world of small to meduim sized businesses (SMBs) this cost advantage means something.
I might also note there are numerous features of an RDBMS that may or may not be available in competing data models, so any analysis will have to take this into account.
> When it comes to modeling > information, I suspect there will always be a gap. Relational advocates [quoted text clipped - 3 lines] > best I can understand, place more value on reproduction of the original > inputs. I can't speak for Anthony and Dawn, but I place more value not on the original inputs but the original concept. An invoice _is_ something that usually has multiple items ordered. It is an object in and of itself that needs no "chopping up", so to speak.
This is where simpler means don't destroy the properties of the invoice in order to make the data fit into an arbitrary data model with tautological axioms and theorems. Keep the business objects as close to what they are. A data model that can do this has many advantages.
> I suspect there are simply different expectations; I'd rather > stretch the computer to avoid stretching humans in ways they're not good at > (e.g. repetitive symbolic manipulation). I think you right here. I've been in business for many years. I would like development to be easy for me. We can watch the pendulum swinging towards making software development easier for those of us using the software. .NET, for better or worse, is attempting to make development easier (if it wasn't for the bizarre data typing and variable scoping it would be a lot easier). Hopefully dbms theory will contribute to this too.
Bill
Eric Kaun - 07 Jun 2004 19:18 GMT > [SNIP] > I've noticed that many people aren't interested in a better proposal, or > even a different proposal. Dogma rules. :-) A fun movie... :-)
> The main reason others use different data models is that they allow a much > closer interaction between the language of dbms and applications and the [quoted text clipped - 3 lines] > significantly lower than those models having additional expertise and > liaison requirements. I am all for lowering this cost - decreasing the "impedance mismatch", so to speak. However, I think my ideas move in the opposite direction - making application languages more relational, rather than DBMSs more procedural (or OO, if you like).
> Now, this advantage may not be what you are looking for. It may not be, for > that matter, what the CIO of a large company is looking for. However, in > the world of small to meduim sized businesses (SMBs) this cost advantage > means something. Agreed - however, while my experience comes from a large company, it's work done for a relatively small business unit. I was the only developer on several of the projects, and my user base was fairly small. I was DBA, developer, customer support, etc. And I still found the relational metaphor (even though I had to use SQL) much easier than XML. I've never used Pick - sounds like their environment gives them a lot of power, and while that's nice, I'd still never think of thinking of an invoice as a single proposition or "object". It's not. It's a fairly complex series of them. Just like an "order", an invoice is a fairly complex confluence of phenomena, and not even a static one (modifications / confirmations to various invoice "pieces" was common in my world, as an invoice was often correlated with multiple shipments and warehouses).
> I can't speak for Anthony and Dawn, but I place more value not on the > original inputs but the original concept. An invoice _is_ something that > usually has multiple items ordered. And I disagree. An invoice is many somethings. If your questions deal only with the set (e.g. presenting an invoice on a screen), then great - treat it as one. But when you're attempting to analyze the distribution of parts across warehouses and across time, "viewing" the invoice as a number of components is far, far more useful. So it depends on your needs, but I'd far rather place my bet on something that allows me to scale my queries and reports to more detailed questions than one that restricts me. And I still think having to correlate multiple line-item attributes across multiple MV attributes in a single File is nonsensical and error-prone.
> It is an object in and of itself that > needs no "chopping up", so to speak. Yes, it does. "Analysis" means chopping up. We gain power in chopping up. Our problems are solvable when they're chopped; our solutions are scalable and provable when they're chopped. Domains are intellectually tractable when they're separated. Holism may be fine in medicine (???) where human psychology is involved, but any translation of a "real world" domain to an automated system involves "chopping up." You can either acknowledge it and chop in a rational way, or pay the price later on.
While I'm not dogmatic about 1NF (believe it or not), or even relational, I do believe based on experience that the balance point for using relational is far, far sooner than critics would believe.
> This is where simpler means don't destroy the properties of the invoice in > order to make the data fit into an arbitrary data model with tautological > axioms and theorems. Tautological? Arbitrary? Any logical model is arbitrary; an invoice has no shape, or at least none beyond that of a piece of paper, and as I've said, if all they want to do is store the invoice, let's scan the thing into a JPG and be done with it.
"Making the data fit" is also nonsense; whatever physical and logical model you choose, you're pushing the data into something. You can either push it into something with maximum power or a lesser degree of power. Perhaps you gain short-term efficiency; in my experience with XML, you gain squat.
> Keep the business objects as close to what they are. So forgetting an invoice for a moment, what "is" a paint color? A paint formula? A carmaker code? A digital certificate store? What's their "natural form"?
There is none. What we do is unnatural. (<insert unnatural-act joke here>)
> A data model that can do this has many advantages. That can do what - model arbitrary data in its "natural form", whatever that means? I agree. If you show that to me, I'll use it.
> > I suspect there are simply different expectations; I'd rather > > stretch the computer to avoid stretching humans in ways they're not good [quoted text clipped - 7 lines] > wasn't for the bizarre data typing and variable scoping it would be a lot > easier). Hopefully dbms theory will contribute to this too. I hope so - that would be nice. I think XPath and XQuery, while convoluted, are reasonable enough operators over an XML type / type generator. I just see far more benefit from the structures and declarative constraints of relational.
- erk
Anthony W. Youngman - 07 Jun 2004 22:58 GMT >> A data model that can do this has many advantages. > >That can do what - model arbitrary data in its "natural form", whatever that >means? I agree. If you show that to me, I'll use it. And there we have our problem.
Yep, I can see where you're coming from, in practical terms. But can't you see where we're coming from? My problem, as I see it, is that 'arbitrary data in its "natural form" ' is NOT amenable to easy coercion into "relational data".
From that, it follows that relational databases are the wrong tool to model natural data with.
However, as I said, I do feel that it's like the circle/ellipse problem that Copernicus had. IF people are prepared to *look* at real data in its "natural form" and develop a model that really addresses that, while it will make one hell of a mess of current relational theory, combining a "natural form data" model with the relational model will yield a very powerful database theory.
After all, isn't that exactly what I do when I insist on normalising all my data within Pick FILEs? And I really don't see the problems you do, even if the line-items come from multiple warehouses etc etc. If the relational analyst didn't foresee that, you're going to end up in an equally big mess (experience says "even bigger" mess) than a Pickie, if both are faced with the same analysis failure.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 09 Jun 2004 16:33 GMT > >> A data model that can do this has many advantages. > > [quoted text clipped - 7 lines] > 'arbitrary data in its "natural form" ' is NOT amenable to easy coercion > into "relational data". My point is that data has no natural form, plain and simple. I've encountered too many cases over the years where accepting the "natural form" as the users stated it would have resulted in brittle design - where abstraction and extension yielded immediate results.
> From that, it follows that relational databases are the wrong tool to > model natural data with. I don't believe natural data exists. It's all unnatural. To simply accept the intuitive "sense" of business data as the users see it gives you something quickly, but in every case I've ever encountered, that simplicity is a limitation not just on future requirements, but on immediate ones as well, and on the ability to craft a strong solution to current problems. In every case I've ever experienced, abstracting and structuring beyond what the users would consider "natural" (and for which we still have no definition) has benefitted me in terms of shorter development time, more flexibility for future extensions, and better ability to explain nuances of their situations to users (e.g. better analysis).
> However, as I said, I do feel that it's like the circle/ellipse problem > that Copernicus had. IF people are prepared to *look* at real data in > its "natural form" and develop a model that really addresses that, It's difficult to address it when it's so intuitive. I understand this is a human discipline, but formal guidelines can be useful in meeting real needs.
> while > it will make one hell of a mess of current relational theory, combining > a "natural form data" model with the relational model will yield a very > powerful database theory. I doubt it, but am willing to think about it. What is that "natural form"? Is it just 1NF? Is that still the linchpin of the argument?
> After all, isn't that exactly what I do when I insist on normalising all > my data within Pick FILEs? And I really don't see the problems you do, > even if the line-items come from multiple warehouses etc etc. OK, I'll accept that - it just sounds like a massive pile of aggravation to me. Predicates can expand as requirements do (and as you learn more about them), which is where normalization acquires its power. To just pile on attribute after attribute or sub-attribute after sub-attribute, and to have to keep straight (for myself and future developers) which of those are correlated with others just sounds like a leap of faith that's both annoying and unnecessary.
> If the > relational analyst didn't foresee that, you're going to end up in an > equally big mess (experience says "even bigger" mess) than a Pickie, if > both are faced with the same analysis failure. Perhaps that's true - I can't say it is, but don't have a strong counter-argument - but in my experience, analysis failures are much less destructive when you've normalized properly.
- erk
Laconic2 - 09 Jun 2004 17:28 GMT > My point is that data has no natural form, plain and simple. I've > encountered too many cases over the years where accepting the "natural form" > as the users stated it would have resulted in brittle design - where > abstraction and extension yielded immediate results. Form is in the eye of the beholder.
The ER model has given me very good results, when it comes to data analysis, and two way communication with subject matter experts who are typically not systems experts.
The relational model, such as I know it, has given me very good results when it comes to data design, with the exception of certain cases, where a different model would have been more natural. But those are the exceptions rather than the rule.
SQL or indexed files have given me very good results when implementing a relational design. The principle difference between "SQL databases" and indexed files has been in the areas of classical DBMS services and data independence. But you can implement a relational data model using either one.
Which form is "natural". It depends. Who are we talking to?
Anthony W. Youngman - 10 Jun 2004 00:38 GMT >> >> A data model that can do this has many advantages. >> > [quoted text clipped - 13 lines] >as the users stated it would have resulted in brittle design - where >abstraction and extension yielded immediate results. Fine. But if data has no "natural form", then in the real world there is no such thing as data. Therefore there is no point in building a system to model it :-)
>> From that, it follows that relational databases are the wrong tool to >> model natural data with. [quoted text clipped - 9 lines] >flexibility for future extensions, and better ability to explain nuances of >their situations to users (e.g. better analysis). It's all unnatural? As I said above, then surely it doesn't exist ...
>> However, as I said, I do feel that it's like the circle/ellipse problem >> that Copernicus had. IF people are prepared to *look* at real data in [quoted text clipped - 22 lines] >correlated with others just sounds like a leap of faith that's both annoying >and unnecessary. Except that actually, it works EXTREMELY WELL in practice. We think in terms of language. To me a "table" is a noun. A field (your "column") is typically an adjective, or grouped into an adjectival clause. It can also be a gerund (your foreign key) which is, sort of, an adjective.
Basically, it fits the way nature has designed our brains to work. So the practice is simple. By imposing abstraction, the relational model is forcing the data into a framework that our brains are not designed to understand.
Try describing your tables in terms of natural language. I can guarantee you'll end up with a mess ... :-) one may be a noun, another is an adjectival phrase, another is a bunch of gerunds - what the hell - in simple intuitive terms - is a table?
>> If the >> relational analyst didn't foresee that, you're going to end up in an [quoted text clipped - 4 lines] >counter-argument - but in my experience, analysis failures are much less >destructive when you've normalized properly. Which is why I'm such a fan of normalising within a Pick FILE. It forces you to analyse properly, and reduces the likelihood of an analysis failure. Supporting our Pick systems at work can be a real pain, I admit. But EVERY screwup can be attributed - directly - to an analysis failure that was just plain sloppy, or poor programming practice like not separating updates from reports :-(
Unfortunately, that's typical of Pick systems - so many systems are *written* by USERS for USERS, so that while they work extremely well they also have the computer pros tearing their hair out.
But surely that says something - wasn't the design aim of SQL to be "so easy that users can use it"? Pick has actually achieved that aim!
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 10 Jun 2004 20:18 GMT > >My point is that data has no natural form, plain and simple. I've > >encountered too many cases over the years where accepting the "natural form" [quoted text clipped - 4 lines] > no such thing as data. Therefore there is no point in building a system > to model it :-) Cute... but the whole point of building a system is to impart some form to the data, to render it manipulable. Otherwise we can stick with pieces of paper in filing cabinets, if that meets the users' needs.
On second thought, you're right - there is no such thing in the real world as data. Data is our model of the real world, or at least part of that model. "Data modeling" is thus a misnomer, though I can't think of a better gerund than "modeling". "Data creation" might be confused with the actual population of a database.
> >OK, I'll accept that - it just sounds like a massive pile of aggravation to > >me. Predicates can expand as requirements do (and as you learn more about [quoted text clipped - 13 lines] > forcing the data into a framework that our brains are not designed to > understand. Heh. You're a scientist, right? Surely much of science is somewhat unnatural, at least until after years learning it? Quantum theory not even then... but in any event, "naturalness" is a poor criterion for use.
> Try describing your tables in terms of natural language. I can guarantee > you'll end up with a mess ... :-) one may be a noun, another is an > adjectival phrase, another is a bunch of gerunds - what the hell - in > simple intuitive terms - is a table? Each relation is a sentence.
> >> If the > >> relational analyst didn't foresee that, you're going to end up in an [quoted text clipped - 11 lines] > failure that was just plain sloppy, or poor programming practice like > not separating updates from reports :-( OK, I can buy that. But I'm still somewhat wary - you normalize, but not all the time. You suggest that normalization assists in verifying the correctness of your analysis. So then at some point you de-normalize. What triggers you to do so? Or do you not denormalize because something in your analysis causes you not to normalize a specific aspect of the model?
> Unfortunately, that's typical of Pick systems - so many systems are > *written* by USERS for USERS, so that while they work extremely well > they also have the computer pros tearing their hair out. > > But surely that says something - wasn't the design aim of SQL to be "so > easy that users can use it"? Pick has actually achieved that aim! Perhaps, I'm not sure. It certainly has failed in that regard, although with a few well-designed views and functions, I've taught some of my users SQL. But I think there's certainly an application level that sits above relational (and SQL), and hides some of the details that are important (or else they shouldn't be there) - namely joins.
- Eric
Laconic2 - 11 Jun 2004 11:39 GMT > Cute... but the whole point of building a system is to impart some form to > the data, to render it manipulable. Otherwise we can stick with pieces of > paper in filing cabinets, if that meets the users' needs. I'd like to suggest that "form follows function" applies here. It's architecture 101.
The function of data inside a database is profoundly different from the function of data inside a file cabinet. That's why the form is different.
> On second thought, you're right - there is no such thing in the real world > as data. Data is our model of the real world, or at least part of that > model. "Data modeling" is thus a misnomer, though I can't think of a better > gerund than "modeling". "Data creation" might be confused with the actual > population of a database. How about "data design"? Notice I didn't say "database design"...
Anthony W. Youngman - 10 Jun 2004 00:21 GMT >>> A data model that can do this has many advantages. >> [quoted text clipped - 24 lines] >an equally big mess (experience says "even bigger" mess) than a Pickie, >if both are faced with the same analysis failure. Following up to myself, a line item on an order is different to a line item on a delivery note is different to a line item on an invoice. I know analysts screw up and get it wrong, but I'll draw an almost identical analogy :-)
If your customer's warehouse moves, you do NOT want changing the warehouse address to change the delivery address on all your old invoices... so while your line item on the invoice may point to the line item on the delivery note, it MUST NOT be the same "object". Because things change. After all, in the transition from order to delivery note, it's quite possible for you to substitute an equivalent. And you can't even guarantee that the invoice line will always be identical to the delivery note rather than the order - because certainly under UK law, if the supplier substitutes on their own initiative they are obliged to bill the cheaper item, not the one that was actually supplied ...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Bill H - 09 Jun 2004 17:52 GMT Eric:
[snipped]
> Agreed - however, while my experience comes from a large company, it's work > done for a relatively small business unit. I was the only developer on [quoted text clipped - 8 lines] > various invoice "pieces" was common in my world, as an invoice was often > correlated with multiple shipments and warehouses). An excellent example of what I was talking about. In an SMB, or small business unit, there should be no staff to support the dbms, server, or any other IT function. Very part time support is all that should be necessary. This is one of the cost issues we discuss all the time.
Secondly, as a business person, an invoice _is_ a single object. I view its function much differently than the way IT might think of it. It has a singular purpose: to get cash for the company. Any discussion of an invoice needs to keep this in mind. Again decomposition/recomposition are issues.
> > I can't speak for Anthony and Dawn, but I place more value not on the > > original inputs but the original concept. An invoice _is_ something that [quoted text clipped - 9 lines] > think having to correlate multiple line-item attributes across multiple MV > attributes in a single File is nonsensical and error-prone. See my comment above. An invoice is a business object that serves a business purpose. Neither of us will ever get our payroll checks unless this invoice is handled as a business object. (remember, get the cash!)
> > It is an object in and of itself that needs no "chopping up", so to speak.
> Yes, it does. "Analysis" means chopping up. We gain power in chopping up. > Our problems are solvable when they're chopped; our solutions are scalable > and provable when they're chopped. [snipped]
This is true only when decomposition doesn't alter the fundamental characteristics of the object; otherwise analysis has tremendous risk introduced. In a business environment, IT personnel (especially DBAs) are not usually in a position to assess such risk. So why put them in this position?
My point is: since databases are such natural extensions of business, why make decomposition of business objects a requirement of storing data. Also, why make the language of databases so obscure to ordinary business people that new expertise, with their attendant costs, are required?
> > This is where simpler means don't destroy the properties of the invoice in > > order to make the data fit into an arbitrary data model with tautological [quoted text clipped - 9 lines] > into something with maximum power or a lesser degree of power. Perhaps you > gain short-term efficiency; in my experience with XML, you gain squat. Perhaps I am being a little unfair here. There are three fundamental rules in business and finance: 1) get the cash, 2) get the cash, and 3) get the cash. :-) Seriously, IT and databases provide support to a business. Their rules and nomenclature had better fit in with this environment otherwise their usefulness becomes less than cost effective. How much cost a business will tolerate is dependent on a number of factors.
I'm trying not to lose sight of the fundamental purpose of data and a dbms.
Bill
Dawn M. Wolthuis - 10 Jun 2004 00:40 GMT > > [SNIP] > > I've noticed that many people aren't interested in a better proposal, or > > even a different proposal. Dogma rules. :-) > > A fun movie... :-) indeed [I'm sure I've missed a bunch since my ISP first had nntp down and then seemed to reinitialize the database (is that the right term?) but I'll read a bit before a long weekend away from news again.]
> > The main reason others use different data models is that they allow a much > > closer interaction between the language of dbms and applications and the [quoted text clipped - 10 lines] > application languages more relational, rather than DBMSs more procedural (or > OO, if you like). And the likelihood of that is ... NIL (choosing not to use that NULL set designation). Why? Because people tend to choose solutions that work. If there were overwhelmingly good evidence that you get a better bang for the buck by using relational theory, that would be a different story. I'd strongly suggest we nudge relational databases toward pragmatism ;-)
> > Now, this advantage may not be what you are looking for. It may not be, > for [quoted text clipped - 7 lines] > developer, customer support, etc. And I still found the relational metaphor > (even though I had to use SQL) much easier than XML. Didn't some of that have to do with having to perform conversions to and from XML which might not have been necessary if the data were stored in the way it was sent? OR was it the loosey-gooseyness of it where there are not as many texts with rules for "how to"?
> I've never used Pick - > sounds like their environment gives them a lot of power, and while that's > nice, I'd still never think of thinking of an invoice as a single > proposition or "object". It's not. Perhaps you've never seen one? ;-)
> It's a fairly complex series of them. That too, but through how many portals would you want to have to go to collect all such? This has to do with how the "user" (application developer or dba, for example) should view the data.
> Just like an "order", an invoice is a fairly complex confluence of > phenomena, and not even a static one (modifications / confirmations to [quoted text clipped - 4 lines] > > original inputs but the original concept. An invoice _is_ something that > > usually has multiple items ordered. Yes and I'm trying to narrow that down a bit while trying to tap into just how I do database design given that I don't start with 1NF. It has to do with people, places and things and entities that are not functional dependent on any other entities in the system. What is that top level of nodes after ENTITY in a system, such as PEOPLE PLACES THINGS.
> And I disagree. An invoice is many somethings. If your questions deal only > with the set (e.g. presenting an invoice on a screen), then great - treat it > as one. But when you're attempting to analyze the distribution of parts > across warehouses and across time, "viewing" the invoice as a number of > components is far, far more useful. I see where you are coming from. No, an invoice is just one of these things, but the data from the invoice is also available through other data portals (for lack of a better word -- don't make me use the word "view"!) such as warehouses and parts. I can see that one difference is that the same data from my perspective is available as an invoice and as parts-invoiced. These are different entities with the same or similar data accessed. Each portal can see everything you can "get to" from there (via declared links as one might have in a join statement).
> So it depends on your needs, but I'd far > rather place my bet on something that allows me to scale my queries and > reports to more detailed questions than one that restricts me. And I still > think having to correlate multiple line-item attributes across multiple MV > attributes in a single File is nonsensical and error-prone. I'll grant that there are pros and cons and not everyone designs an invoice identically no matter what the database, but when you add in the virtual fields (derived data or data found elsewhere), the INVOICE vocabulary for everyone has what it needs to show an invoice.
> > It is an object in and of itself that > > needs no "chopping up", so to speak. > > Yes, it does. "Analysis" means chopping up. We gain power in chopping up. and putting back together
> Our problems are solvable when they're chopped; our solutions are scalable > and provable when they're chopped. again, I think you are confusing something here -- perhaps physical and logical (although I think I've ascertained that would not be like you) but perhaps it is your notion that data can only be accessed through one place - it's base relation. Remove that obstacle -- free yourself. Yes, we still divide it all up, but into wholes, not pieces.
> Domains are intellectually tractable when > they're separated. Holism may be fine in medicine (???) where human > psychology is involved, but any translation of a "real world" domain to an > automated system involves "chopping up." You can either acknowledge it and > chop in a rational way, or pay the price later on. yes, there is some chopping up and the functional dependency thing takes you quite far for that, even if you allow for both scalar values and compound ones (such as lists).
> While I'm not dogmatic about 1NF (believe it or not), or even relational, I > do believe based on experience that the balance point for using relational > is far, far sooner than critics would believe. Someday grasshopper ...
> > This is where simpler means don't destroy the properties of the invoice in > > order to make the data fit into an arbitrary data model with tautological [quoted text clipped - 4 lines] > if all they want to do is store the invoice, let's scan the thing into a JPG > and be done with it. No, the data needs to be available to other entities as well, as you pointed out.
> "Making the data fit" is also nonsense; whatever physical and logical model > you choose, you're pushing the data into something. You can either push it [quoted text clipped - 6 lines] > formula? A carmaker code? A digital certificate store? What's their "natural > form"? It is relational folks who become democratic about this and start thinking about understanding the nature of any particular noun outside of its use in "this" context. Define it based on its use and if a new use comes up, redefine it if necessary, otherwise add qualifiers to it.
> There is none. What we do is unnatural. (<insert unnatural-act joke here>) OK and it's funny, but nevermind.
> > A data model that can do this has many advantages. > > That can do what - model arbitrary data in its "natural form", whatever that > means? I agree. If you show that to me, I'll use it. as entities. Still working on how to show it.
> > > I suspect there are simply different expectations; I'd rather > > > stretch the computer to avoid stretching humans in ways they're not good [quoted text clipped - 13 lines] > see far more benefit from the structures and declarative constraints of > relational. Have you found that when you map from xml to relational, you don't need to add anything to the information in your source, but when you go the other direction, you need to add data (such as ordering)?
> - erk Cheers! --dawn
mAsterdam - 10 Jun 2004 01:34 GMT > [I'm sure I've missed a bunch since my ISP first had nntp down and then > seemed to reinitialize the database (is that the right term?) but I'll read > a bit before a long weekend away from news again.] A shared databank of messages - check the glossary ... yep it's a database! Probably a hierarchical one (MV maybe? - nah just protocol), definitely not designed with the relational model in mind, though. I have no way of viewing it as tables - because the crosspost I made about "database - prolog and relational" to two newsgroups forces me to check both groups for replies.
It works quite well, though :-)
> ... people tend to choose solutions that work. If > there were overwhelmingly good evidence that you get a better bang for the > buck by using relational theory, that would be a different story. I'd > strongly suggest we nudge relational databases toward pragmatism ;-) Roman numerals still exist. They work quite well in some contexts. Besides, there is tradition. Do you know what the QWERTY keyboard was designed for?
> ... an invoice is just one of these > things, but the data from the invoice is also available through other data [quoted text clipped - 4 lines] > accessed. Each portal can see everything you can "get to" from there (via > declared links as one might have in a join statement). Yep. The guys (mostly) who check the deliveries simply can't afford having just the invoice as their unit of work. They need to do it item by item - yep it's there/no it's not.
> again, I think you are confusing something here -- perhaps physical and > logical (although I think I've ascertained that would not be like you) but > perhaps it is your notion that data can only be accessed through one place - > it's base relation. Remove that obstacle -- free yourself. Yes, we still > divide it all up, but into wholes, not pieces. So - let's pay the whole invoice or not *if* one minor item is not there? I guess it's a way of doing business - but I would prefer to not have the database implementation decision determine this business style.
> It is relational folks who become democratic about this and start thinking > about understanding the nature of any particular noun outside of its use in > "this" context. Define it based on its use and if a new use comes up, > redefine it if necessary, otherwise add qualifiers to it. The first department to get a database wins. The rest has to jiggle their stuff into the imposed hierarchy.
> Have you found that when you map from xml to relational, you don't need to > add anything to the information in your source, but when you go the other > direction, you need to add data (such as ordering)? If the order is *that* important, you can model it, but indeed most relational modellers have a blind spot there. However, getting rid of the possible contradictions is much more difficult.
Dawn M. Wolthuis - 10 Jun 2004 14:40 GMT > > [I'm sure I've missed a bunch since my ISP first had nntp down and then > > seemed to reinitialize the database (is that the right term?) but I'll read [quoted text clipped - 10 lines] > > It works quite well, though :-) I'm sure it would work much better if implemented in Oracle, but, ah well ... ;-)
> > ... people tend to choose solutions that work. If > > there were overwhelmingly good evidence that you get a better bang for the [quoted text clipped - 4 lines] > Besides, there is tradition. > Do you know what the QWERTY keyboard was designed for? I was told once it was to keep the mechanical hammers attached to the keys from hitting each other, so they needed to put keys you would likely hit one after the other so they were not close together.
> > ... an invoice is just one of these > > things, but the data from the invoice is also available through other data [quoted text clipped - 19 lines] > to not have the database implementation decision > determine this business style. I'm still not saying this both accurately and clearly. I'll think about it some more. There is no problem paying one line item from an invoice and I'm not sure why you think there would be. Again, this is a logical way of looking at the data, but if you looked at a physical implementation, such as a paper invoice form, does it seem difficult to you to check off one line item from that form? Would it be easier conceptually or in any way for this to come on multiple sheets of paper so you could retrieve the one piece of paper related to this line item and check it off that way?
> > It is relational folks who become democratic about this and start thinking > > about understanding the nature of any particular noun outside of its use in [quoted text clipped - 3 lines] > The first department to get a database wins. > The rest has to jiggle their stuff into the imposed hierarchy. Not at all! Dept #2 identifies their major entities, some of which might align with Dept #1, others of which might be able to see information that Dept #1 maintains. There actually is no issue whatsoever that crops up here. There could be the usual types of changes that need to be made -- adding files, fields, functions, but it works just fine and again I'll have to think of how to make that perfectly clear.
> > Have you found that when you map from xml to relational, you don't need to > > add anything to the information in your source, but when you go the other [quoted text clipped - 3 lines] > most relational modellers have a blind spot there. > However, getting rid of the possible contradictions is much more difficult. As Wol has said, you can take any PICK database and view it as relational, but you can't go the other way around. If you could, then this discussion would be moot -- we could just toggle between different perspectives on the data. Now, it is possible to design a relational database that can do that, but you have to design for it. I would not be surprised if you had a relational data modeler and a pick data modeler both address the same problem space, if the PICK modeler would actually encode more information than the relational one. One example that pops to mind is with classifiers -- the relational modeler who identifies that 90% of the time a single entity fits into a single classification and if they don't then the user can pick one without anyone dying, will likely then proceed to make that a rule by putting a classification code as an attribute on that entity, rather than splitting it out into a separate table for the few times when the entity could be classifed in two ways. The PICK modeler doesn't blink and identifies that this classification code is multivalued -- if you need to put two classification codes on this entity, you do so. Overly simplified example, but ...
Later. --dawn
Eric Kaun - 10 Jun 2004 20:57 GMT > > The first department to get a database wins. > > The rest has to jiggle their stuff into the imposed hierarchy. [quoted text clipped - 5 lines] > adding files, fields, functions, but it works just fine and again I'll have > to think of how to make that perfectly clear. I understand what you're saying - but as the number of departments (or even job roles within a department) demands different views of the data, I believe that whatever vocabularies you layer on top, your "base" data design tends toward normalized relations.
> As Wol has said, you can take any PICK database and view it as relational, > but you can't go the other way around. If you could, then this discussion > would be moot -- we could just toggle between different perspectives on the > data. He did say that, and I've been thinking about it, and am not sure it's accurate. The order of values in a list attribute in a Pick file seems primarily to correlate with other attributes that relate to the same "nested" entity - e.g. a line item. Those can easily be spit out in correlated lists by foreign key traversal. Other ordering would have to be imposed, and maybe that's where the discrepancy is. Relational requires that if order is important, you make it an attribute. I've never found such to be a problem - in most cases, orderings are pseudo-IDs.
> Now, it is possible to design a relational database that can do that, > but you have to design for it. I would not be surprised if you had a [quoted text clipped - 10 lines] > to put two classification codes on this entity, you do so. Overly > simplified example, but ... That's a good example, though... I'll have to give that some thought. The question is whether any power is gained by using another relation, since it's slightly more work; I'm assuming that the classification codes themselves are stored in another relation/file, and thus you want some referential integrity so nonexistent codes don't get entered...
- Eric
Laconic2 - 11 Jun 2004 12:09 GMT > I understand what you're saying - but as the number of departments (or even > job roles within a department) demands different views of the data, I > believe that whatever vocabularies you layer on top, your "base" data design > tends toward normalized relations. I believe what you say. I also believe that this is what databases are for: sharing data between organizations that don't have a common view of the data being shared.
Half of the databases being built today should have been built using file systems. It would have been faster and cheaper. And there's no significant sharing being done. It's all encapsulated in a single subsystem.
> He did say that, and I've been thinking about it, and am not sure it's > accurate. The order of values in a list attribute in a Pick file seems [quoted text clipped - 4 lines] > if order is important, you make it an attribute. I've never found such to be > a problem - in most cases, orderings are pseudo-IDs. Months ago, I asked whether a pizza with pepperoni and onion was the same as a pizza with onion and pepperoni.
I got several cute responses, but nobody really addressed the underlying issue. Sounds like you've got a handle on it.
Eric Kaun - 11 Jun 2004 21:39 GMT > > I understand what you're saying - but as the number of departments (or > even [quoted text clipped - 11 lines] > And there's no significant sharing being done. It's all encapsulated in a > single subsystem. I agree with you from one viewpoint; on the other hand, an RDBMS doing its job (we have a bit of an employment gap in that regard) would also help your design (not just data design); I'm thinking of the ability of a TRDBMS to encode business rules. If that engine also had a significant client-side presence (which it should!), you'd be doing design in a more general sense than just "data design".
> Months ago, I asked whether a pizza with pepperoni and onion was the same > as a pizza with onion and pepperoni. > > I got several cute responses, but nobody really addressed the underlying > issue. Sounds like you've got a handle on it. It's a slippery handle, but maybe - but be careful asking about "the same as" in an OO context - that subject gets very confusing to OOers. :-)
A related and interesting issue is that of relation-valued attributes as primary keys; for example, from one of Date's non-free papers, a relation with a single column: a relation of siblings. Since in a relation order is irrelevant, you couldn't insert the tuple ( {Eric, Curt, Amy} ) if the relation already contained ( {Amy, Curt, Eric} ), for example. He did a similar thing with prime factors; the relation consisted of 2 columns: Integer and {Integer}.
Anyway, I'm rambling...
- erk
Laconic2 - 11 Jun 2004 23:07 GMT > > Months ago, I asked whether a pizza with pepperoni and onion was the same > > as a pizza with onion and pepperoni. [quoted text clipped - 14 lines] > > Anyway, I'm rambling... I don't think it's rambling at all... It's precisely where I was heading with the question.
There's a second question, along the same lines.
In the recent Pick example, showing an invoice, there's a list of account numbers, and a correlated list of amounts. That is, the second amount "goes with" the second account number. But, in the earlier pizza pick example we had a list of three toppings and an uncorrelated list of three cheeses. Now my question is this: how the heck do you know that in one case the two lists are correlated and in the other example they are uncorrelated?
Are you "just expected to know" the logical structure of invoices and pizzas enough to draw this inference? Not that there aren't things you "just have to know" in a schema of tables, but the Pick people treat it as though it's "intuitively obvious". Maybe to an SME, but maybe not to everybody else.
Bill H - 12 Jun 2004 02:36 GMT Sir:
This is a good question.
> There's a second question, along the same lines. > [quoted text clipped - 11 lines] > but the Pick people treat it as though it's "intuitively obvious". Maybe to > an SME, but maybe not to everybody else. The database side doesn't normally enforce this relationship (it could be enforced with a trigger). However, considering the number of business rules associated with such a module, and the fact that the data is usually managed from a single application, these rules are best kept in the application code. This is because the business person is much closer to the application and database, and its tools. The database nomenclature is not unique and words mean what they've always meant (i.e. noone refers to a "row" or a "column" when referencing a customer or a list of their outstanding invoices).
The field definitions are where the descriptions of the field are kept. Any such relationships that exist (such as field#s 9 & 10 below) are also kept in the field definition. Again, it is not the database that enforces these rules, it's the application. You might see the following:
009 1010]1020]1050]1090 010 2500]32500]17525]15 011 9]12]33]34]35]36]37]38]39
in a customer record where:
009 - The G/L acct#s of recurring monthly billings (such as support fees). 010 - The amount of each G/L acct#s recurring billing amount. 010 - The unpaid invoices still associated with this customer.
This is very usual and a single disk read gets the salient properties of the customer record. The dictionary for the G/L acct#s may be defined as being the controlling field with a relationship to field# 10 while field# 10 is dependent on field# 9.
As you can tell, a well defined mvDbms application uses the field definitions to describe the data (as it should be) and relationships with other data (or other tables for that matter). Naturally, the field definitions are nothing more than data maintained in the database since they're just data too. :-)
Bill
Mikito Harakiri - 12 Jun 2004 03:04 GMT > The field definitions are where the descriptions of the field are kept. Any > such relationships that exist (such as field#s 9 & 10 below) are also kept [quoted text clipped - 4 lines] > 010 2500]32500]17525]15 > 011 9]12]33]34]35]36]37]38]39 Is "]" the hole in the punch card that you store those records on? How do you manage the space? Never mind, you could code it as 2 adjacent holes!
Gene Wirchenko - 13 Jun 2004 02:00 GMT >Sir: > [quoted text clipped - 13 lines] >> Are you "just expected to know" the logical structure of invoices and >> pizzas enough to draw this inference? From what Bill H wrote below, it appears he thinks so.
>> Not that there aren't things you "just have to know" in a schema of >tables, [quoted text clipped - 7 lines] >from a single application, these rules are best kept in the application >code. This is because the business person is much closer to the application Every application module that deals with that relationship is going to have to have that code. If just one of them gets it wrong, trouble. If the rule changes, trouble.
That is why it would be better to put it in the database. Do it once, and do it right.
I have an app where I do not have the integrity rules coded in the database. It is all in the application code. It is biting me very badly right now. It made sense at the time (or rather more accurately, it did not make as much non-sense at the time), but I am certainly feeling it now.
Now, when I go to change code of this sort, any that is in more than one place is taking me a lot of time to change.
It starts off being easy to make changes, and then it gradually grows to the point where it is not so easy. Then, it can get rather awkward.
>and database, and its tools. The database nomenclature is not unique and >words mean what they've always meant (i.e. noone refers to a "row" or a >"column" when referencing a customer or a list of their outstanding >invoices). No one? You are sure that it is impossible? "This column is..." or "This row has the subtotals for...".
[snip]
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
Bill H - 14 Jun 2004 04:56 GMT Gene:
Perhaps I was not specific enough.
> >> Are you "just expected to know" the logical structure of invoices and > >> pizzas enough to draw this inference? > > From what Bill H wrote below, it appears he thinks so. Not at all. A field definition defines a field. It also defines any relationships between fields and multiple values within those fields. So my example of an A/P invoice with G/L accts and amounts would be defined as being related. As Dawn indicates one could reference the values singularly (as pairs) or as a whole.
> Every application module that deals with that relationship is > going to have to have that code. If just one of them gets it wrong, > trouble. If the rule changes, trouble. No, they just have to know of the relationship, which is defined in the field definition(s).
> That is why it would be better to put it in the database. Do it > once, and do it right. That's where the definition resides.
> I have an app where I do not have the integrity rules coded in > the database. It is all in the application code. It is biting me > very badly right now. It made sense at the time (or rather more > accurately, it did not make as much non-sense at the time), but I am > certainly feeling it now. Understandably so. Those kinds of constraints can be loaded into the database with a trigger; if that's what one wants.
> >and database, and its tools. The database nomenclature is not unique and > >words mean what they've always meant (i.e. noone refers to a "row" or a [quoted text clipped - 3 lines] > No one? You are sure that it is impossible? "This column is..." > or "This row has the subtotals for...". Well, noone within the management and administration group of the business.
:-) Bill
Gene Wirchenko - 14 Jun 2004 17:04 GMT [snip]
Well, I may be having the chance to work with some of this first-hand. I have a job interview today, and the job description included mention of a hierarchical DBMS. It will be an interesting contrast.
>> >and database, and its tools. The database nomenclature is not unique and >> >words mean what they've always meant (i.e. noone refers to a "row" or a [quoted text clipped - 6 lines] >Well, noone within the management and administration group of the business. >:-) Come now. "row" and "column" are ordinary English words. Try showing a child how to add multi-digit numbers without using the word "column". It is possible, but I submit that it is much easier to use "column". 1s and 10s columns and all that.
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
Eric Kaun - 14 Jun 2004 17:19 GMT > Sir: > [quoted text clipped - 24 lines] > from a single application, these rules are best kept in the application > code. This is, more than anything, the philosophical divide between relational and Pick folks. The more rules, the more they should be kept OUT of the application code. "Application" means just that: a judicious application. Of what? Rules. Application != definition, just as implementation != specification.
> This is because the business person is much closer to the application > and database, and its tools. Their closeness is irrelevant; they should of course be given tools that let them do their job. But encoding the rules in those tools, as opposed to having those tools generated from and respectful of the rules, is a big difference.
Granted that some rules should be configurable; that doesn't imply that all should be. The business, after all, has (or needs!) some structure.
> The field definitions are where the descriptions of the field are kept. Any > such relationships that exist (such as field#s 9 & 10 below) are also kept [quoted text clipped - 21 lines] > definitions are nothing more than data maintained in the database since > they're just data too. :-) While I see many examples like the above, can you give us an example of how the dictionary defines those? What language do you use to define the dictionary? Is it user-accessible?
- erk
Laconic2 - 14 Jun 2004 17:54 GMT Was: In an RDBMS, what does "Data" mean?
> This is, more than anything, the philosophical divide between relational and > Pick folks. The more rules, the more they should be kept OUT of the > application code. "Application" means just that: a judicious application. Of > what? Rules. Application != definition, just as implementation != > specification. It isn't just the Pick folks. The OO folks also feel that the business rules belong encapsulated inside the objects that "really know what's going on", as opposed to formalized as metadata and shared the same way data is shared.
In the days when databases were being spread to the old COBOL and files gang, this divide was called the difference between "process centric" and "data centric" views of the world. I think it's really the same divide, over and over again.
It even happens within the RDBMS vendors. I've been watching SQL gradually evolve from a bad answer to the requirement for a "universal data sublanguange" into a bad programming language, in its own right.
Bill H - 14 Jun 2004 19:41 GMT Laconic2:
It's amazing how one discovers this when one gets a little more mature (in age). :-)
Bill
"Laconic2" <laconic2@comcast.net> wrote in message
> In the days when databases were being spread to the old COBOL and files > gang, this divide was called the difference between "process centric" and > "data centric" views of the world. I think it's really the same divide, > over and over again. Eric Kaun - 14 Jun 2004 21:01 GMT Yes, you're right on all counts.
It seems that at some point in the history of computing, software developers decided to traipse down the path of implementation, rather than the other fork: declarative logic. Somehow thinking like a processor, juggling long procedures and registers (objects), is deemed better than writing engines / JVMs / compilers that take declarative statements and generate the necessary procedures.
> Was: In an RDBMS, what does "Data" mean? > [quoted text clipped - 19 lines] > evolve from a bad answer to the requirement for a "universal data > sublanguange" into a bad programming language, in its own right. Dawn M. Wolthuis - 14 Jun 2004 22:36 GMT > Was: In an RDBMS, what does "Data" mean? > [quoted text clipped - 10 lines] > on", as opposed to formalized as metadata and shared the same way data is > shared. Although then "they" (or is that "we"?) start spec'ing things as parms, perhaps pulled into the OO code by way of xml documents. You can send inputs to pre-existing functions and expect outputs (declarative) and/or write functions along with the inputs and outputs (procedural). Any way you cut it, there are functions that get executed. Those can be written by the Oracle corporation so that if you decouple your assets from the products of that company you have less than an entire solution, or you can write the functions in a language that doesn't give you quite the same ties to a single corporation (such as Java -- and of course we could then argue the merits of the Java approach, which doesn't interest me so much as discussing the merits of the Oracle approach). You we could store rules/constraints as data in the database -- declarative -- and then write functions for that part of the application rather than having the database provide those -- procedural.
> In the days when databases were being spread to the old COBOL and files > gang, this divide was called the difference between "process centric" and > "data centric" views of the world. I think it's really the same divide, > over and over again. yup, definitely
> It even happens within the RDBMS vendors. I've been watching SQL gradually > evolve from a bad answer to the requirement for a "universal data > sublanguange" into a bad programming language, in its own right. So, how should we fix the situation or is declarative vs procedural a matter of taste? --dawn
mAsterdam - 15 Jun 2004 02:08 GMT >>>This is, more than anything, the philosophical divide >>>between relational and Pick folks. The more rules, [quoted text clipped - 41 lines] > declarative vs procedural a matter > of taste? --dawn Anybody running an operation with a large shared databank, in practice, has had to bridge this gap. I haven't seen it done in theory yet, though.
Can we fix it? (in theory, that is) -- I am positive we can. But it will take a lot of unlearning. "They" and "We" does not help.
Eric Kaun - 15 Jun 2004 16:28 GMT > So, how should we fix the situation or is declarative vs procedural a matter > of taste? --dawn Declarative is better; there are enough different styles of declarative to satisfy many (not all) of those who find, say, Prolog distasteful. It's simply easier to lapse into procedural; you quickly find yourself in a quagmire, but that apparently is a lesson not easily learned (even by those who've been through one quagmire after another). It's that resistance to learning abstraction that's made me somewhat less tolerant of bad code than I used to be... the knowledge that in most cases, they'll just do the same type of thing again. And I'm even less intolerant of my own bad code... but until we use our procedural abilities to write "engines" that interpret declarations, we'll keep writing spaghetti. The trouble, of course, is that those focusing in procedural (and OO) generally don't see the value in bothering to deal with declarations at all.
What we're trying to accomplish is basic logic and computation; the restrictions of the languages we use, and the adherence to algorithmic thinking, keep us from advancing very far.
And, of course, the above is all just hand-waving and generalities, though generally true. I've just been debugging some horrific splicings of Java and InstallAnywhere (a rotten package with a GUI and no language at all), and am in a foul mood...
- erk
Dawn M. Wolthuis - 15 Jun 2004 17:56 GMT > > So, how should we fix the situation or is declarative vs procedural a > matter [quoted text clipped - 21 lines] > InstallAnywhere (a rotten package with a GUI and no language at all), and am > in a foul mood... Take a deep breath and then delegate ;-)
Both relational and declarative seem rather obvious choices to you and neither does to me. My issues with declarative include:
1) There seem to be no standards for the black box that does something with the declarations. I don't need standards-committees with years of process to adopt a standard -- I don't even know anything that would help ensure portability of such declarations. SQL has come the closest and does have standards, but we all know you can't just take any code you write against one database and run it against another.
2) It doesn't read like English -- the verbs are missing, for example. I'd like to keep some of Grace Hopper's goal alive of writing code that human beings can read
3) While hiding much that should be hidden, it "feels like" so much gets hidden that people spend time trying to figure out how it does things in order to be good at writing declarations
4) Invariably functions become one of the things to get specified. If we are going to specify both data and functions, which, afterall, is what needs to happen, then what benefit is there to specifying a function and specifying where it is to be used rather than specifying the function and then using it where it needs to be used?
It's all about data; It's all about functions. --dawn
Laconic2 - 15 Jun 2004 18:49 GMT > It's all about data; It's all about functions. --dawn Congratulations. You've just reinvented LISP.
Laconic2 - 15 Jun 2004 18:55 GMT > 2) It doesn't read like English -- the verbs are missing, for example. I'd > like to keep some of Grace Hopper's goal alive of writing code that human > beings can read This is where I agree with you.
This is where the "priesthood" consistently underestimates the "laity". In their ability to understand "code" and in the value of writing code they can read.
I would say the same applies whether it's declarative or imperative.
> 3) While hiding much that should be hidden, it "feels like" so much gets > hidden that people spend time trying to figure out how it does things in > order to be good at writing declarations That's because people are trying to be too clever by half.
Gene Wirchenko - 15 Jun 2004 22:01 GMT [snip]
>Both relational and declarative seem rather obvious choices to you and >neither does to me. My issues with declarative include: [quoted text clipped - 5 lines] >standards, but we all know you can't just take any code you write against >one database and run it against another. Let it be implemented in the manner that the implementer determines is best for the given implementation. Not having to deal with the physical level helps considerably with abstraction.
>2) It doesn't read like English -- the verbs are missing, for example. I'd >like to keep some of Grace Hopper's goal alive of writing code that human >beings can read Because it is not English? French does not read like English either.
I can read a lot of code much more easily than English. English can be very ambiguous.
>3) While hiding much that should be hidden, it "feels like" so much gets >hidden that people spend time trying to figure out how it does things in >order to be good at writing declarations Typically, a waste of time. Take variable declarations. All I need know is the behaviour of that type. I do not need to know the details of implementation in order to use the type. If someone wants to get to that level of detail, fine, but it is not required.
I think it is a confusion of not knowing the appropriate theory and thinking that examining the implementation will give that information. It will not.
>4) Invariably functions become one of the things to get specified. If we >are going to specify both data and functions, which, afterall, is what needs >to happen, then what benefit is there to specifying a function and >specifying where it is to be used rather than specifying the function and >then using it where it needs to be used? You do not get bit when the function[ality] is required in four places and you put it in only three of them. With declaration, the programming system takes care of where.
[snip]
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
Tony - 16 Jun 2004 11:30 GMT > 2) It [declarative code] doesn't read like English -- the verbs are missing, for example. I'd > like to keep some of Grace Hopper's goal alive of writing code that human > beings can read So why not add "Ensure that " or similar in front of the rule?
Laconic2 - 16 Jun 2004 14:10 GMT was: One Ring to Bind them
> > 2) It [declarative code] doesn't read like English -- the verbs are missing, for example. I'd > > like to keep some of Grace Hopper's goal alive of writing code that human > > beings can read > > So why not add "Ensure that " or similar in front of the rule? You are right, Tony. At first I agreed with Dawn's statement, because I want to keep that goal alive, as well.
But the verbs aren't missing from "declarative" sentences. The declarative mood is a feature of a verb. A declarative sentence has averb. It's not an imperative verb, but it's a verb
Tony Douglas - 16 Jun 2004 12:10 GMT <snip>
> Take a deep breath and then delegate ;-) > [quoted text clipped - 7 lines] > standards, but we all know you can't just take any code you write against > one database and run it against another. Well, I'm not sure about that; if you like Prolog style declarativeness, then that's already subject to an ISO standard. Alternatively, if you like Haskell, then the Haskell Standards Committee does a good job of keeping that in good order. So, if either of those is used for declarative constraints then yes, you should be able to port around. (Although you might like to have an argument about the "declarativeness" of Prolog when cut is used.)
> 2) It doesn't read like English -- the verbs are missing, for example. I'd > like to keep some of Grace Hopper's goal alive of writing code that human > beings can read This is no bad thing - what makes English excellent for writing poetry and novels makes it hopeless for writing any sort of formal prose - either systemised for computers or systemised for lawyers.
> 3) While hiding much that should be hidden, it "feels like" so much gets > hidden that people spend time trying to figure out how it does things in > order to be good at writing declarations I agree with the other poster - this is because people are trying to be too clever for their own good.
> 4) Invariably functions become one of the things to get specified. If we > are going to specify both data and functions, which, afterall, is what needs > to happen, then what benefit is there to specifying a function and > specifying where it is to be used rather than specifying the function and > then using it where it needs to be used? I'm a little confused by this point; care to expand on this ?
> It's all about data; It's all about functions. --dawn Cheers,
- Tony
Eric Kaun - 16 Jun 2004 15:45 GMT > Take a deep breath and then delegate ;-) Unfortunately, I am the delegate...
> Both relational and declarative seem rather obvious choices to you and > neither does to me. I don't expect everyone to agree with me; I'll just point out that I've never worked in academia, and my opinion (however wrong) is based strictly on commerical work (some internal for the company, some for resale to customers) in the manufacturing, media, and print industries.
> My issues with declarative include: > [quoted text clipped - 4 lines] > standards, but we all know you can't just take any code you write against > one database and run it against another. True enough; SQL is so convoluted, and the standard so large, that it's difficult to implement. Contrast with the J2EE spec which, while also large (and overwritten in Sun's usual verbose style, writing 100 pages where 10 would do), is implemented fairly quickly by commercial and open-source vendors within 6 months of its release. SQL is atrocious.
That said, the standard for the black box should be a coherent spec, which implies the need for a coherent language.
> 2) It doesn't read like English -- the verbs are missing, for example. I'd > like to keep some of Grace Hopper's goal alive of writing code that human > beings can read I think that dream is dead. English is a poor basis for automation, and defining a useful subset would get us into even murkier water than the c.d.t glossary. Nonetheless, relational offers a more useful basic structure (the predicate, which is a sentence) than Pick/MV, which aspires to nouns. Objects also aspire to be nouns, leaving verbs and sentences "encapsulated", whatever that means (I know that it means, but doubt its utility in all things).
> 3) While hiding much that should be hidden, it "feels like" so much gets > hidden that people spend time trying to figure out how it does things in > order to be good at writing declarations A valid criticism. I could chalk much of that up to the operationally-minded education of most programmers ("we're going to teach you to program just like a compiler writes machine code!"), but it is true that it's harder initially. I just think it would pay off, and that the learning curve, while somewhat steep, has dividends (and not just in the long term; perhaps in the "medium" range).
> 4) Invariably functions become one of the things to get specified. If we > are going to specify both data and functions, which, afterall, is what needs > to happen, But I draw a distinction between specifying a function and implementing it operationally. I can code a function in Java (even though I have to attach it to a class), but contrast that with any algebraic specification of a function; maybe something like Prolog, where you are simply defining terms using input patterns (pardon the gross oversimplification). True, you may have to optimize later; but you don't have to decide on the optimization (or on a sub-optimal algorithm) early.
> then what benefit is there to specifying a function and > specifying where it is to be used rather than specifying the function and > then using it where it needs to be used? I'm not sure what you mean here. If you're arguing for the separation of data and function, I tend to agree; however, that's a very non-OO position to take... is that what you intended?
- erk
Dawn M. Wolthuis - 16 Jun 2004 16:20 GMT <snip> <snip>
> My issues with declarative include: > > [quoted text clipped - 14 lines] > That said, the standard for the black box should be a coherent spec, which > implies the need for a coherent language. One issue ties back into the UI used by a developer for specifying anything. We aren't just talking about languages that are typed in, but some drawn with boxes, some spec'd with drop-down boxes, etc, so that the data collected for a specification (whether declarative or not) is related to the proprietary UI. Somehow we need to get not only a language captured, but also an IDE (bad word for me), sort of. So, there are standards being developed for IDEs. Whether a declarative or procedural or OO language flows from the UI, the user (developer) ends up specifying data that gets stored by the toolset as well as generating anything that might be needed. That was said in a convoluted way, but my point is that language standards are not enough to protect my investment as soon as I opt for a tool from Oracle or IBM or MS or whomever.
> > 2) It doesn't read like English -- the verbs are missing, for example. > I'd [quoted text clipped - 8 lines] > whatever that means (I know that it means, but doubt its utility in all > things). and the others who spoke up were right that a declarative language (such as SQL) does have declarations that are like sentences (SELECT this, that FROM arelation) while a language that is spec'd -- options chosen -- is a set of parms (kinda RPG-like for those who don't think that means role-playing games). And I do agree that English as we speak it is not the goal, but just like the Palm Pilot taught us to write so it could understand, I do think that is a reasonable way to approach a language. COBOL has its charm. Java, along with other modern languages are unnecessarily cryptic to the seasoned IT professional picking up the language for a first time.
> > 3) While hiding much that should be hidden, it "feels like" so much gets > > hidden that people spend time trying to figure out how it does things in [quoted text clipped - 3 lines] > education of most programmers ("we're going to teach you to program just > like a compiler writes machine code!"), I'd argue that it is not just the instruction, but the nature of people that we think in terms of what to do in what order.
> but it is true that it's harder > initially. I just think it would pay off, and that the learning curve, while
> somewhat steep, has dividends (and not just in the long term; perhaps in the > "medium" range). [quoted text clipped - 7 lines] > operationally. I can code a function in Java (even though I have to attach > it to a class) or upset the OO purists and write a class that IS a function ;-)
, but contrast that with any algebraic specification of a
> function; maybe something like Prolog, where you are simply defining terms > using input patterns (pardon the gross oversimplification). True, you may [quoted text clipped - 8 lines] > data and function, I tend to agree; however, that's a very non-OO position > to take... is that what you intended? No, rather the opposite -- data and functions are two sides of the same coin. So, clearly I wasn't writing clearly. My point is that you either write a function directly in procedural or OO code or you spec them and spec when they are to be used -- 6 of one, half-dozen of the other. Having seen so many languages without trying to collect them over my career, I have not yet seen one that makes an order of magnitude break-through in productivity for the developer. It seems to me that Java and OO languages had a good chance of providing large chunks for reuse, but they aren't there. It isn't a silver bullet, but I'm thinking the services strategy, which is language independent, has a chance at helping to get some of those bigger gains. Knock on wood.
--dawn
Gene Wirchenko - 16 Jun 2004 17:42 GMT [snip]
>That was said in a convoluted way, but my point is that language standards >are not enough to protect my investment as soon as I opt for a tool from >Oracle or IBM or MS or whomever. Particularly so when some companies consider it their duty to break standards.
[snip]
>and the others who spoke up were right that a declarative language (such as >SQL) does have declarations that are like sentences (SELECT this, that FROM >arelation) while a language that is spec'd -- options chosen -- is a set of A select is not a declaration. A declaration would be, for example, a constraint: accttype in ("A","B","C")
>parms (kinda RPG-like for those who don't think that means role-playing >games). And I do agree that English as we speak it is not the goal, but Help, help! Acronym poisoning! For a few minutes, I was trying to figure out what Rocket-Propelled Grenades had to do with the situation. Taking out bad implementations?
Then, I realised you probably meant the language whose acronym expands to "Report Progam Generator".
>just like the Palm Pilot taught us to write so it could understand, I do >think that is a reasonable way to approach a language. COBOL has its charm. There are rules. Some are explicit, some are not. Some help, some hinder.
>Java, along with other modern languages are unnecessarily cryptic to the >seasoned IT professional picking up the language for a first time. I do not like the corner cases. The switch statement in C and C-derived languages is a nearly useless thing that puts the corner case of falling through a case onto a pedestal. Yuck!
[snip]
>No, rather the opposite -- data and functions are two sides of the same >coin. So, clearly I wasn't writing clearly. My point is that you either >write a function directly in procedural or OO code or you spec them and spec >when they are to be used -- 6 of one, half-dozen of the other. Having seen Maybe by declaration.
>so many languages without trying to collect them over my career, I have not >yet seen one that makes an order of magnitude break-through in productivity [quoted text clipped - 3 lines] >independent, has a chance at helping to get some of those bigger gains. >Knock on wood. Or a compatible substitute? <g>
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
Bill H - 16 Jun 2004 17:54 GMT Eric:
[snipped]
> English is a poor basis for automation, and > defining a useful subset would get us into even murkier water than the c.d.t [quoted text clipped - 3 lines] > whatever that means (I know that it means, but doubt its utility in all > things). ...a more useful basic structure? Some solutions, yes; some solutions, no; some solutions, debatable. Most competing systems have unique benefits and this is no different. The question: is the application the best place to store business rules in a lanquage understandable to those defining those rules?
You state no (I think). I state yes! There is a tremendous cost advantage for my position, but as I've said before, cost is not an issues in all scenarios. The view that the RDM is more useful is tautological. In other words, a primary axion is: it is. So one has no alternative but to conclude so.
There is a huge disjoint here that encourages this ongoing debate. Many MV developers work with other RDBMS products too. They're exposed to two different environments and don't like the inefficiency of the RDM that requires decomposition of business objects into a language that is not understandable. This creates uncertainty, additional cost, and additional instability. On a limited budget this is generally unacceptable.
On the other hand, there are good reasons, from my perspective, to use the RDM. Tools, ease of programming, rich graphical environment, decomposition and recomposition that's completely hidden (the black box syndrome).
> > 3) While hiding much that should be hidden, it "feels like" so much gets > > hidden that people spend time trying to figure out how it does things in [quoted text clipped - 6 lines] > somewhat steep, has dividends (and not just in the long term; perhaps in the > "medium" range). I program in several procedural languages, enhanced BASIC, Javascript, HTML, VB and as little C{whatever} as I can. BASIC is by far the easiest and the one everyone I work with can read and understand. However, it is not the most useful in each circumstance.
It can always be said that we can learn a new language. But what does it bring to the table? Is it worth it? English is the most useful language in the world because it is the primary language of international business. This says nothing about its usefulness locally in Germany, Egypt, China, etc (which is minimal).
> > 4) Invariably functions become one of the things to get specified. If we > > are going to specify both data and functions, which, afterall, is what [quoted text clipped - 8 lines] > have to optimize later; but you don't have to decide on the optimization (or > on a sub-optimal algorithm) early. These are areas where I would want to defer to IT professionals. All I'd like is to use the function to accomplish my local business task; as Dawn noted, a date function. The date is stored in the database internally but the functions work miracles extracting the various pieces of a date.
In the MV model the dbms contains numerous functions to allow data extraction. This is the benefit with the dbms being both a database and an application server and a development environment. These functions become part of the data when applications are developed.
I think this is the largest misunderstanding in this thread. We're talking past each other when the RDM and the MVM are compared. The MVM includes, as I noted before, both an application server and development environment. The RDM is more constrained by design.
> > then what benefit is there to specifying a function and > > specifying where it is to be used rather than specifying the function and [quoted text clipped - 3 lines] > data and function, I tend to agree; however, that's a very non-OO position > to take... is that what you intended? Again, we're talking past each other. The MVM includes the application server so I think this explains why some think it appropriate to include functions in the database, because the function call is stored in the database.
Bill
Chris Hoess - 18 Jun 2004 04:04 GMT > Eric: > [quoted text clipped - 22 lines] > words, a primary axion is: it is. So one has no alternative but to conclude > so. Huh? I think you've missed Eric's point. As he said, the relational model is based on predicates: sentences which make a single, descriptive assertion. I confess that I don't fully understand the correspondence between Pick internals and parts of speech, but the point is clear nonetheless: our task in compiling databases, as I see it, is to be able to accurately describe relevant parts ofthe world and draw inferences from those descriptions. It is from this premise that Eric's conclusion follows: the best model for making descriptions of things is that whose basic unit is description.
I'm not 100% sure how it follows that storing constraints in the database of necessity results in a disadvantage in costs. As I just posted in another followup here, the constraints themselves are easily accessible to the applications given a good system catalog. For that matter, it doesn't follow that the constraints must be presented exactly as represented internally in the database. It seems to me that it should be possible to make a 1:1 mapping of some terse, "programmer-friendly" language into a more English-like statement of the constraint, suitable for checking by "domain experts".
>> > 3) While hiding much that should be hidden, it "feels like" so much gets >> > hidden that people spend time trying to figure out how it does things in >> > order to be good at writing declarations Is this universal, or is this just a by-product of SQL's redundancy and the need for SQL query optimizers? (e.g. can we build a system where all logically equivalent statements in this declarative language run in more or less the same amount of time, and if so, will that stop people from trying to outguess the compiler?)
> These are areas where I would want to defer to IT professionals. All I'd > like is to use the function to accomplish my local business task; as Dawn [quoted text clipped - 5 lines] > application server and a development environment. These functions become > part of the data when applications are developed. I'll have to go back home and look, but I wonder if much of this would be soluble by TTM's elevation of views to first-class relational citizens, so to speak.
 Signature Chris Hoess
Anthony W. Youngman - 19 Jun 2004 00:24 GMT >> You state no (I think). I state yes! There is a tremendous cost advantage >> for my position, but as I've said before, cost is not an issues in all [quoted text clipped - 11 lines] >follows: the best model for making descriptions of things is that whose >basic unit is description. And I think that you've missed Bill's point. Predicates describe relational data. So Eric's point holds. But relational ASSUMES that relational data can be used to describe the real world - it's an axiom. Bill doesn't think that that holds in the real world, and I don't either which is why I asked the original question that started all this. Logic (ie predicates) is great for showing that a position is self-consistent. It is useless for showing that that position is relevant or useful.
>I'm not 100% sure how it follows that storing constraints in the database >of necessity results in a disadvantage in costs. As I just posted in [quoted text clipped - 5 lines] >more English-like statement of the constraint, suitable for checking by >"domain experts". But are you storing your constraints as a trigger? (Which I would sort-of consider an application in itself.)
As I understand "the database", in order to store constraints in the database, you must store the constraints as *meta*data. Which means your end-user developers (programmer, dba, whatever) MUST be able to program *inside* the db engine so it can recognise metadata. At which point we get user-defined data types and your relational database has started down the road of mutating into an object database :-)
Actually, I've just reread what you wrote. Do you mean "constraint" as in a relational constraint - foreign-key type stuff; or as a general term for enforcing integrity. I was thinking the latter, hence my reference to triggers, but I suspect you might be meaning the former. If you did mean the former, Pick doesn't have them because it achieves the same effect as a side-effect of its implementation. So the fact that relational needs them is de-facto a hindrance relative to Pick.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony Douglas - 21 Jun 2004 11:46 GMT <snip>
> And I think that you've missed Bill's point. Predicates describe > relational data. So Eric's point holds. But relational ASSUMES that [quoted text clipped - 3 lines] > (ie predicates) is great for showing that a position is self-consistent. > It is useless for showing that that position is relevant or useful. Well, that's that darn "closed world assumption" again. Famously, you may assert that "the King of France is bald". But as far as I know no automatic logic system can tell that that's a total fib - unless you want to wire your systems up to Google or Yahoo, in which case you've abandoned any sense of a logical basis for what you're up to. Consistency is (I will hedge and say probably) the best you can achieve - correctness is beyond any automated logic system I'm aware of. As an aside, if this isn't good enough for you, what would you prefer to base your database systems on ? Intuition ? Appeals to authority ? Artist's impressions ?
> But are you storing your constraints as a trigger? (Which I would > sort-of consider an application in itself.) Oh god, here we go with storing things and triggers. How dully procedural ;)
> As I understand "the database", in order to store constraints in the > database, you must store the constraints as *meta*data. Which means your > end-user developers (programmer, dba, whatever) MUST be able to program > *inside* the db engine so it can recognise metadata. Ummm, no. The constraints would go in the catalogues, so they just appear as data too. What do you mean by "*inside* the db engine" ? Even if/when I get the source code to a DBMS server, I wouldn't expect to be updating that to add constraints !
> At which point we get user-defined data types and your relational database > has started down the road of mutating into an object database :-) Umm, no not really - it would just be turning into a relational database. Bit of a non sequitur though (constraints -> metadata -> user defined types ?)
> Actually, I've just reread what you wrote. Do you mean "constraint" as > in a relational constraint - foreign-key type stuff; or as a general [quoted text clipped - 3 lines] > same effect as a side-effect of its implementation. So the fact that > relational needs them is de-facto a hindrance relative to Pick. I would like to know your differential between a "relational constraint - foreign-key type stuff" and "a general term for enforcing integrity". How do you partition your constraints into those categories ? Personally I view a constraint as boolean expression that must never evaluate to false. Some are called "general constraints" or "assertions" because they can refer to arbitrary combinations of columns from tables, others as "base table" or "column" constraints because they refer to one particular table. (In addition to the most fundamental constraint of course - that of declaring the type of each column.)
> Cheers, > Wol Cheers !
- Tony
Anthony W. Youngman - 25 Jun 2004 23:28 GMT ><snip> > [quoted text clipped - 16 lines] >prefer to base your database systems on ? Intuition ? Appeals to >authority ? Artist's impressions ? What would I prefer to base my database systems on? Well, actually, I'd like to base them on science. On some evidence (which by its very nature, must be experimental and statistical) that says it's actually relevant to the real world.
Anything that relies solely on logic (whether automated or not) is useless. If we relied solely on logic then both Aristotle and Galileo would be right, as would Ptolemy and Copernicus (and with hindsight we would laugh BOTH the latter two out of court, despite BOTH of them having impeccable logic. Because we have "experimental" evidence that tells us their models are irrelevant. "correctness" doesn't come into it). Logic merely shows that your theories are self-consistent. But what do you do when you have TWO theories, both of which are logical and self-consistent, but are mutually inconsistent? If I took your argument at face value, I would have to believe both ...
>> But are you storing your constraints as a trigger? (Which I would >> sort-of consider an application in itself.) [quoted text clipped - 11 lines] >Even if/when I get the source code to a DBMS server, I wouldn't expect >to be updating that to add constraints ! That was why I made the comment about "user-defined types". If you define something as an integer, it is enforced by the database. If you define something as "someone's age", it cannot be negative, and it can have the values "unknown" and "dead".
So we now have the position that either you do some of your "type-constraint"ing inside the database and some outside, or you have to have some way of pushing the validation "inside" the database.
>> At which point we get user-defined data types and your relational database >> has started down the road of mutating into an object database :-) [quoted text clipped - 21 lines] >fundamental constraint of course - that of declaring the type of each >column.) I've mentioned it elsewhere, but I see two types of integrity. What I call "natural law" and "statute law". Statute law says you can't have a car without an owner, but either can cease to exist without affecting the existence of the other. That was the general term - the latter of the two I was thinking of. Natural law says the existence of one depends on the existence of the other - you can't have an invoice detail without having an invoice for it to be part of. I should have said "enforcing integrity between tables". In that case, MV simply allocates the same primary key to both - if the primary key goes, everything else goes with it :-)
And MV doesn't constrain the type of each column :-) I'd like to be able to do something like that, actually, but I'd rather validate than constrain :-) Is it relational that enforces strong typing, or just current implementations? Either way, I think it's wrong. But MV is untyped and that's as bad the other way :-) - I would love the *ability* to enforce typing, I just don't think it should be *mandatory*.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
mAsterdam - 26 Jun 2004 13:10 GMT > I would love the *ability* to enforce typing, > I just don't think it should be *mandatory*. With mandatory type enforcement, what prevents you from using types that are not very restrictive? IOW: Mandatory soup isn't eaten as hot as it is served (extended dutch saying).
Anthony W. Youngman - 05 Jul 2004 22:26 GMT >> I would love the *ability* to enforce typing, >> I just don't think it should be *mandatory*. [quoted text clipped - 3 lines] >IOW: Mandatory soup isn't eaten as hot as it is served >(extended dutch saying). You mean declaring stuff as "variant"? Fine!
I just think of my Fortran days when I *chose* to use the "declare all variables" switch. I just think that enforcing strict typing is as bad as no typing at all (and "variant" is a nice middle ground - "this variable is explicitly untyped" :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony Douglas - 21 Jun 2004 15:07 GMT <snip>
> Actually, I've just reread what you wrote. Do you mean "constraint" as > in a relational constraint - foreign-key type stuff; or as a general [quoted text clipped - 3 lines] > same effect as a side-effect of its implementation. So the fact that > relational needs them is de-facto a hindrance relative to Pick. Additionally, with regards to your last point - how does Pick have "foreign-key type stuff" as a "side-effect of its implementation" ? Is this down to its multi-valuedness ? What happens if you use Pick in faux-relational mode - do you just lose this kind of constraint ?
> Cheers, > Wol - Tony
Anthony W. Youngman - 25 Jun 2004 23:42 GMT ><snip> > [quoted text clipped - 10 lines] >this down to its multi-valuedness ? What happens if you use Pick in >faux-relational mode - do you just lose this kind of constraint ? If by "faux relational", you mean splitting a normal-form FILE into a bunch of first-normal-form FILEs, then yes, we do lose this constraint (in MV mode, anyway. Any modern MV will let you declare a relational constraint, but you are now invoking a load of code (and overhead) to do what would have happened naturally).
Because, in MV, a "cell" can itself contain a "column", imagine the "colour" column for a car. It can contain a list of colours, and deleting a car's "row" will take out the list of colours. In relational, that list would be in a different table and would require a constraint, that would effectively have to do a select followed by a multi-row delete.
So while we need a transaction mechanism to update an accounts system because we need to update the bank, the customer file, the general ledger, and other stuff besides in one hit; we do NOT need a transaction mechanism to eg delete a car, because in MV, "delete car" is atomic (I'm being a little unfair here because we ought to update owner and do a few other things as well, which might need a transaction). But the point is, if it's atomic in the real world, it should be atomic in an MV database. A relational database has to assume it's not atomic, because 9 times out of 10 normalisation means it can't be.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony Douglas - 18 Jun 2004 18:09 GMT <snip>
> ...a more useful basic structure? Some solutions, yes; some solutions, no; > some solutions, debatable. Most competing systems have unique benefits and [quoted text clipped - 7 lines] > words, a primary axion is: it is. So one has no alternative but to conclude > so. And I will state categorically "no". To paraphrase my friend Roy, "a database is for life - applications are for Christmas". There can't be cost advantage for many when the applications are more mobile than the data underneath, resulting in reimplementing the same logic over and over (in Cobol, or C, or J2EE, or .Net, or whatever the next fad will be). And if you have to change the constraints in the database, that either means that the business you're dealing with has changed (which is fair enough) or you missed something in your model (which isn't, really).
Could I refer you to Roy's recent presentation at CA World 2004 and the UK Ingres Users Association on Constraints for Performance ? It is quite Ingres specific, but it may provide food for thought. It's available on http://www.rationalcommerce.com/resources/constraints.htm.
> There is a huge disjoint here that encourages this ongoing debate. Many MV > developers work with other RDBMS products too. They're exposed to two > different environments and don't like the inefficiency of the RDM that > requires decomposition of business objects into a language that is not > understandable. This creates uncertainty, additional cost, and additional > instability. On a limited budget this is generally unacceptable. So, can I paraphrase as "we don't want to use relational, because we disapprove of the perceived inefficiency of implementations, and because we don't like the way relational modelling handles our data".
> On the other hand, there are good reasons, from my perspective, to use the > RDM. Tools, ease of programming, rich graphical environment, decomposition > and recomposition that's completely hidden (the black box syndrome). But then, "we like the fact that there are lots of nice bits and bobs to paper over the bits we don't like" ?
> I program in several procedural languages, enhanced BASIC, Javascript, HTML, > VB and as little C{whatever} as I can. BASIC is by far the easiest and the > one everyone I work with can read and understand. However, it is not the > most useful in each circumstance. It is my firm (and hardening) view that the imperative model of programming, with its silly word/record at a time view of the world and reliance of fiddling with variables, is the source of the majority of the programming world's ills. I think it is simply bizarre that in the 21st century we are still being encouraged to think in terms of updatable cells of storage and simplistic kiddie steps when far higher levels of abstraction are readily available. This is one of my two main bugbears with TTM; that it rejects declarative / applicative / referentially transparent models of programming - so although it's much better in terms of handling relations, in terms of programming, operator definition etc. it's just more of the same old stuff.
> It can always be said that we can learn a new language. But what does it > bring to the table? Is it worth it? English is the most useful language in > the world because it is the primary language of international business. > This says nothing about its usefulness locally in Germany, Egypt, China, etc > (which is minimal). Hmmmmmmmm ! Depending on the language you're using, how about provability ? Executable specifications ? No more worrying about race conditions (if you don't have shared memory, how can you have race conditions ?) ? Handling logically infinte data structures ? Type inference ? Simpler programming by case analysis ?
> > > 4) Invariably functions become one of the things to get specified. If > we [quoted text clipped - 10 lines] > (or > > on a sub-optimal algorithm) early. To reply to Eric's point in passing - or, you could simply execute the specification ... :)
> These are areas where I would want to defer to IT professionals. All I'd > like is to use the function to accomplish my local business task; as Dawn > noted, a date function. The date is stored in the database internally but > the functions work miracles extracting the various pieces of a date. Side question : are these *dbms* functions, or are they operators defined on values of the date type ? There is a difference, and it *is* an important difference...
> In the MV model the dbms contains numerous functions to allow data > extraction. This is the benefit with the dbms being both a database and an > application server and a development environment. These functions become > part of the data when applications are developed. Well, we have to be clear here; which functions are we talking about - operators on data types (such as the date functions mentioned above) which are independent of any given application or functions specific to some particular application ?
> I think this is the largest misunderstanding in this thread. We're talking > past each other when the RDM and the MVM are compared. The MVM includes, as > I noted before, both an application server and development environment. The > RDM is more constrained by design. The inclusion of an application server and/or a development environment are implementation decisions - there's nothing in the relational data model to say you can't do the same thing in an implementation of an RDBMS if you wanted to. Necessarily RDM doesn't prescribe anything about that AS or IDE.
<snip>
Anyway, it's 6 o'clock on Friday evening - it's time to be in the pub, not writing on Google !!!
Cheers !
- Tony
Bill H - 24 Jun 2004 03:26 GMT > > "Bill H" <wphaskett@THISISMUNGEDatt.net> wrote in message news:<40d07bda$1_7@corp.newsgroups.com>...
> > The question: is the application the best place to > > store business rules in a lanquage understandable to those defining those [quoted text clipped - 15 lines] > is fair enough) or you missed something in your model (which isn't, > really). Why would you think these same issues are less in a client/server model? If the rules are kept in the client application they're spread out everywhere and are far more difficult to update and maintain. In addition, in the client world there are a lot more kinds of languages de'jour available to cause this very difficulty you've identified.
If the application were moved to an application server this eliminates a lot of issues present in a client/server model. If, however, a dbms included an application server the application would, by definition, be included in the database. So, I think your comments, although proper for a RD model are not so for other models.
> > There is a huge disjoint here that encourages this ongoing debate. Many MV > > developers work with other RDBMS products too. They're exposed to two [quoted text clipped - 6 lines] > disapprove of the perceived inefficiency of implementations, and > because we don't like the way relational modelling handles our data". You can paraphrase if you'd like. :-) I would, however, note that I specifically stated the RD model is perfectly useful at times. I don't agree it is useful at _all_ times and I think there are perfectly useful and adequate alternatives that don't adhere to relational axioms. I just happen to think it is useful to keep things as simple as possible and decomposition/recomposition creates complexity and, for me anyway, confusion. I am not, unfortunately, a rocket scientist and a business mogul all in one.
> > On the other hand, there are good reasons, from my perspective, to use the > > RDM. Tools, ease of programming, rich graphical environment, decomposition > > and recomposition that's completely hidden (the black box syndrome). > > But then, "we like the fact that there are lots of nice bits and bobs > to paper over the bits we don't like" ? Lawyers see the world through legalistic terms...everything seems to be a legal conflict resolvable via dispute resolution. I tend to appreciate a broader perspective (even though I suffer from the same human condition). The tools and other "...nice bits and bobs..." aren't associated with the RD model (they aren't part of it).
> It is my firm (and hardening) view that the imperative model of > programming, with its silly word/record at a time view of the world [quoted text clipped - 7 lines] > much better in terms of handling relations, in terms of programming, > operator definition etc. it's just more of the same old stuff. I prefer a more flexible view of the world. I not looking for the next "Theory of Relativity". :-) I'm simply looking for more clarity and ease of use.
> > It can always be said that we can learn a new language. But what does it > > bring to the table? Is it worth it? English is the most useful language in [quoted text clipped - 7 lines] > conditions ?) ? Handling logically infinte data structures ? Type > inference ? Simpler programming by case analysis ? And there you have some points to bring to the collective table. :-)
> > These are areas where I would want to defer to IT professionals. All I'd > > like is to use the function to accomplish my local business task; as Dawn [quoted text clipped - 4 lines] > defined on values of the date type ? There is a difference, and it > *is* an important difference... It is important if that's the structure or rules under which we're operating. If, on the other hand, a model exists where the functions are both dbms and user defined this has some advantages too. Of course the dbms functions are application independent but more functions should be able to be added.
> > I think this is the largest misunderstanding in this thread. We're talking > > past each other when the RDM and the MVM are compared. The MVM [quoted text clipped - 7 lines] > implementation of an RDBMS if you wanted to. Necessarily RDM doesn't > prescribe anything about that AS or IDE. They are implementation decisions if is is defined as such. Some dbms products are also application servers, so there is no such decision to make (except with the purchase). This makes web development pretty simple but client/server development using SQL more difficult.
Bill
Marshall Spight - 27 Jun 2004 02:04 GMT > I prefer a more flexible view of the world. I not looking for the next > "Theory of Relativity". :-) I'm simply looking for more clarity and ease > of use. The defining characteristic of the next "Theory of Relativity" will be the huge increase in clarity and ease of use it brings.
Marshall
Bill H - 27 Jun 2004 22:02 GMT Marshall:
If we look at this through a statistical perspective we'll note that the "Theory of Relativity" comes about once in every (pick your high number). Such rigidity of focus isn't, therefore, required for most of our business tasks and can have a deleterious effect on potential solutions.
It is a great attribute of human nature that so many people can come up with so many unique ways of solving, what else, so many business problems. I would suggest these unique ways be embraced instead of laughed at because they don't meet a narrow solution model. :-)
Bill
> > I prefer a more flexible view of the world. I not looking for the next > > "Theory of Relativity". :-) I'm simply looking for more clarity and ease [quoted text clipped - 4 lines] > > Marshall Marshall Spight - 27 Jun 2004 22:56 GMT > If we look at this through a statistical perspective we'll note that the > "Theory of Relativity" comes about once in every (pick your high number). > Such rigidity of focus isn't, therefore, required for most of our business > tasks and can have a deleterious effect on potential solutions. What makes you think I'm rigidly focused? In fact, I work on business tasks for most of the week; I occasionally dabble in theory on the weekends. All work and no play makes Jack etc. Most of my work is done with Java, SQL, and HTML; I think that qualifies me pretty well as someone who can make compromises for practical business realities.
I would also assert that "required for most of our business tasks" is not the defining characteristic of this group; otherwise it would be called comp.databases.businesstasks. Since it's comp.databases.theory, I think focus (whether rigid or not) on the next "theory of relativity" is quite on-topic.
Neither is "required for ... business tasks" a filter through which to live one's life. Getting stuff done is good, but so is looking up at the stars. You can't have a balanced life without both, and more still.
> It is a great attribute of human nature that so many people can come up with > so many unique ways of solving, what else, so many business problems. Again, business problems are only a portion of the scope of data management.
> I would suggest these unique ways be embraced instead of laughed at because > they don't meet a narrow solution model. :-) I didn't hear any laughing. Nor do I subscribe to a narrow solution model.
Marshall
Marshall Spight - 27 Jun 2004 02:01 GMT > It is my firm (and hardening) view that the imperative model of > programming, with its silly word/record at a time view of the world [quoted text clipped - 7 lines] > much better in terms of handling relations, in terms of programming, > operator definition etc. it's just more of the same old stuff. Hear hear! Bravo! Bravo!
Please, tell my what is your *other* main bugbear with TTM.
Marshall
Tony Douglas - 02 Jul 2004 10:59 GMT > Hear hear! Bravo! Bravo! > > Please, tell my what is your *other* main bugbear with TTM. My other main bugbear with TTM was over the type system, but I must admit I'm softening my line on that, but not possibly for the normal reason. I felt that the facility to have mulitple possible representations for a type only served to create complications, without adding a lot to the party. I still feel that way, purely from the programming language point of view - but in terms of logical & physical independence over time, as implementations of the types might change in the database, a facility along these lines is probably necessary - otherwise, changing a type representation would require a fair bit of work altering values of that type in the database. Still not a great fan of type hierarchies though, although I can grudgingly accept some rationales for them. (Interestingly, I think Alphora 2 drops type hierarchies as well.)
> Marshall Cheers,
- Tony
Marshall Spight - 03 Jul 2004 01:59 GMT > > Hear hear! Bravo! Bravo! > > [quoted text clipped - 5 lines] > representations for a type only served to create complications, > without adding a lot to the party. It doesn't seem hugely useful to me, either. If you consider the canonical example of the CartesianPoint(int x, int y) vs. the PolarPoint(float theta, float r), using the one constructor with the other implemenation is annoyingly expensive. You have a leaky abstraction problem, in that you can't get away from the fact that your underlying type-implementation has performance consequences for the choice of interface.
Alternatively, you could have a type Point, and types CartesianPoint <: Point, and PolarPoint <: Point, which all support the same set of methods/operators/what have you. You still have the leaky abstraction problem with interface calls; getting the radius is much easier with PolarPoint than with CartesianPoint; getting the X or Y attributes is easier the other way. But it seems like less of an issue even so. I suppose one could also declare that constructing Point is done with one or the other Point subclass.
> Still > not a great fan of type hierarchies though, although I can grudgingly > accept some rationales for them. What about polymorphism? Polymorphism is the best thing about OO, and a good thing in general. You can't have subtype polymorphism without subtypes.
Marshall
Anthony W. Youngman - 18 Jun 2004 18:21 GMT >Was: In an RDBMS, what does "Data" mean? > [quoted text clipped - 10 lines] >on", as opposed to formalized as metadata and shared the same way data is >shared. Yes, but relational formalises metadata INTO data. Once it's in an RDBMS it's no longer metadata, because the rdbms doesn't understand any meaning in it and can't take advantage of that meaning so it's just data.
The ordering in a list is metadata. Convert that into a set to put into an rdbms and ORDER is now just a meaningless (as far as the db engine is concerned) bit of data.
That's where MV and OO fundamentally differ. They try to *avoid* converting metadata to data, so that the db engine can be intelligent and take advantage of it to optimise things.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony - 19 Jun 2004 13:00 GMT > The ordering in a list is metadata. Convert that into a set to put into > an rdbms and ORDER is now just a meaningless (as far as the db engine is [quoted text clipped - 3 lines] > converting metadata to data, so that the db engine can be intelligent > and take advantage of it to optimise things. Very funny! How can MV optimise ANYTHING given that you have already determined the access paths for all the data? Suppose we want to get the data out ordered by product code instead of by line number within order, which is how we chose to store it. How can MV optimise for that? An RDBMS may choose a different access path to get the data depending on the ORDER BY clause.
Bill H - 24 Jun 2004 03:37 GMT > Very funny! How can MV optimise ANYTHING given that you have already > determined the access paths for all the data? The default access path is the storage algorithm, managed by the db engine. It is debatable whether this is an optimization.
> Suppose we want to get > the data out ordered by product code instead of by line number within > order, which is how we chose to store it. How can MV optimise for > that? It will optimize how the applications people tell it to.
> An RDBMS may choose a different access path to get the data > depending on the ORDER BY clause. The access path is always the same as the default, unless otherwise specified as noted above. Output may or may not by ordered BY whatever.
Marshall Spight - 27 Jun 2004 02:21 GMT > Yes, but relational formalises metadata INTO data. Once it's in an RDBMS > it's no longer metadata, because the rdbms doesn't understand any > meaning in it and can't take advantage of that meaning so it's just > data. This is just fundamentally wrong. It's so pervasively wrong it's almost hard to know where to start.
Okay, simple example: foreign key. A foreign key relationship is metadata. DBMSs record foreign key values, and they also record foreign key relationships, in a table in the catalog. The database knows what this metadata means, and may take advantage of this knowedge in deciding how to store the data.
Those tables that form the catalog are the tables that the DBMS understands the meaning of "out of the box." (If you include no metadata when you add a user-defined table, then it could indeed be said that the DBMS doesn't understand the meaning of the new tables. Such as when someone using MySQL builds a database without specifying any integrity constaints. But that's pathological.)
Perhaps I misunderstand, but MV has only the one kind of relationship it is capable of understanding: containment. Yes, it understands the meaning of this, so it knows what a one-to-many relationship means. Does it have any other facilities for understanding meaning? It can do ON DELETE CASCADE but can it do ON DELETE RESTRICT? Can it handle and understand many-to-many relationships? Can it understand and enforce arbitrary constraints a la SQL's CHECK? (These are actual questions, not rhetorical ones; I'm not that familiar with MV.)
Marshall
Laconic2 - 27 Jun 2004 11:27 GMT > > Yes, but relational formalises metadata INTO data. Once it's in an RDBMS > > it's no longer metadata, because the rdbms doesn't understand any [quoted text clipped - 3 lines] > This is just fundamentally wrong. It's so pervasively wrong it's almost > hard to know where to start. Excellent!
Laconic2 - 27 Jun 2004 11:44 GMT > Perhaps I misunderstand, but MV has only the one kind of > relationship it is capable of understanding: containment. > Yes, it understands the meaning of this, so it knows what > a one-to-many relationship means. Does it have any other > facilities for understanding meaning? I don't know much about MV, either. What I've read in here reminds me of LISP. Only in the the sense that there are lots of pointers, everything is a tree, and every value can be replaced by a subtree.
If that's correct, then I would suggest that there is another relationship that MV can understand: sequence. Sequence is inherent in a list. Actually, the combination of sequence and containment is quite powerful. Almost powerful enough to constitute the basis for a database system!
So near and yet so far.
Bill H - 27 Jun 2004 22:43 GMT Marshall:
> Perhaps I misunderstand, but MV has only the one kind of > relationship it is capable of understanding: containment. I'm not sure why it is so difficult to express this concept. An MV environment is both a data store and an application server. It is _NOT_ an RD model. To discuss its attributes strictly from a datastore perspective is neither fair nor accurate. To understand its methodologies for directly solving business problems requires the willingness to work with its two functions: storage and application properties/methods/rules/etc.
Secondly, solving business problems requires a great deal of flexibility. A non-relational model can, in a number of instances, provide additional flexibility over and above whan the RD model can. Not because the RD model is incapable, but because the RD model declares for itself particular limitations and methods of operation. This structure doesn't always work ideally. Do I understand RD proponents to declare that it does in all circumstances?
This is fine. Why is it we can't allow that our own prejudices create limitations on the ability to formalize solutions? It takes Godel's Incompleteness Theorem to declare that some certainties have gotten too big for their britches. It can happen to anybody. :-)
For years HP calculators used RPN (reverse polish notation) instead of standard algebraic entry mode (AEM). Nowadays they offer both. Does this mean RPN is worthless or worse than AEM? No. Many people prefer RPN. Is AEM more often used? Of course, but that doesn't say anything about RPN or those who prefer to use it. The same can be said about the RD model. Not everyone prefers it.
> Yes, it understands the meaning of this, so it knows what > a one-to-many relationship means. Does it have any other [quoted text clipped - 4 lines] > constraints a la SQL's CHECK? (These are actual questions, > not rhetorical ones; I'm not that familiar with MV.) Of course it does, and can. Remember it is both a datastore and an application environment wrapped into one. So, whatever needs to be done can be done. It is simply that this additional functionality is stored in the datastore too. Most of the MV products can even understand and cope with SQL functionality.
Like I said before, this is not to say that the MV model does everything...as nothing can. But it provides an interesting confluence of tools and capabilities that render the model very useful in solving business problems for many people and businesses.
Bill
Marshall Spight - 28 Jun 2004 00:03 GMT > > Perhaps I misunderstand, but MV has only the one kind of > > relationship it is capable of understanding: containment. [quoted text clipped - 5 lines] > solving business problems requires the willingness to work with its two > functions: storage and application properties/methods/rules/etc. I read that paragraph a bunch of times, but it didn't seem to address my statement that MV has only one kind of relationship it is capable of understanding. Does it have relationships besides containment that it can understand? An example of a non-containment relationship would be cool, if the example does not require hand-written application code to work.
I think the MV and the RM world divide things up very differently. I will note first that "storage" is not a first-tier property of RM, but it is a useful, second tier function that most products support and that most applications take advantage of. It is perfectly reasonable, and useful, to have an RDBMS that does not persist its relations. We could still call this an RDBMS, but we couldn't call it a "datastore."
Another example is managing data integrity in procedural application code. In RM this is considered a "stupid database trick" to quote from another thread. There are significant disadvantages to application-managed integrity rules, to the point where I do not consider it an approach worth discussing (and yes, I've used that approach in the real world.) However, it may be that this approach has lower overhead in situations where you have small development teams and single-application databases.
> Secondly, solving business problems requires a great deal of flexibility. A > non-relational model can, in a number of instances, provide additional > flexibility over and above whan the RD model can. Please be specific. I am very interested in specific examples of specific operations or structures that you feel are hard to solve with RM or SQL and easy to solve with MV. I do believe there are some, but I want to know what they are better. As it stands I have a hard time evaluating the claims of the MV people, even the smart/nice ones such as you and Dawn. I'm not saying I believe, and I'm not saying I disbelieve. I just want to hear more specifics.
> Not because the RD model > is incapable, but because the RD model declares for itself particular > limitations and methods of operation. This structure doesn't always work > ideally. Do I understand RD proponents to declare that it does in all > circumstances? I don't know how to measure the idealness of a solution, so I have no particular claims about whether the RM is ideal or not.
> For years HP calculators used RPN (reverse polish notation) instead of > standard algebraic entry mode (AEM). Nowadays they offer both. Does this > mean RPN is worthless or worse than AEM? No. Many people prefer RPN. Is > AEM more often used? Of course, but that doesn't say anything about RPN or > those who prefer to use it. The same can be said about the RD model. Not > everyone prefers it. The problem with this analogy is that there is a simple one-to-one mapping between AEM and RPN. It is easy to show that the two methods are equivalent. I do not believe the RM and MV have such a mapping, nor do they support the same operations nor structures.
> > Yes, it understands the meaning of this, so it knows what > > a one-to-many relationship means. Does it have any other [quoted text clipped - 8 lines] > application environment wrapped into one. So, whatever needs to be done can > be done. If by this you mean that you can implement these features by hand-writing application code, then I don't consider that any achievement. I can say the same thing about some Java code and a hashtable, but it's not a good solution.
For example, in another thread someone said (over and over if I remember correctly :-) that if you delete an invoice, all the line items go with it, automatically. Okay, this is the same thing as ON DELETE CASCADE. But sometimes you want ON DELETE RESTRICT. (In other words, if you want to delete a container but it is still containing something, you have to dispose of the contained things first; you can't just throw them away.) Can you do this declaratively in MV? How is it done?
Can it handle many-to-many? I've heard some people say it can, but is integrity enforced automatically, or is it just done with references that are application managed?
Can it *automatically* enforce declared integrity constraints? Can you have an integer attribute and declare that it must always be divisible by 4? Is that enforced by auditing your application code and manually inserting a check at each place the attribute is updated, or is it enforced by declaring the constraint centrally? Does the constraint have a hole in it if you add a new place the attribute can change and forget to put the %4 check in?
> Like I said before, this is not to say that the MV model does > everything...as nothing can. I don't think I agree. For example, Java, C++, and BASIC are all able to compute anything that can be computed. They do everything that can be done; no programming language of the future can ever do anything more. (Which is not the same thing as saying there is no room for improvement: FORTRAN < C < C++ < Java < {OCaml, Haskell}, IMHO. But these are usability and expressivity issues, not computability issues; we need to be clear on the distinction.)
> But it provides an interesting confluence of > tools and capabilities that render the model very useful in solving business > problems for many people and businesses. This is not so much what is under discussion in this newsgroup. I will readily acknowledge that many people use MV to do useful work, and that they solve business problems, and that they enjoy themselves doing so. They on-topic question is the theoretical basis for the tools. Are they complete? Are they correct? Are they self-consistent?
SQL is relationally complete, over its lame type system. It could really use a better type system. This will make it more usable but it won't make it any more complete. SQL is already really good at automatically enforcing integrity; it's a real strong point. OTOH, it's not so good at ease-of-use, and could really stand to improve. I suspect MV is much better at ease of use and worse at enforcing integrity. Understanding why and where one is better and one is worse will help us better use our own systems, evaluate others, and also to build the next generation. In this respect, I think Dawn and I are engaged in exactly the same exploration, although we come at it from different backgrounds.
Marshall
Dawn M. Wolthuis - 30 Jun 2004 02:13 GMT > > > Perhaps I misunderstand, but MV has only the one kind of > > > relationship it is capable of understanding: containment. [quoted text clipped - 12 lines] > relationship would be cool, if the example does not require hand-written > application code to work. I'm not sure whether this answers your question as it depends on what you mean by "relationship" but here is another type of relationship -- each file(function/entity) requires a unique identifier for each record (instance/row-ish) so that you have this relationship for a file named People, for example
People(12345)={all attributes of this person including those stored directly as part of the People function and those derived via links to other functions}
Another type of relationship it understands is a link placed in a "virtual field" for derived data. So, even if the street address for People(12345) is not part of the "base relation" (is not stored "in" People, the function to link a foreign key to another file is a relationship that is understood. So, once that virtual field is defined, I can ask the database to
List People Name Address
> I think the MV and the RM world divide things up very differently. > I will note first that "storage" is not a first-tier property of RM, > but it is a useful, second tier function that most products support > and that most applications take advantage of. It is perfectly reasonable, > and useful, to have an RDBMS that does not persist its relations. > We could still call this an RDBMS, but we couldn't call it a "datastore." Yes -- if you remove the storage feature of MV, you get something very close to XML. So, if you add it back in, you have something very close to "an XML database" which is the creature that the big database vendors are saying doesn't exist and won't be able to save your company from having to pay for an RDBMS. Hmmm.
> Another example is managing data integrity in procedural application > code. In RM this is considered a "stupid database trick" to quote from [quoted text clipped - 3 lines] > However, it may be that this approach has lower overhead in situations > where you have small development teams and single-application databases. I think I agree in principle that we do not want constraints in application code, but would add that we don't want them stuck in the proprietary database language, inaccessible to the application either. The odd thing is that it really "seems like" the cause and effect are different -- you GET smaller development teams when you use this approach and that is concerning to me. Something is decidedly less expensive in terms of time for maintaining and having the constraints in the same language as the rest of the application just might be one of the keys to that.
> > Secondly, solving business problems requires a great deal of flexibility. A > > non-relational model can, in a number of instances, provide additional [quoted text clipped - 7 lines] > and Dawn. I'm not saying I believe, and I'm not saying I disbelieve. > I just want to hear more specifics. But you see, I have a hard time evaluating the claims of people like me. I don't have proof. I am very confident that I can find aspects of the relational model that are not based on either mathematics or science (we've had many such discussions in the past half year). I do not have any scientific evidence that models other than relational have anything better going for them. I have personal experience that is insufficient as proof and a collection of anecdotes. I'm in search of better science on the matter and a mathematical model that is as useful to the practitioner as the RM.
How do you think we could get evidence? It seems to me that a class of databases that advance the "older" approaches of Cache' and PICK could beat today's SQL databases in a number of categories. How could I prove that starting with PICK would be better than starting with SQL Server if we want to provide highly scalable but relatively inexpensive and agile software development environments in the future? It seems the best I can do is prove that the relational model is not purely mathematics, but contains some amount of religious claims.
I've considered other approaches such as approaching the Mountain Dew folks to see if they would sponsor a "Dew IT" event where we put some hypotheses to the test more. I don't know what the equivalent of a placebo would be in our tests, however. Cheers --dawn
Marshall Spight - 30 Jun 2004 16:01 GMT > I'm not sure whether this answers your question as it depends on what you > mean by "relationship" but here is another type of relationship -- each [quoted text clipped - 5 lines] > as part of the People function and those derived via links to other > functions} Gotcha.
> Another type of relationship it understands is a link placed in a "virtual > field" for derived data. So, even if the street address for People(12345) > is not part of the "base relation" (is not stored "in" People, the function > to link a foreign key to another file is a relationship that is understood. Let me see if I understand this. You have a "file" of People and it might have, directly in it, a field that is a list of addresses, so we have one:many for People:Addresses.
In another scenario, you might have a file People, and it would have directly in it a virtual field, whose value is a key into another file. The fact that it's virtual is a metadata bit, and the file being referenced is also metadata. Again, one:many for People:Addresses.
The difference between a virtual field and a non-virtual field is one of implementation; the interface is the same either way. (Yes? No?)
> So, once that virtual field is defined, I can ask the database to > > List People Name Address Uh, "List" is a command, "People" is the file, and are Name and Address fields of the file people? (Whether virtual or not?)
These files are functions because you are required to have a primary key, so the file is a function from <primary key domain> to <field range>. Are you limited having a single field that is marked unique?
> > Another example is managing data integrity in procedural application > > code. In RM this is considered a "stupid database trick" to quote from [quoted text clipped - 7 lines] > code, but would add that we don't want them stuck in the proprietary > database language, inaccessible to the application either. Yes, we've discussed this before, and I believe we agree that it's important that constaints be available to applications.
> The odd thing is > that it really "seems like" the cause and effect are different -- you GET > smaller development teams when you use this approach and that is concerning > to me. I didn't quite follow this.
> Something is decidedly less expensive in terms of time for > maintaining and having the constraints in the same language as the rest of > the application just might be one of the keys to that. I'd buy that in a second. But I still want my constraints enforced (at least) centrally.
> > Please be specific. I am very interested in specific examples of specific > > operations or structures that you feel are hard to solve with RM or SQL [quoted text clipped - 6 lines] > But you see, I have a hard time evaluating the claims of people like me. I > don't have proof. I'm not asking for proof. I know you care a lot about proof, but I don't so much. Right now I'm more interested in hearing a lot of people's stories. So if you have use-cases for situations where you feel MV is better than the relational approach, I'm happy to hear them.
> I am very confident that I can find aspects of the > relational model that are not based on either mathematics or science (we've > had many such discussions in the past half year). I do not have any > scientific evidence that models other than relational have anything better > going for them. I have personal experience that is insufficient as proof > and a collection of anecdotes. Bring on the anecdotes!
> I'm in search of better science on the > matter and a mathematical model that is as useful to the practitioner as the > RM. > > How do you think we could get evidence? Give me ten million dollars and 5 years and it should be no problem. Since I have neither, I'm willing to forego the whole proof thing.
> It seems to me that a class of > databases that advance the "older" approaches of Cache' and PICK could beat > today's SQL databases in a number of categories. How could I prove that > starting with PICK would be better than starting with SQL Server if we want > to provide highly scalable but relatively inexpensive and agile software > development environments in the future? I have serious doubts about the scalability claim, but then I have an extreme view of scalability which has been skewed by my workplace. However I can believe the agile part.
> It seems the best I can do is prove > that the relational model is not purely mathematics, but contains some > amount of religious claims. If I just stipulate that, will it help?
Any time we are building a model, what we are doing is making design choices. It is good if these choices are consistent with good mathematics, but even if we completely succeed at that, it doesn't mean we are doing math and not design. It's always design.
And there's not just one math, either. You come up with a formalism, and if it useful, then we rejoice. It's certainly possible for a formalism to be completely sound and self-consistent and utterly useless.
Marshall
Bill H - 05 Jul 2004 03:23 GMT Marshall:
My comments are embedded.
> "Bill H" <wphaskett@THISISMUNGEDatt.net> wrote in message ... > > "Marshall Spight" <mspight@dnai.com> wrote in message [quoted text clipped - 5 lines] > > I'm not sure why it is so difficult to express this concept. An MV > > environment is both a data store and an application server. It is _NOT_ an
> > RD model. To discuss its attributes strictly from a datastore perspective
> > is neither fair nor accurate. To understand its methodologies for directly
> > solving business problems requires the willingness to work with its two > > functions: storage and application properties/methods/rules/etc. [quoted text clipped - 12 lines] > and useful, to have an RDBMS that does not persist its relations. > We could still call this an RDBMS, but we couldn't call it a "datastore." An MV relationship isn't an RM relationship; at least it isn't stored as such. It is an expression of a relationship, the containment of which resides in the database. e.g. a relationship exists between a vendor and invoices, between a check and invoices, between a bank transaction and a check, and between a reconciliation and checks.
So, your statement that the only relationship an MV dbms can understand is containment is not true; though not exactly false either because it does fundamentally understand that. The MV model understands defined relationships, which are stored (or contained) within the database. These defined relationships are then understood by the MV model.
I can set a relationship between a vendor and invoices by simply storing the data required for the relationship then defining the relationship. So, I can say:
:select vendors invoices :sort invoices with no pddate by pddate I can then define the above as a stored procedure named "List-Unpaid-Invoices", then execute:
:List-Unpaid-Invoices '12345' which will list all unpaid invoices for vend# '12345'.
Now, was this only containment? I think it was much more than that. However, it is what it is and the RM model will do the same thing; just differently. Notice that the relationship and the relationship data has to be stored somewhere in both models. One of the interesting aspects of the MV model is the data and relationship is stored in the database. The application will usually initiate the creation of the relationship data but it can be done via table triggers or relationship triggers separate from the application (as long as it's defined that way).
There's nothing tricky about this. All dbms models have to do the same things to accomplish the same tasks. The MV model doesn't do some miraculous mumbo-jumbo and neither does the RM model. Both store data, both store relationships, and both store constraints. In the MV model all this is stored in the database!
> Another example is managing data integrity in procedural application > code. In RM this is considered a "stupid database trick" to quote from [quoted text clipped - 3 lines] > However, it may be that this approach has lower overhead in situations > where you have small development teams and single-application databases. The models we use create "stupid" tricks. It's the models that create the constraints to make some tricks stupid and others smart. An RM "stupid" trick may be an MV smart move; and visa-versa. However, most design and development are constrained by the base delivery model: server vs client/server.
> > Secondly, solving business problems requires a great deal of flexibility. A
> > non-relational model can, in a number of instances, provide additional > > flexibility over and above whan the RD model can. [quoted text clipped - 6 lines] > and Dawn. I'm not saying I believe, and I'm not saying I disbelieve. > I just want to hear more specifics. Let's reconcile a bank account. We need a primary account table and a transaction table in both models. However, in the MV model we don't need anything more than this. We will define the keys of the transactions to include the account#, so the account table can (and probably will) contain the ref# of the transactions. The transaction key would look like: Account# and transaction#. The account row would include all of the uncleared transaction#s and the key of each transaction would include the account key. We have a defined relation in a format other than as defined in RD model. Don't let this fools us, a relation is a relation and has to be defined and stored somewhere. In the MV model it is simply stored in the database. My goodness, we've just defined a many to many relationship (please note: this description if fundamentally viewed from an MV model perspective).
Now we get a simple download file from our financial institution which usually includes the fed route#, the account#, the transaction#, the date cleared, and the amount of the cleared transaction. It can be in any format, we don't care as long as it's consistent. :-)
Our transaction# is encoded on the financial instrument (the check or the deposit) so the bank sends it back to us as their transaction#. Part of the transaction# returned by the bank is our account#, since it was part of our transaction key!
Now, what good is this? As Mr Youngman points out it only takes one disk read to get the account and its relationships to the uncleared transactions. This is an almost instantaneous response to our web clients. So there's an upside.
> > Not because the RD model > > is incapable, but because the RD model declares for itself particular [quoted text clipped - 4 lines] > I don't know how to measure the idealness of a solution, so I have > no particular claims about whether the RM is ideal or not. I sit in a corporate VP meeting and discuss this with them and they with me. We all see the same thing. I'm not the odd man out here. In addition, I can almost directly translate their vast knowledger to the dbms design and relationship definition. I think this is good!
> > For years HP calculators used RPN (reverse polish notation) instead of > > standard algebraic entry mode (AEM). Nowadays they offer both. Does this > > mean RPN is worthless or worse than AEM? No. Many people prefer RPN. Is > > AEM more often used? Of course, but that doesn't say anything about RPN or
> > those who prefer to use it. The same can be said about the RD model. Not > > everyone prefers it. [quoted text clipped - 3 lines] > equivalent. I do not believe the RM and MV have such a mapping, > nor do they support the same operations nor structures. They really do. They do primarily the same things. We're not talking nuclear reactors and cigarettes. :-)
> > > Yes, it understands the meaning of this, so it knows what > > > a one-to-many relationship means. Does it have any other [quoted text clipped - 7 lines] > > Of course it does, and can. Remember it is both a datastore and an > > application environment wrapped into one. So, whatever needs to be done can
> > be done. > > If by this you mean that you can implement these features by > hand-writing application code, then I don't consider that any > achievement. I can say the same thing about some Java code and > a hashtable, but it's not a good solution. Yes and no. You can always hand write code in an applicaton. You can also store the code in the dbms as triggers, constraints, relations, etc. The difference is that the RD model does some things one way and the MV model does some things the other way. The MV model is much more application-centric. This is only bad when working with the RD model, where this is defined as bad (or "stupid"). Most things are done the same though.
:-) It's nice to have a model implement some features for us. It saves us time, and I realize, and appreciate, this. Most of my experience working with RD models is: you give me data and I'll give you data.
> For example, in another thread someone said (over and over > if I remember correctly :-) that if you delete an invoice, all [quoted text clipped - 5 lines] > throw them away.) Can you do this declaratively in MV? How > is it done? Let me point out that the MV model communicates with the database, with respect to data maintenance, via an application language. Where a RD model might say:
INSERT ...
the MV model would need to:
OPEN My file READ and Lock New record (make sure noone else is) or READ and Lock Item to change (make sure noone else is) CHANGE data WRITE data TO My file
Lock contention is a part of the dbms. There is no such thing as "optimistic" locking (unless one is an idiot). :-) But this is an MV perspective, not an RD perspective.
> Can it *automatically* enforce declared integrity constraints? > Can you have an integer attribute and declare that it must [quoted text clipped - 4 lines] > it if you add a new place the attribute can change and forget > to put the %4 check in? Remember, a constraint is defined and stored somewhere. The only value with storing outside the application is if some other application is using it. This is not a usual requirement but an MV model can simply enforce this via via a trigger. We're much more inclined to place this in the application because all MV application are server-centric and run in the dbms.
> > Like I said before, this is not to say that the MV model does > > everything...as nothing can. [quoted text clipped - 7 lines] > usability and expressivity issues, not computability issues; we > need to be clear on the distinction.) Rule one in life: never say never. Rule two in life: never say I can do everything. :-)
> > But it provides an interesting confluence of > > tools and capabilities that render the model very useful in solving business
> > problems for many people and businesses. > [quoted text clipped - 4 lines] > basis for the tools. Are they complete? Are they correct? Are they > self-consistent? I thoroughly agree. That's what keeps us all here...the amount of knowledge and interesting thought-provoking ideas elucidated.
> SQL is relationally complete, over its lame type system. It could > really use a better type system. This will make it more usable but [quoted text clipped - 7 lines] > and I are engaged in exactly the same exploration, although we > come at it from different backgrounds.
:-) Bill
Marshall Spight - 05 Jul 2004 19:05 GMT > > I read that paragraph a bunch of times, but it didn't seem to > > address my statement that MV has only one kind of relationship [quoted text clipped - 38 lines] > However, it is what it is and the RM model will do the same thing; just > differently. So I read all of your comments, and I couldn't figure out what they meant. I didn't see any clear answer to whether MV supports relationships besides containment. In fact you evaluated that stament as "not true but not exactly false." I have no idea what that means.
> Notice that the relationship and the relationship data has to > be stored somewhere in both models. Of course.
> One of the interesting aspects of the > MV model is the data and relationship is stored in the database. Uh, same with RM.
> > Another example is managing data integrity in procedural application > > code. In RM this is considered a "stupid database trick" to quote from [quoted text clipped - 9 lines] > development are constrained by the base delivery model: server vs > client/server. I disagree. There are specific well-documented and *fundamental* disadvantages to managing integrity in applications instead of centrally. This is independent of MV vs. RM vs. whatever.
> > Please be specific. I am very interested in specific examples of specific > > operations or structures that you feel are hard to solve with RM or SQL [quoted text clipped - 10 lines] > the ref# of the transactions. The transaction key would look like: > Account# and transaction#. You're going to reuse transaction numbers in different accounts? And you're also going to include the transaction number in the account table? That kind of redundancy leads directly to data corruption.
> Now, what good is this? As Mr Youngman points out it only takes one disk > read to get the account and its relationships to the uncleared transactions. > This is an almost instantaneous response to our web clients. So there's an > upside. Ugh. Let's please not talk about disk reads.
> > I don't know how to measure the idealness of a solution, so I have > > no particular claims about whether the RM is ideal or not. [quoted text clipped - 3 lines] > can almost directly translate their vast knowledger to the dbms design and > relationship definition. I think this is good! How is this a response to what I wrote? It sounds like what you are saying is "I work in the computer industry."
> > > For years HP calculators used RPN (reverse polish notation) instead of > > > standard algebraic entry mode (AEM). Nowadays they offer both. Does this [quoted text clipped - 11 lines] > They really do. They do primarily the same things. We're not talking > nuclear reactors and cigarettes. :-) Okay, how do you map a relational table like this into MV:
create table Tri ( a int, b int, c int, unique(a,b), unique(b,c), unique(a,c) );
> > > > Yes, it understands the meaning of this, so it knows what > > > > a one-to-many relationship means. Does it have any other [quoted text clipped - 16 lines] > > Yes and no. This kind of answer is hard to work with. It's much easier to understand you when you give me a straight answer. Saying "yes and no" is worse than not responding, because it adds confusion.
> You can always hand write code in an applicaton. You can also > store the code in the dbms as triggers, constraints, relations, etc. The > difference is that the RD model does some things one way and the MV model > does some things the other way. Remember when I asked you to "be specific?"
> The MV model is much more > application-centric. This is only bad when working with the RD model, where > this is defined as bad (or "stupid"). Most things are done the same though. No, it's bad for more fundamental reasons. If you don't enforce constraints centrally, then integrity support becomes ad-hoc and application-dependent, so one application might fail to enforce a constraint. A constraint that isn't enforced centrally is a constraint that won't necessarily hold.
> > For example, in another thread someone said (over and over > > if I remember correctly :-) that if you delete an invoice, all [quoted text clipped - 24 lines] > "optimistic" locking (unless one is an idiot). :-) But this is an MV > perspective, not an RD perspective. Is this supposed to be a response to my earlier paragraph? Because I don't see the answer to my question about "can MV do ON DELETE RESTRICT" anywhere. Can it? What relevance does lock contention have to my question?
> > Can it *automatically* enforce declared integrity constraints? > > Can you have an integer attribute and declare that it must [quoted text clipped - 10 lines] > via a trigger. We're much more inclined to place this in the application > because all MV application are server-centric and run in the dbms. So, is that a "yes?" Are you saying it *is* possible to enforce a constraint centrally?
> > > Like I said before, this is not to say that the MV model does > > > everything...as nothing can. [quoted text clipped - 10 lines] > Rule one in life: never say never. Rule two in life: never say I can do > everything. :-) It sounds like you don't understand Turing completeness. Also note that I didn't say "everything." I said "anything that can be computed." And I stand by my statement that BASIC can compute anything that can be computed; it is a Turing complete language. This is not the same thing as saying that it is a good language, though.
I appreciate you're trying to help me understand, but I'm having trouble following your posts. It seems like you quote me, then respond, but the response, while interesting isn't a response per se but you talking about something else. I get lost.
Marshall
Bill H - 08 Jul 2004 08:48 GMT Marshall:
"Marshall Spight" <mspight@dnai.com> wrote...
[snipped]
> So I read all of your comments, and I couldn't figure out what > they meant. I didn't see any clear answer to whether MV supports > relationships besides containment. In fact you evaluated that stament > as "not true but not exactly false." I have no idea what that means. One of the primary impediments to communication is a different use of words and definitions. From a non-RD model perspective a relationship exists when the properties of two pieces of data can be defined as having an aspect or quality that connects them as being or belonging or working together or as being of the same kind <the relation of time and space>. This seems obvious to me but I do not use the RD model, or mathematical, definition.
So, there exists a relationship between vendors and invoices. Containment has nothing to do with that relationship, except the relationship is contained within the database.
What exactly is this relationship and how is it stored? I can store the invoice#s within the vendor in the vendor table. This defines a relationship in the MV model (although there are a number of other ways to do so). How is this relationship going to be exposed? An example would be to create a virtual field definition in the vendor table so that when asked, will deliver the list of invoices associated with this vendor and any data contained within the invoice table.
The phrase "...not true but not exactly false..." was intended to reflect my desire to avoid being argumentative or didactic. My apologies for being obtuse and misleading. :-)
> > One of the interesting aspects of the > > MV model is the data and relationship is stored in the database. > > Uh, same with RM. I'm sorry to say this is another of those "yes but not really" observations. The relationship is stored in the relational database but not really like it is stored in the MV database. This is true because the MV model treats everything like regular data; unlike the RD model. As such, everything is stored in the database tables right along with all the other data; names, addresses, relationships, metadata, functions, constraints, stored procedures, application code, compiled code, etc. All MV tools, like RD tools, are available for these additionally defined data; it's just stored with all other data in the exact same formats.
What makes this different is only that these tables are usually part of the database structure of the production data. So, you'd have tables for constraints, stored procedures, relationships, metadata, application code, and data all within a single database structure built for an application. The RD model would normally keep this kind of data separate from the production data within its own special system tables.
However, once again, the stuff stored is the same. It's just where its stored in relation to the normal everyday production data that's different.
> > The models we use create "stupid" tricks. It's the models that create the > > constraints to make some tricks stupid and others smart. An RM "stupid" [quoted text clipped - 5 lines] > disadvantages to managing integrity in applications instead of centrally. > This is independent of MV vs. RM vs. whatever. Don't forget, these "well documented" disadvantages revolve around the RD model, as its structure requires a different dance. If a dbms stores integrity constraints in the dbms, and the application is stored and runs in the dbms, then it makes little difference whether the integrity constraint is in or out of the application, as the application is located centrally in the dbms. I would point out that from this perspective it is wise to modularize the application so other applications can utilize the defined constraints.
> > Let's reconcile a bank account. We need a primary account table and a > > transaction table in both models. However, in the MV model we don't need [quoted text clipped - 7 lines] > the account table? That kind of redundancy leads directly > to data corruption. Ah, excuse me? One reuses check#s in different accounts all the time. One reuses invoice#s for different vendors all the time too. To include the transaction# in the account table is to do nothing different than needs to be done anyway to define a relation; A > B and B < A. Redundancy? Storing the transaction#s in the account saves having to store the "transaction to account" relationship, as it is already defined by the transaction key. So this reduces redundancy. Data corruption? No different than anywhere else. Synchronization code performs the same task in all dbms products, although sometimes differently.
> > > I don't know how to measure the idealness of a solution, so I have > > > no particular claims about whether the RM is ideal or not. [quoted text clipped - 6 lines] > How is this a response to what I wrote? It sounds like what you > are saying is "I work in the computer industry." I'm not taking a stand here claiming the RD model is bad. Nor am I stating that other models are necessarily better. I'm merely pointing out there are other methods and tools and dbms models that work.
My point is the nomenclature, syntax, and concepts within the MV model are specifically modeled after those of business. Business people feel at ease working with the model because of its business friendly terms and concepts. That's why a lot of the MV modeling is done in a rapid development structure directly with business people.
> > They really do. They do primarily the same things. We're not talking > > nuclear reactors and cigarettes. :-) [quoted text clipped - 10 lines] > unique(a,c) > ); A good example of the point I was trying to make, and have made before, about deconstruction/reconstruction. To a business person this is complete nonsense. However, it isn't nonsense to make sure a group of values are not duplicates or to make sure that certain fields are certain data types.
So, for instance, it is important that all invoice#s for a particular vendor are unique (we certainly wouldn't want to pay the same invoice twice). In the MV model the key is _not_ part of the data set but is part of the key (I would read a dataset using the key as the unique identifier). Thus both models do the same thing but a little differently. Other fields can be constrained. However, they're not constrained in the syntax of the table creation statement. They're done differently. So I can say any invoice must have a unique invoice# and a unique creation-stamp.
> > > If by this you mean that you can implement these features by > > > hand-writing application code, then I don't consider that any [quoted text clipped - 6 lines] > you when you give me a straight answer. Saying "yes and no" is worse > than not responding, because it adds confusion. I can understand your frustration. But it is true. You can write application code to implement these features. This code isn't at all one monolithic .exe. Remember, the application code sits inside the dbms just like any other data so its proximity to the datastore is significantly closer than in the RD model. A simple application may contain a thousand executables and a vast portion of the application is probably nothing more than functions that enforce integrity constraints, relationships, business rules, etc. So a function or API can be written and used by the application code just like an .OCX or .dll or .exe can be used. I would call this written in the application but, in the RD model this could easily be defined as a separate API residing on the application server serving any application wishing to use its functionality.
> > You can always hand write code in an applicaton. You can also > > store the code in the dbms as triggers, constraints, relations, etc. The > > difference is that the RD model does some things one way and the MV model > > does some things the other way. > > Remember when I asked you to "be specific?" I can set a trigger to enforce integrity within the bank account table so if a bank transaction is cleared, the uncleared reference to it within the bank account table is removed. So, right here I've set both a trigger and constraint on a relation at the same time. I know the RD model accomplishes the same task but differently.
> > The MV model is much more > > application-centric. This is only bad when working with the RD model, where [quoted text clipped - 4 lines] > so one application might fail to enforce a constraint. A constraint that > isn't enforced centrally is a constraint that won't necessarily hold. I cannot emphasize this enough; the MV model is located centrally! The application server and dbms server reside within the same environment, on the same machine. Therefore, all constraints are enforced centrally. The centralized application APIs can be called from outside the application. Additional constraints can be developed to provide service to more than one application and to meet ever-changing requirements.
> Is this supposed to be a response to my earlier paragraph? Because > I don't see the answer to my question about "can MV do ON DELETE > RESTRICT" anywhere. Can it? What relevance does lock contention > have to my question? The answer is an emphatic yes. But not by saying: "ON DELETE RESTRICT"; unless one wants to utilize the SQL functionality within the dbms. More like:
:select table with no defined_constraint :delete table
> So, is that a "yes?" Are you saying it *is* possible to enforce a > constraint centrally? Remember, the constraints are stored centrally, as is the application APIs, custom business rules, relationships, functions, data, metadata, etc. So not only is it possible to enforce constraints centrally but it is required, and assumed from this model's perspective.
> > Rule one in life: never say never. Rule two in life: never say I can do > > everything. :-) [quoted text clipped - 4 lines] > can be computed; it is a Turing complete language. This is not > the same thing as saying that it is a good language, though. There is a lot in the universe I don't understand. I understand the word tautology, though. :-)
My tendencies are to accept imperfections and deal with them rather than think I'm correct, if only within my limited definition of what correct is.
> I appreciate you're trying to help me understand, but I'm > having trouble following your posts. It seems like you > quote me, then respond, but the response, while interesting > isn't a response per se but you talking about something else. > I get lost. And here I was thinking I was answering your queries directly, albeit in a slightly different perspective. Perhaps my writing skills, and clarity of thought, will improve with time. :-)
Bill
Marshall Spight - 10 Jul 2004 16:44 GMT > "Marshall Spight" <mspight@dnai.com> wrote... > > One of the primary impediments to communication is a different use of words > and definitions. Yes. I'm trying to learn different terminology.
> From a non-RD model perspective a relationship exists when > the properties of two pieces of data can be defined as having an aspect or > quality that connects them as being or belonging or working together or as > being of the same kind <the relation of time and space>. In other words, a relation is anything we say it is. This works for me.
> This seems obvious > to me but I do not use the RD model, or mathematical, definition. Actually, the mathematical definition (as best I understand) is pretty much the same thing: a "relation" is a set of pair. How do we decide what the set is? It's anything we care to say it is.
> So, there exists a relationship between vendors and invoices. Containment > has nothing to do with that relationship, except the relationship is > contained within the database. I dunno. If every invoice has exactly one vendor, I think "containment" is a pretty good term to describe that relationship. (And a popular one as well.) Do you have a preferred term for "every x has an associated y?"
> What exactly is this relationship and how is it stored? I can store the > invoice#s within the vendor in the vendor table. I want to make sure I understand: when you say "invoice#***s***" (I especially note the "s") you mean to say that the vendors table/file/collection has an attribute/field that is a **list** of invoice numbers? Or is it a list of invoices?
> This defines a > relationship in the MV model (although there are a number of other ways to > do so). As an aside: can you enumerate the different ways?
> How is this relationship going to be exposed? An example would be > to create a virtual field definition in the vendor table so that when asked, > will deliver the list of invoices associated with this vendor and any data > contained within the invoice table. The term "virtual" here; what does it mean? Is there an online reference you like that I could use to read about this?
> The phrase "...not true but not exactly false..." was intended to reflect my > desire to avoid being argumentative or didactic. My apologies for being > obtuse and misleading. :-) 'Tis nothing. Thank you for having the conversation with me. I hope I did not come off as impatient.
> > > One of the interesting aspects of the > > > MV model is the data and relationship is stored in the database. > > > > Uh, same with RM. > > I'm sorry to say this is another of those "yes but not really" observations.
:-)
> The relationship is stored in the relational database but not really like it > is stored in the MV database. This is true because the MV model treats [quoted text clipped - 4 lines] > tools, are available for these additionally defined data; it's just stored > with all other data in the exact same formats. The distinction you're drawing is that in addition to all the stuff that both model store, (names, addresses, relationships, metadata, functions, constraints, stored procedures) the MV model additionally stores application code, compiled code, MV tools, etc. Is that right?
Again, it's something I'd like to try out. Can you recommend a free solution; the mysql of MV?
> What makes this different is only that these tables are usually part of the > database structure of the production data. So, you'd have tables for > constraints, stored procedures, relationships, metadata, application code, > and data all within a single database structure built for an application. > The RD model would normally keep this kind of data separate from the > production data within its own special system tables. This separation you describe is not much of a separation.
> However, once again, the stuff stored is the same. It's just where its > stored in relation to the normal everyday production data that's different. Okay.
> > I disagree. There are specific well-documented and *fundamental* > > disadvantages to managing integrity in applications instead of centrally. [quoted text clipped - 8 lines] > modularize the application so other applications can utilize the defined > constraints. I am hesitant here. On the one hand, that which makes application-enforced constraints not a good choice would seem to apply whether one kept the applications in the DB or on the filesystem. But having the applications stored centrally makes them central as well.
What if you have two comparatively unrelated applications that work against the same schema; both applications are in the dbms; one application enforces a constraint and one doesn't (for whatever reason: a bug, or the programmer just forgot about it.) Wouldn't that be a pathway for data corruption to enter the system?
> > > Let's reconcile a bank account. We need a primary account table and a > > > transaction table in both models. However, in the MV model we don't [quoted text clipped - 14 lines] > transaction# in the account table is to do nothing different than needs to > be done anyway to define a relation; A > B and B < A. I agree up until the last sentence. You don't need both A > B and B < A to define a relation; you only need one or the other. Likewise, you don't need a list of invoice numbers in the accounts table *and* an account number in the invoices table; that's a denormalization that will lead to corruption. You need one or the other, but both is bad, (unless they are just different views on the same data. Are they? Or are they stored separately, and able to become out of sync.)
> Redundancy? Storing > the transaction#s in the account saves having to store the "transaction to > account" relationship, as it is already defined by the transaction key. So > this reduces redundancy. Uh, no. I mean, it's less redundancy that storing it three times, but it's more than just storing it once.
> Data corruption? No different than anywhere else. > Synchronization code performs the same task in all dbms products, although > sometimes differently. If you don't store the same information more than once, then the entire concept of "synchronization code" (first time I've heard the term) is unnecessary.
> I'm not taking a stand here claiming the RD model is bad. Nor am I stating > that other models are necessarily better. I'm merely pointing out there are > other methods and tools and dbms models that work. Sure; yes. My interest in these conversations is to understand what works well in each of various approaches, and also what doesn't.
> My point is the nomenclature, syntax, and concepts within the MV model are > specifically modeled after those of business. Hmmm. Data management is something that is very useful to business, but it is not business-oriented in and of itself. Same with adding up columns of numbers.
> > Okay, how do you map a relational table like this into MV: > > [quoted text clipped - 11 lines] > about deconstruction/reconstruction. To a business person this is complete > nonsense. You mean because they don't understand SQL? I don't get why we're talking about business people here; we're discussing data management.
> So, for instance, it is important that all invoice#s for a particular vendor > are unique (we certainly wouldn't want to pay the same invoice twice). In [quoted text clipped - 4 lines] > creation statement. They're done differently. So I can say any invoice > must have a unique invoice# and a unique creation-stamp. Hmmm. You didn't really answer my question.
I don't really care whether the constraints are part of the table declaration statement or not; I care about whether they are declarative, automatically enforced, and at least flexible enough to model that each of these three pairs must be unique: {a,b}, {a,c}, {b,c}
Is there a way to do that? I'm guessing not.
> I can set a trigger to enforce integrity within the bank account table so if > a bank transaction is cleared, the uncleared reference to it within the bank > account table is removed. So, right here I've set both a trigger and > constraint on a relation at the same time. I know the RD model accomplishes > the same task but differently. Do you have to set these up manually every place a bank transaction clear is invoked, or do you just do it once?
> I cannot emphasize this enough; the MV model is located centrally! The > application server and dbms server reside within the same environment, on > the same machine. Therefore, all constraints are enforced centrally. The > centralized application APIs can be called from outside the application. > Additional constraints can be developed to provide service to more than one > application and to meet ever-changing requirements. Where can I read more about this intriguing concept?
> > It sounds like you don't understand Turing completeness. Also note > > that I didn't say "everything." I said "anything that can be computed." [quoted text clipped - 4 lines] > There is a lot in the universe I don't understand. I understand the word > tautology, though. :-)
:-) back. But it's not actually a tautology. There are languages that can compute a lot of things but not everything. SQL is one such language.
> > I appreciate you're trying to help me understand, but I'm > > having trouble following your posts. It seems like you [quoted text clipped - 5 lines] > slightly different perspective. Perhaps my writing skills, and clarity of > thought, will improve with time. :-) Actually, I found this most recent message quite comprehensible. Also, I want to thank you for hanging in with my questions as my frustration grew. You are a gentleman, sir, and the world and this newsgroup needs more gentlemen. (And ladies, of course.)
Marshall
Bill H - 11 Jul 2004 03:51 GMT Marshall:
> "Marshall Spight" <mspight@dnai.com> wrote... > > "Bill H" <wphaskett@THISISMUNGEDatt.net> wrote... [quoted text clipped - 3 lines] > as well.) Do you have a preferred term for "every x has an associated > y?" My apologies. I thought you defined containment differently. This will do for me, although I'd probably not use that term for many to many relationships. But if you like, I'll stick with it.
> > What exactly is this relationship and how is it stored? I can store the > > invoice#s within the vendor in the vendor table. [quoted text clipped - 3 lines] > has an attribute/field that is a **list** of invoice numbers? Or is it a list > of invoices? A list of invoices numbers, in my example, is correct. It may look like:
Field# Contents..... 005 1272]7214-2]B715Z]1714A16
The order doesn't matter in this example. So the record set not only contains data but arrays/lists/collections/whatever. You get the point. The dbms tools work with this. I can then reference data in the related table as though the data were local to the referenced table (e.g. list vendor inv# invdate invdesc invamount...) and this will list the vendors with their associated invoice data extracted from the related table.
> > This defines a relationship in the MV model (although there are > > a number of other ways to do so). > > As an aside: can you enumerate the different ways? An index can define a relationship, so can a constraint and a function. I can also create custom rule relationships that operate off a trigger.
> > How is this relationship going to be exposed? An example would be > > to create a virtual field definition in the vendor table so that when asked, [quoted text clipped - 3 lines] > The term "virtual" here; what does it mean? Is there an online reference > you like that I could use to read about this? I alluded to it above. It is a field definition that doesn't reference data in the referenced table but references data in a related table. I gave an example of the vendor table list that actually returns data from the related invoice table.
> > The relationship is stored in the relational database but not really like it > > is stored in the MV database. This is true because the MV model treats [quoted text clipped - 9 lines] > constraints, stored procedures) the MV model additionally stores > application code, compiled code, MV tools, etc. Is that right? Yes. When I deliver an application the entire application is delivered in the database, including code, table structure, field definitions, etc, etc, etc. Users can access and run it without loading any application code onto their workstations.
> Again, it's something I'd like to try out. Can you recommend a free > solution; the mysql of MV? Try the following:
http://www-306.ibm.com/software/data/u2/universe/
http://www.jbase.com/products/jbase_download.html
http://www.revelation.com/SOFTWARE.NSF/06fb58066b4ed717852564030070163e?OpenView
There are some others but this will get anyone started. Remember, the product is both a dbms and an application environment. Don't expect it to just be a dbms where one starts the service and accesses it via SQL (although one can). I personally use the IBM product and another one but you can't get the other one for free, so I won't send you to their web site.
:-)
> > What makes this different is only that these tables are usually part of the > > database structure of the production data. So, you'd have tables for [quoted text clipped - 4 lines] > > This separation you describe is not much of a separation. Remember this is both a dbms and an application server product. It has more capabilities and additional tools.
> What if you have two comparatively unrelated applications that work > against the same schema; both applications are in the dbms; one > application enforces a constraint and one doesn't (for whatever reason: > a bug, or the programmer just forgot about it.) Wouldn't that be a > pathway for data corruption to enter the system? Under this scenario, one would have to design a non-RD model dbms application like any other good application where the API's are designed to return and/or do stuff via calls, where the user interface is separate from the business rules. However, there's just no solution for multiple applications sharing data in a dbms and having different business rules (constraints/relationships/etc) that effect the data. Which is controlling? Can application A build the business rule APIs and application B use them?
I'd say this is an ideal place for the RD model dbms, as many people who don't know the business well can do the development and queries.
> > Ah, excuse me? One reuses check#s in different accounts all the time. One > > reuses invoice#s for different vendors all the time too. To include the [quoted text clipped - 8 lines] > (unless they are just different views on the same data. Are they? > Or are they stored separately, and able to become out of sync.) My last sentence only meant to describe my defined relation between the vendor and the invoice and the invoice and the vendor; nothing more. :-)
If the invoice is related to the vendor then the invoice has to have access to the vendor, somehow, somewhere, plain and simple. If not, I could never get a list of invoices with the vendor# too. The same is true with the vendor. I described how to do this, explicitly. There's no need to talk about data corruption, considering the syncronization tools are available and, hopefully, are in place.
These are the kinds of issues, however, we have to face as DBAs and developers, no matter what we're developing with/in. True?
> > Redundancy? Storing > > the transaction#s in the account saves having to store the "transaction to [quoted text clipped - 3 lines] > Uh, no. I mean, it's less redundancy that storing it three times, but it's > more than just storing it once. I'm more inclined to design for errors with a "little" redundancy. :-) (although it isn't absolutely necessary).
> If you don't store the same information more than once, then > the entire concept of "synchronization code" (first time I've > heard the term) is unnecessary. If the RD model does a cascading delete of a vendor and its association invoices, it _has_ to know the invoices associated with the vendor too! I write the code once (15 years ago) and use it all the time in this environment. That's because the mvDbms environment is more than a dbms and considers the application side of its nature.
Some things have to be written because it is "assumed" better to do so in the application (hey, what can I say). We've developed applications where bad things happen to the hardware and, thus, the data. In accounting data some redundancy is an absolute requirement, if for no other reason than to know of problems and be able to rebuild.
Is this assumption good? I don't know. Probably mostly yes in some development environments and mostly no in other development environments.
> Hmmm. Data management is something that is very useful to > business, but it is not business-oriented in and of itself. > Same with adding up columns of numbers. > > You mean because they don't understand SQL? I don't get why we're > talking about business people here; we're discussing data management. This is one view. Mine is different because I have to pay bills and employees. If there's not enough cash I don't get paid. So, for me, it _is_ all about business. :-)
> > I can set a trigger to enforce integrity within the bank account table so if > > a bank transaction is cleared, the uncleared reference to it within the bank [quoted text clipped - 4 lines] > Do you have to set these up manually every place a bank transaction clear > is invoked, or do you just do it once? No. It's done once in the appropriate table definition, or perhaps the field definition.
> > I cannot emphasize this enough; the MV model is located centrally! The > > application server and dbms server reside within the same environment, on [quoted text clipped - 4 lines] > > Where can I read more about this intriguing concept? You can read a lot of information about it at:
http://www-306.ibm.com/software/data/u2/
and you can google to newsgroups and read comp.databases.pick. It's been around forever and has a lot of fun stuff. And they're mostly polite, except for the occasional individual who forgot to take his lithium. :-)
> > And here I was thinking I was answering your queries directly, albeit in a > > slightly different perspective. Perhaps my writing skills, and clarity of [quoted text clipped - 4 lines] > grew. You are a gentleman, sir, and the world and this newsgroup needs > more gentlemen. (And ladies, of course.) Thank you for your kind words and I look forward to passing them on to the next person on this list. :-)
Bill
Anthony W. Youngman - 13 Jul 2004 00:22 GMT >The distinction you're drawing is that in addition to all the stuff that >both model store, (names, addresses, relationships, metadata, functions, [quoted text clipped - 3 lines] >Again, it's something I'd like to try out. Can you recommend a free >solution; the mysql of MV? See my sig :-) Note however, it's open source and somewhere between alpha and beta status ... (plus I wouldn't call it "pure MV" - more like an "MV BASIC compiler over a relational database" almost :-( so it's likely to be very misleading as an intro into the MV model.
The other thing is to download one of the commercial variants. There are three at least that are "free for non-commercial use" namely Jbase (www.jbase.com), and UniVerse and UniData (both now owned by IBM). I'm not sure of the urls for those two - there's no point searching the IBM site because it's a needle in a haystack, but if you go to www.u2ug.org and look at the FAQs, it'll almost certainly be there.
Cheers, Wol
 Signature Anthony W. Youngman <pixie@thewolery.demon.co.uk> 'Yings, yow graley yin! Suz ae rikt dheu,' said the blue man, taking the thimble. 'What *is* he?' said Magrat. 'They're gnomes,' said Nanny. The man lowered the thimble. 'Pictsies!' Carpe Jugulum, Terry Pratchett 1998 Visit the MaVerick web-site - <http://www.maverick-dbms.org> Open Source Pick
Marshall Spight - 13 Jul 2004 03:12 GMT > The other thing is to download one of the commercial variants. There are > three at least that are "free for non-commercial use" namely Jbase > (www.jbase.com), and UniVerse and UniData (both now owned by IBM). I've downloaded UniVerse this weekend, but haven't gotten too far with it.
What's the difference between UniVerse and UniData?
MarShall
Bill H - 14 Jul 2004 13:53 GMT Marshall:
> I've downloaded UniVerse this weekend, but haven't gotten too far > with it. > > What's the difference between UniVerse and UniData? Universe merged several different mvDbms products and is now able to be used by code from several different heritages (within the mvDbms market that is). Unidata is a cleaner version and never really worried about trying to be all things to all people (within the mvDbms market that is). Anybody moving to Unidata had to convert their applications and table structures to Unidata's while this wasn't really necessary with Universe.
So, despite the basic similarities of the environment, the underlying dbms code (I mean the actual code written to run the dbms) is significantly different.
It has often been said if you're converting from one mvDbms to another move to Universe but if you designing from scratch use Unidata. :-)
Hope this helps.
Bill
Marshall Spight - 14 Jul 2004 18:28 GMT > > What's the difference between UniVerse and UniData? > > It has often been said if you're converting from one mvDbms to another move > to Universe but if you designing from scratch use Unidata. :-) > > Hope this helps. That's exactly what I needed to know. Thanks!
Marshall
Mike Preece - 19 Jul 2004 05:14 GMT > > > What's the difference between UniVerse and UniData? > > [quoted text clipped - 6 lines] > > Marshall So how's it going Marshall? Are you getting to grips with UniVerse? Did you decide to go with UniData instead?
Mike.
Marshall Spight - 19 Jul 2004 05:31 GMT > > > It has often been said if you're converting from one mvDbms to another move > > > to Universe but if you designing from scratch use Unidata. :-) [quoted text clipped - 5 lines] > So how's it going Marshall? Are you getting to grips with UniVerse? > Did you decide to go with UniData instead? Uh, I de-installed UniVerse and installed UniData. I haven't done much with it yet, but I've browsed some docs. I think the thing I really care about is learning the query language. I'm particularly interested in learning the part about how they deal with MV attributes. (No surprise, I suppose; that's the part that's different.)
I was a bit daunted by the pages about what one has to do to get the sample database installed; it was longer than I would have hoped. Still, I expect I'll slog through it all.
Marshall
Anthony W. Youngman - 10 Jul 2004 00:00 GMT >> The models we use create "stupid" tricks. It's the models that create the >> constraints to make some tricks stupid and others smart. An RM "stupid" [quoted text clipped - 5 lines] >disadvantages to managing integrity in applications instead of centrally. >This is independent of MV vs. RM vs. whatever. Well, yes ...
But my experience is that that begs the question. Do you want your data to be "consistent" OR "accurate"? Constraints enforce consistency, but what do you do when real-life decides that IT is going to be INconsistent?
With flexibility comes power. MV solutions are more flexible. And with that flexibility comes the ability to shoot yourself in the foot.
>> > Please be specific. I am very interested in specific examples of specific >> > operations or structures that you feel are hard to solve with RM or SQL [quoted text clipped - 15 lines] >the account table? That kind of redundancy leads directly >to data corruption. What redundancy? MV does not normally contain redundant data. Unless you mean storing a foreign key is "redundancy", and isn't that what relational databases do all the time?
>> Now, what good is this? As Mr Youngman points out it only takes one disk >> read to get the account and its relationships to the uncleared transactions. >> This is an almost instantaneous response to our web clients. So there's an >> upside. > >Ugh. Let's please not talk about disk reads. So you're quite happy to give your clients a system that, running on a Cray, still makes an old Z80-based system look like a speed demon?
The whole reason we hammer on about disk accesses is because we KNOW we can't be beaten. And the whole reason relational people like you don't like it is because you can't compete. Isn't that always the rule of competition - try to make the other guy's advantage look like a disadvantage?
At the end of the day, by avoiding things like disk reads, you are saying "performance is irrelevant". Taken to its extreme, that means that you would be quite happy delivering a system that guaranteed to eg crack a 4096-bit RSA key. The fact that it wouldn't finish its first run before the heat-death of the solar system isn't your problem ...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Marshall Spight - 10 Jul 2004 03:14 GMT > But my experience is that that begs the question. Do you want your data > to be "consistent" OR "accurate"? Constraints enforce consistency, but > what do you do when real-life decides that IT is going to be > INconsistent? Can you give me an example? Also note, DBMSs are for managing data, not for managing real life.
> With flexibility comes power. MV solutions are more flexible. Easier to use, maybe. But less flexible, from what I can tell.
> >> Now, what good is this? As Mr Youngman points out it only takes one disk > >> read to get the account and its relationships to the uncleared transactions. [quoted text clipped - 8 lines] > The whole reason we hammer on about disk accesses is because we > KNOW we can't be beaten. I don't wish to condescend, but when you talk about performance, I get the impression that it's not something you know very much about, and that you have an extremely simplified view of how it works. In any event, the topic is *extremely* complicated, to the point that counting disk reads is a useless endevour.
Sigh. All right.
Is there a canonical MV application that I can get easily and try out, so as to evaluate your performance claims? If someone were to ask me the same about a SQL dbms, I'd say "mysql." Is there a mysql of MV?
I'd like to compare some complicated query performance on mysql vs. X-MV. Not that complicated query performance is a mysql strong point.
> At the end of the day, by avoiding things like disk reads, you are > saying "performance is irrelevant". Performance is very relevant. I know what it is that I'm saying, and it's not what you're saying I'm saying.
> Taken to its extreme, that means > that you would be quite happy ... You're trying to put words in my mouth. Don't do that.
Marshall
Anthony W. Youngman - 09 Jul 2004 23:25 GMT >> I'm not sure whether this answers your question as it depends on what you >> mean by "relationship" but here is another type of relationship -- each [quoted text clipped - 24 lines] >The difference between a virtual field and a non-virtual field is >one of implementation; the interface is the same either way. (Yes? No?) Not quite. Yes the interface is the same, but your first example would have a PEOPLE file with an ADDRESS datafield.
The second example would have a PEOPLE file with an ADDRESS-KEY datafield and an ADDRESS virtual field. From the point of the person using the query language, they would neither know nor care that the two ADDRESS fields are fundamentally different "under the bonnet".
>> So, once that virtual field is defined, I can ask the database to >> [quoted text clipped - 7 lines] ><field range>. Are you limited having a single field that is marked >unique? Integrity-wise, the only uniqueness that the database itself enforces is the primary key. Yes, this could be improved on ...
>> > Another example is managing data integrity in procedural application >> > code. In RM this is considered a "stupid database trick" to quote from [quoted text clipped - 17 lines] > >I didn't quite follow this. Putting constraints in the app not the database leads to smaller development teams.
>> Something is decidedly less expensive in terms of time for >> maintaining and having the constraints in the same language as the rest of >> the application just might be one of the keys to that. > >I'd buy that in a second. But I still want my constraints enforced (at least) >centrally. So would I :-) But I want my constraints *optional*.
>> > Please be specific. I am very interested in specific examples of specific >> > operations or structures that you feel are hard to solve with RM or SQL [quoted text clipped - 11 lines] >So if you have use-cases for situations where you feel MV is better than >the relational approach, I'm happy to hear them. Well, you saw my example about the Australian breweries? Where one brewery stole a march on the rest and hammered the lot in the market place - apart from the one MV-based brewery that responded quickly enough to ride up with them?
>> I am very confident that I can find aspects of the >> relational model that are not based on either mathematics or science (we've [quoted text clipped - 4 lines] > >Bring on the anecdotes! The Witwatersrand study that said MV-based companies spent *half* the money that relational-based companies did on their databases.
The experience of MV practitioners involved in conversions from MV to relational - they *ALL* say that any company escaping with *just* a *doubling* in head count (plus the same in licence fees) has got off very lightly cost-wise.
The story I like, where consultants spent SIX MONTHS tuning a complex query so's it ran faster than the MV system it was replacing - and when they crowed to management that the new system was 10% faster than the old system they were brought down to earth with a big bang as the guy supporting the MV system pointed out that was running on an ancient P90 - the new system was a twin Xeon-800 box and surely it should be able to do better than just 10%? (Oh - and I'm prepared to bet dollars to cents that the MV query wasn't optimised AT ALL.)
>> I'm in search of better science on the >> matter and a mathematical model that is as useful to the practitioner as the [quoted text clipped - 4 lines] >Give me ten million dollars and 5 years and it should be no problem. >Since I have neither, I'm willing to forego the whole proof thing. Well, the first thing you'd have to do is find some way of showing that "data == tuple". It's all very well the relational model *asserting* that it is, but unless you've got some real-world conjecture that links the two, you're going to get nowhere.
Science has recently been surprised by the apparent existence of 5-quark bosons. I think investigating the relationship between "real data" and "relational tuples" (in other words, trying to formalise business analysis) might provide a few (to say the least) surprises ...
>> It seems to me that a class of >> databases that advance the "older" approaches of Cache' and PICK could beat [quoted text clipped - 6 lines] >extreme view of scalability which has been skewed by my workplace. >However I can believe the agile part. Anecdotally ... but there are apparently some pretty huge MV databases out there, and they haven't hit problems yet. At least, not ones attributable to the database - maybe the hardware isn't powerful enough, but relational would have hit the same problems a lot harder AND sooner.
Or redundancy, hardware scalability, what have you but all things that are external to the database.
>> It seems the best I can do is prove >> that the relational model is not purely mathematics, but contains some [quoted text clipped - 10 lines] >and if it useful, then we rejoice. It's certainly possible for a formalism >to be completely sound and self-consistent and utterly useless. Exactly. So you see why we object when people say "relational MUST be right because it's based on mathematics". It's formal, sound, self-consistent, and ... :-)
>Marshall Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Marshall Spight - 10 Jul 2004 03:43 GMT > >Let me see if I understand this. You have a "file" of People and it might > >have, directly in it, a field that is a list of addresses, so we have one:many [quoted text clipped - 15 lines] > using the query language, they would neither know nor care that the two > ADDRESS fields are fundamentally different "under the bonnet". In other words, the implementation is different ("under the bonnet") but the interface is the same.
Here's a question: let's imagine a giant nested data structure. You have records nested ten levels deep, and the data in each level is ten times as much as the level it's contained in.
Let's say you want to query only the top level, and not pull in all that extra stuff. Can you do that? What does the query look like? What if you want only 3 levels deep?
Is there a reference for the MV query language I could read somewhere?
> >Yes, we've discussed this before, and I believe we agree that it's important > >that constaints be available to applications. [quoted text clipped - 8 lines] > Putting constraints in the app not the database leads to smaller > development teams. I don't believe it. I'd believe that it only *works* with smaller development teams.
> >> Something is decidedly less expensive in terms of time for > >> maintaining and having the constraints in the same language as the rest of [quoted text clipped - 4 lines] > > > So would I :-) But I want my constraints *optional*. An optional constraint is a contradiction in terms.
What is it that makes you want it optional? What's an example of a rule you want enforced sometimes but not other times?
> >I'm not asking for proof. I know you care a lot about proof, but I don't > >so much. Right now I'm more interested in hearing a lot of people's stories. [quoted text clipped - 5 lines] > place - apart from the one MV-based brewery that responded quickly > enough to ride up with them? That's not a use-case, but it *is* a great success story.
I'm interested in lower-level, specific details. The nitty-gritty. Like, here's table 1 and here's table 2 and I wanted to figure out x, so I typed xxx and it was really easy and the comparable SQL is xxxxxxxxxx which is really hard.
> >Bring on the anecdotes! > [quoted text clipped - 6 lines] > do better than just 10%? (Oh - and I'm prepared to bet dollars to cents > that the MV query wasn't optimised AT ALL.) Again, a nice story, but without the *specific* query and the specific tables, I don't learn much from it.
I guess a key thing I'm looking for is results that I can reproduce at home.
> Science has recently been surprised by the apparent existence of 5-quark > bosons. I think investigating the relationship between "real data" and > "relational tuples" (in other words, trying to formalise business > analysis) might provide a few (to say the least) surprises ... Can you propose a methodology? My best guess is that the question you are posing is meaningless.
> >I have serious doubts about the scalability claim, but then I have an > >extreme view of scalability which has been skewed by my workplace. [quoted text clipped - 4 lines] > attributable to the database - maybe the hardware isn't powerful enough, > but relational would have hit the same problems a lot harder AND sooner. How huge? My job involves working on a dataset that is measured in terabytes.
> Or redundancy, hardware scalability, what have you but all things that > are external to the database. Agreed.
> >And there's not just one math, either. You come up with a formalism, > >and if it useful, then we rejoice. It's certainly possible for a formalism [quoted text clipped - 3 lines] > right because it's based on mathematics". It's formal, sound, > self-consistent, and ... :-) Mmmm. That might be a fair criticism of this newsgroup as a whole, but I don't know if it would stick to me that well. I don't recall saying "relational MUST be right" at any point. I'm more along the lines of "I like how well relational handles many-to-many relationships."
Marshall
Dawn M. Wolthuis - 30 Jun 2004 02:19 GMT > > > Perhaps I misunderstand, but MV has only the one kind of > > > relationship it is capable of understanding: containment. I should have read from the top of the topic down, but I now understand what you mean. As far as the database itself, without any triggers written, nor any application code, the only relationship "between relations" that it understands is that of parent-child. --dawn
Marshall Spight - 30 Jun 2004 16:05 GMT > > > > Perhaps I misunderstand, but MV has only the one kind of > > > > relationship it is capable of understanding: containment. [quoted text clipped - 3 lines] > any application code, the only relationship "between relations" that it > understands is that of parent-child. --dawn So what do you do in the face of many:many relationships? I bet it's the same thing that OO does: you have links on one side and links on the other, and manage them in code.
Many to many relationships are one thing that the RM just totally nails. I bring this up not to run a whole "mine's bigger" thing but because I believe that if this entire years-long conversation has a use, it is to highlight the areas where each side succeeds, so that we may begin to work towards a new model that encompases the best of several existing systems.
In programming languages, they are talkin more and more about "multiparadigm." I think we should follow their lead.
Marshall
Dawn M. Wolthuis - 30 Jun 2004 17:50 GMT > > > > > Perhaps I misunderstand, but MV has only the one kind of > > > > > relationship it is capable of understanding: containment. [quoted text clipped - 7 lines] > it's the same thing that OO does: you have links on one side and > links on the other, and manage them in code. yup
> Many to many relationships are one thing that the RM just totally > nails. I bring this up not to run a whole "mine's bigger" thing but > because I believe that if this entire years-long conversation has > a use, it is to highlight the areas where each side succeeds, so > that we may begin to work towards a new model that encompases > the best of several existing systems. Sounds good.
RM does do well with M:M, the most conceptually difficult for the user, but not in doing anything to simplify the presentation/ease of use for the user. Viewing books and their authors from one perspective and then authors and their books from another makes sense to a person. Viewing it as a many-to-many is not as helpful (as each book-author pair on a separate line so you don't have one row for each book, nor one row for each author). RM also has difficulty with multiple 1:M with the same 1 when there is a need for counting, summation, or other arithmetic and visuals/reporting against the same. STAR joins have helped a bit with that, I think.
> In programming languages, they are talkin more and more about > "multiparadigm." I think we should follow their lead. agreed. --dawn
Marshall Spight - 01 Jul 2004 05:30 GMT > RM > also has difficulty with multiple 1:M with the same 1 when there is a need > for counting, summation, or other arithmetic and visuals/reporting against > the same. STAR joins have helped a bit with that, I think. Could you expand on that a bit? I didn't quite follow.
Marshall
Laconic2 - 30 Jun 2004 19:13 GMT > Many to many relationships are one thing that the RM just totally > nails. I bring this up not to run a whole "mine's bigger" thing but [quoted text clipped - 5 lines] > In programming languages, they are talkin more and more about > "multiparadigm." I think we should follow their lead. Hear, Hear!
Eric Kaun - 09 Jul 2004 03:51 GMT >>Many to many relationships are one thing that the RM just totally >>nails. I bring this up not to run a whole "mine's bigger" thing but [quoted text clipped - 7 lines] > > Hear, Hear! In support of this effort, I've taken all of my dimes and created a set (or is it a list?) of stacks, 2 in each stack. Thus ends my support of multi pair-o-dimes.
The problem, of course, is deciding where those paradigms apply... but I certainly support the desire to merge them somehow. The Xen effort, funded by Microsoft Research (I forget the researchers and am too lazy to look them up) looks somewhat promising, though from what I've seen it still completely lacks any declarative constraints.
- erk
Anthony W. Youngman - 05 Jul 2004 23:57 GMT >For example, in another thread someone said (over and over >if I remember correctly :-) that if you delete an invoice, all [quoted text clipped - 5 lines] >throw them away.) Can you do this declaratively in MV? How >is it done? You mean a bit like you can't delete a company if there are any outstanding invoices? No that can't (or rather, shouldn't) be done natively and declaratively in MV. But I wouldn't call that a "container and contents".
>Can it handle many-to-many? I've heard some people say it >can, but is integrity enforced automatically, or is it just done >with references that are application managed? Let's give an example - an owner can have multiple cars, and a car can have multiple owners.
What I'd do is have an OWNERS field in the CARS file, and declare it as an index. So if I want to know who owns a car, I just list the car and pull the owners into the listing. If I want to know what cars someone owns, I list all cars owned by that person.
Actually, thinking about it, this seems like a perfect case of "ON DELETE RESTRICT" - don't delete an owner if any cars only have that owner. But MV would leave that to the app (I'd rather be able to enforce it, but it doesn't seem to be a problem in real life ... :-)
>Can it *automatically* enforce declared integrity constraints? >Can you have an integer attribute and declare that it must [quoted text clipped - 7 lines] >> Like I said before, this is not to say that the MV model does >> everything...as nothing can. And centrally enforced constraints are a lack in the MV model ... but design-enforced relations are its strength. Because it doesn't need 90% of relational constraints that are necessary in relational, it hasn't bothered with the other constraints (which I agree is a pity).
>I don't think I agree. For example, Java, C++, and BASIC are >all able to compute anything that can be computed. They do [quoted text clipped - 15 lines] >basis for the tools. Are they complete? Are they correct? Are they >self-consistent? Are they even *RELEVANT*? Take any theory in PURE mathematics. It's complete, it's correct, it's self-consistent. And if it assumes that parallel lines in three dimensions can meet, it doesn't break. It just models a completely different world to the one we actually live in ...
>SQL is relationally complete, over its lame type system. It could >really use a better type system. This will make it more usable but [quoted text clipped - 7 lines] >and I are engaged in exactly the same exploration, although we >come at it from different backgrounds. I think I'd agree here ... MV is great at ease of use. It's great at enforcing entity-level integrity (can't have an adjective (or adjectival clause) without a noun for it to describe). It's not great at enforcing constraints *between* *entities*. But then, neither is the real world
:-) Relational fits theory fine. MV fits the real world fine.
>Marshall Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Marshall Spight - 07 Jul 2004 02:47 GMT > >For example, in another thread someone said (over and over > >if I remember correctly :-) that if you delete an invoice, all [quoted text clipped - 10 lines] > natively and declaratively in MV. But I wouldn't call that a "container > and contents". Why do you say "shouldn't?" It seems pretty clear to me that a declarative approach is always better than a procedural one. (Dawn? Care to rebut?)
If not container/contents, what terminology would you use?
> Let's give an example - an owner can have multiple cars, and a car can > have multiple owners. [quoted text clipped - 3 lines] > pull the owners into the listing. If I want to know what cars someone > owns, I list all cars owned by that person. What does "declare it an index" mean? Is it like a pointer or foreign key?
> >> But [MV] provides an interesting confluence of > >> tools and capabilities that render the model very useful in solving business [quoted text clipped - 11 lines] > parallel lines in three dimensions can meet, it doesn't break. It just > models a completely different world to the one we actually live in ... I'm sorry, you say this why? Because you have traced some lines from one end of the universe to other and checked that they don't meet? Actually, even the very idea of the "world we live in" having lines in it doesn't work for me. Walking around my house, I never saw an infinite sequence of colinear points.
> Relational fits theory fine. MV fits the real world fine. That statement just seems totally bogus to me. Does subtraction fit the real world? What happens when I subtract 5 lemons from 3 lemons? Do I get -2 lemons? Can you send me a picture of -2 lemons via email; I want to see what they look like.
Marshall
Anthony W. Youngman - 10 Jul 2004 00:20 GMT >> >For example, in another thread someone said (over and over >> >if I remember correctly :-) that if you delete an invoice, all [quoted text clipped - 14 lines] >declarative approach is always better than a procedural one. (Dawn? >Care to rebut?) What seems clear to me is not clear to you, and vice versa. Read Dick Feynmann. Different brains are wired differently, and see the world differently. Just because it SEEMS to you that declarative is better than procedural it does not mean that that is the case.
>If not container/contents, what terminology would you use? To me, an invoice is a container, and line items are contents thereof. You can't have the latter without the former.
A company does NOT contain its invoices - a company can go bust but the invoices are still outstanding ... okay - we now get into all sorts of semantics such as "does a company entry in a database represent a real company, or just a fictional representation thereof?".
But I view those two relationships as being fundamentally different, and they are modelled completely differently in MV. I don't think relational can see any difference between them.
>> Let's give an example - an owner can have multiple cars, and a car can >> have multiple owners. [quoted text clipped - 5 lines] > >What does "declare it an index" mean? Is it like a pointer or foreign key? Surely you declare indices in relational dbs? Same thing here. So's I can say "SELECT CARS WITH OWNER EQ 'X'", and it doesn't need to search the entire CARS file, but just goes to the index and grabs a list of primary keys into the CARS file from the index.
>> >> But [MV] provides an interesting confluence of >> >> tools and capabilities that render the model very useful in [quoted text clipped - 18 lines] >lines in it doesn't work for me. Walking around my house, I never >saw an infinite sequence of colinear points. I'm just saying that being complete, correct and self-consistent isn't enough. All that proves is that it works as pure maths. But if it doesn't work as *applied* maths, then it's the wrong theory for the problem at hand.
>> Relational fits theory fine. MV fits the real world fine. > >That statement just seems totally bogus to me. Does subtraction >fit the real world? What happens when I subtract 5 lemons >from 3 lemons? Do I get -2 lemons? Can you send me a picture >of -2 lemons via email; I want to see what they look like. Relational is complete, correct, and self-consistent. It's fine as a pure-maths theory.
MV just seems to *fit* the real world rather better :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Marshall Spight - 10 Jul 2004 03:25 GMT > >Why do you say "shouldn't?" It seems pretty clear to me that a > >declarative approach is always better than a procedural one. (Dawn? > >Care to rebut?) > > What seems clear to me is not clear to you, and vice versa. I notice you didn't answer my question.
> >If not container/contents, what terminology would you use? > > > To me, an invoice is a container, and line items are contents thereof. > You can't have the latter without the former. > > A company does NOT contain its invoices ... Okay. What does contain the invoices? Or are they a top-level concept? If so, how are they related to companies?
> - a company can go bust but the > invoices are still outstanding ... Their financial situation is irrelevant. Perhaps you are confusing the company and the record of the company in the dbms.
> okay - we now get into all sorts of > semantics such as "does a company entry in a database represent a real > company, or just a fictional representation thereof?". This is a simple question with a simple answer. The company entry in the database represents a real-world company. It is not an actual company, nor is it a representation of a representation of a company.
> But I view those two relationships as being fundamentally different, and > they are modelled completely differently in MV. I don't think relational > can see any difference between them. Okay, so *how* are they different?
> >What does "declare it an index" mean? Is it like a pointer or foreign key? > > > Surely you declare indices in relational dbs? Same thing here. MV terminology is quite foreign to me, so I do not assume that when an MV person uses a word I'm used to, they are using it in the same way. Note when I say foreign, I just mean that I'm not familiar with it; I don't have any opinion on the goodness or badness of the terminology. (Well, I might think "file" is an unfortunately-overloaded term.)
> So's I > can say "SELECT CARS WITH OWNER EQ 'X'", and it doesn't need to search > the entire CARS file, but just goes to the index and grabs a list of > primary keys into the CARS file from the index. Okay.
> >> Relational fits theory fine. MV fits the real world fine. > > [quoted text clipped - 7 lines] > > MV just seems to *fit* the real world rather better :-) The simplest explanation here is that it's what you're used to, and hence it seems to fit best for you. You haven't given any evidence that it actually does fit the real world any better. Note that I consider that question unanswerable and hence irrelevant.
Marshall
Anthony W. Youngman - 05 Jul 2004 22:43 GMT >Marshall: > [quoted text clipped - 12 lines] >datastore too. Most of the MV products can even understand and cope with >SQL functionality. Thanks Bill. Not knowing what ON DELETE RESTRICT means, I couldn't really respond ...
>Like I said before, this is not to say that the MV model does >everything...as nothing can. But it provides an interesting confluence of >tools and capabilities that render the model very useful in solving business >problems for many people and businesses. Well, it can't actually *do* ON DELETE CASCADE (in native mode anyway) - it's just that what it can do has the same effect :-) As Bill says, it's *just* *different*
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 09 Jul 2004 01:52 GMT > Yes, but relational formalises metadata INTO data. No formalization is needed; metadata is data. It's just data with a different domain, but there's no reason to think it obeys different laws or requires different structure.
> Once it's in an RDBMS > it's no longer metadata, because the rdbms doesn't understand any > meaning in it and can't take advantage of that meaning so it's just data. I'm confused. How does placing it in an RDBMS make it no longer metadata? The system catalog (metadata - data about your data) can be represented relationally (or as XML if you're feeling masochistic).
How does the RDBMS "understand" no meaning in it? And how do other DBMSs "understand" meaning? The constraints and relation definitions of the metadata are as much meaning as the RDBMS can have.
> The ordering in a list is metadata. Convert that into a set to put into > an rdbms and ORDER is now just a meaningless (as far as the db engine is > concerned) bit of data. No, in that case order is gone, vanished. If you don't state it, the RDBMS doesn't know about it. On the other hand, it doesn't assume anything either. Order is easily represented, and again if you're masochistic, you can store a list-typed attribute.
> That's where MV and OO fundamentally differ. They try to *avoid* > converting metadata to data, so that the db engine can be intelligent > and take advantage of it to optimise things. So by treating metadata as something other than data (what would that be?), they can be intelligent and optimize? Intelligent how? Optimize what?
- erk
Anthony W. Youngman - 10 Jul 2004 22:52 GMT >> Yes, but relational formalises metadata INTO data. > >No formalization is needed; metadata is data. It's just data with a >different domain, but there's no reason to think it obeys different >laws or requires different structure. Yes - but metadata can be used by the database while data can't.
>> Once it's in an RDBMS it's no longer metadata, because the rdbms >>doesn't understand any meaning in it and can't take advantage of that [quoted text clipped - 3 lines] >metadata? The system catalog (metadata - data about your data) can be >represented relationally (or as XML if you're feeling masochistic). Because you've converted it to data! And the system catalog doesn't let you store ALL metadata AS metadata. It will only let you store metadata it recognises.
>How does the RDBMS "understand" no meaning in it? And how do other >DBMSs "understand" meaning? The constraints and relation definitions of >the metadata are as much meaning as the RDBMS can have. In other words, an RDBMS is incomplete. :-)
>> The ordering in a list is metadata. Convert that into a set to put >>into an rdbms and ORDER is now just a meaningless (as far as the db [quoted text clipped - 4 lines] >anything either. Order is easily represented, and again if you're >masochistic, you can store a list-typed attribute. But if you DO state it, the RDBMS doesn't know anything about it, either! What do you mean by a "list-typed attribute"? Do you mean a column that contains ordering information?
>> That's where MV and OO fundamentally differ. They try to *avoid* >>converting metadata to data, so that the db engine can be intelligent [quoted text clipped - 3 lines] >be?), they can be intelligent and optimize? Intelligent how? Optimize >what? The whole point of a database is it STORES data, it does *not* UNDERSTAND data. By converting metadata into data, you are now forcing "intelligence" into the application.
A relational database thinks in terms of sets. In order to have a list, you need to create extra DATA, and the database itself can't take advantage of it, because it doesn't understand it.
DATA is what is stored IN a database. METADATA is data that is USED BY the database. There *is* a difference, and the difference is crucial. The more metadata you can leave as metadata, rather than convert to data, the more information the database has available to it to optimise.
How does an RDBMS optimise access to a list, if it doesn't have any understanding of what a list is?
That's the point of storing metadata *as* *metadata*. Because the database understands it.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Marshall Spight - 13 Jul 2004 03:16 GMT > The whole point of a database is it STORES data, it does *not* > UNDERSTAND data. The whole point of a database management system is to manage data. As a side effect, it might also store it in a persistent storage mechanism, but this is not a requirement. It has to be able to manage the appropriate structure, enforce integrity, and allow manipulation. ("Structure, integrity, manipulation.") It can do this because it understands the data.
Marshall
Eric Kaun - 19 Jul 2004 18:11 GMT >> No formalization is needed; metadata is data. It's just data with a >> different domain, but there's no reason to think it obeys different >> laws or requires different structure. > Yes - but metadata can be used by the database while data can't. Not quite true - if a DBMS supports a user-extensible typing system, then it can "use" those types without understanding anything about them. This is where SQL and so many other DBMSs completely fall down: 1. by requiring the user to rely solely on the types already provided by the vendor 2. (the case with SQL today) making the type system so baroque as to be useless 3. (also the case with SQL and JDBC/ODBC/etc. today) making the embedding of data in programs so difficult as to force a compromise back to the lowest-common-denominator primitives again.
Or some combination of the three.
>> I'm confused. How does placing it in an RDBMS make it no longer >> metadata? The system catalog (metadata - data about your data) can be [quoted text clipped - 3 lines] > you store ALL metadata AS metadata. It will only let you store metadata > it recognises. Of course. So are you saying that 1) lists should be commonly-understood "metadata", or 2) that Pick/MV let you extend the metadata recognized by the DBMS?
If you're saying #1, then I could argue as well for other types (and would say relation-valued attributes are far more powerful and useful than lists). If you're saying #2, then again the typing mechanism would help, though user-defined functions and views can aid somewhat.
What types of metadata does the DBMS need to recognize?
>> How does the RDBMS "understand" no meaning in it? And how do other >> DBMSs "understand" meaning? The constraints and relation definitions >> of the metadata are as much meaning as the RDBMS can have. > > In other words, an RDBMS is incomplete. :-) Heh.
>>> The ordering in a list is metadata. Convert that into a set to put >>> into an rdbms and ORDER is now just a meaningless (as far as the db [quoted text clipped - 8 lines] > either! What do you mean by a "list-typed attribute"? Do you mean a > column that contains ordering information? No, I meant a single attribute that stores a list, much like in MV. The difference is that it's not "first order" to the database; user-defined types are orthogonal to relations.
So what does a Pick DB "know" that the RDBMS wouldn't? And how do you tell it?
>>> That's where MV and OO fundamentally differ. They try to *avoid* >>> converting metadata to data, so that the db engine can be intelligent >>> and take advantage of it to optimise things. I've read this several times, and still don't know what you mean. How does OO avoid converting metadata to data? I'd say you're wrong; in Java you can use classes like Class, Constructor, Method, etc. to do "higher-order" operations, so the metadata is effectively converted to the same sorts of things you write your programs in (i.e. classes). The new JDK1.5 metadata will simply expand this; the metadata will still be accessible as "data".
Other languages do similar things (albeit in a much more elegant way than Java).
> The whole point of a database is it STORES data, it does *not* > UNDERSTAND data. By converting metadata into data, you are now forcing > "intelligence" into the application. No, you're forcing intelligence [sic] into the RDBMS. You're telling it what's allowed and what's not. What other meaning of "data definition" is there?
My ongoing gripe about declaration vs. procedure is based on descriptions of meaning. With procedural code, the meaning is implicit; if you're lucky, the code was written in a clear way, and you can see the meaning. With declarative, you don't guess (nor do you have to implement in an algorithmic sense). The language/engine/DBMS does the monkey work for you.
> A relational database thinks in terms of sets. In order to have a list, > you need to create extra DATA, and the database itself can't take > advantage of it, because it doesn't understand it. Right, it understands relations and values; the types of those values are something different. But what exactly does it matter? You seem to be implying that lists are so useful as to be first-class citizens to the DBMS, and I say they're not; I'd prefer sets, for one thing (and no, from that standpoint, RDBMSs don't "do" sets either). Or even bags. Or perhaps relations themselves. Lists are in so many cases poor substitutes for a real data structure - as the presence of "they-gotta-be-correlated" attributes in Pick files (e.g. QUANTITY list-valued attribute and PRODUCT list-value-attribue to store line item data for an order - better not lose the order or an item in one, or you're hosed).
> DATA is what is stored IN a database. METADATA is data that is USED BY > the database. There *is* a difference, and the difference is crucial. > The more metadata you can leave as metadata, rather than convert to > data, the more information the database has available to it to optimise. That's ignoring what you mentioned earlier - the metadata that the DBMS can understand. Are you saying that the metadata needs to be left in so that later on, when the DBMS is extended in some way, it can now comprehend what previously meant nothing to it?
And again, the concept of metadata (at least in the discussion at hand) only has meaning in the context of datatypes. You seem to be saying that because lists are Very Important Things, that the DBMS must "understand" them as metadata, in much the same way as it understands files and fields. I'm saying that's not needed, because you can define a List type which the RDBMS can manipulate like any other type you want to define, though if you want the benefit of relational manipulation (a good thing which would eliminate, for example, many many lines of code), you must express the data relationally.
> How does an RDBMS optimise access to a list, if it doesn't have any > understanding of what a list is? So it's an optimization question? In short, it wouldn't - no more than it would optimize access to an Order type I've defined (including line items). Then again, if it were a relation-valued attribute, it could optimize that with the same machinery with which it optimizes the rest of the relations.
But again, the main point here is what's important and what's not. Lists and their status as first-class DBMS citizens seems to be the point in question.
> That's the point of storing metadata *as* *metadata*. Because the > database understands it. It can only understand what it understands. What other types besides Lists need to "be" metadata?
- erk
Eric Kaun - 09 Jul 2004 01:53 GMT > Yes, but relational formalises metadata INTO data. No formalization is needed; metadata is data. It's just data with a different domain, but there's no reason to think it obeys different laws or requires different structure.
> Once it's in an RDBMS > it's no longer metadata, because the rdbms doesn't understand any > meaning in it and can't take advantage of that meaning so it's just data. I'm confused. How does placing it in an RDBMS make it no longer metadata? The system catalog (metadata - data about your data) can be represented relationally (or as XML if you're feeling masochistic).
How does the RDBMS "understand" no meaning in it? And how do other DBMSs "understand" meaning? The constraints and relation definitions of the metadata are as much meaning as the RDBMS can have.
> The ordering in a list is metadata. Convert that into a set to put into > an rdbms and ORDER is now just a meaningless (as far as the db engine is > concerned) bit of data. No, in that case order is gone, vanished. If you don't state it, the RDBMS doesn't know about it. On the other hand, it doesn't assume anything either. Order is easily represented, and again if you're masochistic, you can store a list-typed attribute.
> That's where MV and OO fundamentally differ. They try to *avoid* > converting metadata to data, so that the db engine can be intelligent > and take advantage of it to optimise things. So by treating metadata as something other than data (what would that be?), they can be intelligent and optimize? Intelligent how? Optimize what?
- erk
Bill H - 14 Jun 2004 20:12 GMT > "Bill H" <wphaskett@THISISMUNGEDatt.net> wrote in message... > > [quoted text clipped - 18 lines] > > as being the controlling field with a relationship to field# 10 while > > field# 10 is dependent on field# 9. [snipped]
> While I see many examples like the above, can you give us an example of how > the dictionary defines those? Here's the field definition for the Accounts and Amounts:
accounts 001 A 002 9 003 ACCT 004 C;10 005 006 007 008 009 L 010 5 011 012 013 014 015 016 017 The G/L acct#s associated with this invoice (controls field# 10)
amounts 001 A 002 10 003 ACCT/AMTS 004 D;9 005 006 007 MR2,M 008 009 RN 010 13 011 012 013 014 015 016 017 The amounts associated with each G/L acct# in field# 9.
Field# 004 in the above definitions defines the controlling and dependent fields. The above structure may be different in different mvDbms products. Anyway, these definitions are data just like other data and reside in the database.
> What language do you use to define the dictionary? Is it user-accessible? > > - erk As you can see, the definitions are just data. They describe the data the definitions have a pre-defined structure (the dbms defines this structure). One builds a dictionary through various tools (line editor, screen editor, GUI editor, GUI dictionary editor, etc). The query language uses the field definitions so I could:
LIST APINVOICES ACCOUNTS AMOUNTS
and get the following output:
apopen.... ACCT. ACCT/AMTS.... * 555*1011 5070 6.73 340*VR3-2 5170 1,012.61- 3370 1,963.84- 5170 0.00 3370 0.00 9999*3907 5000 300.00 555*1018 5070 29.53 340*VR11-1 5170 999.22- 3370 1,977.23- 5170 0.00 3370 0.00
So the data is accessible by users or developers and the field definitions can be accessed in the same way (since they're just data too) using the query language:
LIST DICT APOPEN 'ACCOUNTS' 'AMOUNTS' D/CODE A/AMC S/NAME V/STRUC V/TYP V/MAX
DICT APOPEN code A/AMC S/NAME.............. c/d struc. TP MAX
ACCTS A 9 ACCT C;10 L 5 AMTS A 10 ACCT/AMTS D;9 RN 13
[405] 2 items listed out of 2 items.
Hope this helps.
Bill
Anthony W. Youngman - 18 Jun 2004 18:16 GMT >This is, more than anything, the philosophical divide between relational and >Pick folks. The more rules, the more they should be kept OUT of the >application code. "Application" means just that: a judicious application. Of >what? Rules. Application != definition, just as implementation != >specification. Actually, as a Pickie, I'm very much inclined to agree. Rules should sit BETWEEN the application and the data store. So no, I don't quite agree with the relational approach, but I think the Pick approach is lacking here.
>> This is because the business person is much closer to the application >> and database, and its tools. [quoted text clipped - 6 lines] >Granted that some rules should be configurable; that doesn't imply that all >should be. The business, after all, has (or needs!) some structure. Beyond relational's strict data typing, aren't all relational rules configurable? Okay, it's done using constraints, triggers, etc etc but it's still configured by the programmer or dba, as far as I can see.
I want that power in Pick :-) (Okay we've got it with triggers, but I don't necessarily think that's the best way to do it)
>> As you can tell, a well defined mvDbms application uses the field >> definitions to describe the data (as it should be) and relationships with [quoted text clipped - 5 lines] >the dictionary defines those? What language do you use to define the >dictionary? Is it user-accessible? I said earlier "the generic trumps the specific". Relational has a rule that says "the data definition must be accessible using the same tools as are used to access the data" (C&D rule 4, my paraphrase).
My multi-value rule 5 - Database self description :-) The database management system shall describe itself using FILEs. FILEs are described by other FILEs. The database itself is described by a FILE. The database shall have no fundamental mechanism to differentiate between FILEs containing data and FILEs containing metadata about that data.
So yes, Pick uses the same language to access both data and definition. And yes, it's user accessible. Because the db engine is forbidden to know that there is a difference between data and metadata.
Which is why I think of a database as layered. And why I find the relational emphasis on "push it into the database" difficult. I see it as a four-layer thing.
Layer 1: the data store. Layer 2: the integrity layer. Where relational has triggers, constraints, and all that guff. Pick doesn't have anything native here (although a lot of what relational has, Pick doesn't need because the model is different - constraints for example). But Pick really should have a native mechanism here. It hasn't had it in the past because we manage fine without it (really we do - it's just that, in a FEW cases, we can see that relational is worth copying here). Layer 3: the presentation layer. What the apps see - relational tables and views, Pick FILEs. Really, this is where views are defined, so Pick really neither has it nor needs it. Layer 4: applications - split into 4a database tools and 4b user apps.
I get the impression relational is trying to have a monolithic database layer which is trying to be all things to all men. And if that's the case, it's bound to fail. Break things up into tasks and layers, and don't just have "the database".
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Laconic2 - 18 Jun 2004 20:14 GMT > Actually, as a Pickie, I'm very much inclined to agree. Rules should sit > BETWEEN the application and the data store. So no, I don't quite agree > with the relational approach, but I think the Pick approach is lacking > here. As a relational (as in SQL), I'm glad to see some agreement. But I'd offer yet another opinion.
Rules should sit both ABOVE and BEYOND the application and the data store. Both the CREATE script for the data store, and the code generation phase of the application should be able to include rules, when necessary, from some common rule repository. This rule repository would do for rules what a data dictionary does for data definitions.
Or not?
Dawn M. Wolthuis - 18 Jun 2004 21:43 GMT > > Actually, as a Pickie, I'm very much inclined to agree. Rules should sit > > BETWEEN the application and the data store. So no, I don't quite agree [quoted text clipped - 11 lines] > > Or not? Yes-ish. It is hard to split out metadata, including rules, from data. If a type or a maximum length are designated as binding information regarding an attribute, then those are rules, right? And they are constraints, right? And they are metadata, right? And if we want to store all of our rules for use by a rules engine, these these rules should be there, right? So, what should be in a system catalog or as DBMS constraints specifications outside of a rules respository? Nothing. So, the rules repository should include whatever aspects of the data dictionary are in need of enforcement. The data dictionary is then descriptive for use in queries, not another rules repository. I think of a data dictionary as like a Land's End catalog -- something from which to shop for the information I want.
--dawn
Anthony W. Youngman - 19 Jun 2004 23:42 GMT >> Actually, as a Pickie, I'm very much inclined to agree. Rules should sit >> BETWEEN the application and the data store. So no, I don't quite agree [quoted text clipped - 11 lines] > >Or not? Except I don't understand what you mean by "above" and "beyond". The app sits above the rules, the datastore sits below. In order for the app to write to the datastore it has to go through the rule layer. That's the way I see it. You seem to be saying the rules are somewhere else. I think I see what you mean, but it doesn't make sense to me.
The way I would implement it (in Pick) would be to attach an "integrity check routine" to the FILE. Think of it as a SQL "whole-table trigger" - you can't write to file without setting off this thing (if it exists) and it can reject the write with a "this data is invalid" error.
And as a first thing, I would add the OPTION to make dictionary descriptions prescriptive, so you could enforce eg "this column/FIELD is a number" :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony - 19 Jun 2004 12:40 GMT > >This is, more than anything, the philosophical divide between relational and > >Pick folks. The more rules, the more they should be kept OUT of the [quoted text clipped - 6 lines] > with the relational approach, but I think the Pick approach is lacking > here. Perhaps you agree more than you realise, since a DBMS is a database MANAGEMENT system, not just a data STORE. The DBMS sits between the application and the dumb data store, which is the file system. That's why the rules belong in the DBMS.
Laconic2 - 19 Jun 2004 13:59 GMT > Perhaps you agree more than you realise, since a DBMS is a database > MANAGEMENT system, not just a data STORE. The DBMS sits between the > application and the dumb data store, which is the file system. That's > why the rules belong in the DBMS. Excellent point!
This gets to be even more true when the database integrates data from more than one application.
mAsterdam - 20 Jun 2004 09:53 GMT > ... Rules should sit > BETWEEN the application and the data store. In the relational approach, by separating the rules and the data store, that is exactly where they are: BETWEEN the application and the data store.
> I get the impression relational is trying to have a monolithic database > layer which is trying to be all things to all men. And if that's the > case, it's bound to fail. Break things up into tasks and layers, and > don't just have "the database". Why did you do it: because I can. A lot of presentational application code has tabular structure. While there may be no need to share that (it is not user data, it is code) it is convenient to put it into something which has a track record of storing tables. The content of the tables resulting from this practise (the use of the tables managed by the DBMS to contain application code) should be treated as what it is, part of the code: apply change management discipline, include the tables in packaged releases etc.
Dawn M. Wolthuis - 13 Jun 2004 03:30 GMT > > > Months ago, I asked whether a pizza with pepperoni and onion was the > same [quoted text clipped - 31 lines] > Are you "just expected to know" the logical structure of invoices and > pizzas enough to draw this inference? I think the way this is handled is one of the (rather few) areas that is not the same with each MV database on the market. In the UniData environment, with which I am most familiar, if there are "associated multivalues" then they are identified as such and this "association" is named in the dictionary -- the vocabulary of the view of the data through a particular portal. So, I can talk about each multivalued field individually, or the association (nested table-ish) by its name.
Keep in mind that unlike an RDBMS schema, the vocabulary for MV/PICK systems is descriptive of the data and not constraining. The same data can be described in many different ways. The association would really be a type of derived data.
> Not that there aren't things you "just have to know" in a schema of tables, > but the Pick people treat it as though it's "intuitively obvious". Maybe to > an SME, but maybe not to everybody else. No we don't -- oddly enough, it is though. smiles. --dawn
Eric Kaun - 14 Jun 2004 17:21 GMT > Keep in mind that unlike an RDBMS schema, the vocabulary for MV/PICK systems > is descriptive of the data and not constraining. The same data can be > described in many different ways. The association would really be a type of > derived data. How is this useful? I've seen this in COBOL layouts, and was underwhelmed; it always seemed to cause more problems (and invite even others) than it appeared to solve. How is this more effective than a view, for example?
- erk
Dawn M. Wolthuis - 14 Jun 2004 21:43 GMT > > Keep in mind that unlike an RDBMS schema, the vocabulary for MV/PICK > systems [quoted text clipped - 6 lines] > it always seemed to cause more problems (and invite even others) than it > appeared to solve. How is this more effective than a view, for example? Logically that is what it is, I guess, but it can be nested.
Take all of the nouns you want to consider and look at their relationships. Month, Day, and Year are three such nouns and you might want another that is made up of exactly these three -- so you can derive Date as Month | Day | Year or derive month, for example, using a function as Month(Date). Now, if you are looking at a list of dates, you can do the same thing, performing functions to group or separate various data.
I'm not sure that answered your concern. I think being underwhelmed regarding derived data is appropriate in 2004. smiles. --dawn
Eric Kaun - 16 Jun 2004 15:56 GMT > > How is this useful? I've seen this in COBOL layouts, and was underwhelmed; > > it always seemed to cause more problems (and invite even others) than it [quoted text clipped - 11 lines] > I'm not sure that answered your concern. I think being underwhelmed > regarding derived data is appropriate in 2004. smiles. --dawn I don't think any of the above represents derivation; it looks more to me like operations over types. I think of Date as a type, as well as Day, Month, and Year.
So I'd set up equivalences like these:
Month(Date(Y, M, D)) = M Day(Date(Y, M, D)) = D Year(Date(Y, M, D)) = Y
which assumes only that you have a selector (constructor) Date(Y,M,D). You could set up others, of course, and you'd need domain specifiers over M and Y, and then a constructor for Day that took Month into account.
And then the individual types would have other semantics. In particular, you'd have to introduce the notion of calendars (the above is GregorianDate), and the base type all of them rely on (not "derived from") is something like Timestamp, an instant in time.
But I think I've gone far afield of your original points...
- erk
Dawn M. Wolthuis - 16 Jun 2004 16:35 GMT > > > How is this useful? I've seen this in COBOL layouts, and was > underwhelmed; [quoted text clipped - 20 lines] > like operations over types. I think of Date as a type, as well as Day, > Month, and Year. Well, see now, I knew that about you (and your type ;-) and so I poked. Derived data is just applying operations to stored data, whether a "type operation" or any other function one wants to write or that comes with the licenesed toolset. There remain these two things: data and functions and the coolest of these is derived data (applying functions, by whatever name, to data).
> So I'd set up equivalences like these: > [quoted text clipped - 5 lines] > could set up others, of course, and you'd need domain specifiers over M and > Y, and then a constructor for Day that took Month into account. yup, or use something like number-of-days-since-D-day and then functions on that to view it in whatever way is desired
> And then the individual types would have other semantics. In particular, > you'd have to introduce the notion of calendars (the above is > GregorianDate), and the base type all of them rely on (not "derived from") > is something like Timestamp, an instant in time. > > But I think I've gone far afield of your original points... No problem - we've wandered a ways from Wol's original point, so I'll summarize with it. Now that we've determined that relational modeling/theory is NOT mathematics, we can get back to the science of it, which is where experience comes in. Since we all have different experiences and there are not enough good studies (if any) to validate industry best practices, I think we do well to learn from the experiences of others.
cheers! --dawn
Anthony W. Youngman - 18 Jun 2004 18:27 GMT >> Not that there aren't things you "just have to know" in a schema of >tables, [quoted text clipped - 3 lines] > >No we don't -- oddly enough, it is though. smiles. --dawn Oddly enough, I've just been trying to get to grips with our new SQL database. And I asked "how do I know which tables belong together?" I was told that, given an individual table, I couldn't find out which other tables "join"ed to it. I "just had to know".
With Pick, you just have to LIST DICT FILENAME, and chances are it'll be there in front of you. Certainly you get all the files that FILENAME links to, if not the files that link to FILENAME (probably the same set).
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Laconic2 - 18 Jun 2004 20:21 GMT > Oddly enough, I've just been trying to get to grips with our new SQL > database. And I asked "how do I know which tables belong together?" I > was told that, given an individual table, I couldn't find out which > other tables "join"ed to it. I "just had to know". Bad database, don't you think? If the appropriate REFERENCES clauses had been included, you would be able to figure out the join conditions, wouldn't you?
Of course, there are times when the ability to join data in an unanticipated way is actually useful.
As I wrote in "Stupid database tricks", it's possible to create a totally inscrutable database, comprehensible only to programmers, in SQL. It's also possible to write spaghetti code in C.
Nothing is foolproof, because fools are so ingenious. Or so it says on somebody's tag line.
Anthony W. Youngman - 19 Jun 2004 23:49 GMT >> Oddly enough, I've just been trying to get to grips with our new SQL >> database. And I asked "how do I know which tables belong together?" I [quoted text clipped - 4 lines] >been included, you would be able to figure out the join conditions, >wouldn't you? Maybe they have been. How would I find out? Or maybe the RDBMS doesn't support REFERENCES. It's MS Squirrel Server :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony - 19 Jun 2004 12:47 GMT > >> Not that there aren't things you "just have to know" in a schema of > tables, [quoted text clipped - 8 lines] > was told that, given an individual table, I couldn't find out which > other tables "join"ed to it. I "just had to know". Either the person you asked was an idiot, or you have a crap SQL DBMS, or both. What DBMS is it? Every SQL DBMS I know has a data dictionary that shows the RI constraints between the tables, which gives you what you need. Of course, some application-centric idiot "designers" don't bother to define these, because the application "knows".
Laconic2 - 19 Jun 2004 14:04 GMT > "Anthony W. Youngman" <wol@thewolery.demon.co.uk> wrote in message > > Oddly enough, I've just been trying to get to grips with our new SQL
> > database. And I asked "how do I know which tables belong together?" I > > was told that, given an individual table, I couldn't find out which [quoted text clipped - 6 lines] > "designers" don't bother to define these, because the application > "knows". I ran into one of those, a few years back. It was the "Great Plains" order processing system for dotcoms. No constraints in the DB, although the DBMS supports them. Rows in the same table that represented different "record types", depending on a value in one of the columns. Sets formed by doubly linked lists of foreign keys. No documentation. The whole nine yards.
The programmers told me it was "very advanced". Yeah, right.
Eric Kaun - 14 Jun 2004 17:14 GMT > [...] > In the recent Pick example, showing an invoice, there's a list of account [quoted text clipped - 10 lines] > but the Pick people treat it as though it's "intuitively obvious". Maybe to > an SME, but maybe not to everybody else. I think the choice of data structure is important, both in terms of correctness and in communication. I see this a lot on Java - people using ArrayList everywhere just because they can, and then doing nausea-inducing searches through the lists, as opposed to using a Set or Bag or some other structure. And besides the simple bad choice, I keep thinking "O how I wish I could do an in-memory SELECT here..."
Anthony W. Youngman - 18 Jun 2004 17:56 GMT >> It's a slippery handle, but maybe - but be careful asking about "the same >> as" in an OO context - that subject gets very confusing to OOers. :-) [quoted text clipped - 24 lines] >Are you "just expected to know" the logical structure of invoices and >pizzas enough to draw this inference? No. Pick stores metadata in its dictionaries, and has a concept called ASSOCiation.
With the pizza, there is no ASSOCiation defined between CHEESE and TOPPING, but with the invoice there is an ASSOCiation between ACCOUNT.NO and AMOUNT. That is, if the programmer has remembered to define it ...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Laconic2 - 18 Jun 2004 20:09 GMT > No. Pick stores metadata in its dictionaries, and has a concept called > ASSOCiation. > > With the pizza, there is no ASSOCiation defined between CHEESE and > TOPPING, but with the invoice there is an ASSOCiation between ACCOUNT.NO > and AMOUNT. That is, if the programmer has remembered to define it ... That's a fair enough answer to the question as stated. Data is only self describing if somebody made it that way.
As far as "if the programmer has remembered to define it" goes, I find that no more, and no less, of a pitfall than the REFERENCES constraint in SQL.
Dawn M. Wolthuis - 18 Jun 2004 21:48 GMT > > No. Pick stores metadata in its dictionaries, and has a concept called > > ASSOCiation. [quoted text clipped - 9 lines] > that no more, and no less, of a pitfall than the REFERENCES constraint in > SQL. Only slightly different, for better or worse, in that if the REFERENCES constraint is not there, you can still create a query with the appropriate join, so there is nothing to force you to put in the REFERENCES constraint, where in PICK, the user will complain that they can't get the data out the way they want if the ASSOC is not there, so the dictionary, which is just descriptive, does get that type of accuracy quite soon after deployment if it is not built in from the start.
--dawn
Anthony W. Youngman - 18 Jun 2004 17:51 GMT >> He did say that, and I've been thinking about it, and am not sure it's >> accurate. The order of values in a list attribute in a Pick file seems [quoted text clipped - 12 lines] >I got several cute responses, but nobody really addressed the underlying >issue. Sounds like you've got a handle on it. In other words, is it a set, a bag, or a list?
Note that it's easy to go from a list to either of the other two. But in order to go back, the set or bag needs to contain extra data (ie the order) over the list.
Because Pick stores attributes as lists (if relevant) the order is available to the db engine as metadata if required. And it can't be accidentally lost by an analyst :-) So I would argue that storing things as lists is better, because you can always get the other two if you want.
After all, your question could be taken to mean "Is it a pizza with both pepperoni and onion" or "is it a pizza with pepperoni on it then onion on top of that".
If the analyst hasn't taken it into account, then relational is sunk without a redesign. With Pick you just tell your till-operatives that it's to be entered as an ordered list, not a set :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Marshall Spight - 27 Jun 2004 02:42 GMT > Note that it's easy to go from a list to either of the other two. But in > order to go back, the set or bag needs to contain extra data (ie the > order) over the list. I don't see how you could consider that data "extra" if it was there originally.
Anyway, the list [A, B, C] expressed as a set is { (1, A), (2, B), (3, C) } where [] denotes an ordered collection and {} denotes an unordered collection.
Going back to the list from the information-preserving set is not that hard.
> Because Pick stores attributes as lists (if relevant) the order is > available to the db engine as metadata if required. And it can't be > accidentally lost by an analyst :-) So I would argue that storing things > as lists is better, because you can always get the other two if you > want. Although I don't think having lists as the only collection primitive is a good idea, there is one key point that I will gladly grant you:
Lists are very common, and SQL doesn't handle them well at all. RM doesn't handle them well either.
(I can also believe that MV handles them quite well, although I have no direct experience to that end.)
Does anyone know of a "list calculus" or "list algebra" with a formal definition? It is just too simple for anyone to have cared about?
Marshall
mAsterdam - 27 Jun 2004 10:51 GMT >>Note that it's easy to go from a list to either of the other two. But in >>order to go back, the set or bag needs to contain extra data (ie the [quoted text clipped - 8 lines] > > Going back to the list from the information-preserving set is not that hard. There is more to it.
[forced meaning/overspecification] The above is one way of expressing the list as a set. There are other ways, and that is where the complication comes in. One has to decide which.
First, the way you chose is not without problems:
Let's put a new element 'N' into the list, right after the 'B':
The new list is [A, B, N, C]. Now what is the set?
is it UC1: { (1, A), (2, B), (3, C), (2.5, N) } or is it UC2: { (1, A), (2, B), (4, C), (3, N) } ?
In the case of UC1 we will have to make sure that there is allways room for inserts, In the case of UC2 every insert causes updates to all elements that should be after the new element.
Now there are other ways of representing lists as unordered collection, better or worse at some or other aspects - but that is not the problem.
The problem is we have to make a choice with consequences to get from the list to a unordered collection carrying the same meaning, so we are capable of reconstructing the list. It is a strange thing: we require an 'unordered collection' to preserve order - now if that isn't asking for trouble ... :-) but I am digressing.
Thing is we *have* to make a complex choice, because (and only because) we decided that the order was meaningful. But the choice we have to make has *more* consequences than we bargained for. I could imagine people calling that 'extra data' - but what does the extra data mean? In the above solutions we get an extra column (or attribute) stating rank. In another solution we might get an extra colmunn designating the 'next in order' element - or would 'prior' be better?
I don't care, but I must decide. It is somewhat like having to choose which side on the road to drive on without traffic signs and without knowing what country you are in.
There are even more worms in this can, I think, but I would appreciate your comments on this one.
<snip>
> Lists are very common, and SQL doesn't handle them well at all. > RM doesn't handle them well either. [quoted text clipped - 5 lines] > formal definition? It is just too simple for anyone to have > cared about? Lisp and prolog come to mind - but that is not what you are looking for - at least it is not what I am looking for.
Marshall Spight - 27 Jun 2004 16:55 GMT > >>Note that it's easy to go from a list to either of the other two. But in > >>order to go back, the set or bag needs to contain extra data (ie the [quoted text clipped - 9 lines] > [...] > First, the way you chose is not without problems: For sure!
> Let's put a new element 'N' into > the list, right after the 'B': [quoted text clipped - 4 lines] > is it UC1: { (1, A), (2, B), (3, C), (2.5, N) } or > is it UC2: { (1, A), (2, B), (4, C), (3, N) } ? It's UC2.
> In the case of UC2 every insert causes updates to > all elements that should be after the new element. Let us also consider the two other common implementations of lists: linked lists and arrays.
In UC2, an insert requires O(N) time: every index above the one inserted requires updating. In a linked list, an insert requires O(1) time, but locating an item in order to insert in front of it is O(N). In an array, an insert requires O(N) time, because you have to move all the higher elements up.
The issue here is not performance, it's interface. Using SQL to manipulate ordered data *where the ordering is positional* and *not by any logical component of the data* is just way too hard.
"Lists are very common, and SQL doesn't handle them well at all. RM doesn't handle them well either."
Let me expand on what I mean by "handle," and to do so, I'll use the "structure, integrity, manipulation" definition of a DBMS.
If you have an unchanging list, using (1,A), (2,B), (3,C) etc as a list *structure* works just fine. All the query operators you'd expect to have, you have: get me item 2, how many items are there, etc. You'd also get some not-so-common queries: at what indicies does the letter "Q" occur?
But let's consider manipulation. Simple insert, using your (3,N) tuple.
Java: list.insert(3, N);
SQL: begin update List set position = position + 1 where position >= 3 insert into List (position, value) values (3, N) commit [note if you do them in the wrong order you're screwed.]
Wait, I just remembered: Date says integrity enforcement should be at the statement level, not the transaction level. So the first update will fail, because it leaves a hole in the sequence. So I've gotta figure out how to update all those pos values at once, which I think I can do with mod and some offsets. I expect it would take me a few hours to figure out.
Now: integrity. I have to specify some checks. Uh: unique(pos) check( pos >= 0 ) check( select max(pos) from List = select count(pos) from List )
Did that get it? I think it did, but I'm not sure. Also, if I saw that in a table declaration, would I say "list" or would I say "bunch of integrity constraints."
> The problem is we have to make a choice with consequences to get from > the list to a unordered collection carrying the same meaning, so we are > capable of reconstructing the list. I don't disagree, but I might say it differently: what matters is what we options we have to enforce integrity, and what operators we have to perform manipulation. I think RM is a step backwards in ease-of-use for each of these where lists are concerned. (Which is not to say that I think we should make our decisions on that basis alone, but I do think it's significant.)
> Thing is we *have* to make a complex choice, > because (and only because) we decided that the order was > meaningful. I don't think the choice is intrinsically complex. I think the issue is just that SQL (and TTM for that matter) don't give you even the most basic list manipulation or integrity checks.
> There are even more worms in this can, I think, but > I would appreciate your comments on this one. [quoted text clipped - 12 lines] > Lisp and prolog come to mind - but that is not what you are > looking for - at least it is not what I am looking for. No, if I want a PL with list operators, I can find them anywhere. I was more thinking of something like the relational algebra, with its minimal set of operators, additional operators defined in terms of the minimal ones, and a definition of what it means for a set of operators to be complete. Only for lists.
Frighteningly, if I can't find something like that, I may have to do it myself.
Marshall
mAsterdam - 27 Jun 2004 21:38 GMT >>>>Note that it's easy to go from a list to either of the other two. But in >>>>order to go back, the set or bag needs to contain extra data (ie the [quoted text clipped - 21 lines] > > It's UC2. Ok. No argument, just wondering: any particular reason to discard UC1?
>>In the case of UC2 every insert causes updates to >>all elements that should be after the new element. > > Let us also consider the two other common implementations > of lists: linked lists and arrays. Yes, there are more alternatives, all with pros and cons.
> In UC2, an insert requires O(N) time: every index above > the one inserted requires updating. In a linked list, an insert [quoted text clipped - 3 lines] > > The issue here is not performance, ... Indeed.
> it's interface. Is it really? Just interface? Maybe I just don't understand what you mean by that. I suspect it goes beyond interface. To be more specific: I think it is related to the information principle:
> The entire information content of a relational database > is represented in one and only one way: namely, as > attribute values within tuples within relations. As soon as some order is said to have information content, this principle requires that to have it in a relational database, this content must be represented as attribute values.
So we have to answer the question: which attribute(s') values? The answer seems obvious: list typed attributes. But that's not the way it is done - so what stops that? I know: the lack of the "Spight list algebra"!
BTW: Attribute is still on the glossary's todo list (hint :-)
> Using SQL > to manipulate ordered data *where the ordering is positional* [quoted text clipped - 25 lines] > commit > [note if you do them in the wrong order you're screwed.] No problem. Assume a preprocessor and make a macro^H^H^H^H^H shorthand.
> Wait, I just remembered: Date says integrity enforcement > should be at the statement level, not the transaction [quoted text clipped - 12 lines] > saw that in a table declaration, would I say "list" or would > I say "bunch of integrity constraints."
:-)
>>The problem is we have to make a choice with consequences to get from >>the list to a unordered collection carrying the same meaning, so we are [quoted text clipped - 6 lines] > that I think we should make our decisions on that basis alone, but I > do think it's significant.)
>>Thing is we *have* to make a complex choice, >>because (and only because) we decided that the order was [quoted text clipped - 4 lines] > give you even the most basic list manipulation or integrity > checks. The more options we have, the more serious the problem is. Is list [A, B, N, C] ambiguous? Is 'insert Y into L1 after B' amibiguous?
Say we have a pizza-attribute 'topping' of type 'list-of-toppings' Is 'constraint FK topping refers to toppings' ambiguous?
If it is intrinsically simple as you say - what is your explanation why SQL (and TTM) do simply not address meaningful order?
Maybe we are just overlooking something obvious.
> I was more thinking of something like the relational algebra, > with its minimal set of operators, additional operators defined [quoted text clipped - 3 lines] > Frighteningly, if I can't find something like that, I may have > to do it myself. "Spight list algebra". Sounds good. :-)
Marshall Spight - 27 Jun 2004 23:27 GMT > >>Let's put a new element 'N' into > >>the list, right after the 'B': [quoted text clipped - 9 lines] > Ok. No argument, just wondering: any particular reason to > discard UC1? If you deviate from the integer domain for your index, you no longer have a list; you now have just another set with just another regular attribute. A list or sequence is a mapping from the natural numbers to another set. For example, a string is a mapping from nat -> char.
UC1 is a perfectly valid approach; it's just not a list.
> > The issue here is not performance, ... > [quoted text clipped - 14 lines] > this principle requires that to have it in a relational database, > this content must be represented as attribute values. Something the information principal *doesn't* say is that those relations cannot be a subtype of relation. Ordered relation is a subtype of unordered relation, and list is a subtype of ordered relation. And if we allow relation- valued attributes (RVAs) then that means we also allow lists as attributes.
> So we have to answer the question: which attribute(s') > values? The answer seems obvious: list typed attributes. > But that's not the way it is done - so what stops that? > I know: the lack of the "Spight list algebra"! Actually, I think it's exactly that. (Not that it needs to have my name on it, of course.) I think we need a *theoretical* understanding of the subtyping relationship between sets, totally ordered sets, partially ordered sets, and lists. (Mathematicians, of course, have had this understanding for years, but not too many people in data management are there yet. Still, I wish I knew as much math as, say Mikito Harakiri.)
We also need the relational language to have a type system that isn't antediluvian. SQL's type system is right in line with other languages of the day, such as FORTRAN, Cobol, and Pascal (or C for that matter.) It hasn't advanced much, at least not in the type system, and it's extraordinary market success has defeated all newcomers.
> > [note if you do them in the wrong order you're screwed.] > > No problem. Assume a preprocessor and make a > macro^H^H^H^H^H shorthand. Agreed, but I think that for practical purposes you *have* to have this shorthand form.
> > I don't think the choice is intrinsically complex. I think > > the issue is just that SQL (and TTM for that matter) don't [quoted text clipped - 4 lines] > Is list [A, B, N, C] ambiguous? Is 'insert Y into L1 after B' > amibiguous? I'd say "no" and "no." I'm not certain I see your point, though. Maybe it's that the current situation is quite complicated if you want to handle lists? I'd agree with that. That's why I think we need those list primitives (derived from that list algebra) to make it simple.
> Say we have a pizza-attribute 'topping' of type 'list-of-toppings' Why is that a list? When I go to pizza hut and ask for ham and pineapple, I get the same thing as if I asked for pineapple and ham. Pizza toppings aren't a list; they're a set.
I think part of the reason people see lists everywhere is because they're used to supplying information in the form of a list, even when the order info isn't relevant. This makes for potential program bugs, because
List{ham, pineapple} != List {pineapple, ham} but Set {ham, pineapple} == List {pineapple, ham}
> Is 'constraint FK topping refers to toppings' ambiguous? I don't get the question.
> If it is intrinsically simple as you say - what is your > explanation why SQL (and TTM) do simply not address meaningful > order? > > Maybe we are just overlooking something obvious. I think Date et. al.'s usual culprit suffices here: lack of education. I would note, ironically, that that group has never managed to clarify the (IHMO) essential distinction between partial order and total order when discussing order.
> > I was more thinking of something like the relational algebra, > > with its minimal set of operators, additional operators defined [quoted text clipped - 5 lines] > > "Spight list algebra". Sounds good. :-) I'll get right on it. :-)
Marshall
mAsterdam - 28 Jun 2004 23:36 GMT > mAsterdam wrote: >> [quoted text clipped - 19 lines] > > UC1 is a perfectly valid approach; it's just not a list. Now I am puzzled. What makes the integer *not* a regular attribute? -
I'll try myself: it is easier to hide. And that is exactly what is necessary. We can pretend it's not there, and we should, because in the unambigous list presentation it is not there. Any more data than exactly necessary for presenting the order should not be visible.
(Yep: interface :-)
<snip>
> Something the information principal *doesn't* say is that > those relations cannot be a subtype of relation. > Ordered relation is a subtype of unordered relation, and > list is a subtype of ordered relation. And if we allow relation- > valued attributes (RVAs) then that means we also allow > lists as attributes. How does this eliminate the extra choice to make? (- I think you agree it should).
>>So we have to answer the question: which attribute(s') >>values? The answer seems obvious: list typed attributes. [quoted text clipped - 16 lines] > at least not in the type system, and it's extraordinary market > success has defeated all newcomers. Yep. Types. Sigh. Gimme gimme gimme :-)
But later. One thing at a time. For now: lists!
>>>[note if you do them in the wrong order you're screwed.] >> [quoted text clipped - 3 lines] > Agreed, but I think that for practical purposes you *have* to > have this shorthand form. /me nods.
>>>I don't think the choice is intrinsically complex. I think >>>the issue is just that SQL (and TTM for that matter) don't [quoted text clipped - 16 lines] > pineapple, I get the same thing as if I asked for pineapple and > ham. Pizza toppings aren't a list; they're a set. To connaisseurs it's a list. The order of the toppings matters. I assumed that this foundation of the pizza-model was common knowledge ;-)
> I think part of the reason people see lists everywhere is because > they're used to supplying information in the form of a list, [quoted text clipped - 8 lines] > > I don't get the question. After the above pizza-model 101 you probably do.
It's just to state that the list-member should be first class^H^H^H^H^H type citizens.
>>If it is intrinsically simple as you say - what is your >>explanation why SQL (and TTM) do simply not address meaningful [quoted text clipped - 6 lines] > to clarify the (IHMO) essential distinction between partial > order and total order when discussing order. I have to admit that I suppressed bringing it (partial order) in earlier. I think complete sequence (non-partial order) is to be viewed as a special case of partial ordering (1 part). So maybe the simplest strategy is: get the simple, special case first in thorough detail, than tackle the more complicated (partial order), than show the simple case is a specialization of the complex. Not unlike the approach in 'Temporal data'.
>>>I was more thinking of something like the relational algebra, >>>with its minimal set of operators, additional operators defined [quoted text clipped - 7 lines] > > I'll get right on it. :-) Just lists for now, ok?
Marshall Spight - 29 Jun 2004 02:53 GMT > >>>It's UC2. > >> [quoted text clipped - 21 lines] > > (Yep: interface :-) Mostly I agree. But I don't think the "we should" part always applies. Sometimes we should, and sometimes we shouldn't. It depends on what particular operations we want to do. The most flexible way is where we get to choose.
> <snip> > > [quoted text clipped - 7 lines] > How does this eliminate the extra choice to make? > (- I think you agree it should). No, I don't think I do. What you have to do is decide how you are going to model your data. *That* choice is fundamental; you can't get rid of it. Do your pizza toppings come in order or not? The data model you're working with won't help you make that decision; it only comes into play when you have to figure out how to *express* that decision.
> But later. One thing at a time. For now: lists! Deal.
> >>Say we have a pizza-attribute 'topping' of type 'list-of-toppings' > > [quoted text clipped - 5 lines] > matters. I assumed that this foundation of the > pizza-model was common knowledge ;-) Well, okay; whether a particular aspect of your data model is ordered or not is part of the conceptual modelling; that's the part that's all art, no science.
Pizza Hut's pizza-ordering web app treats pizza toppings as unordered; I postulate a chain called "Bob's Totally Delicious Pizza withTotally Ordered Pizza Toppings" that keeps them in order.
> > I think part of the reason people see lists everywhere is because > > they're used to supplying information in the form of a list, [quoted text clipped - 13 lines] > It's just to state that the list-member should be first > class^H^H^H^H^H type citizens. Ah; I see. I agree the list members should be first-class citizens. Thus, you should be able to say, I have a list of items which are each foreign keys to <whatever>.
> I have to admit that I suppressed bringing it (partial order) in > earlier. I think complete sequence (non-partial order) is to [quoted text clipped - 4 lines] > the simple case is a specialization of the complex. > Not unlike the approach in 'Temporal data'. Yes, but you wanted to talk about lists, and a totally ordered set is not the same thing as a list!
Consider the string "abca". This is a list of characters, but it certainly isn't a totally ordered set over {abc}.
I wish it was simpler, but it's not.
Marshall
mAsterdam - 01 Jul 2004 22:33 GMT >>>>>It's UC2. >>>> [quoted text clipped - 27 lines] > particular operations we want to do. The most > flexible way is where we get to choose. The flexible way would unfortunately also be the way that would force us to make decicions where there are no a priori criteria.
If *all* we want is to assiocate some meaning with the order of some values, the *only* meaningful thing besides those values should be the order of them. Nothing more - for if we would have more, we'd have to decide between them, and thus be forced to assign meaning where we have none.
The fact that there are so many ways to represent lists (just read a few of Jan Hidders' links), all with different consequences, suggests that there should be a visible, simple part (only the values in order + operators) and an invisible, complex part (the values, and all that is necessary to keep them in order under the visible operators).
The complex, invisible part could use the flexiblity to make efficiency decisions based on the (a posteriori) actual content and usage of the values, but these choices should in no way affect the content of the visible, meaningful part.
>><snip> >> [quoted text clipped - 15 lines] > only comes into play when you have to figure out how > to *express* that decision. We decided that, beside the values, the order is relevant. For now, it would be a good thing (TM) to also decide/assume that *only* the order and the values are relevant.
>>But later. One thing at a time. For now: lists! > [quoted text clipped - 13 lines] > is ordered or not is part of the conceptual modelling; that's > the part that's all art, no science. Yep, hence the assumptions.
> Pizza Hut's pizza-ordering web app treats pizza toppings > as unordered; I postulate a chain called "Bob's Totally > Delicious Pizza withTotally Ordered Pizza Toppings" > that keeps them in order.
:-)
>>>I think part of the reason people see lists everywhere is because >>>they're used to supplying information in the form of a list, [quoted text clipped - 17 lines] > citizens. Thus, you should be able to say, I have a list > of items which are each foreign keys to <whatever>. Exactly.
>>I have to admit that I suppressed bringing it (partial order) in >>earlier. I think complete sequence (non-partial order) is to [quoted text clipped - 10 lines] > Consider the string "abca". This is a list of characters, > but it certainly isn't a totally ordered set over {abc}. Yep, that is clear. I was also thinking about precedence ordering (CPM/PERT type planning graphs) to *exclude* them from the problem space (at least for now).
> I wish it was simpler, but it's not. You wouldn't have volonteered if it were :-)
Anthony W. Youngman - 10 Jul 2004 01:02 GMT >>>Say we have a pizza-attribute 'topping' of type 'list-of-toppings' >> Why is that a list? When I go to pizza hut and ask for ham and [quoted text clipped - 4 lines] >matters. I assumed that this foundation of the >pizza-model was common knowledge ;-) Tomato always goes at the bottom, cheese on the top :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Marshall Spight - 10 Jul 2004 02:42 GMT > >>>Say we have a pizza-attribute 'topping' of type 'list-of-toppings' > >> Why is that a list? When I go to pizza hut and ask for ham and [quoted text clipped - 6 lines] > > Tomato always goes at the bottom, cheese on the top :-) Good point.
Is crust a topping?
Marshall
Gene Wirchenko - 10 Jul 2004 05:58 GMT >>>>Say we have a pizza-attribute 'topping' of type 'list-of-toppings' >>> Why is that a list? When I go to pizza hut and ask for ham and [quoted text clipped - 6 lines] > >Tomato always goes at the bottom, cheese on the top :-) What? I always have tomato on top. Maybe, you are confusing tomato and tomato sauce?
Just to be safe: I live in Kamloops, British Columbia, Canada. Please do not open a pizzeria in Kamloops.
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
Dawn M. Wolthuis - 30 Jun 2004 01:07 GMT > > >>Note that it's easy to go from a list to either of the other two. But in > > >>order to go back, the set or bag needs to contain extra data (ie the [quoted text clipped - 22 lines] > > It's UC2. Yes, there is some such list algebra in the DataBASIC language associate with PICK and this is how such an insert would be handled logically.
> > In the case of UC2 every insert causes updates to > > all elements that should be after the new element. Only if the number is stored rather than implied by the ordering of the data in linked list, for example
> Let us also consider the two other common implementations > of lists: linked lists and arrays. [quoted text clipped - 9 lines] > and *not by any logical component of the data* is just way > too hard. yes, exactly!
> "Lists are very common, and SQL doesn't handle them well at all. > RM doesn't handle them well either." [quoted text clipped - 37 lines] > saw that in a table declaration, would I say "list" or would > I say "bunch of integrity constraints." very good point
> > The problem is we have to make a choice with consequences to get from > > the list to a unordered collection carrying the same meaning, so we are [quoted text clipped - 15 lines] > give you even the most basic list manipulation or integrity > checks. RM, or at least SQL, focuses on a) scalar values and b) relations. Adding c) lists as a type with its own functions (operators) native to the database and SQL language is a start. Viewing relations and lists both as functions allows us to think about what functions can be applied to which others, in particular, which functions can be applied to functions where the, uh, domain of one of the parameters is the natural numbers (to mix vocabulary). These would be the ordered lists.
> > There are even more worms in this can, I think, but > > I would appreciate your comments on this one. [quoted text clipped - 5 lines] > > > (I can also believe that MV handles them quite well, although I > > > have no direct experience to that end.) sometimes yes, sometimes no.
> > > Does anyone know of a "list calculus" or "list algebra" with a > > > formal definition? It is just too simple for anyone to have [quoted text clipped - 11 lines] > Frighteningly, if I can't find something like that, I may have > to do it myself. I look forward to it! --dawn
Anthony W. Youngman - 10 Jul 2004 00:35 GMT >> Let's put a new element 'N' into >> the list, right after the 'B': [quoted text clipped - 6 lines] > >It's UC2. Why? (Okay, I do agree with you. But why?) I think it's because your ordering column is actually "list position" which by definition must be a sequential integer. But that column could equally be defined as "relative position" in which case UC1 would be fine.
>> In the case of UC2 every insert causes updates to >> all elements that should be after the new element. [quoted text clipped - 12 lines] >and *not by any logical component of the data* is just way >too hard. Is it "way too hard" or is it simply because the underlying relational model has no concept of order? You cannot manipulate something that has no conceptual existence.
>"Lists are very common, and SQL doesn't handle them well at all. >RM doesn't handle them well either."
:-) > >Let me expand on what I mean by "handle," and to do so, >I'll use the "structure, integrity, manipulation" definition of >a DBMS. Of a DBMS, or the relational version of that definition?
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Marshall Spight - 10 Jul 2004 02:41 GMT > >> Let's put a new element 'N' into > >> the list, right after the 'B': [quoted text clipped - 10 lines] > ordering column is actually "list position" which by definition must be > a sequential integer. Exactly.
> >> In the case of UC2 every insert causes updates to > >> all elements that should be after the new element. [quoted text clipped - 15 lines] > Is it "way too hard" or is it simply because the underlying relational > model has no concept of order? It's too hard because SQL doesn't have primitives that make it easy. If it did, it would be easy.
> >"Lists are very common, and SQL doesn't handle them well at all. > >RM doesn't handle them well either." > > :-) I figured we'd agree on that point.
> >Let me expand on what I mean by "handle," and to do so, > >I'll use the "structure, integrity, manipulation" definition of > >a DBMS. > > Of a DBMS, or the relational version of that definition? It's a definition of DBMS; it's independent of whether the DBMS is relational or not.
Marshall
Jan Hidders - 01 Jul 2004 01:59 GMT > Does anyone know of a "list calculus" or "list algebra" with a > formal definition? It is just too simple for anyone to have > cared about? Yes, it exists, and, no, it is not too simple for anyone to have cared about. There are in fact lots of them. The most interesting ones are based on the the comprehension syntax.
Comprehension syntax. Peter Buneman, Leonid Libkin, Dan Suciu, Val Tannen and Limsoon Wong. SIGMOD Record, 23 (1994), 87-96.
Related to those are the ones based on monads.
Comprehending monads Philip Wadler. Mathematical Structures in Computer Science, Special issue of selected papers from 6'th Conference on Lisp and Functional Programming, 2:461-493, 1992.
The core of XQuery is to some extent based on that. With Google and citeseer you should be able to find on-line versions. There are many many more papers on this, but these should set you in the right direction.
If you are into that sort of thing you might want to read about calculi for even more general data structures such as pomsets (partially ordered mutisets, a generalization of sets, bags and lists):
An Algebra for Pomsets Stéphane Grumbach and Tova Milo Proceedings of the 5th International Conference on Database Theory 191-207, 1995
Happy reading,
-- Jan Hidders
Anthony W. Youngman - 10 Jul 2004 00:26 GMT >> Note that it's easy to go from a list to either of the other two. But in >> order to go back, the set or bag needs to contain extra data (ie the [quoted text clipped - 6 lines] >where [] denotes an ordered collection and {} denotes an unordered >collection. Well, you have just created a field called ORDER, and created the values 1, 2 and 3.
The point is that the data was NOT there initially - it was implicit. If something is implicit, then it quite clearly does not have existence in its own right.
>Going back to the list from the information-preserving set is not that hard. Correct. But surely it's better to throw away the implicit ordering if it's unnecessary at the point of use, than to suddenly discover that it was necessary but that it's been thrown away by accident ... :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Marshall Spight - 10 Jul 2004 02:37 GMT > >> Note that it's easy to go from a list to either of the other two. But in > >> order to go back, the set or bag needs to contain extra data (ie the [quoted text clipped - 8 lines] > > Well, you have just created a field called ORDER, Yes.
> and created the values 1, 2 and 3. No, they were there already.
> The point is that the data was NOT there initially - it was implicit. It sounds like what you're saying is that information encoded as order is "not there." This is incorrect; order encodes information.
> If something is implicit, then it quite clearly does not have existence > in its own right. Strongly disagree. If it didn't exist, then it couldn't carry any information.
> >Going back to the list from the information-preserving set is not that hard. > > Correct. But surely it's better to throw away the implicit ordering if > it's unnecessary at the point of use, than to suddenly discover that it > was necessary but that it's been thrown away by accident ... :-) If you can't tell what information is necessary and what isn't, you're not going to be able to manage that information anyway.
Also, there's no reason why the list [A, B, C] and the set { (1, A), (2, B), (3, C) } can't have the exact same in-memory representation.
Marshall
mAsterdam - 10 Jun 2004 21:16 GMT >>Roman numerals still exist. They work quite well in some contexts. >>Besides, there is tradition. >>Do you know what the QWERTY keyboard was designed for?
> I was told once it was to keep the mechanical hammers attached to the keys > from hitting each other, so they needed to put keys you would likely hit one > after the other so they were not close together. IOW: It was designed to *slow* the typing *down*. Alternative keyboards have been designed, built, and tested to significantly ( > 2x) increase the speed of typing. E.g. velotype (http://www.velotype.com/index.html) is used in specialised applications, but it never really caught on. People tend to like and kling on to what is there and works.
>>>... an invoice is just one of these things, but >>> the data from the invoice is also available [quoted text clipped - 9 lines] >>>you can "get to" from there (via >>>declared links as one might have in a join statement). Phew ;-) So a portal is similar to a view. What is the difference?
[snip]
> I'm still not saying this both accurately and clearly. I'll think about it > some more. There is no problem paying one line item from an invoice and I'm [quoted text clipped - 4 lines] > to come on multiple sheets of paper so you could retrieve the one piece of > paper related to this line item and check it off that way? No, I would prefer to have one listing - wether on paper or on my handheld pen-computer. So - I need to be able to treat the whole invoice as one thing, and I need to be able to treat the invoice as being composed of items. So the model of the invoice should be designed to cater for both needs. Though earlier posts suggested that the chopping up is somehow a typical relational, bad 1NF thing to do I suspect that is rather easy to to, both in MV or in a RDBMS.
Now, I also want one listing for the measurements of stock turnaround, in order to aim for just-in-time logistics and optimally sized orders. In an RDBMS I would create another view on the same schema. Would, in MV, another portal be the way to approach this?
>>The first department to get a database wins. >>The rest has to jiggle their stuff into the imposed hierarchy. [quoted text clipped - 5 lines] > adding files, fields, functions, but it works just fine and again I'll have > to think of how to make that perfectly clear. Another poster already asked an ownership question, so I won't go into that here. "see information that Dept #1 maintains." gives me an uneasy feeling, though.
> As Wol has said, you can take any PICK database and view it as relational, > but you can't go the other way around. And you will have seen that I asked him a question about that.
> If you could, then this discussion > would be moot -- we could just toggle between different perspectives on the [quoted text clipped - 12 lines] > to put two classification codes on this entity, you do so. Overly > simplified example, but ... In two ways. The straight-jacket feeling this gives comes from oversimplification in the relational design. :-) But it is a realistic example, I think.
Eric Kaun - 10 Jun 2004 20:48 GMT > > I am all for lowering this cost - decreasing the "impedance mismatch", so > to [quoted text clipped - 8 lines] > buck by using relational theory, that would be a different story. I'd > strongly suggest we nudge relational databases toward pragmatism ;-) Several things here: 1. I doubt "overwhelmingly good evidence" motivated people to pick up Pick (or any other technology) 2. People tend to use that with which they're familiar 3. The market doesn't necessarily guarantee anything (and no, I'm not anti-capitalism) about what it produces 4. The gadgeteering which makes software development fun also produces a tendency to wallow in what you know, and in what seems "cool", orthogonal to any actual value 5. Finally, the criteria businesses have for their solutions also factors in learning ability, prevalence of those taught in technology X, etc.
> > Agreed - however, while my experience comes from a large company, it's work
> > done for a relatively small business unit. I was the only developer on > > several of the projects, and my user base was fairly small. I was DBA, [quoted text clipped - 6 lines] > way it was sent? OR was it the loosey-gooseyness of it where there are not > as many texts with rules for "how to"? Good questions.
1. The XML was mapped into Java objects (actually object graphs), which turns out to be trivial. 2. However: that requires lots and lots of redundant and overlapping methods to query that object graph (e.g. I want to find a certification with a destination URL of "http:blah", so I write a method, then later I need to find one by ID, etc. etc.). I can get around it (now) using expression-parsing libraries (JXPath, Jelly, others). Still, those aren't type-checked, which gives them some agility but gives me less comfort (I've seen how easy it is for them to go wrong type-wise) 3. Doing "agile XML", at least here, resulted in multiple files with overlapping attributes. True, some of that was just bad design, but from what I've seen here, those apps using Oracle are more disciplined. Maybe because they have to be? Are just encouraged to be? Not sure... 4. XML's type system is impoverished - some validation is easy, but no constraints 5. We use libraries for XML, including Castor to generate Java objects, so the how-to isn't lacking in this case.
> > I've never used Pick - > > sounds like their environment gives them a lot of power, and while that's > > nice, I'd still never think of thinking of an invoice as a single > > proposition or "object". It's not. > > Perhaps you've never seen one? ;-) My wife handles the finances, since she's damn good at it. And all of the invoices I "see" are so... flat. Two-dimensional. Surely ripe for relational? :-)
> > It's a fairly complex series of them. > > That too, but through how many portals would you want to have to go to > collect all such? This has to do with how the "user" (application developer > or dba, for example) should view the data. Oh, I agree the user should have few portals. But application developers want to see the messy back-room (to extend the department-store metaphor). Or more accurately, developers are like the store managers who map many different suppliers' products into departments, clusters, and shelves.
> > Just like an "order", an invoice is a fairly complex confluence of > > phenomena, and not even a static one (modifications / confirmations to [quoted text clipped - 11 lines] > dependent on any other entities in the system. What is that top level of > nodes after ENTITY in a system, such as PEOPLE PLACES THINGS. Ah, I see. Yes, I agree that those drive UIs, reports, etc. - at least for a while. I focus on those technologies that will make that part easy, AND give me some assurance in their consistency and that I can drive more complex requirements easily. And those complex ones always arise quickly, I've found... if I've oversimplified early (and I've done the entity/object style of design before), I usually regret it. Sometimes that's warranted, if time-to-market is the critical success factor.
> > And I disagree. An invoice is many somethings. If your questions deal only > > with the set (e.g. presenting an invoice on a screen), then great - treat [quoted text clipped - 11 lines] > accessed. Each portal can see everything you can "get to" from there (via > declared links as one might have in a join statement). I think we're on the same page - I just think (based on comparison with other things) that relational makes the best logical support for a multi-portal system. And if you think about it, those portals can be small and nested... UIs inside other UIs, etc. - whatever the user needs to get the job done. Since those portals start to look like mini-apps, that makes their common logical foundation all the more important.
> > So it depends on your needs, but I'd far > > rather place my bet on something that allows me to scale my queries and [quoted text clipped - 6 lines] > fields (derived data or data found elsewhere), the INVOICE vocabulary for > everyone has what it needs to show an invoice. And I think I'm seeing more and more value to a path-like / hierarchical expression as a user tool. I see it as best layered atop relational, since I anticipate more views (if my data is useful, and I'm trying to help the business's departments interoperate) but I think we agree philosophically with the notion of packaging for the user.
> > > It is an object in and of itself that > > > needs no "chopping up", so to speak. [quoted text clipped - 11 lines] > it's base relation. Remove that obstacle -- free yourself. Yes, we still > divide it all up, but into wholes, not pieces. I agree, and didn't mean to give the impression that data should only be accessed through base relations. Far from it. Relations are a necessary (to me) but not sufficient condition for good application design.
> > Domains are intellectually tractable when > > they're separated. Holism may be fine in medicine (???) where human [quoted text clipped - 5 lines] > quite far for that, even if you allow for both scalar values and compound > ones (such as lists). For users, yes, lists are useful (I'd argue that sets are more often, and that relations are even better, but I'll lighten up on that). The other linchpin of relational, of course, is types. I distrust technologies with weak typing, but that's a different discussion; suffice it to say that having a LINE_ITEMS attribute in a file would make me far less queasy if the elements of that list were real objects, with real operations defined over them.
> > > This is where simpler means don't destroy the properties of the invoice > in [quoted text clipped - 10 lines] > No, the data needs to be available to other entities as well, as you pointed > out. Sure, I was being facetious - so there are 2 questions: 1. What is the nature of the "other entities" that will need to use the data? 2. In what form does the data need to be to provide those entities with easy access; and even to make those entities easy to develop?
I see those entities as applications (including GUIs and reports and batch processes), and contend that relational is the best answer for #2. But hierarchies are useful for #1. The impedance mismatch, though much more tractable at this level than object-relational mappings.
> > "Making the data fit" is also nonsense; whatever physical and logical > model [quoted text clipped - 13 lines] > "this" context. Define it based on its use and if a new use comes up, > redefine it if necessary, otherwise add qualifiers to it. Hmmm. Okay, I'm all for agility where it makes sense - still, I think a little extra work up front goes a long way. But if you're got your DB-upgrade and redeployment processes automated, and unit tests and all, this can work...
> > That can do what - model arbitrary data in its "natural form", whatever > that > > means? I agree. If you show that to me, I'll use it. > > as entities. Still working on how to show it. I'm getting the idea.
> > I hope so - that would be nice. I think XPath and XQuery, while > convoluted, [quoted text clipped - 5 lines] > add anything to the information in your source, but when you go the other > direction, you need to add data (such as ordering)? Most of the XML I deal with requires no ordering, so that's a wash. I think XML is a relatively poor notation for anything requiring explicit ordering, but that's just my gut feel. Usually I find hints that the XML designer really wanted relational; they've got IDs and IDREFs, and then in the code they're manually coding searches through the hierarchy - which is where an in-memory RDBMS would be nice. It's not so horrid now, but in this industry (print industry), there's a standard called JDF that is currently manifesting itself as a 1.36MB set of XML Schema specs. Needless to say, there are LOTS of cross-links, and regardless of the storage technology, relations would have helped break this down considerably... even with the ordering requirements (which are there, but much less than the cross-linked references to node IDs).
- erk
Laconic2 - 11 Jun 2004 11:59 GMT > 1. I doubt "overwhelmingly good evidence" motivated people to pick up Pick > (or any other technology) I'll bet it was the path of least resistance. My guess, from Dawn's description, is that it's real easy to learn, and it puts together, in one package, the tools needed to store and retrieve data, and the tools needed to capture and present data. Add in some fairly trivial computing capability, and you've got a pretty powerful system... regardless of the data model.
> 2. People tend to use that with which they're familiar And they become familiar with that which they use. It's feedback.
> 3. The market doesn't necessarily guarantee anything (and no, I'm not > anti-capitalism) about what it produces If nobody buys, then it doesn't sell. No invoices, no cash. That's more a guarantee of apparent value than real value. But measuring real value is extraordinarily difficult.
> 4. The gadgeteering which makes software development fun also produces a > tendency to wallow in what you know, and in what seems "cool", orthogonal to > any actual value To a kid with a hammer, "normalization" means "flattening".
> 5. Finally, the criteria businesses have for their solutions also factors in > learning ability, prevalence of those taught in technology X, etc. Good point.
Dawn M. Wolthuis - 13 Jun 2004 04:30 GMT <snip>
> My wife handles the finances, since she's damn good at it. And all of the > invoices I "see" are so... flat. Two-dimensional. Surely ripe for > relational? :-) doubt it (not doubting the wife's skills) -- I'm guessing at least some have header info and multiple line items ... ?
> > > It's a fairly complex series of them. > > [quoted text clipped - 7 lines] > Or more accurately, developers are like the store managers who map many > different suppliers' products into departments, clusters, and shelves. Sure, the developers need to know the information required to set up the store into nice tidy departments.
> > > Just like an "order", an invoice is a fairly complex confluence of > > > phenomena, and not even a static one (modifications / confirmations to [quoted text clipped - 19 lines] > of design before), I usually regret it. Sometimes that's warranted, if > time-to-market is the critical success factor. I very much agree, but seem to arrive at a different conclusion on how best to set up for handling these sudden changes to requirements.
> > > And I disagree. An invoice is many somethings. If your questions deal > only [quoted text clipped - 21 lines] > the job done. Since those portals start to look like mini-apps, that makes > their common logical foundation all the more important. More mind meld here -- similar thought processes drawing different conclusions.
> > > So it depends on your needs, but I'd far > > > rather place my bet on something that allows me to scale my queries and [quoted text clipped - 15 lines] > business's departments interoperate) but I think we agree philosophically > with the notion of packaging for the user. OK, now read what the purpose of the relational model is (somewhere towards the front of Date's latest edition of the textbook). If "the user" (whether a s/w developer or an end-user) can work with data thinking entirely in this walk-our-way-through-the-vocabulary fashion for queries of any sort, then what, again was the need for the relational model in this? You are correct, however when you asked somewhere whether one can update through these portals -- not really, but it works for managers & high level designers, making anything more an implementation detail ;-)
> > > > It is an object in and of itself that > > > > needs no "chopping up", so to speak. [quoted text clipped - 39 lines] > elements of that list were real objects, with real operations defined over > them. How and where what rules/constraints are applied to the data is one of those topics where I'm not yet where I want to be in understanding various options and how they influence agility/maintainability. So, I can sympathize but I can't get too upset about descriptions of the data that go further than the constraints that are applied to it (that might not have made sense to anyone but me, so ignore if it didn't).
> > > > This is where simpler means don't destroy the properties of the > invoice [quoted text clipped - 18 lines] > 1. What is the nature of the "other entities" that will need to use the > data? We will know in time.
> 2. In what form does the data need to be to provide those entities with easy > access; and even to make those entities easy to develop? We will know in time.
But we definitely should think about what are the most likely changes on the horizon and what our strategy would be for each of those. I'm not completely in the XP camp where we only think about the requirements (stories) for this iteration of development and worry about tomorrow, tomorrow.
> I see those entities as applications (including GUIs and reports and batch > processes), and contend that relational is the best answer for #2. But > hierarchies are useful for #1. The impedance mismatch, though much more > tractable at this level than object-relational mappings. I hope to eventually agree with you on the best approach to #2. That is not the same statement as saying that I hope to eventually agree with your current opinion on the matter.
> > > "Making the data fit" is also nonsense; whatever physical and logical > > model [quoted text clipped - 21 lines] > DB-upgrade and redeployment processes automated, and unit tests and all, > this can work... Yes, I agree that identifying such potential risks is a good idea, but not in terms of the semantics of the data required for this round, but rather the likelihood of various possible new requirements. <snip>
Cheers! --dawn
Eric Kaun - 15 Jun 2004 16:40 GMT > > And I think I'm seeing more and more value to a path-like / hierarchical > > expression as a user tool. I see it as best layered atop relational, since I
> > anticipate more views (if my data is useful, and I'm trying to help the > > business's departments interoperate) but I think we agree philosophically > > with the notion of packaging for the user. > > OK, now read what the purpose of the relational model is (somewhere towards > the front of Date's latest edition of the textbook). Don't have it here, but if it suggests making data easy for end-users, that's a myth. Its creators said the same thing about COBOL. Yes there are users who can understand it, but most users have other considerations and want to focus on their tasks. There are always power-users who write reports and queries; they can grok relational, in many cases, and can compose views.
> If "the user" (whether > a s/w developer or an end-user) can work with data thinking entirely in this > walk-our-way-through-the-vocabulary fashion for queries of any sort, then > what, again was the need for the relational model in this? In the parlance of Pick, to establish the various vocabularies for various users. To act as its foundation. The minute I have more than one view of the data, I need the model to be neutral (or else you get something like a pipeline of XSLT transforms, which is declarative but easily made intractable). In the case of Pick, though, there's still an implicitly "right" view of the FILE - all of its attributes in gory detail, I assume. That still assumes a hierarchy, meaning other views are relative to that. Basing all the views on a predicate-based structure makes those easier; granted that it gives none of them special status, but I view that as a good thing.
> > Sure, I was being facetious - so there are 2 questions: > > 1. What is the nature of the "other entities" that will need to use the [quoted text clipped - 13 lines] > (stories) for this iteration of development and worry about tomorrow, > tomorrow. I agree. I think agile arose because big up-front documents suck, and that's still true. What's never been really pushed outside academia are real models, checkable ones, and theorem-proving. Alloy is a nice tool in the model-checking camp; and check out how it subsumes OO-like structures inside relations! In any event, if our designs were concise and checkable, and even used as the basis of code generation, we get correctness and agility in one. But given languages like Java and platforms like the J2EE, while code generation helps, agile is still important. I'm in the Martin Fowler camp, not quite trusting the complete absence of up-front analysis. Like Reagan, I think the agilists talk much tougher than they walk; I think you'd find far more up-front analysis and design than they admit in rhetoric.
> > I see those entities as applications (including GUIs and reports and batch > > processes), and contend that relational is the best answer for #2. But [quoted text clipped - 4 lines] > the same statement as saying that I hope to eventually agree with your > current opinion on the matter. Hey, that's fine... we're just discussing, not brainwashing...
- erk
Anthony W. Youngman - 07 Jun 2004 22:48 GMT >> When it comes to modeling >> information, I suspect there will always be a gap. Relational advocates >> favor being able to derive truths from other truths, acknowledging of >course >> that the internal predicates must be defined relative to an external one, >> and that that's a human effort which can always go awry. Yep. I have no problem with that. It's just that the whole point of this thread was me asking "what is that external one", and I still haven't got an answer. What I *have* got, though, is loads of people having a go at me for having the temerity to ask the question ...
>> You and Dawn, as >> best I can understand, place more value on reproduction of the original [quoted text clipped - 4 lines] >usually has multiple items ordered. It is an object in and of itself that >needs no "chopping up", so to speak. Yep. I think you're certainly speaking for me here.
>This is where simpler means don't destroy the properties of the invoice in >order to make the data fit into an arbitrary data model with tautological [quoted text clipped - 5 lines] >at >> (e.g. repetitive symbolic manipulation). Except that relational theory DOES stretch humans in ways they're not good at. Didn't Tony have a go at me for "can't you handle the abstraction?". Why should I, when the MV model tells me I don't have to?
>I think you right here. I've been in business for many years. I would like >development to be easy for me. We can watch the pendulum swinging towards >making software development easier for those of us using the software. >.NET, for better or worse, is attempting to make development easier (if it >wasn't for the bizarre data typing and variable scoping it would be a lot >easier). Hopefully dbms theory will contribute to this too. I want development to be easy, too :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk The society which scorns excellence in plumbing because plumbing is a humble activity, and tolerates shoddiness in philosophy because it is an exalted activity, will have neither good plumbing nor good philosophy. Neither its pipes nor its theories will hold water. John W Gardner
mAsterdam - 08 Jun 2004 00:49 GMT > ... What I *have* got, though, is loads of people having a go > at me for having the temerity to ask the question ... You asked a good question, phrased in a relational-bashing way. I rephrased it. You went on.
> I want development to be easy, too :-) Moi aussi.
Tony - 02 Jun 2004 10:43 GMT > And no, Tony, Einstein did NOT "build a better model" using the same > algebra. What he DID do was realise that Newton's fundamental axioms > were wrong. Amounts to the same thing: Einstein developed equations that took into account additional information not known to Newton. These equations gave correct results for more cases than Newton's. This is much like a database designer realising that an earlier designer's database had some rules missing, and adding them.
> He redefined the metaphysical interface between reality and the model. Baloney. His model was different from Newton's. It wasn't the same model with a different "metaphysical interface". What the hell is one of those anyway?
> And the problem I have is that I cannot see any metaphysical interface > between reality and relational theory. This is basically Dawn's point > about "is relational theory even the right theory to use?". You are looking for a nonsense. No wonder you can't find it.
Eric Kaun - 02 Jun 2004 16:30 GMT > And the problem I have is that I cannot see any metaphysical interface > between reality and relational theory. This is basically Dawn's point [quoted text clipped - 8 lines] > improve it. BUT IN DOING SO, IT WILL BE TRANSFORMED BEYOND RECOGNITION > :-) So what improvements would you make? From what I've heard suggested elsewhere, it's not a transformation beyond recognition.
> What we NEED is a "theory of business analysis" - a formal theory that > tells analysts how to analyse the real world. hahahahahahahaha
Oh... you're serious?
> And I'm pretty damn > confident that you can NOT create a theory that will do a reversible > mapping between the real world and relational data. So what precisely is different about other theories of data that do allow a reversible mapping? And are there properties other then reversibility that are desirable in such a model?
> This theory will then be the equivalent of Kepler and Newton discovering > ellipses and calculus, or of Einstein realising that mass and energy > were interchangeable. Basically, pretty much ALL of relational theory's > axioms are taken as given by the mathematicians, and no thought is given > as to whether they actually match the real world. Which axioms don't match? I wasn't really aware there were axioms per se.
> To give you a simple example, the business analyst analyses an invoice, > and you design the database to store the data. Can you then ask the > DATABASE to give you the invoice data back? Sure.
> Certainly with current > relational databases accessed with SQL, you're relying on either an > application programmed OVER the database, or a view which gives you > multiple copies of data of which the original only had one. Huh?
> Yes I know people are likely to say that "SQL is not genuine > relational", but you're still relying on a view - even a valid > relational one - or an application. So what do you want - the invoice paper? Maybe we should just rely on scanners producing JPGs - non-lossy, of course.
> If we can't go - using formal theory - from the database back through > the analysis to get back to the real world we started from, then we have > no idea if our axioms are correct, and as Dawn says, we have no idea if > relational theory is the correct theory to solve real world problems. Most real-world problems are more than just round-trip regurgitation. Surely any trivial serialization scheme fits that bill?
> And as I said before, it we have no idea if it's the correct theory, why > are we using it? So what do we have that's correct? You mean the round-trip is your litmus test?
> Dawn was going on about faith. Do you have faith in > business analysts to get the analysis correct, or would you rather have > a formal, REVERSIBLE and PROVABLE (or testable, falsifiable, scientific, > whatever term you want to use) logical theory to do it for you? Sure. I also want to fly, eat infinite amounts of ice cream without gaining weight, and drive at very fast speeds with no possibility of injury.
- erk
Dawn M. Wolthuis - 02 Jun 2004 18:12 GMT > > And the problem I have is that I cannot see any metaphysical interface > > between reality and relational theory. This is basically Dawn's point [quoted text clipped - 22 lines] > > confident that you can NOT create a theory that will do a reversible > > mapping between the real world and relational data. I agree and figure that we will have a useful theory of this sort when we have the same on "how to parent a teenager". However, in some ways, Wol is making a similar argument to mountain man (even if they might not agree with that) in identifying that even if relational theory were good to apply, it is useful in a rather small portion of what we do in addressing the "data processing" needs of a business.
> So what precisely is different about other theories of data that do allow a > reversible mapping? And are there properties other then reversibility that [quoted text clipped - 7 lines] > > Which axioms don't match? I wasn't really aware there were axioms per se. There are at least the axioms of set theory and then some things were tossed in the mixed without any proof from these axioms, such as restricting the sets from which elements can come to sets of scalar values (which has been changed now, but 1NF, however defined, would have to be considered an axiom since it does not arise from any other mathematics)
> > To give you a simple example, the business analyst analyses an invoice, > > and you design the database to store the data. Can you then ask the [quoted text clipped - 8 lines] > > Huh? It think it is worth noting that is far more difficult to retrieve an invoice the way it looked originally after chopping it up (that 1NF thing again) and then using SQL to show the invoice again. It is possible, however, so perhaps Wol has looked at some more difficult specimens. Loosely stated - SQL can only place on a single line entities that are related to each other on that one line. Stick with me here, I know I said that poorly.
Example:
Qty....Item..........................Catalogs.............................Co lor.............Price 1 Beautiful Skirt Summer Collection. White $120.00 2004 Wardrobe Catalog. Blue
Without arguing the semantics (and mapping of the data to reality) of this particular example, if your invoice looked like this when selling a beautiful skirt in white and blue that comes from two of your catalogs, it is definitely HARDER than a non-1NF environment, though not impossible, to get a SQL statement to show your invoice properly.
> > Yes I know people are likely to say that "SQL is not genuine > > relational", but you're still relying on a view - even a valid > > relational one - or an application. > > So what do you want - the invoice paper? Maybe we should just rely on > scanners producing JPGs - non-lossy, of course. No need -- including lists in your data (at least your virtual data!) gets you far enough that you don't notice any more big disconnects. SQL Server permits lists in their UDFs, while Oracle (to my knowledge) does not allow lists returned from their functions (stored procedures)
> > If we can't go - using formal theory - from the database back through > > the analysis to get back to the real world we started from, then we have [quoted text clipped - 17 lines] > Sure. I also want to fly, eat infinite amounts of ice cream without gaining > weight, and drive at very fast speeds with no possibility of injury. As long as we are all aiming for the same things ... smiles. --dawn
mAsterdam - 02 Jun 2004 19:44 GMT > It think it is worth noting that is far more difficult to retrieve an > invoice the way it looked originally after chopping it up You chopped it up. Why?
While chopping it up, you got rid of the layout. What you will retrieve is the data, not the layout. Now if you also have some markup for the abstract invoice, you can just fit the invoice-data you retrieved into the invoice-markup.
I would think you would know all this, if it was not so that over and over you blame
> (that 1NF thing again) for these non-problems.
> and then using SQL to show the invoice again. SQL reports are ugly - I'ld would not want to show one to a customer. Use a tool that was designed to present data.
> It is possible, > however, so perhaps Wol has looked at some more difficult specimens. [quoted text clipped - 15 lines] > is definitely HARDER than a non-1NF environment, though not impossible, to > get a SQL statement to show your invoice properly. Some products have presentation and query integrated. Some of those use (generated, hidden) SQL for the query part. Don't use just SQL and expect anything that looks like a proper invoice.
It is like you expect to be able to prepare a meal by just unpacking the ingredients - you are going to need some kitchen tools.
>>Sure. I also want to fly, eat infinite amounts of ice cream without >>gaining weight, and drive at very fast >> speeds with no possibility of injury. > > As long as we are all aiming for the same things ... smiles. --dawn Sorry if I did not address your problem, but please try distinguishing the retrieval and presentation part if you restate it because I did get it all wrong.
Bon apetit!
Dawn M. Wolthuis - 03 Jun 2004 01:56 GMT > > It think it is worth noting that is far more difficult to retrieve an > > invoice the way it looked originally after chopping it up > > You chopped it up. Why? You know the answer, so I'll move on ...
> While chopping it up, you got rid of the layout. Not JUST the layout, but the ease in retrieving the data required for the layout. In 1NF'ing we can make a nightmare for the retrieval process. And ease of data retrieval seems to me to be one of the most important requirements for any DBMS, right (she says, baiting him)?
> What you will retrieve is the data, not the layout. > Now if you also have some markup for the abstract invoice, [quoted text clipped - 13 lines] > show one to a customer. > Use a tool that was designed to present data. And what will that tool use? And so developers should not build reports directly because they are so ugly? Why not give them a better language?
> > It is possible, > > however, so perhaps Wol has looked at some more difficult specimens. [quoted text clipped - 3 lines] > > > > Example: Qty....Item..........................Catalogs.............................Co
> > lor.............Price > > 1 Beautiful Skirt Summer Collection. White [quoted text clipped - 11 lines] > query part. Don't use just SQL and expect anything > that looks like a proper invoice. As a must-have requirement, I wouldn't want to invest in any DBMS that didn't provide a query tool that could be used happily by developers. Sure we need security, reliability, and such, but then it is just plain imperative that the data can be handily retrieved!!
> It is like you expect to be able to prepare a meal by > just unpacking the ingredients - you are going to need > some kitchen tools. Best not to discuss preparing meals with me ;-) I require tools such as phone and car.
> >>Sure. I also want to fly, eat infinite amounts of ice cream without > >>gaining weight, and drive at very fast [quoted text clipped - 5 lines] > try distinguishing the retrieval and presentation > part if you restate it because I did get it all wrong. Data retrieval -- not presentation, but retrieval -- is an extremely important feature of a database and pretty close to the whole point of why you would choose to model the data in one way or another -- so that data retrieval is easy over time (the "over time" bringing up other failings of SQL-DBMS's, but I'll skip that for now).
Cheers! --dawn
> Bon apetit! mAsterdam - 03 Jun 2004 10:02 GMT >>>It think it is worth noting that is far more difficult to retrieve an >>>invoice the way it looked originally after chopping it up >>You chopped it up. Why? > You know the answer, so I'll move on ... Do I? Now I have to guess. You could store an image of the original invoice if you need the original look.
My guess would be: You chopped it up because you want to do something with the pieces other than making invoices. You can store the invoice image anyway.
>>While chopping it up, you got rid of the layout. > > Not JUST the layout, but the ease in retrieving > the data required for the layout. > In 1NF'ing we can make a nightmare for the retrieval process. Your query will give you all data your invoice needs for reconstruction, except the layout. They'll be - agreed - in a clumsy fashion for presentation. That is the loss of ease. (Or am I missing something?)
> And ease of data retrieval seems to me to be one of > the most important requirements for any DBMS, > right (she says, baiting him)? /me nods
[snip]
>>Use a tool that was designed to present data. > > And what will that tool use? > And so developers should not build reports > directly because they are so ugly? > Why not give them a better language? Yes! A markup language - or a tool that generates the markup and the query (for whatever querylanguage/database/normal form). I haven't seen exploitation of databases without them.
[snip]
>>Some products have presentation and query integrated. >>Some of those use (generated, hidden) SQL for the [quoted text clipped - 7 lines] > but then it is just plain imperative that > the data can be handily retrieved!! /me nods again.
>>It is like you expect to be able to prepare a meal by >>just unpacking the ingredients - you are going to need >>some kitchen tools. > > Best not to discuss preparing meals with me ;-) > I require tools such as phone and car. When using the phone you still need a table, plates, knives, glasses - or do you accept the layout as it comes :-)
[snip]
> Data retrieval -- not presentation, but retrieval -- is an extremely > important feature of a database and pretty close to the whole point of why > you would choose to model the data in one way or another -- so that data > retrieval is easy over time (the "over time" bringing up other failings of > SQL-DBMS's, but I'll skip that for now). Ah! Here is the nugget. I knew there was more to it. Thank you for restating.
Why is a MPEG not simply a table of pictures? Because we mostly only need the complete movie. We do not need to share the parts of the movie. The benefits of generality do not outweigh the cost. Or even one picture, JPEG: If we *do* need to get into the picture (I mean we need to retrieve parts of it - think automated fingerprint, face or signature recognition) we need to model content of the picture differently, and while a table of pixels may or may not be the basis of that, it won't get us very far.
>>Bon apetit! Bill H - 03 Jun 2004 23:03 GMT > > It think it is worth noting that is far more difficult to retrieve an > > invoice the way it looked originally after chopping it up [quoted text clipped - 6 lines] > you can just fit the invoice-data you retrieved into the > invoice-markup. I find it interesting you should say this. All RDBMS products I've seen show data in columns and rows. In fact, that is the language of RDBMS: rows and columns.
It is not unusual, therefore, to define and describe data in a preferred layout?
Bill
mAsterdam - 03 Jun 2004 23:54 GMT >>>It think it is worth noting that is far more difficult to retrieve an >>>invoice the way it looked originally after chopping it up [quoted text clipped - 13 lines] > It is not unusual, therefore, to define and describe data in a preferred > layout? I don't know about the 'therefore', but in my experience their preferred layout is something which domain experts are most comfortable with.
The most important question here, though (the one Dawn refused to answer) is why do want to chop it up? What exactly are you trying to achieve by doing so?
Dawn M. Wolthuis - 04 Jun 2004 00:06 GMT > >>>It think it is worth noting that is far more difficult to retrieve an > >>>invoice the way it looked originally after chopping it up [quoted text clipped - 21 lines] > Dawn refused to answer) is why do want to chop it up? > What exactly are you trying to achieve by doing so? Sorry, not refusal, but even I get sick of my broken record on 1NF -- that's why things are chopped up unnecessarily, in order to put them into 1NF. So, in the example I gave, there is no reason, in my opinion, not to have a single line of the invoice be stored in a tuple, allowing the lists to be elements of the tuple, just as the single-valued attributes are.
--dawn
mAsterdam - 04 Jun 2004 00:27 GMT >>>>>It think it is worth noting that is far more difficult to retrieve an >>>>>invoice the way it looked originally after chopping it up >>>> >>>>You chopped it up. Why? [chop]
> Sorry, not refusal, but even I get sick of my broken record on 1NF -- > that's why things are chopped up unnecessarily, in order to put them into > 1NF. So, in the example I gave, there is no reason, in my opinion, not to > have a single line of the invoice be stored in a tuple, allowing the lists > to be elements of the tuple, just as the single-valued attributes are. So you don't need the to share the internal structure. Don't do that, then.
Dawn M. Wolthuis - 04 Jun 2004 01:05 GMT > >>>>>It think it is worth noting that is far more difficult to retrieve an > >>>>>invoice the way it looked originally after chopping it up [quoted text clipped - 9 lines] > So you don't need the to share the internal structure. > Don't do that, then. My understanding of relational structure is that it is for the logical view of the database, not the internal structure. If we opt for something else as the logical level, then we are not doing relational theory, we are doing something else (such as Nelson-Pick [un]theory). There are folks, particular those working with XML who have worked on non-relational theories of databases and I'm reading what I can of what Jan Hidders suggested earlier. But, again, if your data model (logical level) is not relational, then what's the purpose of relational theory? --dawn
mAsterdam - 04 Jun 2004 16:12 GMT >>>>>>>It think it is worth noting that is far more difficult to retrieve an >>>>>>>invoice the way it looked originally after chopping it up >>>>>> >>>>>>You chopped it up. Why? [chop]
>>So you don't need the to share the internal structure. >>Don't do that, then. [quoted text clipped - 7 lines] > earlier. But, again, if your data model (logical level) is not relational, > then what's the purpose of relational theory? --dawn Somehow I get the impression that you put all blame for the chopping (and the need to re-assemble) on relational theory, in particular on 1NF. That is too much blame, I think.
Say we have a date. It has structure, no doubt. Actually it has a different structure for different purposes. It has a different structure in different countries. We may only be interested in one or some parts/aspects of it: day-of-the week, century. Now suppose we did not have a system defined type 'date'. What to do? Are the problems really that different in the context of relational theory?
Well, a little. COBOL has a powerful way of defining types: easy grouping, redefinitions, symbolic values (not without complications, http://home.swbell.net/mck9/cobol/style/88.html) for: the picture clause. Unfortunately it also defines the storage structure - this makes it fragile.
Eric Kaun - 04 Jun 2004 15:11 GMT > > >>>It think it is worth noting that is far more difficult to retrieve an > > >>>invoice the way it looked originally after chopping it up [quoted text clipped - 28 lines] > have a single line of the invoice be stored in a tuple, allowing the lists > to be elements of the tuple, just as the single-valued attributes are. What are your criteria for chopping into the following: 1. files 2. attributes 3. sub-attributes 4. sub-sub-attributes
?
Anthony W. Youngman - 07 Jun 2004 23:10 GMT >> Sorry, not refusal, but even I get sick of my broken record on 1NF -- >> that's why things are chopped up unnecessarily, in order to put them into [quoted text clipped - 3 lines] > >What are your criteria for chopping into the following: For me, it's simple.
>1. files This represents a "physical" object. A house. A car. A company. A building. An invoice.
>2. attributes This describes the object. A house has an address. A car has a colour, and an owner. A company may have several buildings (so here we have a "foreign key"). A building has an address. An invoice may have several addresses, and several lines.
>3. sub-attributes A building has an address - which may have multiple lines (actually, this is a bad example, but it's a common mistake). An invoice has multiple lines, each of which contains several different types of data.
>4. sub-sub-attributes Simply nest sub-attributes one level deeper. :-)
Basically, to describe it in relational terms, if you link table A to table B, such that deleting a record in A causes a cascading delete of one or more records in B, then I'd make each column of B a column of A, and each row of B into a sub-attribute row of A.
And then you use common sense to say "I use these fields all the time, and these fields only rarely" so you split A into two physical FILEs, and make all the colums of A-rarely into virtual columns of A-all-the-time, and vice versa. So for retrievals the user notices nothing (apart from the speed-up), although it does cost a bit extra logic when updating.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 09 Jun 2004 20:41 GMT > >1. files > > This represents a "physical" object. A house. A car. A company. A > building. An invoice. What about relationships between any of those things - e.g. cars and houses owned by companies, and the invoices for their purchases? At what point does something move from being a file to an attribute or vice versa?
> >2. attributes > > This describes the object. A house has an address. A car has a colour, > and an owner. A company may have several buildings (so here we have a > "foreign key"). A building has an address. An invoice may have several > addresses, and several lines. So if an invoice has a Parts attribute, it then needs additional attributes corresponding to each "attribute" of its use of those parts? At what point does the relationship between the two acquire enough "meaning" or enough attributes of its own to warrant being in its own file? Imagine line items on an invoice became very complex, with shipment information and payment information... is there a point at which you say "Enough!" and stop adding those things as sub-attributes to the Parts attribute of the Invoice?
> >3. sub-attributes > > A building has an address - which may have multiple lines (actually, > this is a bad example, but it's a common mistake). An invoice has > multiple lines, each of which contains several different types of data. Yes, but how far do you go? Certainly at some point some of those attributes refer to other things properly categorized as Files. Do you find yourself yanking out attributes or sub-attributes, and moving the lot to Files? At least to me, normalization offers a much clearer view on how do make those data design decisions. Maybe I'm being alarmist, but when I've had to make changes in a SQL data model it's been due to actual changes in the requirements (external predicates), not just acquiring one attribute too many. Granted that I don't know Pick, so haven't walked a mile in your shoes... these criteria just seem very dicey.
> >4. sub-sub-attributes > > Simply nest sub-attributes one level deeper. :-) Ah, hierarchical induction. I'll just have one File in my app. :-)
> Basically, to describe it in relational terms, if you link table A to > table B, such that deleting a record in A causes a cascading delete of > one or more records in B, then I'd make each column of B a column of A, > and each row of B into a sub-attribute row of A. You're describing parent-child relationships. Surely you run into multi-parent and many-to-many scenarios?
> And then you use common sense to say "I use these fields all the time, > and these fields only rarely" so you split A into two physical FILEs, > and make all the colums of A-rarely into virtual columns of > A-all-the-time, and vice versa. So for retrievals the user notices > nothing (apart from the speed-up), although it does cost a bit extra > logic when updating. A logical cost is a big cost. Every updating app needs to know that, right?
So you're definitely describing some physical redesign which sits below the logical view available to users. I think in relational terms, that's what the DBMS vendors should offer, since they can more accurately (and easily) split relations based on usage, and have that happen dynamically. But you do seem to be describing a user- or application-level view of the data, which is layered atop something that is, or leans toward, or could be "more relational." At least that's the way it seems...
- erk
Anthony W. Youngman - 10 Jun 2004 01:14 GMT >> >1. files >> [quoted text clipped - 4 lines] >owned by companies, and the invoices for their purchases? At what point does >something move from being a file to an attribute or vice versa? Just because I own a car, doesn't make the car part of me ... think language, and think nouns and adjectives (and gerunds).
In Britain, a car's registration plate is assigned on first sale, and "deleted" when the car is crushed. Actually, that's not completely true, but near enough.
So my car's registration plate is an attribute of me, and of my car. So in the "car" FILE it would be the primary key, and in the "person" FILE it would be a foreign key (to use relational terminology). You do not put two different "nouns" in the same FILE - you use a foreign key.
>> >2. attributes >> [quoted text clipped - 10 lines] >information... is there a point at which you say "Enough!" and stop adding >those things as sub-attributes to the Parts attribute of the Invoice? In theory, no. In practice, you might choose to split the invoice data across two FILES, where you've promoted sub-attributes of INVOICE to be primary attributes of the secondary file.
>> >3. sub-attributes >> [quoted text clipped - 11 lines] >many. Granted that I don't know Pick, so haven't walked a mile in your >shoes... these criteria just seem very dicey. Changes in Pick are almost invariably due to changes in requirements, too :-)
>> >4. sub-sub-attributes >> >> Simply nest sub-attributes one level deeper. :-) > >Ah, hierarchical induction. I'll just have one File in my app. :-) Nah! FILE = noun :-)
Now if your app consists solely of invoices, then your approach might work :-)
>> Basically, to describe it in relational terms, if you link table A to >> table B, such that deleting a record in A causes a cascading delete of [quoted text clipped - 3 lines] >You're describing parent-child relationships. Surely you run into >multi-parent and many-to-many scenarios? Think about what you've just said. I said "deleting a record in A triggers a cascading delete into B (and C (and D (...)))". Do you want to try that in relational? Deleting one record in relational will cascade and delete your entire database ... ?
You completely missed the point here. Where and why would you use a cascading delete? THINK! Be *practical*. What *works* in *reality* (rather than theory, which can think up a thousand impossible scenarios before breakfast (with apologies to "Alice in Wonderland")).
>> And then you use common sense to say "I use these fields all the time, >> and these fields only rarely" so you split A into two physical FILEs, [quoted text clipped - 4 lines] > >A logical cost is a big cost. Every updating app needs to know that, right? Yep ... but relational theory, which imposes mandatory separation of the logical from the physical, imposes that cost on EVERY app, not just those that update the data.
Furthermore, by actively hindering the programmer from providing hints to the database, relational forces the programmer to rely on the database's artificial intelligence, which is quite likely to guess wrong ...
>So you're definitely describing some physical redesign which sits below the >logical view available to users. I think in relational terms, that's what [quoted text clipped - 3 lines] >is layered atop something that is, or leans toward, or could be "more >relational." At least that's the way it seems... Yup. Let's assume that the Pick database has been designed properly, and that within the FILEs the data has been normalised. I can now present my apps with a *closed* relational view!
My Pick application has also FORCED, by DEFAULT, my database to store related data close to itself (what relational calls clustering, I believe). It's fairly easy to prove, statistically, that this will optimise data retrieval from disk. Sod AI optimisation, Pick doesn't have a choice and it works, which is why in any system lacking sufficient ram a Pick app will kick the equivalent relational app's butt!
Basically, by not hiding the physical implementation from the user, Pick makes it easy to prove there just IS NO room for improvement. By hiding the physical from the user, relational forces you to rely on the AI and you have no way of knowing whether it is efficient or not.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 11 Jun 2004 22:54 GMT > >> >1. files > >> [quoted text clipped - 7 lines] > Just because I own a car, doesn't make the car part of me ... think > language, and think nouns and adjectives (and gerunds). What about sentences?
> >So if an invoice has a Parts attribute, it then needs additional attributes > >corresponding to each "attribute" of its use of those parts? At what point [quoted text clipped - 7 lines] > across two FILES, where you've promoted sub-attributes of INVOICE to be > primary attributes of the secondary file. Why would you do the split, when the intent seems to be to keep things whole? Is this purely for performance optimization? I have a hard time keeping up with shifts between logical and physical, and the reasons for the splits. I understand you CAN do these things, but why and when? What are your heuristics?
> >> >4. sub-sub-attributes > >> [quoted text clipped - 3 lines] > > > Nah! FILE = noun :-) Okay; I name my file "MyApplication." :-)
> You completely missed the point here. Where and why would you use a > cascading delete? THINK! Be *practical*. What *works* in *reality* > (rather than theory, which can think up a thousand impossible scenarios > before breakfast (with apologies to "Alice in Wonderland")). Seldom, due to business desires, but to answer the question you're getting at: when there's a foreign-key dependency, and there's one relation that is deemed "important" enough to trigger the cascade. There could be multiple, though that's rare...
> >> And then you use common sense to say "I use these fields all the time, > >> and these fields only rarely" so you split A into two physical FILEs, [quoted text clipped - 8 lines] > logical from the physical, imposes that cost on EVERY app, not just > those that update the data. And it enables EVERY app with EVERY optimization. (not close, but you get my drift)
You've just successfully argued against code sharing, by the way, since if something is coded badly (either slowly, or laden with defects), then every app has to suffer, so you're better off recoding it in each app, right?
> Furthermore, by actively hindering the programmer from providing hints > to the database, relational forces the programmer to rely on the > database's artificial intelligence, which is quite likely to guess wrong And you've also just argued against compilers, since they're so likely to guess wrong about the intention of your code, and therefore will produce badly-optimized machine code.
> Yup. Let's assume that the Pick database has been designed properly, and > that within the FILEs the data has been normalised. I can now present my > apps with a *closed* relational view! What do you mean by that?
> My Pick application has also FORCED, by DEFAULT, my database to store > related data close to itself (what relational calls clustering, I > believe). It's fairly easy to prove, statistically, that this will > optimise data retrieval from disk. For that one access path.
> Sod AI optimisation, Pick doesn't > have a choice and it works, which is why in any system lacking > sufficient ram a Pick app will kick the equivalent relational app's > butt! I can think of several faster alternatives. Using ROWID and stashing hierarchies in Oracle tables would at least close some of the gap. Performance isn't the only point, but oh well...
> Basically, by not hiding the physical implementation from the user, Pick > makes it easy to prove there just IS NO room for improvement. hahahahaha
Oh - you were serious. My bad.
> By hiding > the physical from the user, relational forces you to rely on the AI and > you have no way of knowing whether it is efficient or not. AI? Yes, I'd hate to rely on something like a "computer" or some other fancy "automaton" that does "logic" or some such liberal nonsense... :-)
- erk
Anthony W. Youngman - 18 Jun 2004 19:05 GMT >> In theory, no. In practice, you might choose to split the invoice data >> across two FILES, where you've promoted sub-attributes of INVOICE to be [quoted text clipped - 5 lines] >splits. I understand you CAN do these things, but why and when? What are >your heuristics? The heuristics are probably when it gets too complicated for the brain to comprehend easily.
Okay, I'm getting physical, and messy, but that's the real world. Where do you draw the line between biology and organic chemistry? Between organic and inorganic chemistry? Between chemistry and the physics of atoms?
Okay, there is a pretty clear line between the physics of atoms and atomic physics, but that's an anomaly!
I can understand you want things nice and clear cut, but the real world isn't like that. I use relational theory to help me understand the data down to one or two levels deeper than I need, then I draw the line at whatever level seems appropriate.
Don't forget, I'm a chemist by training. If I'm doing bio-chemistry it's incredibly useful to understand electron orbital theory but I can't WORK at that level. It's just *too* abstract to be meaningful.
By abstracting data down to (and focussing on) the tuple, relational theory has just gone into TOO MUCH detail and lost sight of (indeed, to some extent DESTROYED) any view of the big picture...
What you want to do is present the user with a view of the data at their level, and then analyse it deeper.
As a chemist, I think in molecules. As a businessman, I think people tend to think in terms of customers, invoices, things like that. THAT is the level at which the database should interface with users.
Relational interfaces at the chemical equivalent of atoms - with the tuple. The poor programmer has to think UP to the "business object" level, and then UP AGAIN to the reality equivalent.
With Pick, I can stand at the "business object" interface, and reach DOWN into the data, and UP into reality. It's far easier to stand on the interface reaching in both directions, than to be mired down in the detail, struggling to get out.
I am sorry I can't give you a better answer than that. But the real world is messy. Deal with it!
>> >> >4. sub-sub-attributes >> >> [quoted text clipped - 15 lines] >deemed "important" enough to trigger the cascade. There could be multiple, >though that's rare... So I would I would seriously consider pulling all the tables into which the cascade went into a single FILE. "Rules are for the guidance of wise men, and obedience of fools" - I would use my intelligence as to whether this made sense.
>> >> And then you use common sense to say "I use these fields all the time, >> >> and these fields only rarely" so you split A into two physical FILEs, [quoted text clipped - 12 lines] >And it enables EVERY app with EVERY optimization. (not close, but you get my >drift) But if the small cost of the "bad" code (which by definition is rarely used) makes a big difference to the cost of the "good" code, then I'm laughing all the way to the bank. Who cares if I add an hour to a job that runs once a month, if by doing so I can shave a second off a job that 50 users use several hundred times a day?
>You've just successfully argued against code sharing, by the way, since if >something is coded badly (either slowly, or laden with defects), then every [quoted text clipped - 13 lines] > >What do you mean by that? I mean it's like a relational view, but if I've got a "one to many" relationship, the "one" data only appears once, not replicated for every instance of the "many".
>> My Pick application has also FORCED, by DEFAULT, my database to store >> related data close to itself (what relational calls clustering, I >> believe). It's fairly easy to prove, statistically, that this will >> optimise data retrieval from disk. > >For that one access path. Here you go again - crippling the race horse so we can have a "fair" race against the crippled old nag ...
>> Sod AI optimisation, Pick doesn't >> have a choice and it works, which is why in any system lacking [quoted text clipped - 11 lines] > >Oh - you were serious. My bad. Yes I was :-)
>> By hiding >> the physical from the user, relational forces you to rely on the AI and >> you have no way of knowing whether it is efficient or not. > >AI? Yes, I'd hate to rely on something like a "computer" or some other >fancy "automaton" that does "logic" or some such liberal nonsense... :-) I don't. I rely on statistics to tell me the Pick model does a better job than AI.
That crack about the race horse was deliberate. Relational seeks to make all access paths equal. Fair enough. Rather like the UK educational system that sees competition as "unfair" and wants all schoolkids to leave Uni with a first class degree, not caring whether they are a dunce or a genius (sadly, I'm serious about our education :-(
You said "for that one access path". But that access path IS THE MOST COMMON PATH! So. I can prove that it's the most common path. I can prove it's the most efficient path.
Can you prove, that by crippling the most common path, you can improve the "worst path" cases enough to make it worth-while? Was it Knuth that said "premature optimisation is the worst evil"? I couldn't give a damn if the nag trails in last by a racecourse. I want the thoroughbred to win.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Dawn M. Wolthuis - 18 Jun 2004 21:32 GMT > >> In theory, no. In practice, you might choose to split the invoice data > >> across two FILES, where you've promoted sub-attributes of INVOICE to be [quoted text clipped - 8 lines] > The heuristics are probably when it gets too complicated for the brain > to comprehend easily. I suspect there are some papers about this as it relates to XML documents and Jan Hidders did point to some papers re normalization for XML at one point IIRC. The rule of thumb is to try to match up to the way people think when designing the logical structure of the data in a "nested relational" structure (I don't like that description of XML, but it gives some hint that we are still working with relations)
> Okay, I'm getting physical, and messy, but that's the real world. Where > do you draw the line between biology and organic chemistry? Between [quoted text clipped - 6 lines] > I can understand you want things nice and clear cut, but the real world > isn't like that. so, then, you are NOT a Calvinist? ;-)
> I use relational theory to help me understand the data > down to one or two levels deeper than I need, then I draw the line at [quoted text clipped - 14 lines] > tend to think in terms of customers, invoices, things like that. THAT is > the level at which the database should interface with users. The analogy to molecules is an excellent one. Looking for, and defining, the molecules among the data in our problem domain is really what we do with our logical data models.
> Relational interfaces at the chemical equivalent of atoms - with the > tuple. The poor programmer has to think UP to the "business object" [quoted text clipped - 17 lines] > > > >Okay; I name my file "MyApplication." :-) That's where experience and best practices come into play, where you would play to the strengths of whatever tools you were using and would, likely, not name your file that.
> >> You completely missed the point here. Where and why would you use a > >> cascading delete? THINK! Be *practical*. What *works* in *reality* [quoted text clipped - 10 lines] > men, and obedience of fools" - I would use my intelligence as to whether > this made sense. and it is definitely the case that for many of the non-DBMS's that are really enhanced file systems (which is where PICK really falls) that the developer is given enough rope to hang themselves.
> >> >> And then you use common sense to say "I use these fields all the time, > >> >> and these fields only rarely" so you split A into two physical FILEs, [quoted text clipped - 18 lines] > that runs once a month, if by doing so I can shave a second off a job > that 50 users use several hundred times a day? And you do have the option of having "services" that test business rules as well as others that perform updates to ensure the same degree of consistencey and decoupling of app and database as an RDBMS. These same services can be used to determine the appropriate GUI components. Excellent software can be written, but the database does not require anything of the developer -- there is considerable freedom (to shoot yourself in the foot) and this is what also provides the ease of maintenance.
> >You've just successfully argued against code sharing, by the way, since if > >something is coded badly (either slowly, or laden with defects), then every > >app has to suffer, so you're better off recoding it in each app, right? No -- then fix it.
> >> Furthermore, by actively hindering the programmer from providing hints > >> to the database, relational forces the programmer to rely on the [quoted text clipped - 13 lines] > relationship, the "one" data only appears once, not replicated for every > instance of the "many". The replication in a SQL-DBMS is only upon viewing granular data. I think I know what you mean by this, but perhaps it makes more sense to state it differently. A FILE would include the "one" and the many manys. We would not haves separate relations defined for each many, with multiple rows for each of the "one" in order to link such relations back to the "one" (master) relation. One FILE of PEOPLE in a non-1NF structure (such as XML docs or PICK) could easily turn into 20 relations in a SQL-DBMS. The 1 million records in that one PICK file could turn into those 1 million rows plus multiple rows for each of these records in each of the other 19 relations that were split out when putting into 1NF. Since each of those 19 files needs to have a candidate key, a lot of generated keys get built to accomplish this 1NF. So, there is a lot of extra data stored in the relational structure (in the form of lots and lots of keys).
I have the feeling I didn't actually CLARIFY your statement, Wol, sorry, but the way it is stated, I would object when my relational hat is on.
> >> My Pick application has also FORCED, by DEFAULT, my database to store > >> related data close to itself (what relational calls clustering, I [quoted text clipped - 5 lines] > Here you go again - crippling the race horse so we can have a "fair" > race against the crippled old nag ... Now, there's no reason to call me names ;-)
> >> Sod AI optimisation, Pick doesn't > >> have a choice and it works, which is why in any system lacking [quoted text clipped - 13 lines] > > Yes I was :-) Yes, he was. I don't know about the PROVE part, but I do know that there is a HUGE difference between letting queries fly on the MV side of the house compared to tuning SQL statements ad infinitum on the SQL side. After trying to migrate people from the old (PICK) to the newer (SQL), I have become completely convinced that is a significant step backwards. If there were something comparable to ODBC for non-SQL structures, there would be no reason at all to consider SQL in those environments.
> >> By hiding > >> the physical from the user, relational forces you to rely on the AI and [quoted text clipped - 11 lines] > leave Uni with a first class degree, not caring whether they are a dunce > or a genius (sadly, I'm serious about our education :-( A "no child left behind" jab on state of US education would be in order, but it is so upsetting that I can't muster one right now.
> You said "for that one access path". But that access path IS THE MOST > COMMON PATH! So. I can prove that it's the most common path. I can prove [quoted text clipped - 5 lines] > if the nag trails in last by a racecourse. I want the thoroughbred to > win. Another good analogy. By trying to treat all relations as if they are of the same weight -- as if none is a more important entry-point into the data, we are breaking down the molecules and focussing only on the atoms, without giving any hints as to where the molecules are to be found. Those "one" relations in the one to many you mentioned above are often like named molecules, but invisible to the database user -- a case of missing the forest for the trees. Views can be built to add these molecules back in, but there is nothing it relational modeling that makes it clear, or even suggests, how to handle this.
Cheers! --dawn
Tony - 19 Jun 2004 12:51 GMT > Was it Knuth that said "premature optimisation is the worst evil"? I don't know, but YOU are the one who wants to optimise prematurely, i.e. while designing the database. The relationalists prefer to optimise at the last possible moment, i.e. when we know what the query is.
Laconic2 - 19 Jun 2004 14:06 GMT > > Was it Knuth that said "premature optimisation is the worst evil"? > > I don't know, but YOU are the one who wants to optimise prematurely, > i.e. while designing the database. The relationalists prefer to > optimise at the last possible moment, i.e. when we know what the query > is. Not only when we know the query, but also when we know, approximately, the data volumes.
Given that access strategy costs are non linear, this is an important input to optimization.
Anthony W. Youngman - 20 Jun 2004 00:27 GMT >> "Anthony W. Youngman" <wol@thewolery.demon.co.uk> wrote in message >news:<L4l1l0HM7y0AFwZd@thewolery.demon.co.uk>... [quoted text clipped - 10 lines] >Given that access strategy costs are non linear, this is an important input >to optimization. Okay. Let's try to explain. It's statistics, so it's totally outside relational theory (statistics is fuzzy logic, after all :-)
Let's assume we've got an invoice file, with a thousand invoices. Each invoice has ten detail lines.
Your app gets one detail line from the table. The database grabs another line as a pre-fetch ...
What's the chance that the next thing your app asks for is either another line, or the invoice header? Quite high. If it's another line, what's the chance that the one it asks for is the one the db has prefetched. Easy - it's one in ten thousand. But if, by chance or design, it happens to belong to the same invoice as the first line, the chances have improved to one in ten or twenty - a massive improvement.
Look at your apps. How many data fetches are "random" (ie depend on external input), and how many are of "related data", ie given that you have one record already, you're retrieving data that shares a "foreign key" relationship with the data you already have. I'd guess that, typically, for every one of the first type you have a hundred, maybe more, of the second.
So, by actively clustering related data together, you could massively improve database performance. In other words, you'd be copying what Pick achieves by default.
Let's phrase it in a different way. Those invoice detail lines can be considered as a bag of lists. By default, relational will treat them as a random set. It has a one-in-ten-thousand chance of guessing the next one correctly, unless it has a clever optimisation engine or the DBA gives it a hint.
Pick by default knows it's a bag of lists. If the app really does ask for a random "next line", Pick's worst case is the same as relational. But if, as is normally the case, the next line comes from the same invoice, Pick's DEFAULT chance of getting it right is the same as the chance of two consecutive lines coming from the same invoice - ie pretty high.
So Pick's worst case is equal to relational's worst case - in the highly UNlikely scenario of random data access. But in the normal case, of accessing rows that are somehow linked, relational relies on AI or the DBA "hinting" to the database. Pick just does it that way "by default".
And that's why ALL the anecdotes I've ever seen say that Pick outperfoms relational by a huge margin.
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 19 Jul 2004 18:48 GMT > I can understand you want things nice and clear cut, but the real world > isn't like that. I do understand that, but machines are remarkably literal; they sort of need thing spelled out to them, so at some point we need to decide what's important and then how to represent it. Software is description. [Michael Jackson's quote, not mine, though I agree with it.]
> I use relational theory to help me understand the data > down to one or two levels deeper than I need, then I draw the line at > whatever level seems appropriate. This is interesting: 1. If you're just understanding the data, how do you know what you need, much less what "one or two levels" beyond that is? 2. What's a level? 3. Appropriate with respect to... what? Some function of requirements, I imagine, but it would be nice to formalize the rules-of-thumb.
> Don't forget, I'm a chemist by training. If I'm doing bio-chemistry it's > incredibly useful to understand electron orbital theory but I can't WORK > at that level. It's just *too* abstract to be meaningful. Couldn't agree more.
> By abstracting data down to (and focussing on) the tuple, relational > theory has just gone into TOO MUCH detail and lost sight of (indeed, to > some extent DESTROYED) any view of the big picture... The big picture as presenting to users is a different thing, outside the domain of relational. How relational losing sight of the big picture destroys is it amusing, but
And the tuple isn't the focus - the relation is, including constraints on relations. As logical predicates, they bear an uncanny resemblance to many "classes" of rules and requirements.
Of course, you can look at just the "raw data" - typed attributes, values in tuples. Or you can look at individual predicates. Or you can look at database constraints (constraints over the set of relations). Wow! That's three... Three... THREE levels of abstraction in one!
Is that the final word? No, of course not. Date draws a distinction between physical (DBMS storage), logical (predicates), and conceptual (mapping to the user). Views, screens, reports, etc. all help paint the "big picture" to the user. But the big picture won't fully develop if the components aren't there, or the logical underpinnings are suspect.
> What you want to do is present the user with a view of the data at their > level, and then analyse it deeper. So you're talking about a RAD / prototyping / extreme programming approach to data design? This seems more like process than logical definition of the data.
> As a chemist, I think in molecules. As a businessman, I think people > tend to think in terms of customers, invoices, things like that. THAT is > the level at which the database should interface with users. Not a bad idea, though the users of a database tend to be developers. Some users ("power users") can handle SQL and reporting and such, but not that many. Still, given that relational's domain is "shared data banks," a logical representation which supports multiple users and multiple applications (and user views) is likely to be lower level - you need to be concerned with those issues to make decisions that help the big picture look real purty.
> Relational interfaces at the chemical equivalent of atoms - with the > tuple. There are no tuple-level operators in relational; although each tuple is a fact, operations are defined over relations (predicates), not individual tuples (which are not to be singled out).
> The poor programmer has to think UP to the "business object" > level, and then UP AGAIN to the reality equivalent. Again, sometimes the poor programming is thinking the other way - not of the order, but of which customers in certain states have placed more than 3 orders which contain both condoms and ice cream cones (or some such combination). From that standpoint, the "business object" is... what? In a hierarchical data definition, the most common operations to data-entry clerks are obvious; everything else becomes convoluted procedural logic. (generalization noted)
> With Pick, I can stand at the "business object" interface, and reach > DOWN into the data, and UP into reality. It's far easier to stand on the > interface reaching in both directions, than to be mired down in the > detail, struggling to get out. Hey, the interface to the real world is messy - deal with it. That interface is an ever-shifting beast in any "real business" I've ever dealt with. With a shifting interface (one which changes radically as you follow data from department to department), a lower-level (sic) definition of data is a better support system than a "big picture."
By "standing on" the interface, you depend on its stability. I believe it to be far, far less stable than the logical definition of the data, which perhaps might be "small picture." But I'll deal with it there, where I have some power.
> I am sorry I can't give you a better answer than that. But the real > world is messy. Deal with it! That's exactly what we're all trying to do; fuzzy definitions like "messy", "real world", and "big picture" aren't going to give us much purchase in the attempt.
>> Seldom, due to business desires, but to answer the question you're >> getting [quoted text clipped - 8 lines] > men, and obedience of fools" - I would use my intelligence as to whether > this made sense. I agree - if it makes sense, it's not a bad idea. I have just found very few cases where it has made sense, but not every database I've designed as been fully normalized (and that was by design).
One case in point: an issue-reporting database that I did during a meeting as a prototype which quickly went production. Before it did go production, I normalized what had been denormalized structures; and was glad I did, because subsequent report, query, data export, and even screen view requirements would have been tricky without it. Not impossible by any means, but far less intuitive.
>>> Yep ... but relational theory, which imposes mandatory separation of the >>> logical from the physical, imposes that cost on EVERY app, not just [quoted text clipped - 9 lines] > that runs once a month, if by doing so I can shave a second off a job > that 50 users use several hundred times a day? True (with reservations based on the context of those processes) - but I've been talking more about new requirements for reports, views, data imports/exports, etc. A normalized relational structure supports new requirements better; a denormalized one adds some initial overhead. As far as performance, I have no doubt that Pick performs well, but haven't been so constrained in terms of hardware and design that I would denormalize to save... something.
>>> Yup. Let's assume that the Pick database has been designed properly, and >>> that within the FILEs the data has been normalised. I can now present my [quoted text clipped - 5 lines] > relationship, the "one" data only appears once, not replicated for every > instance of the "many". Well, a hierachical database is likely to deal natively with hierarchies (which is what you're talking about). From a relational viewpoint, you don't have "one thing" - you're talking about two predicates (the parent and the child).
>>> My Pick application has also FORCED, by DEFAULT, my database to store >>> related data close to itself (what relational calls clustering, I [quoted text clipped - 4 lines] > > Here you go again - Uh oh... flashbacks of the Jimmy Carter - Ronald Reagan debate... (in which Reagan repeatedly used the phrase "there you go again" to avoid actually having rebut a point).
> crippling the race horse so we can have a "fair" > race against the crippled old nag ... A poor analogy, though I can't think of a better one. I'm talking about being able to support new and changed requirements with a minimum of change, as well as being able to firewall the integrity of my data from programmer error (including my own!), and declaring the meaning of my data (for humans as well as for enforcement).
For the record, yes, I'm sure Pick can load that record real darn fast. But I have yet to see an application that, based on a primary key, couldn't load the vast majority of its associated hierarchy in an unnoticeable amount of time. It just isn't that hard. And in terms of code burden, mapping tools make it a no-brainer.
So I'll form my own analogy: you're talking about having that racehorse cross the finish line 1 second faster (when it already was beating the other horses anyway), albeit dropping the jockey on his arse en route. :-)
> That crack about the race horse was deliberate. Relational seeks to make > all access paths equal. Fair enough. Rather like the UK educational [quoted text clipped - 5 lines] > COMMON PATH! So. I can prove that it's the most common path. I can prove > it's the most efficient path. Sure - but taking alternative paths to that same endpoint aren't much slower. Yeah they're slower... but so what? It doesn't always make a different, and in my experience, it's the reports and ad hoc queries and dataloads and such that demand performance optimizations. I don't give a damn whether loading my Order is done in one read in 0.09 seconds, or in several reads in 1 second, since my UI is probably going to take its sweet time painting anyway...
Obviously I'm not that naive, but I think your optimizations to the most-common path, while certainly an improvement, may have a less-than-noticeable impact on most users. But I could certainly be wrong.
> Can you prove, that by crippling the most common path, you can improve > the "worst path" cases enough to make it worth-while?
> Was it Knuth that > said "premature optimisation is the worst evil"? "Premature optimization is the root of all evil", and while Knuth gets the credit, he says Tony (C.A.R.) Hoare said it first.
> I couldn't give a damn > if the nag trails in last by a racecourse. I want the thoroughbred to win. Well, we differ. If the horses are functions / applications, I want as many horses as possible to finish before the jockeys drop dead of old age. :-)
Thanks, Anthony, for the lively exchange(s).
- erk
Marshall Spight - 19 Jul 2004 19:38 GMT > Again, sometimes the poor programming is thinking the other way - not of > the order, but of which customers in certain states have placed more > than 3 orders which contain both condoms and ice cream cones (or some > such combination). I don't know what you've got planned for tonight, Homer, but count me out.
Marge
Eric Kaun - 25 Jul 2004 05:32 GMT >>Again, sometimes the poor programming is thinking the other way - not of >>the order, but of which customers in certain states have placed more [quoted text clipped - 5 lines] > > Marge Heh.
Oh - and duckbilled platypi (sp?). Need those too.
Leandro Guimaraens Faria Corsetti Dutra - 04 Jun 2004 00:13 GMT > that is the language of RDBMS: rows and columns. Rather tuples and attributes.
 Signature Leandro Guimarães Faria Corsetti Dutra +55 (11) 5685 2219 Av Sgto Geraldo Santana, 1100 6/71 leandro@dutra.fastmail.fm 04.674-000 São Paulo, SP BRASIL http://br.geocities.com./lgcdutra/
Anthony W. Youngman - 04 Jun 2004 00:35 GMT >> It think it is worth noting that is far more difficult to retrieve an >> invoice the way it looked originally after chopping it up [quoted text clipped - 19 lines] >show one to a customer. >Use a tool that was designed to present data. THAT WAS MY POINT!
The tool is external to the database ...
Thanks for proving it :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
mAsterdam - 04 Jun 2004 01:18 GMT >>> It think it is worth noting that is far more difficult to retrieve an >>> invoice the way it looked originally after chopping it up [quoted text clipped - 24 lines] > > Thanks for proving it :-) Thank you for your trust but modesty dictates me to say I did not prove anything. I was just giving pragmatic guidance based on opinion.
Dawn M. Wolthuis - 04 Jun 2004 01:50 GMT > >>> It think it is worth noting that is far more difficult to retrieve an > >>> invoice the way it looked originally after chopping it up [quoted text clipped - 30 lines] > I was just giving pragmatic > guidance based on opinion. LOL --dawn
Anthony W. Youngman - 07 Jun 2004 23:12 GMT >>> SQL reports are ugly - I'ld would not want to >>> show one to a customer. [quoted text clipped - 8 lines] >I was just giving pragmatic >guidance based on opinion. But it doesn't stop your modest opinion being a perfect example of what I was trying to say :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
mAsterdam - 08 Jun 2004 00:52 GMT >>>> SQL reports are ugly - I'ld would not want to >>>> show one to a customer. [quoted text clipped - 12 lines] > But it doesn't stop your modest opinion being a perfect example of what > I was trying to say :-)
:-) Eric Kaun - 02 Jun 2004 22:00 GMT > > > And I'm pretty damn > > > confident that you can NOT create a theory that will do a reversible [quoted text clipped - 6 lines] > is useful in a rather small portion of what we do in addressing the "data > processing" needs of a business. Possibly, though that depends on how you define data processing, of course. Certainly software concepts like concurrency and distributed computing aren't addressed by relational. When it comes to data, I would say that a presentation language would be a nice relational add-on, as well as the definition of a system catalog. The two of those would make a nice combination.
In any event, Dataphor (and reporting products mAsterdam refers to) infer a great deal already, and to present both UIs and other user-facing artifacts derived from... relations! Why you'd structure your data based on its eventual output is beyond me, since that output is neither unique nor static.
> There are at least the axioms of set theory and then some things were tossed > in the mixed without any proof from these axioms, such as restricting the > sets from which elements can come to sets of scalar values (which has been > changed now, but 1NF, however defined, would have to be considered an axiom > since it does not arise from any other mathematics) 1NF says only that the relational model doesn't treat types specially; it defines the "domain" of the relational model as distinct from the user-definable (therefore extensible) portions. Not sure why that's a problem - in any event, lists can be seen as scalars, as long as you're not requiring the base algebra to in some way acknowledge them. Not sure why you'd want to, since you do have user-definable types and operations.
And as far as data types go, lists are sometimes nice but there are far better ones for most purposes - e.g. relation-valued attributes. For example, let's say that you chose to have a "LineItems" attribute. If you had only lists, you'd have to have several attributes - one for part numbers, one for quantities, etc. And you'd have to adopt the convention that PartNumber[n] corresponds to Quantity[n], with no guarantees that the lists couldn't end up with different sizes, if your code is bad. With a relation LineItems = {Part#, Qty}, not only would you have a guarantee, you could even query that attribute, rather than writing a stupid loop!
So relations give you rich attributes in a far more consistent and powerful way than lists, which are impoverished little suckers. I rarely use them in Java - the other types are far more useful and powerful. They're the type-generators of last resort.
But if you do want to see lists done right, check out any Lisp dialect. Easier processing, nicer syntax, and better reuse than Java.
(and yes, I understand we're mixing code and data again...)
> It think it is worth noting that is far more difficult to retrieve an > invoice the way it looked originally after chopping it up (that 1NF thing > again) and then using SQL to show the invoice again. True, but it's far easier to derive other facts about that invoice, and its relationship to the customer and other invoices and parts and shipments, when it's not all one big blob. Again, if your main task is showing people the invoice, why transform it at all? How do you know when you've gone too far? In other words, what are your normalization rules? Is it just unhooking 2NF from 1NF?
> Without arguing the semantics (and mapping of the data to reality) of this > particular example, if your invoice looked like this when selling a > beautiful skirt in white and blue that comes from two of your catalogs, it > is definitely HARDER than a non-1NF environment, though not impossible, to > get a SQL statement to show your invoice properly. That's true, primarily because SQL is a bad reporting tool (in addition to being a bad relational derivative). There are better ones.
> > So what do you want - the invoice paper? Maybe we should just rely on > > scanners producing JPGs - non-lossy, of course. > > No need -- including lists in your data (at least your virtual data!) gets > you far enough that you don't notice any more big disconnects. It gets you somewhere, but as I said above, relation-valued attributes get you much further. Why not them?
> SQL Server > permits lists in their UDFs, while Oracle (to my knowledge) does not allow > lists returned from their functions (stored procedures) Does SQL Server's SQL dialect address lists directly?
> > > Dawn was going on about faith. Do you have faith in > > > business analysts to get the analysis correct, or would you rather have [quoted text clipped - 6 lines] > > As long as we are all aiming for the same things ... smiles. --dawn True - ease of development and assurance of data integrity are mine. Both benefit the user...
- erk
Anthony W. Youngman - 04 Jun 2004 00:34 GMT >> > Certainly with current >> > relational databases accessed with SQL, you're relying on either an [quoted text clipped - 10 lines] >related to each other on that one line. Stick with me here, I know I said >that poorly. What I'm trying to say is that if we use SQL, we get a corrupted version of the data back (ie data that went in ONCE comes back MULTIPLY DUPLICATED), *or* we use an application (such as CrystalReports) which is not part of the database to retrieve the relevant bits from the relevant table.
The database itself doesn't know which tables represent "the set of invoices" and doesn't know how to retrieve a single instance of "invoice" from it - it needs to be told by an external influence, namely a query.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Anthony W. Youngman - 04 Jun 2004 00:29 GMT >> This theory will then be the equivalent of Kepler and Newton discovering >> ellipses and calculus, or of Einstein realising that mass and energy [quoted text clipped - 3 lines] > >Which axioms don't match? I wasn't really aware there were axioms per se. BLOODY HELL ...
I don't mean to sound stunned, but this takes the biscuit ...
ALL mathematical theories are based on axioms.
Science is basically the search for experimental proof that the axioms correctly describe the real world.
If you can't describe relational theory in terms of axioms and logical deductions, then it isn't maths and can't be science!
An axiom is basically "any statement which the model ASSUMES to be true". In relational theory, I would guess that at least one axiom could be phrased as "data comes in tuples".
So, if you don't have experiments to show that real-world data ALSO comes in tuples (or a close approximation thereof), then you can't conclude that a relational database is a good place to store real-world data. (Oh - and if you conclude that real-world data DOES come in tuples, but in several different types of tuple, then your theory needs to take that into account!)
Sorry for ignoring the rest of your post, but this is ABSOLUTELY FUNDAMENTAL!!!
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Dawn M. Wolthuis - 04 Jun 2004 01:16 GMT > >> This theory will then be the equivalent of Kepler and Newton discovering > >> ellipses and calculus, or of Einstein realising that mass and energy [quoted text clipped - 15 lines] > If you can't describe relational theory in terms of axioms and logical > deductions, then it isn't maths and can't be science! By George, you've got it., Wol!!! Perfect!
Relational theory, once some choice axioms are added in (without being stated as axioms and without being obvious that they out to be axiomatic when measured by any map to reality) does then proceed with mathematics, but there is a lot of "tossing stuff in and out" going on because there is not that match with reality at each point.
Cheers! --dawn
Eric Kaun - 04 Jun 2004 15:51 GMT > [SNIP] > > If you can't describe relational theory in terms of axioms and logical [quoted text clipped - 7 lines] > there is a lot of "tossing stuff in and out" going on because there is not > that match with reality at each point. So what mathematical axioms do you know of that "map to reality"? I didn't realize that was the fundamental aspect of an axiom's value. And if it is, then again, what data axioms do you propose as a good start? They needn't be formal, but have to have more meaning than "data comes in tuples".
- erk
Dawn M. Wolthuis - 04 Jun 2004 16:53 GMT > > [SNIP] > > > If you can't describe relational theory in terms of axioms and logical [quoted text clipped - 10 lines] > > So what mathematical axioms do you know of that "map to reality"? Those arithmetic ones have worked OK for me.
> I didn't > realize that was the fundamental aspect of an axiom's value. It is only of worth if you want to apply the mathematics to something, such as databases.
> And if it is, > then again, what data axioms do you propose as a good start? They needn't be > formal, but have to have more meaning than "data comes in tuples". I'm not in that spot yet and I do want them to be formal. smiles. --dawn
Eric Kaun - 04 Jun 2004 21:05 GMT > > So what mathematical axioms do you know of that "map to reality"? > > Those arithmetic ones have worked OK for me. I agree, they work well. But what "reality" do they map to? They're synthetic, albeit extremely useful. How would you correlate them with reality?
> > I didn't realize that was the fundamental aspect of an axiom's value. > It is only of worth if you want to apply the mathematics to something, such > as databases. Right, but how exactly does one determine the applicability of mathematics to, say, physics? In other words, what axioms does any branch of mathematics have that correlate to something in the real world?
- erk
Dawn M. Wolthuis - 05 Jun 2004 06:53 GMT > > > So what mathematical axioms do you know of that "map to reality"? > > [quoted text clipped - 3 lines] > synthetic, albeit extremely useful. How would you correlate them with > reality? Without looiking up the axioms themselves, I map the number 1 to a single sheep and then with addition, I add in sheep. It's all about sheep.
> > > I didn't realize that was the fundamental aspect of an axiom's value. > > It is only of worth if you want to apply the mathematics to something, [quoted text clipped - 4 lines] > to, say, physics? In other words, what axioms does any branch of mathematics > have that correlate to something in the real world? I think that is where Wol's line of discussion was. As far as I'm concerned, they correlate as metaphors when they are used and then they are are used for that which they work for. So, the correlation is very pragmatic. There is no proof of such a correlation, but you can disprove an exact correlation just as you can come up with a fault in a metaphor.
Somehow I don't think I'm tapping into your questions right 'cause I think you and I agree on these points and are arguing anyway -- otherwise, without asking a qusetion for the answer, where do you think we have a disagreement in this area? --dawn
Eric Kaun - 07 Jun 2004 19:22 GMT > Without looiking up the axioms themselves, I map the number 1 to a single > sheep and then with addition, I add in sheep. It's all about sheep. I disagree - it's all about turtles, stacked up on an elephant. Or maybe vice versa. But in either case, there are no sheep. Unless it's the ones the turtles are dreaming about.
> > Right, but how exactly does one determine the applicability of mathematics > > to, say, physics? In other words, what axioms does any branch of [quoted text clipped - 11 lines] > asking a qusetion for the answer, where do you think we have a disagreement > in this area? --dawn Uh... I object. By golly, I object. (flashback to an old Bloom County cartoon with Opus in court, pounding on the desk with a gavel saying "By golly I object" repeatedly...)
Anyway, you're right - I don't think we actually disagree with that. Systems development is unnatural, and discards detail because the real world is unautomatable. We can only model and simulate tiny segments of it, and my assertion is that those models gain far more power from the nature of the model than from correlation with the real world. That correlation is nice, and certainly there must be a mapping... but that balance point is what we're arguing about, not the points above.
Sorry...
- erk
Anthony W. Youngman - 07 Jun 2004 23:39 GMT >> By George, you've got it., Wol!!! Perfect! >> [quoted text clipped - 9 lines] >then again, what data axioms do you propose as a good start? They needn't be >formal, but have to have more meaning than "data comes in tuples". e=mc^2 ?
Yep. I know it's bl**dy difficult. But if you're not prepared to attempt it, then you're admitting your theory is irrelevant to the real world (and cannot be used to solve real-world problems).
Let's take the evolution of that theory I keep on throwing out as an example.
Copernicus : orbit == circle Kepler : obit == ellipse Newton : F=ma; E=1/2mv^2 where m is constant Einstein : e=mc^2
Each change may only subtly modify the previous axioms, but the result is theory/model that is a closer fit to reality.
Going back to relational theory. Does the THEORY distinguish between a "join" and a "join with a cascading delete"? Or a "join" and a "join with a foreign key that must exist (cannot be null)".
Because if relational theory cannot cope with that, then the Pick model can. And surely, a relational table who's rows are meaningful in their own right MUST be different from a table who's rows are meaningless without another table to relate to?
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 09 Jun 2004 20:58 GMT > >So what mathematical axioms do you know of that "map to reality"? I didn't > >realize that was the fundamental aspect of an axiom's value. And if it is, [quoted text clipped - 6 lines] > it, then you're admitting your theory is irrelevant to the real world > (and cannot be used to solve real-world problems). That's silly. The following have all been used to solve real-world problems: - Pick - SQL - Relational (assuming Dataphor has at least one real-world solution in place somewhere) - Flat files - XML
I don't know the "axioms" of the flat-file solution, and don't think that Unix's "everything is a file" is really an axiom. What they're saying is that we have a useful model that treats all data as files.
Besides, who is attempting it? What attempt has MV made? I don't understand what you're looking for here - do you want science, math, or neither? In whatever category you want, what does Pick/MV offer? You seem unwilling to pick up your own gauntlet.
> Let's take the evolution of that theory I keep on throwing out as an > example. [quoted text clipped - 6 lines] > Each change may only subtly modify the previous axioms, but the result > is theory/model that is a closer fit to reality. I don't think the above are axioms in the mathematical sense, though I could be wrong.
> Going back to relational theory. Does the THEORY distinguish between a > "join" and a "join with a cascading delete"? Cascading deletes are useful for implementations, not part of the theory - a cascading delete is simply nice shorthand for an implicit multi-update (as advocated by Date in recent writings), and roughly corresponds to the usefulness of the "foreign key" concept in place of a longer-winded constraint definition.
> Or a "join" and a "join with a foreign key that must exist (cannot be null)".
In relational, all foreign keys must exist, and no attribute value can be null.
> Because if relational theory cannot cope with that, then the Pick model can.
"Cannot cope with that" implies that there is some objective reality that's presenting X, and that a model that doesn't "cope with" X is a poor match to reality. While I agree with the implication overall, the premise is false - there's no objective reality "presenting" cascading deletes or nulls. Those are both aspects of modeling data. There's no objective reality with which those correspond. At best, you're pitting Data Model A against Data Model B, and claiming B is lacking in attribute C, when C doesn't even enter into Data Model A.
> And surely, a relational table who's rows are meaningful in their > own right MUST be different from a table who's rows are meaningless > without another table to relate to? "Meaningful in their own right" is rhetorical - every relation has a meaning (the external predicate). To turn the question on its ear, surely a Pick file which requires applications to enforce the correspondence between values in several distinct attributes MUST be different from a file whose attributes refer to the IDs of other files?
- erk
Anthony W. Youngman - 10 Jun 2004 01:34 GMT >> >So what mathematical axioms do you know of that "map to reality"? I >didn't [quoted text clipped - 26 lines] >whatever category you want, what does Pick/MV offer? You seem unwilling to >pick up your own gauntlet. What I'm saying is that maths is great at building a model. But without science you can't say that any model is useful. Without science, a model is just an intellectual exercise of no value to the real world.
>> Let's take the evolution of that theory I keep on throwing out as an >> example. [quoted text clipped - 9 lines] >I don't think the above are axioms in the mathematical sense, though I could >be wrong. Yes they are. Copernicus ASSUMED that the planets went in circles, and then he used logic on top or that. Therefore, "orbit == circle" is an axiom.
Kepler realised that "orbit == ellipse", and that explained why Copernicus' logic was so screwy.
Newton ASSUMED that m and E could neither be created nor destroyed, therefore they are axioms.
Einstein realised that m and E were interchangeable, and that explained why Newton couldn't predict the orbit of Mercury.
Basically, any assumption that underlies mathematical logic is an axiom. Copernican orbital theory is a mathematical model. Newtonian Mechanics is a mathematical model. Therefore the assumptions that underlie them must be axioms.
>> Going back to relational theory. Does the THEORY distinguish between a >> "join" and a "join with a cascading delete"? [quoted text clipped - 4 lines] >usefulness of the "foreign key" concept in place of a longer-winded >constraint definition. In other words, the theory has no way of coping with what I call "the adjectival clause" - a table whose contents are meaningless without the existence of another table to point to. An invoice line item cannot exist without an invoice for it to belong to!
Or, in other words again, relational theory is deficient because it has no way of coping with real-world constructs that "obviously" exist.
>> Or a "join" and a "join with a foreign key that must exist (cannot be >null)". [quoted text clipped - 13 lines] >and claiming B is lacking in attribute C, when C doesn't even enter into >Data Model A. But there IS objective reality. A line-item on an invoice, for example. The former has no existence outwith the latter.
>> And surely, a relational table who's rows are meaningful in their >> own right MUST be different from a table who's rows are meaningless [quoted text clipped - 5 lines] >values in several distinct attributes MUST be different from a file whose >attributes refer to the IDs of other files? I don't get that. But I think you're making the logical blunder of expecting your logic to PREscribe the world's behaviour, rather than DEscribe it.
Please explain to me how, in the real world, an invoice line item can have an existence in the absence of the invoice to which it belongs ... because as I read you you are saying that relational theory says it can ...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 10 Jun 2004 21:15 GMT > What I'm saying is that maths is great at building a model. But without > science you can't say that any model is useful. Without science, a model > is just an intellectual exercise of no value to the real world. You've probably said it before, and it makes sense. But most sciences verify their theorems using prediction and comparisons with real-world phenonema. We don't have that luxury, since we can build working systems with various data models and languages and frameworks, and any experiments designed to measure overall productivity in system creation and maintenance have to take colossal human factors into account... is there some way of testing the models - or actually the meta-models?
> >> Let's take the evolution of that theory I keep on throwing out as an > >> example. [quoted text clipped - 27 lines] > is a mathematical model. Therefore the assumptions that underlie them > must be axioms. OK, my mistake - I thought you meant the formulae themselves were axioms. You're referring to underlying assumptions, which of course are.
> >Cascading deletes are useful for implementations, not part of the theory - a > >cascading delete is simply nice shorthand for an implicit multi-update (as [quoted text clipped - 6 lines] > existence of another table to point to. An invoice line item cannot > exist without an invoice for it to belong to! Right, but cascading delete is different than that. Constraints encode what you just said ("A cannot exist without B"); cascades are useful shorthands for updates, designed to make sets of operations easier while obeying the constraints.
> >"Meaningful in their own right" is rhetorical - every relation has a meaning > >(the external predicate). To turn the question on its ear, surely a Pick [quoted text clipped - 5 lines] > expecting your logic to PREscribe the world's behaviour, rather than > DEscribe it. That's not what I'm trying to do, but describing the world's behavior isn't all software development is. My requirements have always included demands for information which isn't directly manifest in the real world - in other words, beyond just being able to reproduce an invoice (thus showing that my database describes invoices), my data is also the basis for
> Please explain to me how, in the real world, an invoice line item can > have an existence in the absence of the invoice to which it belongs ... > because as I read you you are saying that relational theory says it can I may have expressed myself badly. What relational theory says is that statements about line items on an invoice state truths about values which are unrelated to the invoice as a whole, though each line item of course depends on the invoice (header). That statement, while "related to" the invoice header (in that it can't exist without it), has logical meaning on its own - I can formulate useful queries over line items which don't involve the header.
If cascading delete is your guide, then would deleting a customer cause all of that customer's invoices (and their line items) to be deleted as well? Danger aside, does that imply that invoices are an attribute of customers? I understand there's a difference between customer:invoice and invoice:line item relationships, but I'm trying to boil it down to something more than "they're part of the same thing". When it comes to general ledger entries, invoices, payments, shipments, contacts, etc., the line between what is and isn't part of a customers gets a little murkier. Or does it?
- Eric
Anthony W. Youngman - 14 Jun 2004 23:12 GMT >> Basically, any assumption that underlies mathematical logic is an axiom. >> Copernican orbital theory is a mathematical model. Newtonian Mechanics [quoted text clipped - 3 lines] >OK, my mistake - I thought you meant the formulae themselves were axioms. >You're referring to underlying assumptions, which of course are. Some of the formulae may have to be axioms too. If you need to assume, then it's an axiom, if you can derive from your previous assumptions then it's a theorem.
>> >Cascading deletes are useful for implementations, not part of the >theory - a [quoted text clipped - 13 lines] >for updates, designed to make sets of operations easier while obeying the >constraints. But the need for a cascading delete is metadata - information that should be *implicit* within the database. You're turning it into an *external* constraint - putting it where it does NOT belong!
>> Please explain to me how, in the real world, an invoice line item can >> have an existence in the absence of the invoice to which it belongs ... [quoted text clipped - 7 lines] >its own - I can formulate useful queries over line items which don't involve >the header. I think we're having a bit of fun here :-) You're saying you want to extract data from certain "columns" without caring what the primary key is. Fine - no problem there. Ignore the columns you're not interested in.
I'm saying that deleting the primary key should delete all related rows - even those in other tables! If your analyst forgot to specify a cascading delete (and you say that they're external to the theory, anyway), what you're saying is that the theory FAILS to enforce data integrity in that you're using something external to theory to keep the tables in sync.
Pick just stores it all together so that taking out the primary key takes out everything else.
>If cascading delete is your guide, then would deleting a customer cause all >of that customer's invoices (and their line items) to be deleted as well? [quoted text clipped - 4 lines] >invoices, payments, shipments, contacts, etc., the line between what is and >isn't part of a customers gets a little murkier. Or does it? Well, if you follow accounting rules, yes it should :-) Although I think that really goes the other way - you can't create an invoice for a non-existent customer :-)
I think I know what you mean though, when you say "gets a little murkier". Except, in practice, it doesn't. "customer" is a noun - it gets its own FILE. "invoice" likewise. "line item" - is it a noun or adjectival clause? Pick Business Analysis would unhesitatingly place it in the category of adjectival clause. But I know why you would want to treat it as a noun.
Probably because it makes the General Ledger so much easier :-) you want to analyse by line-item, and not by invoice. Actually, that's not difficult at all - you just add ledger code as an attribute of invoice, grouped as part of line-item :-) But yep. I can see why you wouldn't think it as clean - I'm inclined to agree with you. If I was programming this, I'd probably say that "line-item" in the general ledger wasn't the same as "line-item" in the invoice and that would make my life nice and simple :-) but it would have the relational people throwing their hands up in horror. Or just make the entries in the GENERAL-LEDGER FILE a list of foreign keys pointing at the line item in the invoice file - not hard at all. Just a smidgeon more work for the database (but rather more mental contortion for the programmer).
But I've been thinking about a few other things while this reply has been sitting half-composed on my computer ... Relational Theory is all about capturing *data*. BUT - a lot of information is *metadata* which an RDBMS is incapable of storing as such. We were discussing ordering - an RDBMS only captures this - as data - if the analyst thinks it important. A Pick database captures it as a matter of course.
And constraints - I categorise them as "natural constraints" and "business constraints". You can't have an invoice line item without an invoice - that's a "natural constraint". But you *can* have an invoice without a valid company. It might be an error, or it might be called a receipt. But there's nothing to stop the accounts dept screwing up and issuing an invoice to a non-existent company :-) That's what I call a "business constraint". You seem to think that should be captured as *data*. Pick captures it as *metadata*.
Now compare the amount of *metadata* available to Pick and/or relational. It doesn't matter what your database is, the data in it is, as far as the dbms is concerned, a meaningless "blob". To optimise performance, storage, whatever, the only thing available of any use to the dbms is *metadata*. Which Pick has in abundance.
That's why I describe Pick as a superset of relational - it can convert metadata into data and present it to the app. It can also USE the metadata to optimise itself. Relational can only store this sort of information as *data*, and as such the information is not available to the dbms for its internal use.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony - 15 Jun 2004 10:38 GMT > Now compare the amount of *metadata* available to Pick and/or > relational. It doesn't matter what your database is, the data in it is, [quoted text clipped - 7 lines] > information as *data*, and as such the information is not available to > the dbms for its internal use. False. A relational database contains a lot of metatdata: primary/unique keys, foreign keys, other constraints. All of these are available to the RDBMS for optimisation purposes. To take your invoice example, the RDBMS "knows" that a given invoice has 14 invoice lines just as surely as Pick does.
Laconic2 - 15 Jun 2004 13:39 GMT > False. A relational database contains a lot of metatdata: > primary/unique keys, foreign keys, other constraints. All of these > are available to the RDBMS for optimisation purposes. To take your > invoice example, the RDBMS "knows" that a given invoice has 14 invoice > lines just as surely as Pick does. Excellent, excellent point. So many people fail to recognize this feature of an RDBMS.
When I built a "data mart" in Oracle as a star schema, I included all the primary and foreign key constraints, even though it slowed down loading. The advantage came when I went to copy the star into Cognos (Impromptu or Power Play, I forget) Both Cognos and the Oracle optimizer recognized my star schema for what it was, and made appropriate use of that fact.
x - 15 Jun 2004 14:28 GMT > > "Anthony W. Youngman" <wol@thewolery.demon.co.uk> wrote in message > news:<qZztNxCLLizAFwnY@thewolery.demon.co.uk>... [quoted text clipped - 7 lines] > Excellent, excellent point. So many people fail to recognize this feature > of an RDBMS. And all this "knowledge" about data won't spare you from writing code for assembling several SQL statements.
What Anthony said is this: 1) we entered all data in an invoice *at once* in the database 2) we should be able to work on the *whole* as well as on a part of this data by means of the DBMS 3) we should be able to ask for the *same* data we entered in the database as a whole by means of the DBMS. If we are not able to do this with a RDBMS, then something is missing. He call this metadata.
> When I built a "data mart" in Oracle as a star schema, I included all the > primary and foreign key constraints, even though it slowed down loading. > The advantage came when I went to copy the star into Cognos (Impromptu or > Power Play, I forget) > Both Cognos and the Oracle optimizer recognized my star schema for what it > was, and made appropriate use of that fact. You were lucky.
Laconic2 - 15 Jun 2004 16:23 GMT > > When I built a "data mart" in Oracle as a star schema, I included all the > > primary and foreign key constraints, even though it slowed down loading. [quoted text clipped - 4 lines] > > You were lucky. I don't think so. The engineers who built the CBO for Oracle were real smart. And they had the example of the DEC Rdb optimizer to guide them. And there was a note somewhere in the release notes saying they had implemented a thing they called a "star join". That was enough for me.
And the Oracle DBA, who had plenty of experience with databases that ran like molasses, was amazed at the performance I got out of this beast. Especially when she looked at my code, and didn't find any "hints" and already knew that my tablespaces had nothing but default parameter settings.
It's amazing how many times you "get lucky" by just following simple, sound design, and by keeping things "as simple as possible, but not simpler than that." In the few places where you end up with a performance problem, you can typically tune locally, without ripples spreading all over the system.
The engineers who built the Cognos data extraction tool were real smart. And if they knew MDDB down cold (which they must have), then they almost certainly knew how to recognize a star schema when they saw one. They way I knew that they knew was by looking at the SQL the Cognos tool used to extract the data from my star schema. Sure enough, they "got it".
A lot of people in this business get a lot of bang for the buck by assuming that "everybody but me is an idiot". I've gotten a lot of bang for the buck by assuming just the opposite: "nobody in this business is an idiot. But everybody makes mistakes, and some of them are idiotic."
(I occasionally call people "idiots". But that's just venting.)
Dawn M. Wolthuis - 15 Jun 2004 18:21 GMT > > > When I built a "data mart" in Oracle as a star schema, I included all > the [quoted text clipped - 12 lines] > And there was a note somewhere in the release notes saying they had > implemented a thing they called a "star join". That was enough for me. Is the star join a relational concept? I heard someone suggest that fact-dimension tables with star schema is bad design, but I forget the rationale for that and they seem to be very effective.
> And the Oracle DBA, who had plenty of experience with databases that ran > like molasses, was amazed at the performance I got out of this beast. [quoted text clipped - 16 lines] > by assuming just the opposite: "nobody in this business is an idiot. But > everybody makes mistakes, and some of them are idiotic." Good rule of thumb.
> (I occasionally call people "idiots". But that's just venting.) and that's fine behind closed doors. I'm thankful that there is less of that in public on this list than there was when I started (I'm not sure now what to do with the balls I had to grow at that time, but pleased that I no longer need them to chat here ;-)
Cheers! --dawn
Laconic2 - 15 Jun 2004 20:49 GMT > Is the star join a relational concept? I heard someone suggest that > fact-dimension tables with star schema is bad design, but I forget the > rationale for that and they seem to be very effective. As near as I can make out, a "star join" is yet another join algorithm, that is added to the ones previously implemented.
Earlier join algorithms include the "loop join" and the "merge join". I could describe these in more detail, but you may already know them. They all acheive the same result: a join. They differ in performance, and different ones are better in different cases. A smart optimizer picks the best algorithm given the available information.
A star schema is not a relational concept as such. A star schema is a projection of the multidimensional model onto databases like Oracle, DB2, etc. that I still refer to as "relational DBMSes", except in this forum, where I will be scolded by the keepers of the faith if I do.
In order to implement a successful star schema, you have to unlearn most of what you learned in normalization catechism. I would have said that would be fun for you, except that you don't unlearn 1NF.
Is it bad design? It depends. For certain types of uses, it is far more useful than a fully normalized relational design. Especially reporting, warehousing, and OLAP. Like almost everything in life, sometimes it's a good idea, sometimes it's a bad idea.
But I wouldn't recommend that you run off and learn star schema immediately, although it might be useful if you could incorporate that into some of the SQL you teach in college. What I would recommend, for what it's worth, is that you learn a little MDDB and OLAP, if you haven't already. Then, I think you would find it quite easy to back your way into star schema.
It's just a blend of MDDB concepts with the relational and SQL concepts you already know.
Dawn M. Wolthuis - 15 Jun 2004 21:09 GMT > > Is the star join a relational concept? I heard someone suggest that > > fact-dimension tables with star schema is bad design, but I forget the [quoted text clipped - 8 lines] > different ones are better in different cases. A smart optimizer picks the > best algorithm given the available information. Yes, I do have plenty of star join experience (with SQL-DBMS's and with OLAP "cubes" in various tools)
> A star schema is not a relational concept as such. A star schema is a > projection of the multidimensional model onto databases like Oracle, DB2, > etc. that I still refer to as "relational DBMSes", except in this forum, > where I will be scolded by the keepers of the faith if I do. I've switched from RDBMS to SQL-DBMS for that reason. I think TRDBMS is the same as RDBMS.
> In order to implement a successful star schema, you have to unlearn most of > what you learned in normalization catechism. > I would have said that would be fun for you, except that you don't unlearn > 1NF. Yes, interesting, eh? It makes you think that 1NF is decidedly a different animal. But since I've done stars(-ish) in Pick as well, I can say with certainty that 1NF is not required.
> Is it bad design? It depends. For certain types of uses, it is far more > useful than a fully normalized relational design. > Especially reporting, warehousing, and OLAP. Like almost everything in > life, sometimes it's a good idea, sometimes it's a bad idea. There are OLTP (online transaction processing) designs that can double handily for OLAP (online analytical processing). I'd tell you what they are, but I'm trying to trim back my use of the P word. There are good reasons to rehost data into some other format, but if all of the data you need are in a single OLTP system and you don't need a frozen point in time, then it is such a shame that so many people feel a need to pull their data out of their SQL-DBMS's and/or reshape it just so they can get information back out (reporting), don't you think?
> But I wouldn't recommend that you run off and learn star schema immediately, > although it might be useful if you could incorporate that into some of the > SQL you teach in college. What I would recommend, for what it's worth, is > that you learn a little MDDB and OLAP, if you haven't already. Then, I > think you would find it quite easy to back your way into star schema. Sorry to mislead you, I'm well-versed in the ways of the stars -- more so than I am with other relational joins in SQL-DBMS's.
> It's just a blend of MDDB concepts with the relational and SQL concepts you > already know. Maybe the relational complaint is about implementing fact-dimension strategies in MOLAP or other non-RDBMS products. I thought I had heard someone state that designing star schemas was both unnecessary and outside of relational modeling. I'll check Date's book later to see what he says.
cheers! --dawn
Eric Kaun - 16 Jun 2004 20:16 GMT > Is the star join a relational concept? I heard someone suggest that > fact-dimension tables with star schema is bad design, but I forget the > rationale for that and they seem to be very effective. Star schemas are created primarily for performance reasons, because SQL DBMSs are so bad. They're typically denormalized extracts / transformations of normalized schemas, and thus can be regarded as large views. I think they're 2NF but not 3+NF. In any event, you wouldn't want to update one of them, because not only are they denormalized enough that you'd need to update N other rows, but expressing the constraints as triggers in a SQL database, as a derivation of the real integrity rules in the source normalized schema, would be ugly (to say the least). Thus the ETL (extract-transform-load) as basically a big function over the original database.
> and that's fine behind closed doors. I'm thankful that there is less of > that in public on this list than there was when I started (I'm not sure now > what to do with the balls I had to grow at that time, but pleased that I no > longer need them to chat here ;-) I won't ask where they go when you're not using them, or whether you're still able to regrow them at will like the gender-changing frogs whose DNA provided the catalyst for the dino-crisis in Jurassic Park... oh wait, I guess I just begged the question I was too demure to ask directly. :-\
I do wonder whatever happened to Bob... maybe he's reading, maybe not. I can't say I miss the abuse, but do think he knew his stuff relationally.
- Eric
Dawn M. Wolthuis - 17 Jun 2004 00:43 GMT > > Is the star join a relational concept? I heard someone suggest that > > fact-dimension tables with star schema is bad design, but I forget the [quoted text clipped - 27 lines] > > - Eric Yes, and I feel bad about him leaving. I didn't let him bully me off the list and I think that contributed to him either leaving or going into a silent state. I figured the list was big enough for both of us and I also think he knew a ton about relational theory and would prefer to learn from him than have him gone. I don't miss being called names constantly, but I hope he is doing well. So, Bob B, we miss you (even if not your abuse). If you need me to leave before you return, let me know and I will bow out.
--dawn
Tony - 15 Jun 2004 21:48 GMT > > When I built a "data mart" in Oracle as a star schema, I included all the > > primary and foreign key constraints, even though it slowed down loading. [quoted text clipped - 4 lines] > > You were lucky. Yes, and the harder he works on database design, the luckier he gets ;-)
Eric Kaun - 16 Jun 2004 19:49 GMT > >OK, my mistake - I thought you meant the formulae themselves were axioms. > >You're referring to underlying assumptions, which of course are. > > Some of the formulae may have to be axioms too. If you need to assume, > then it's an axiom, if you can derive from your previous assumptions > then it's a theorem. Yes, agreed...
> >Right, but cascading delete is different than that. Constraints encode what > >you just said ("A cannot exist without B"); cascades are useful shorthands [quoted text clipped - 4 lines] > should be *implicit* within the database. You're turning it into an > *external* constraint - putting it where it does NOT belong! I don't understand your distinction between "implicit" and "external" here. External to what? In relational, both are part of the database, which includes both relations (actually relvars, relation-typed variables which are updated with new relation values) and constraints. In most businesses there are rules which bind multiple relations; I'm sure you have something similar in Pick, though it may be enforced by the application. Using first-order logic over relvars, you can specify most of these (if not all).
Haven't you ever seen a Pick app where deleting from (or updating) a record in FILE A requires a corresponding delete/update in FILE B, yet you don't have the ability to encode that in the file or dictionary?
I think you're suggesting that the data structures themselves should encode the constraints, which gets you into dangerous territory, leading to novel data structures for each individual enterprise. That's fine as long as the query and update operators stay consistent, but you'd quickly find constraints undoing that, leaving you with custom persistence and no standard at all. Remember that even foreign and primary key constraints are just that; SQL and even D give shortcuts, but it's just a 1-1 mapping to a constraint declaration.
> >I may have expressed myself badly. What relational theory says is that > >statements about line items on an invoice state truths about values which [quoted text clipped - 5 lines] > > I think we're having a bit of fun here :-) Yes, apparently we both have a sick notion of fun. :-)
> You're saying you want to > extract data from certain "columns" without caring what the primary key > is. Fine - no problem there. Ignore the columns you're not interested > in. But then why include them at all? Certainly I can ignore attributes, for example in updating attribute A I ignore attribute B. Consistently treating a set of Pick attributes as a group (e.g. the line item attributes), while they're part of INVOICE, seems logically wrong; those attributes are different than, for example, the INVOICE_DATE.
> I'm saying that deleting the primary key should delete all related rows > - even those in other tables! If your analyst forgot to specify a > cascading delete (and you say that they're external to the theory, > anyway), what you're saying is that the theory FAILS to enforce data > integrity in that you're using something external to theory to keep the > tables in sync. Nothing external about it. In Pick the integrity of the files is enforced by the application, yet you don't regard the app as external (at least I've seen arguments to the contrary). Constraints are different from relations because they make statements about those relations. Both are integral to relational.
> Pick just stores it all together so that taking out the primary key > takes out everything else. A fine shorthand, and again the CASCADE DELETE (not always what you want, by the way) is simple enough to do, and even to add in later (unlike in Pick, where you have to make that decision up front).
I object less to this than you'd expect; I can see some cases where this buys you a short-term gain. I just see little long-term gain, and expect long-term cost. I've been trying to think of past databases I've worked on, and whether MVs would have bought me anything. Haven't found anything yet... in the few cases where multi-descriptions or multi-coding would have helped, I had cross-business unit and internationalization issues that would have prevented leveraging them anyway. And I can remember a few cases where properly treating a simple code as its own "noun", rather than an adjective, saved me much work later.
Short version: I see adjectives "becoming" relations fairly frequently. I see relations which remain "unused" as such infrequently. Of course, I'm aware that our perceptions are less than objective, and that the lexicons in our head guide our observations more than they should.
> I think I know what you mean though, when you say "gets a little > murkier". Except, in practice, it doesn't. "customer" is a noun - it > gets its own FILE. "invoice" likewise. "line item" - is it a noun or > adjectival clause? Pick Business Analysis would unhesitatingly place it > in the category of adjectival clause. But I know why you would want to > treat it as a noun. And I can see the desire to make it an adjective - believe me, I understand the object-oriented view, the desire to treat the entire business notion as a single object. But I've been bitten too much by doing so, and rarely by "overnormalizing" - and I can usually see an impending need to "add more intelligence" to that "attribute."
> Probably because it makes the General Ledger so much easier :-) you want > to analyse by line-item, and not by invoice. Actually, that's not [quoted text clipped - 8 lines] > at all. Just a smidgeon more work for the database (but rather more > mental contortion for the programmer). And I certainly understand the development-time advantage in reports and GUI screens that a list attribute gives you. I have no doubt that Pick leverages the MV paradigm far, far more than either SQL or SQL libraries leverage relational (or even SQL, for that matter). There are better environments and libraries, but they're far from good.
If you read Michael Jackson (not the king of pop, not the beer expert, but the English software engineer), he has an interesting approach to business (domain) analysis - and it advocates predicates prior to (or instead of) object analysis. It's interesting not just because it accords more with relational (a side benefit), but because it treats "phenomena", modeled by predicates and "owned" by different domains, the basis for design.
> But I've been thinking about a few other things while this reply has > been sitting half-composed on my computer ... Relational Theory is all > about capturing *data*. BUT - a lot of information is *metadata* which > an RDBMS is incapable of storing as such. We were discussing ordering - > an RDBMS only captures this - as data - if the analyst thinks it > important. A Pick database captures it as a matter of course. True. An avenue for discussion might also be what other metadata is useful, other than order. I think that more general question would cut more to the heart of why different data models appeal in different ways.
> And constraints - I categorise them as "natural constraints" and > "business constraints". You can't have an invoice line item without an [quoted text clipped - 4 lines] > "business constraint". You seem to think that should be captured as > *data*. Pick captures it as *metadata*. Given that relational advocates (at least in recent writings by Date) a system catalog that is also composed of relational (and which effectively represents a partial second-order relational algebra/calculus), I'd say that relational definitely wants all data, even metadata, as relations and constraints. I have no particular reason to think that that's not desirable, but it also begs the question: what metadata is there, what's useful, and how does the importance of a given "type" of metadata influence the utility of a given data model?
There may be research on such... just haven't stumbled across it.
> Now compare the amount of *metadata* available to Pick and/or > relational. It doesn't matter what your database is, the data in it is, > as far as the dbms is concerned, a meaningless "blob". To optimise > performance, storage, whatever, the only thing available of any use to > the dbms is *metadata*. Which Pick has in abundance. What, other than ordering?
> That's why I describe Pick as a superset of relational - it can convert > metadata into data and present it to the app. True enough about the ordering, but I'd argue that without any constraints (ordering is implicit), Pick doesn't offer much else. I have to admit not knowing enough about the dictionary, but that seems to be functional transformation, not actual constraints on what's placed into a file... and in particular no constraints that cross multiple files. I think relational constraints can be much, much more descriptive (as well as being proscriptive) - far better than SQL would let on.
> It can also USE the metadata to optimise itself. How so? I thought the programmer had to make the opimization, by choosing what data is retrieved at one time by virtue of being in the same file? I may be missing something.
> Relational can only store this sort of > information as *data*, and as such the information is not available to > the dbms for its internal use. Well, if the catalog is relational (as it should be), then I'd say this isn't quite correct. One could even enforce database design / naming standards using constraints over system catalog relations!
In any event, it would seem useful even in Pick if there were certain "implicit" files that represented the files in the system - for example, a file called FILE with one record per file, and perhaps an attribute called ATTRIBUTES containing a list of attributes... anyway, you can probably see the utility of that for app generation, enforcing standards, and even implementing the Pick engine (and extensions/plugins). Date advocates that, and I believe that Dataphor uses that heavily.
I do wish the Dataphor folks would chime in... it would be nice to hear something from a real relational engine.
- erk
Dawn M. Wolthuis - 17 Jun 2004 00:34 GMT > > >OK, my mistake - I thought you meant the formulae themselves were axioms. > > >You're referring to underlying assumptions, which of course are. [quoted text clipped - 36 lines] > just that; SQL and even D give shortcuts, but it's just a 1-1 mapping to a > constraint declaration. I suspect that Wol was talking about one of the more common relationships between relations -- that of parent and child. A parent-child relationship is designed and then specified and no additional constraints or logic of any sort is required to ENSURE there are no children without a parent and if the parent goes, the children are gone too. Of course this can be accomplished handily in a SQL-based solution, but it isn't quite as intuitive.
> > >I may have expressed myself badly. What relational theory says is that > > >statements about line items on an invoice state truths about values which [quoted text clipped - 9 lines] > > Yes, apparently we both have a sick notion of fun. :-) Count me in on the sick fun, but, nevermind -- I deleted my first response, which is just as well.
> > You're saying you want to > > extract data from certain "columns" without caring what the primary key [quoted text clipped - 6 lines] > they're part of INVOICE, seems logically wrong; those attributes are > different than, for example, the INVOICE_DATE. I'm missing your point. INVOICE_DATE is an attribute of an INVOICE and INVOICE_LINE_ITEM is an attribute of an INVOICE, even if it has both cardinality and degree greater than 1.
> > I'm saying that deleting the primary key should delete all related rows > > - even those in other tables! If your analyst forgot to specify a [quoted text clipped - 5 lines] > Nothing external about it. In Pick the integrity of the files is enforced by > the application, In the case that Wol is talking about -- the parent-child relationship, it is the database that enforces integrity, not the application.
> yet you don't regard the app as external (at least I've > seen arguments to the contrary). all a matter of definition
> Constraints are different from relations > because they make statements about those relations. Both are integral to > relational. There are constraints that are part of the relation -- an attribute being part of a relation is a constraint of sorts, for example.
> > Pick just stores it all together so that taking out the primary key > > takes out everything else. > > A fine shorthand, and again the CASCADE DELETE (not always what you want, by > the way) is simple enough to do, and even to add in later (unlike in Pick, > where you have to make that decision up front). Yes, but it does get missed often and application developers have to know whether such logic is left to the app or is encoded in the database.
> I object less to this than you'd expect; I can see some cases where this > buys you a short-term gain. I just see little long-term gain, and expect > long-term cost. It might not be this particular feature, but I suspect it is a part of what makes for agile software development -- it is easy to make a mess of Pick design over time, but there are an amazing number of twenty-year-old systems out there (in need of database refactoring, no doubt).
> I've been trying to think of past databases I've worked on, > and whether MVs would have bought me anything. Haven't found anything yet... [quoted text clipped - 3 lines] > properly treating a simple code as its own "noun", rather than an adjective, > saved me much work later. I think some code-offs in the future might be in order.
> Short version: I see adjectives "becoming" relations fairly frequently. I > see relations which remain "unused" as such infrequently. Of course, I'm > aware that our perceptions are less than objective, and that the lexicons in > our head guide our observations more than they should. Language does A LOT to guide our perceptions. I was just in a meeting with a project manager who is implementing a PICK application (although he doesn't know that) when the last project he managed was SAP on Oracle. He said that comparatively this was a piece of cake except that it is so different that he doesn't know if he is asking all of the right questions. I wanted to tell him to ask the questions that he would have if he had never been in an SAP or Oracle shop, but opted not to say that. It seems to me that relational thinking trains something out of us rather than training something into us. Just thinking outloud.
> > I think I know what you mean though, when you say "gets a little > > murkier". Except, in practice, it doesn't. "customer" is a noun - it [quoted text clipped - 8 lines] > "overnormalizing" - and I can usually see an impending need to "add more > intelligence" to that "attribute." It is the combination of the initial structure plus the ability to make changes over time that helps to handle these impending changes. I agree with your statement as it relates to 2nd & 3rd normal forms (functional dependency issues).
> > Probably because it makes the General Ledger so much easier :-) you want > > to analyse by line-item, and not by invoice. Actually, that's not [quoted text clipped - 13 lines] > the MV paradigm far, far more than either SQL or SQL libraries leverage > relational (or even SQL, for that matter). One person mentioned that Pick is archaic, old-fashioned, or whatever. That is true and you should not give it too much credit, especially on the GUI side (given that didn't exist in the 70's and there has been little enhancement to Pick in the past few decades -- some will disagree with me). My interest in it for the future is as a better starting point for the industry than the SQL-DBMS's are. I can see that it has provided its users with better agility than the SQL-DBMS and that it "thinks like people think" about data (way too vague, I realize).
> There are better environments and > libraries, but they're far from good. [quoted text clipped - 31 lines] > relational definitely wants all data, even metadata, as relations and > constraints. Yes, that is my understanding of the theory (not the practice)
> I have no particular reason to think that that's not desirable, > but it also begs the question: what metadata is there, what's useful, and [quoted text clipped - 10 lines] > > What, other than ordering? The logical structures are built on lots of derived data (sort-of analogous to stored procedures).
<snip> I'll stop there - I can't get through the entire thing -- you guys can last a long time! --dawn
Anthony W. Youngman - 19 Jun 2004 01:31 GMT >> But the need for a cascading delete is metadata - information that >> should be *implicit* within the database. You're turning it into an [quoted text clipped - 7 lines] >similar in Pick, though it may be enforced by the application. Using >first-order logic over relvars, you can specify most of these (if not all). See lower in my original post - an invoice line can't exist without an invoice, whereas a car can exist without an owner ...
>Haven't you ever seen a Pick app where deleting from (or updating) a record >in FILE A requires a corresponding delete/update in FILE B, yet you don't >have the ability to encode that in the file or dictionary? It probably happens, but MUCH less than relational. So much so, that the need has never been worth doing anything about :-)
>I think you're suggesting that the data structures themselves should encode >the constraints, which gets you into dangerous territory, leading to novel [quoted text clipped - 15 lines] >they're part of INVOICE, seems logically wrong; those attributes are >different than, for example, the INVOICE_DATE. Because statistics tell me that if I access a line item, then I am highly likely to want to access the invoice date at the same time. The cost of retrieving it unnecessarily turns out to be worth it in making it available in case I want it.
But that's a physical thing that relational theory refuses to address...
>> I'm saying that deleting the primary key should delete all related rows >> - even those in other tables! If your analyst forgot to specify a [quoted text clipped - 8 lines] >because they make statements about those relations. Both are integral to >relational. No they are NOT enforced by the application. They are enforced by the DESIGN. An invoice line has as its primary key the invoice number (plus an *implicit* sequence number). Delete that primary key and all associated data disappears including all the invoice lines.
>> Pick just stores it all together so that taking out the primary key >> takes out everything else. > >A fine shorthand, and again the CASCADE DELETE (not always what you want, by >the way) is simple enough to do, and even to add in later (unlike in Pick, >where you have to make that decision up front). Yup, you do "make that decision up front", but it's an obvious decision. I know you can't design something to be fool-proof, but you really do have to be an idiot to make a design blunder of this magnitude...
>I object less to this than you'd expect; I can see some cases where this >buys you a short-term gain. I just see little long-term gain, and expect [quoted text clipped - 10 lines] >aware that our perceptions are less than objective, and that the lexicons in >our head guide our observations more than they should. And Pick treats foreign keys as "just another attribute", so I'm sorry but I'd just dismiss your "adjectives become relations" with "so what!". We don't see it as a problem.
>> But I've been thinking about a few other things while this reply has >> been sitting half-composed on my computer ... Relational Theory is all [quoted text clipped - 6 lines] >other than order. I think that more general question would cut more to the >heart of why different data models appeal in different ways. Part of the problem is that relational theory explicitly ignores implementation. The main reason for keeping metadata as metadata not data, is that it assists greatly in optimisation - an implementation issue.
Keeping metadata is worthless in the relational paradigm, which I suspect is why so many people have trouble with me repeatedly talking about statistics :-)
>> And constraints - I categorise them as "natural constraints" and >> "business constraints". You can't have an invoice line item without an [quoted text clipped - 23 lines] > >What, other than ordering? Which data is *tightly* linked to other data, and which data is *loosely* linked. Information which helps it guarantee that after one disk access, the next few requests can be met from cache not disk ...
>> That's why I describe Pick as a superset of relational - it can convert >> metadata into data and present it to the app. [quoted text clipped - 6 lines] >constraints can be much, much more descriptive (as well as being >proscriptive) - far better than SQL would let on. The dictionary doesn't declare constraints between FILEs, but seeing as a FILE usually contains contents equivalent to several relational tables, it has no need to ... that's why Pick doesn't have constraints like relational does - even those of us who understand relational constraints just can't see the point of implementing them in Pick :-)
>> It can also USE the metadata to optimise itself. > [quoted text clipped - 17 lines] >implementing the Pick engine (and extensions/plugins). Date advocates that, >and I believe that Dataphor uses that heavily. Sounds interesting ... and from another post of mine, you'll see that if I've understood you correctly, Pick does indeed have something like that ... or if it doesn't then it would be easily implemented if it made any sense within the model.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Alfredo Novoa - 10 Jun 2004 14:43 GMT >> Copernicus : orbit == circle >> Kepler : obit == ellipse [quoted text clipped - 6 lines] >I don't think the above are axioms in the mathematical sense, though I could >be wrong. Of course they are not!
An axiom is a proposition regarded as self-evidently true without proof.
http://mathworld.wolfram.com/Axiom.html
Regards Alfredo
Bill H - 10 Jun 2004 23:32 GMT Alfredo:
[snipped]
>> Let's take the evolution of that theory I keep on throwing out as an >> example. [quoted text clipped - 9 lines] >I don't think the above are axioms in the mathematical sense, though I could >be wrong.
> An axiom is a proposition regarded as self-evidently true without > proof. > > http://mathworld.wolfram.com/Axiom.html I think this definition is too rigid. Thinking of an axiom this rigidly often produces a rigidly, narrow analysis. :-)
An axiom can easily be thought of as both a self-evident truth (so what's self-evident?) or an assumption to use to base a further analysis. Newton's 3 laws of motion are generally referred to as axioms that are used as assumptions (or postulates) for further theoretical analysis.
Since databases are natural companions to multiple environments (business, gov't, etc) we shouldn't be limiting our inquiry with such rigid definitions of useful words.
Bill
Laconic2 - 11 Jun 2004 12:20 GMT > An axiom can easily be thought of as both a self-evident truth (so what's > self-evident?) or an assumption to use to base a further analysis. Newton's > 3 laws of motion are generally referred to as axioms that are used as > assumptions (or postulates) for further theoretical analysis. People in this forum tend to confuse "axiom" and "hypothesis".
But then, they also tend to confuse "math" and "science".
Alfredo Novoa - 14 Jun 2004 15:46 GMT >> An axiom is a proposition regarded as self-evidently true without >> proof. >> >> http://mathworld.wolfram.com/Axiom.html > >I think this definition is too rigid. No, it is correct.
>An axiom can easily be thought of as both a self-evident truth (so what's >self-evident?) Absolutely trivial and self contained. You don't need to operate with the statement to see that it is true.
For instance here is the fitst of Euclid's postulates:
"A straight line segment can be drawn joining any two points."
This is contained in the line definition. Nothing new.
>or an assumption to use to base a further analysis. Newton's >3 laws of motion are generally referred to as axioms that are used as >assumptions (or postulates) for further theoretical analysis. It is a very bad use of the terms. Postulates are not assumptions, postulates are axioms: truths.
Newtos's 3 laws of motion are not evident, self consistent nor true.
>Since databases are natural companions to multiple environments (business, >gov't, etc) we shouldn't be limiting our inquiry with such rigid definitions >of useful words. Rigid and correct are different things.
Regards Alfredo
Todd B - 14 Jun 2004 22:05 GMT > >> An axiom is a proposition regarded as self-evidently true without > >> proof. [quoted text clipped - 4 lines] > > No, it is correct. Yep, it sure is a good statement of the concept.
> >An axiom can easily be thought of as both a self-evident truth (so what's > >self-evident?) [quoted text clipped - 7 lines] > > This is contained in the line definition. Nothing new. Axioms are based within a system of thought. For example, Euclid was thinking about planar geometry. Is it possible that if your straight line bent by space could not connect two points in that space? Ah, then you might think, "Well then, it's not a straight line anymore." But from who's perspective? I'm thinking about Einsteinian physics, or even touching on n-dimensional concepts. Axioms just set down rules (and the rules don't have to make 'sense' in the real world) for a logical system. They are not 'true' inherently to the real world. They simply are a base for logical deduction. Although looking back at your post, we may be thinking the same thing.
> >or an assumption to use to base a further analysis. Newton's > >3 laws of motion are generally referred to as axioms that are used as > >assumptions (or postulates) for further theoretical analysis. Referring to this earlier post, I'd say: Newton's laws are not postulates (axioms). They are theorems in physics based upon his original hypotheses. These physical theorems, as far as I know, are different than mathematical theorems, where the former are elucidations about the physical world we perceive, the latter are conclusions derived from the original axioms with certain rules applied to those axioms. Newton's laws, in other words, make bad examples in this discussion about axioms.
> It is a very bad use of the terms. Postulates are not assumptions, > postulates are axioms: truths. Well said, but, truths in the real world, or within the system? Because I can build any logical system with a set of axioms. They will always be true (if they don't contradict each other) because that's where I started. I made them true, like an act of God. I said, "This is how it is; where do we go from here." IMO, I think that is what the Wolfram definition is stating rather clearly.
> Newtos's 3 laws of motion are not evident, self consistent nor true. Correct.
Todd "Nothing is True" -- not a Zen koan, but very paradoxically self-referential
Alfredo Novoa - 15 Jun 2004 17:27 GMT >Axioms are based within a system of thought. For example, Euclid was >thinking about planar geometry. Is it possible that if your straight >line bent by space could not connect two points in that space? Ah, >then you might think, "Well then, it's not a straight line anymore." >But from who's perspective? From the planar geometry perspective.
> I'm thinking about Einsteinian physics, >or even touching on n-dimensional concepts. But this is not the case. Euclids postulates are about planar geometry and only about that.
> Axioms just set down >rules (and the rules don't have to make 'sense' in the real world) for >a logical system. They are not 'true' inherently to the real world. >They simply are a base for logical deduction. Although looking back >at your post, we may be thinking the same thing. Yes, I completely agree with you. Axioms are independent to the physical world.
>Referring to this earlier post, I'd say: Newton's laws are not >postulates (axioms). They are theorems in physics based upon his >original hypotheses. And in observations of the physical world.
>> It is a very bad use of the terms. Postulates are not assumptions, >> postulates are axioms: truths. > >Well said, but, truths in the real world, or within the system? Within the system. We can not know if something is true in the physical world.
>Because I can build any logical system with a set of axioms. They >will always be true (if they don't contradict each other) because >that's where I started. If they contradict then they are not axioms.
> I made them true, like an act of God. They were always true because you are saying the same in two ways.
When you say line you are saying the shortest join of two points. Axioms are redundant.
>Todd >"Nothing is True" -- not a Zen koan, but very paradoxically >self-referential Like: all generalizations are bad :-)
or
There are two groups of people in the world; those who believe that the world can be divided into two groups of people, and those who don't. :-)
Regards Alfredo
Paul - 18 Jun 2004 22:42 GMT >>>or an assumption to use to base a further analysis. Newton's >>>3 laws of motion are generally referred to as axioms that are used as [quoted text clipped - 8 lines] > applied to those axioms. Newton's laws, in other words, make bad > examples in this discussion about axioms. I think we have to distinguish between Newton's laws as a practical way to discuss reality, and Newton's theory as a mathematical model.
The mathematical model may be inspired by reality but it exists on its own as well. In this sense the postulates are axioms.
Reality is the semantics, mathematical models are the syntax. Mathematical models always need a human to map them to reality.
Also I think there is a difference between theorems and theories: Theorems are purely mathematical, they can be proved. If they haven't yet been proved thay are just a conjecture. Theories are the maps between models and reality, they can only be disproved (not proved).
Paul.
Alfredo Novoa - 20 Jun 2004 02:37 GMT >I think we have to distinguish between Newton's laws as a practical way >to discuss reality, and Newton's theory as a mathematical model. > >The mathematical model may be inspired by reality but it exists on its >own as well. In this sense the postulates are axioms. They are not postulates nor axioms, they are assumptions.
Regards Alfredo
Anthony W. Youngman - 07 Jun 2004 23:29 GMT >> >> This theory will then be the equivalent of Kepler and Newton >discovering [quoted text clipped - 25 lines] >there is a lot of "tossing stuff in and out" going on because there is not >that match with reality at each point. Fine. This seems as good a place as any to say what I thought of after that previous post.
This is for all those people who think "if I don't understand it, then it must be wrong" (is Tony listening :-)
Now. It's not words of one syllable, I'm afraid, but I'm trying to explain something very heavy as simply as I can.
Let's start by defining what the words mean.
A "theory", a "model" and an "axiom" are ALL things that have not been proven correct. BUT - and here we hit our first point of confusion - with the exception of a "mathematical theory", they are all things that CANNOT be proven correct. Once proven, a mathematical theory become a "theorem", but a mathematical axiom by definition cannot be proven true, scientific theories and models can only be shown to be false, and a mathematical model cannot be proven to be true because it relies on axioms which cannot be proven true.
Okay. Now ALL models (scientific or mathematical) belong to the set "IF {axioms} THEN {theorems}". Read C&D's twelve rules. Ask yourself which rules are axioms, which rules are convenient constraints, and what else? Basically, what fundamental mathematical category does each rule fall into?
I think Codd (maybe Date) is even on record as saying that various rules were "convenient constraints". In other words, they are axioms with as much validity as Euclid's "parallel lines never meet" - they make the maths easy with no real grounding in reality.
Once you've identified those axioms, ask yourself "what proof do we have in favour of them?" and DON'T FORGET that you CANNOT use logic! "if/then" is NOT TRANSITIVE"! Just because the theorems are true doesn't mean you can conclude the axioms are true - indeed - it's the exact opposite - you can only prove the theorems are true BECAUSE you have ASSUMED the axioms are true.
LOOK at the subject of this thread again. It is an AXIOM of relational theory that data comes in tuples. Show me that that's true! And because it's an axiom, mathematics itself tells you that logic CAN not give you an answer!
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony - 08 Jun 2004 10:51 GMT > This is for all those people who think "if I don't understand it, then > it must be wrong" (is Tony listening :-) Do you mean me? If so, I'm not sure what you are referring to. Do you mean that I think "If I (Tony) don't understand it, then it must be wrong"? If so, where did you get that idea?
> Now. It's not words of one syllable, I'm afraid, but I'm trying to > explain something very heavy as simply as I can. Thanks, Professor ;-)
> LOOK at the subject of this thread again. It is an AXIOM of relational > theory that data comes in tuples. Show me that that's true! And because > it's an axiom, mathematics itself tells you that logic CAN not give you > an answer! Where did you get that axiom from that "data comes in tuples"? Codd's rule #1 says that all data in the database is to be REPRESENTED in only one way: as values in attributes of tuples. It is a prescribed RULE for building relational databases, it is not a claim that anything in the real world "comes in tuples". We have a similar rule in English that all objects are represented by words made up from the 26 letters of the alphabet; it is not an "axiom" that says that objects "come in" combinations of the letters A-Z.
Your problem is that you consistently confuse data and reality.
Of course, this all doesn't mean that tuples are the BEST way to represent data, or even that ALL data can be represented by tuples. But you could easily disprove a theorem that said that "all data can be represented by tuples" by finding a counter-example. Bet you can't though!
Paul - 08 Jun 2004 12:44 GMT > Where did you get that axiom from that "data comes in tuples"? Codd's > rule #1 says that all data in the database is to be REPRESENTED in > only one way: as values in attributes of tuples. Here's a thought: where do constraints fit in? They are kind of like data, since they give you some information about the real-world system you are modelling.
For a fixed snapshot of a database I guess they don't add anything extra, since the tuples satisfy the constraints already. But if you think about a database evolving over time, they do add information. For example suppose you had a constraint "Age < 60" on some relation/column. Then you could ask the question: "Can I add a person aged 65 to my database?" Now in current DBMSs I think you'd do that by trying it and seeing if you get an error. (or maybe by querying the system tables).
In databases we assume anything that isn't true is false (closed world assumption). So maybe constraints give a stronger form of truth that tuples in this sense: If I have no-one aged 65 in my tuples I could say: "the real-world system I'm (partially) modelling may have people aged 65, but my database doesn't". But If I have a constraint "age < 60" it's like I'm making a stronger claim: that not only does my database have no-one over 60, but also the real-world situation I'm modelling has no-one over 60.
Another question: do current systems use the constraints when optimising queries? Would it be feasible for them to do so? For example suppose I have a billion people in my table, with the constraint "Age < 60". If I do "SELECT * FROM people WHERE age = 65" the optimizer could in theory use the constraint to quickly return an answer.
You could also think of examples where an index wouldn't be feasible: constraint: "name NOT LIKE '%x%'" query: "SELECT * FROM people WHERE name LIKE '%axw%'
Paul.
Laconic2 - 08 Jun 2004 13:08 GMT > For a fixed snapshot of a database I guess they don't add anything > extra, since the tuples satisfy the constraints already. But if you [quoted text clipped - 3 lines] > database?" Now in current DBMSs I think you'd do that by trying it and > seeing if you get an error. (or maybe by querying the system tables). Yes.
In particular, the optimizer can use information made available by the constraints in order to generate additional correct strategies, ones that could not be guaranteed to be correct in the absence of such information.
In particular, entity integrity and referential integrity constraints can be used to "prove" that, in certain cases, "SELECT ALL" and "SELECT DISTINCT" will yield identical results. This can result in generating a faster strategy.
The information that a given snapshot happens to conform to a constraint could be made available by examining the snapshot, rather than examining the constraint, but the cost of obtaining that knowledge would be prohibitive.
So a constraint that is known to be valid can be used to advantage, even in the context of a snapshot.
Paul - 08 Jun 2004 13:38 GMT > In particular, entity integrity and referential integrity constraints can > be used to "prove" that, in certain cases, > "SELECT ALL" and "SELECT DISTINCT" will yield identical results. This can > result in generating a faster strategy. OK, but uniqueness constraints and referential integrity constraints are a very small subset of all possible constraints. They're quite simple for a DBMS to understand and use. What about ones that are even a little bit more complicated? I guess the constraints mentioned above don't require knowledge of particular types or operators (other than equality), but ones like "Age < 60" do.
In general a constraint could be any expresson in first order logic. And then to complicate matters further you've got non-relational operators (like "<") added in.
> The information that a given snapshot happens to conform to a constraint > could be made available by examining the snapshot, rather than examining the > constraint, but the cost of obtaining that knowledge would be prohibitive. Here's a thought: consider a database with the constraints "Age < 65" and "Age < 60". Should there be something to say this isn't normalised in some sense? I know that normalization and eliminating redundancy are different things but maybe there should be some kind of "constraint normalization"?
Paul.
Laconic2 - 08 Jun 2004 16:08 GMT > OK, but uniqueness constraints and referential integrity constraints are > a very small subset of all possible constraints. They're quite simple > for a DBMS to understand and use. What about ones that are even a little > bit more complicated? I guess the constraints mentioned above don't > require knowledge of particular types or operators (other than > equality), but ones like "Age < 60" do. I didn't mean to imply that all constraints were useful in the way I set forth. Just that some were.
The DBMS can make use of value limiting constraints to compress data better. For instance, if there is a column called
, ZIP_CODE CHAR(10)
(the tenth character is for the hyphen), and a value is to be stored that is CHAR(15), but the last five characters are blanks, a suitable DBMS could go ahead and store the value anyway, knowing that it can reconstruct the CHAR(15) value later, if necessary.
The same comment goes for a "field" defined as CHAR(10) by the way.
> In general a constraint could be any expresson in first order logic. And > then to complicate matters further you've got non-relational operators > (like "<") added in. I don't understand. What makes "<" a non relational operator? I had been taught that "x < y" is a relation on x and y. This was in math, not comp. sci.
Eric Kaun - 09 Jun 2004 16:43 GMT > Here's a thought: consider a database with the constraints "Age < 65" > and "Age < 60". Should there be something to say this isn't normalised > in some sense? I know that normalization and eliminating redundancy are > different things but maybe there should be some kind of "constraint > normalization"? Implication is one part of it; since [Age<60] implies [Age<65], the latter is unnecessary. The relational model, by relying on such, make more optimizations possible than some of the ad hocisms of SQL and the like - and they certainly get much more complex than this example...
An interesting related point is the overlap between types and constraints. In the above, isn't it really an example of an Age type, with values [0, 1, ..., 60]? (assuming integers here)
- erk
Eric Kaun - 09 Jun 2004 16:39 GMT > Where did you get that axiom from that "data comes in tuples"? Codd's > rule #1 says that all data in the database is to be REPRESENTED in [quoted text clipped - 4 lines] > 26 letters of the alphabet; it is not an "axiom" that says that > objects "come in" combinations of the letters A-Z. Ah, an excellent analogy. I'm sure it's flawed, but it gets the point across in a new way... thanks.
Another, often-cited, is the difference between "flat" tables and relations, and the way people assume relations are 2-dimensional. Consider the following: 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1
Wow! A flat cube! Nifty! I was just too lazy to type out a tessaract (hypercube)...
> Your problem is that you consistently confuse data and reality. > > Of course, this all doesn't mean that tuples are the BEST way to > represent data, or even that ALL data can be represented by tuples. Keep in mind that even if you say this, the orthogonal dimension is Type (Domain), which introduces wrinkles of its own.
> But you could easily disprove a theorem that said that "all data can > be represented by tuples" by finding a counter-example. Bet you can't > though! Yes, good point - find us that black swan.
- erk
Anthony W. Youngman - 10 Jun 2004 01:44 GMT >> Where did you get that axiom from that "data comes in tuples"? Codd's >> rule #1 says that all data in the database is to be REPRESENTED in [quoted text clipped - 7 lines] >Ah, an excellent analogy. I'm sure it's flawed, but it gets the point across >in a new way... thanks. But in the reality we live in, all objects DO come in combinations of A-Z. So it has to be a theorem or an axiom. And if it's a theorem, from what axioms is it derived?
>> Your problem is that you consistently confuse data and reality. I may well be confused. But that's because I'm actually trying to understand the link between the two. After all, isn't that the subject of this thread? And if there IS no link, what the hell's the point of studying data, since it is no use to us here in reality, anyway :-)
>> Of course, this all doesn't mean that tuples are the BEST way to >> represent data, or even that ALL data can be represented by tuples. [quoted text clipped - 7 lines] > >Yes, good point - find us that black swan. I'd suggest going to visit the Serpentine in Hyde Park :-) You'll find plenty there.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony - 10 Jun 2004 10:25 GMT > >> Where did you get that axiom from that "data comes in tuples"? Codd's > >> rule #1 says that all data in the database is to be REPRESENTED in [quoted text clipped - 7 lines] > >Ah, an excellent analogy. I'm sure it's flawed, but it gets the point across > >in a new way... thanks. (Yes, I fear it is flawed too!)
> But in the reality we live in, all objects DO come in combinations of > A-Z. So it has to be a theorem or an axiom. And if it's a theorem, from > what axioms is it derived? IT ISN'T AN AXIOM OR A THEOREM!!!!!!!!!!! That's my point! I could today to invent a new way of representing objects, using only the 12 letters A to L, or using shapes and colours, whatever. It would surely work, but it would be neither an axiom ("it is self-evidently true that all real-world objects come in combinations of the letters A to L") nor a theorem. It would just be a method for representing real-world objects.
Dawn M. Wolthuis - 10 Jun 2004 01:17 GMT > >> >> This theory will then be the equivalent of Kepler and Newton > >discovering [quoted text clipped - 63 lines] > opposite - you can only prove the theorems are true BECAUSE you have > ASSUMED the axioms are true. Minor point, but another way to say it is that theorms are true with respect to the axioms.
> LOOK at the subject of this thread again. It is an AXIOM of relational > theory that data comes in tuples. Show me that that's true! And because > it's an axiom, mathematics itself tells you that logic CAN not give you > an answer! Excellent, excellent, point. I would love to hear if there is any disagreement on this point. If not, then perhaps we can work this into the glossary somehow related to "relational theory" or "axioms". Cheers! --dawn
> Cheers, > Wol mAsterdam - 10 Jun 2004 01:43 GMT >>LOOK at the subject of this thread again. It is an AXIOM of relational >>theory that data comes in tuples. Show me that that's true! And because [quoted text clipped - 4 lines] > disagreement on this point. If not, then perhaps we can work this into the > glossary somehow related to "relational theory" or "axioms". The "information principle" would qualify as an axiom, I suspect - but I am not well-versed in this math/logic area (I did read some - just never discussed it) - so somebody else will have to make a clean, copy & pastable piece of proze for inclusion.
It surely fits the 'lengthy misunderstandings' criterion :-)
Eric Kaun - 04 Jun 2004 15:46 GMT > >Which axioms don't match? I wasn't really aware there were axioms per se. > [quoted text clipped - 13 lines] > true". In relational theory, I would guess that at least one axiom could > be phrased as "data comes in tuples". That's hardly an axiom that I would recognize, since while "tuple" is defined in terms of other more basic terms (axioms?), "data" is hardly well-defined. And what does "comes in" mean?
I believe the axioms of set theory and predicate calculus apply (those in set theory limited somewhat, to sets of tuples perhaps), but don't claim to know formally what those are.
> So, if you don't have experiments to show that real-world data ALSO > comes in tuples (or a close approximation thereof), then you can't > conclude that a relational database is a good place to store real-world > data. Sure you can; evidence <> proof. The nice work logicians and mathematicians have done with predicate calculus over the years, while perhaps not corresponding to "the real world" (tm, MTV Networks), gives us nice machinery with which to manipulate... well, data. What, precisely, would allow you to conclude that a <datamodel> database is a "good place" to store real-world data?
> Sorry for ignoring the rest of your post, but this is ABSOLUTELY > FUNDAMENTAL!!! Perhaps, but I still don't think "data comes in tuples" is anything like an axiom. I could certainly be wrong.
- erk
Anthony W. Youngman - 07 Jun 2004 23:47 GMT >> So, if you don't have experiments to show that real-world data ALSO >> comes in tuples (or a close approximation thereof), then you can't [quoted text clipped - 7 lines] >allow you to conclude that a <datamodel> database is a "good place" to store >real-world data? Yup. Evidence does not equal proof. But that was not what I was getting at. Note my careful use of the phrase "or a close approximation thereof"
:-) If "real world" data is not a close approximation of "relational data", then it is reasonable to conclude that a relational database is not a good place to put it ... :-) And if the two are a close approximation, then a relational database may not be the *best* place, but it has to be a *good* place.
Don't forget - I'm a scientist :-) If the stats are 95% confident, that's not "proof", but it's "good enough".
>> Sorry for ignoring the rest of your post, but this is ABSOLUTELY >> FUNDAMENTAL!!! > >Perhaps, but I still don't think "data comes in tuples" is anything like an >axiom. I could certainly be wrong. Read C&D's first rule! "Data comes in rows" - which is as far as I can make out, a synonym for "data comes in tuples". I'm sure a relational guru will disagree, but I can't see the difference ...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 09 Jun 2004 16:45 GMT > >Perhaps, but I still don't think "data comes in tuples" is anything like an > >axiom. I could certainly be wrong. > > Read C&D's first rule! "Data comes in rows" - which is as far as I can > make out, a synonym for "data comes in tuples". I'm sure a relational > guru will disagree, but I can't see the difference ... And as stated elsewhere, those aren't axioms anyway... he used the word "representation", and the context fully suggests that he's not correlating it with the real world.
Anthony W. Youngman - 10 Jun 2004 01:51 GMT >> >Perhaps, but I still don't think "data comes in tuples" is anything like >an [quoted text clipped - 7 lines] >"representation", and the context fully suggests that he's not correlating >it with the real world. I know. After writing that I thought rather more about what C&D's twelve rules actually are. And that they don't seem to contain any axioms at all.
Which leads to the conclusion that relational theory is axiom-free. Which means that it cannot be a valid model. Which means that its application to the real world has no basis in anything whatsoever.
Okay, I'm sure that the mathematicians who've built on it have fleshed out the fundamentals somewhat, but it certainly means that if your sole criteria for defining a "relational database" is that "it complies with C&D's 12 rules", then such a database has no grounding in formal logic whatsoever.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 10 Jun 2004 21:19 GMT > >And as stated elsewhere, those aren't axioms anyway... he used the word > >"representation", and the context fully suggests that he's not correlating [quoted text clipped - 6 lines] > Which leads to the conclusion that relational theory is axiom-free. > Which means that it cannot be a valid model. Some possibilities (I'm running too short on time to explore them): - The axioms may be simply implicit in his rules - What are MV's axioms? If it has them, then we could map at least some of them to relational (since there are commonalities)
> Which means that its > application to the real world has no basis in anything whatsoever. Maybe, but again, what sort of data model would have axioms? I'm not sure this is possible... and if it is, again, relational would somewhat-similar ones. Surely there's at least a partial homomorphism between data models?
> Okay, I'm sure that the mathematicians who've built on it have fleshed > out the fundamentals somewhat, but it certainly means that if your sole > criteria for defining a "relational database" is that "it complies with > C&D's 12 rules", then such a database has no grounding in formal logic > whatsoever. Mathematics requires axioms - does logic? I thought it was purely symbolic manipulation, which is defined for relational.
- erk
Bill H - 11 Jun 2004 00:09 GMT ...Crossposted from comp.databases.theory...
An interesting question. Can someone proffer some suggestions?
>"Eric Kaun" <ekaun@yahoo.com> wrote in message news:An3yc.2600 > [quoted text clipped - 14 lines] > > erk Kevin Powick - 11 Jun 2004 03:23 GMT > ...Crossposted from comp.databases.theory... > > An interesting question. And the answer will change my life.. how?
I want move to theory... Everything works there.
<flame shields up> <set thread to ignore>
 Signature Kevin Powick
Laconic2 - 11 Jun 2004 12:15 GMT > And the answer will change my life.. how? "I come that you may know the truth. And the truth will make you free."
Kevin Powick - 11 Jun 2004 15:48 GMT > > And the answer will change my life.. how? > > "I come that you may know the truth. And the truth will make you free." Thanks Neo ;-)
 Signature Kevin Powick
Anthony W. Youngman - 14 Jun 2004 23:24 GMT >> >And as stated elsewhere, those aren't axioms anyway... he used the word >> >"representation", and the context fully suggests that he's not [quoted text clipped - 12 lines] >- What are MV's axioms? If it has them, then we could map at least some of >them to relational (since there are commonalities) If they're implicit, then they need to be made explicit (hence my comment about mathematicians "fleshing out" the theory).
>> Which means that its >> application to the real world has no basis in anything whatsoever. > >Maybe, but again, what sort of data model would have axioms? I'm not sure >this is possible... and if it is, again, relational would somewhat-similar >ones. Surely there's at least a partial homomorphism between data models? The generic always trumps the specific. I suspect Pick axioms are very similar to relational. But just as C&D's first rule says that data comes in 2-dimensional tables (or arrays), I've defined "Pick's first rule" that says data comes in n-dimensional arrays. So relational is the specific subset of Pick where n=2. :-)
>> Okay, I'm sure that the mathematicians who've built on it have fleshed >> out the fundamentals somewhat, but it certainly means that if your sole [quoted text clipped - 4 lines] >Mathematics requires axioms - does logic? I thought it was purely symbolic >manipulation, which is defined for relational. Logic is used to manipulate axioms to give theorems. The result is a model.
So no, if you're being pedantic, maybe logic doesn't require axioms. But in the same way as an axe doesn't *require* wood. Just as an axe with nothing to chop is useless, so is logic without axioms to manipulate.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Dawn M. Wolthuis - 15 Jun 2004 02:29 GMT <snip>
> >Mathematics requires axioms - does logic? I thought it was purely symbolic > >manipulation, which is defined for relational. [quoted text clipped - 5 lines] > in the same way as an axe doesn't *require* wood. Just as an axe with > nothing to chop is useless, so is logic without axioms to manipulate. Logic as a branch of mathematics most definitely requires axioms. --dawn
Eric Kaun - 16 Jun 2004 19:57 GMT > Logic as a branch of mathematics most definitely requires axioms. --dawn I'll posit that it doesn't; I would regard mathematics as a derivation of (wrong term, I know) logic. Logic is a system or metasystem of symbolic manipulation, and can be applied to many different "maths".
As I think of it, I'd say math has axioms and logic doesn't, and derives its power for precisely that reason; it can be applied to different sets of axioms. At least that's how I've always thought of it... I could be dead wrong, and would expect this to be contested.
- erk
Paul - 18 Jun 2004 22:55 GMT >>Logic as a branch of mathematics most definitely requires axioms. --dawn > [quoted text clipped - 6 lines] > axioms. At least that's how I've always thought of it... I could be dead > wrong, and would expect this to be contested. This is from http://en.wikipedia.org/wiki/First-order_predicate_calculus --- Like any logical theory, first-order calculus consists of * a specification of how to construct syntactically correct statements (the well-formed formulas) * a set of axioms, each axiom being a well-formed formula itself * a set of inference rules which allow one to prove theorems from axioms or earlier proven theorems.
There are two types of axioms: the logical axioms which embody the general truths about proper reasoning involving quantified statements, and the axioms describing the subject matter at hand, for instance axioms describing sets in set theory or axioms describing numbers in arithmetic. ---
Now ultimately set theory is used for the foundation of everything in mathematics. So you use set theory to build your logical theory.
You might ask how do you specify your set theory without using logic, otherwise you've got a chicken and egg situation. I'm not too sure what the answer is there, I think there is some kind of hand-waving appeal to "naive logic". Or maybe it is more rigorous, I don't know.
Very philosophically interesting these questions of foundation though.
Paul.
Alfredo Novoa - 30 May 2004 12:32 GMT >> You just don't get it, do you Wol? No matter how many times people >> try to explain it to you it just doesn't sink in. The relational [quoted text clipped - 4 lines] >and I just responded to Alfredo who said that data were facts and I thought >for sure the idea was that these facts corresponded to reality. Not necessarily. They can be false or about a fantasy world.
>If there is a tight mathematical definition of "data" within relational >theory, then that's great, but it is not the commonly used definition, I >suspect. The common use of the term is sloppy like with many other terms.
> It is in the leap from doing relational theory to thinking that >the application of such theory is the best approach to storing/retrieving >propositions using computers by a business -- that is where there is a >rather significant leap of faith. You are wrong. It was mathematically proven that it is better than the graph based approaches.
>That connection is NOT science It is maths, but the word science has many meanings.
>, although >we could conceivably set up some experiments to collect a bit more >information about whether it is better than some other approach. We don't need the experiments and it was proved in the 70's that The Relational Model is better than the other approaches.
It was explained zillions of times in this group.
> I'm not >opposed to faith I am completely opposed to faith and other forms of irrationalism. The Relational Model is maths not irrational faith.
Regards Alfredo
mAsterdam - 30 May 2004 16:06 GMT [snip]
>>It is in the leap from doing relational theory to thinking that >>the application of such theory is the best approach to storing/retrieving [quoted text clipped - 3 lines] > You are wrong. It was mathematically proven > that it is better than the graph based approaches. This is a very strange statement. It gets stated over and over again, not only in this newsgroup. Outside this newsgroup I am supposed to take it for granted and not take time to think about it.
But here I can ask the people in support of this statement: - Better at what? - What exacltly was proven? - Could you please give a reference?
I happen to like the relational model for thinking about data in a detailed fashion, checking and double checking the database and the support it gives to the whole of the system it is part of.
I happen to like graph based approaches for the overall picture and to elicit design ideas from non-IT professionals.
But that is both just preference and personal experience, not proof.
[snip]
>>I'm not opposed to faith > > I am completely opposed to faith and other forms of irrationalism. The > Relational Model is maths not irrational faith. Rationalism is as irrational(/rational) as any oher faith. I see reason(ratio) as a tool, even more so than language (some posts ago somebody claimed languages are tools).
Alfredo Novoa - 31 May 2004 00:06 GMT >> You are wrong. It was mathematically proven >> that it is better than the graph based approaches. > >This is a very strange statement. >It gets stated over and over again, >not only in this newsgroup. This is a very basic knowledge taught in every serious database introductory course.
>But here I can ask the people in support of this statement: > - Better at what? Simplicity
> - What exacltly was proven? That the Relational Model is superior.
> - Could you please give a reference? Codd, E. F. and C. J. Date. "Interactive Support for Nonprogrammers: The Relational and Network Approaches." IBM Research Report RJ1400 (June 6th, 1974). Republished in Randall J. Rustin (ed.), Proc. ACM SIGMOD Workshop on Data Description, Access, and Control, Vol. II, Ann Arbor, Michigan, May 1974. Also in C. J. Date, Relational Database: Selected Writings. Reading, Mass.: Addison-Wesley, 1986.
http://www.intelligententerprise.com/db_area/archives/1999/992206/online1.jhtml
http://www.intelligententerprise.com/db_area/archives/1999/991105/online2.jhtml
>I happen to like graph based approaches >for the overall picture and to elicit design >ideas from non-IT professionals. We are talking about very different things. I am not talking about drawings, I am talking about the network and hierarchical approaches.
>> I am completely opposed to faith and other forms of irrationalism. The >> Relational Model is maths not irrational faith. > >Rationalism is as irrational(/rational) as any oher faith. What a nonsense!
Regards Alfredo
Bill H - 31 May 2004 10:46 GMT Ahhh. Descartes Corollary: I think therefore I'm right. :-)
Bill
"Alfredo Novoa" <alfredo@ncs.es> wrote in message
[snipped]
> > - What exacltly was proven? > > That the Relational Model is superior. Alfredo Novoa - 31 May 2004 11:44 GMT >Ahhh. Descartes Corollary: I think therefore I'm right. :-) This is the "logic" of the pickies and the likes.
Regards Alfredo
Laconic2 - 31 May 2004 14:09 GMT > Ahhh. Descartes Corollary: I think therefore I'm right. :-) It's all nonsense. "better" and "superior" are value judgements. They are not the subject matter of mathematics.
This is cant.
Alfredo Novoa - 31 May 2004 15:55 GMT >It's all nonsense. "better" and "superior" are value judgements. Wrong. The quality can be comparable.
> They are >not the subject matter of mathematics. It is the subject matter of Computing Science.
mAsterdam - 31 May 2004 11:47 GMT >>>... It was mathematically proven >>>that it is better than the graph [quoted text clipped - 4 lines] > This is a very basic knowledge taught in every > serious database introductory course. The statement is made in just about every database course, without demonstrating it - that's exactly what I think is strange about it. If it's proven why not give or at least reference the proof? The way it is put, it is propaganda, not basic knowledge.
>> ... - Better at what? > Simplicity This reduces the statement to "It was mathematically proven that it is simpler than the graph based approaches." and leaves the judgement to the reader/student. An improvement, but it still leaves the questions unanswered: simpler at what? etc.
>> - What exacltly was proven? > That the Relational Model is superior. [quoted text clipped - 3 lines] > Codd, E. F. and C. J. Date. "Interactive Support for Nonprogrammers: > The Relational and Network Approaches." ... 1974 ... Part of it was quoted at your second url, see below. I could only find the abstract on-line.
abstract (from http://portal.acm.org/citation.cfm?id=811529&dl=ACM&coll=portal): <quote>
The objectives and strategies of the relational and network approaches are compared. The status of support for non-programming users is examined. General purpose support for such users entails provision of an augmented relationally complete retrieval capability without branching, explicit iteration, or cursors. It is clear how this capability can be realized with the relational approach—whether with a formal or informal language interface. It is not at all clear how the network approach can reach this goal, so long as the principal schema includes owner-coupled sets “bearing information essentially”. A relational discipline is suggested as a way out for DBTG users.
</quote>
Appearantly the information principle is dicussed avant la lettre there. For the people who do not know that cornerstone:
Chris Date in "EDGAR F. CODD 08/23/1923 – 04/18/2003 A TRIBUTE" at http://www.dbdebunk.com/page/page/621965.htm : <quote>
The concept of essentiality, introduced by Ted in this debate, is a great aid to clear thinking in discussions regarding the nature of data and DBMSs. In particular, The Information Principle (which I heard Ted refer to on occasion as the fundamental principle underlying the relational model) relies on it, albeit not very explicitly:
The entire information content of a relational database is represented in one and only one way: namely, as attribute values within tuples within relations.
</quote>
> http://www.intelligententerprise.com/db_area/archives/1999/992206/online1.jhtml While this does give some insights in the history of the use of 'data model' and related terms (for the people here who showed interest in that topic), it doesn't at all claim to mathematically prove anything.
> http://www.intelligententerprise.com/db_area/archives/1999/991105/online2.jhtml Here it gets very interesting. From the overview: <quote> Of course, the battle between relations and networks is ancient history now. (The good guys won.) This fact notwithstanding, Codd's paper -- even though it was written over 25 years ago -- is still worth reading today as a beautiful example of clear thinking. Indeed, it's quite remarkable to see how, on a topic where muddled thinking was the norm at the time, Codd was able to do such a good job of cutting to the chase and focusing on the real underlying issues. Let me elaborate:
* First of all, Codd realized that to compare the very concrete CODASYL specifications and the much more abstract relational model would be an apples-and-oranges comparison and would involve numerous distracting irrelevancies.
* Hence, it would be necessary first to define an abstract "network model." The comparison could then be done on a level playing field, as it were, in a fair and sensible manner.
* Codd therefore proceeded to define an abstraction of the CODASYL specifications that might reasonably be regarded as such a model. (And then, of course, he went on to compare that abstraction with the relational model.)
</quote>
Relevancy to the 'mathematical proof' statement under discussion: a fair comparison (a precondition for the claimed mathematical proof) would require specification on the same (or at least similar) levels of abstraction.
I don't know if anybody after this has provided another formalization of the network model, so AFAIK this comparison stands.
But what exactly is compared? Relational model versus network model for interactive support(1) for nonprogrammers. To dissmiss all graph based approaches for all purposes based on it is overstretching it, IMO, jumping to conclusions.
>>I happen to like graph based approaches >>for the overall picture and to elicit design >>ideas from non-IT professionals. > > We are talking about very different things. I am not talking about > drawings, I am talking about the network and hierarchical approaches. Equating network approaches to graph based approaches, for all purposes? The network approach is Codd's formalization of the CODASYL specification for the purpose of interactive support(1) for nonprogrammers, in the documents you referenced. (Or should I say pointed me to :-)
To determine wether it possible generalise Codd's comparisons to relational approach vs. graph based approach, some more levelling is needed. Generalising the stated purpose is not trivial, either.
>>>I am completely opposed to faith and other forms of irrationalism. The >>>Relational Model is maths not irrational faith. >> >>Rationalism is as irrational(/rational) as any oher faith. > > What a nonsense! Very faithful ;-)
I suspect we will not be able to agree on this one.
However, maybe we can try to agree on the 'mathematical proof' issue, by clearly stating what exactly was proven.
Anyway, thank you for the nice read.
==== Footnote: (1) : interactive in textmode is implicitly meant - but that's another can of worms.
Dawn M. Wolthuis - 31 May 2004 15:02 GMT > >>>... It was mathematically proven > >>>that it is better than the graph [quoted text clipped - 11 lines] > The way it is put, it is propaganda, not basic > knowledge. Exactly. I've read a lot of what people have suggested is a mathematical proof that relational database theory is good for business. While the mathematical theory itself is fine, the application of it to databases can have no mathematical proof of its usefulness (math does not prove its usefulness!) and seems to also have no scientific proof of its usefulness either. There are exceptions to this, such as logically proving/showing that if you handle functional dependencies one way or another, it affects what changes need to be made when requirements change. So, I use those techniques. There are tradeoffs. You design one way with agility in mind and mitigate the risks.
> >> ... - Better at what? > > Simplicity [quoted text clipped - 5 lines] > but it still leaves the questions unanswered: > simpler at what? etc. There is surely some mathematics that is simpler when putting data into what-once-was-the-def-of-1NF (no repeating groups). But it is also simpler for the logic in retrieving data to have no relation-valued-attributes and yet they have now been tossed into the mix. So, what's simpler? The old version of 1NF or the new version? Is simpler always better? Applying the simplest mathematics to complex problems isn't our goal here.
> >> - What exacltly was proven? > > That the Relational Model is superior. [quoted text clipped - 17 lines] > explicit iteration, or cursors. It is clear how this capability can be > realized with the relational approach?whether with a formal or informal
> language interface. It is not at all clear how the network approach can > reach this goal, so long as the principal schema includes owner-coupled [quoted text clipped - 22 lines] > > </quote> http://www.intelligententerprise.com/db_area/archives/1999/992206/online1.jhtml
> While this does give some insights in the > history of the use of 'data model' and > related terms (for the people here who > showed interest in that topic), it doesn't > at all claim to mathematically prove anything. http://www.intelligententerprise.com/db_area/archives/1999/991105/online2.jhtml
> Here it gets very interesting. From the overview: > <quote> [quoted text clipped - 11 lines] > model would be an apples-and-oranges comparison and would > involve numerous distracting irrelevancies. Let me guess -- so instead of taking the relational model to an implementation and playing on the IDMS playing field (which would only provide data on once instance of each), he brought CODASYL onto his ball field and then beat it, right? Sorry, I'm getting ahead of you, excited to hear the story unfold.
> * Hence, it would be necessary first to define an abstract > "network model." The comparison could then be done on a > level playing field, as it were, in a fair and sensible > manner. laughing
> * Codd therefore proceeded to define an abstraction of > the CODASYL specifications that might reasonably be [quoted text clipped - 16 lines] > approaches for all purposes based on it is overstretching it, IMO, > jumping to conclusions. Absolutely!
> >>I happen to like graph based approaches > >>for the overall picture and to elicit design > >>ideas from non-IT professionals. > > > > We are talking about very different things. I am not talking about > > drawings, I am talking about the network and hierarchical approaches. As I understand it, the purpose of the relational model is to have a way to "view" the structure of the data. It isn't intended to be the way that it is implemented. So, if users (e.g. me) want to view the data in a graph, then that's seems like a good model to use, right?
> Equating network approaches to graph based approaches, for all > purposes? The network approach is Codd's formalization of the CODASYL [quoted text clipped - 15 lines] > > Very faithful ;-) more laughter from the heretic in this corner
> I suspect we will not be able to agree on this one. > > However, maybe we can try to agree on the > 'mathematical proof' issue, by clearly > stating what exactly was proven. Yes, I think you started to get at it. It sounds like it has been proven that a mathematical relational model is simpler than a corresponding network model so it would be good to get this nailed down in precise terms (and I haven't read all suggested readings, but will look at them soon). Although I do believe this has been proven, I would still like a clear, crisp theorm/proof of "the proof" for relational theory.
Has there been any proof, ever, of the use of the relational model providing for a better realized solution for anything than any other model? It is in the application of the model that I think we lack evidence.
> Anyway, thank you for the nice read. quite entertaining! --dawn
Alfredo Novoa - 31 May 2004 17:14 GMT >Exactly. I've read a lot of what people have suggested is a mathematical >proof that relational database theory is good for business. It is good for data management.
> While the >mathematical theory itself is fine, the application of it to databases can >have no mathematical proof of its usefulness (math does not prove its >usefulness!) Of course it can and it did. You can do the same as before with a fraction of the instructions and optimization can be done by the machine.
>Let me guess -- so instead of taking the relational model to an >implementation and playing on the IDMS playing field (which would only >provide data on once instance of each), he brought CODASYL onto his ball >field and then beat it, right? To the field of formalism.
You try to do the same but your field is irrationalism and rough sophistry.
>Yes, I think you started to get at it. It sounds like it has been proven >that a mathematical relational model is simpler than a corresponding network >model It has been proved that it is simpler to manage data with the relational approach.
You are always trying to confuse playing sloppily with words and distorting things.
>Has there been any proof, ever, of the use of the relational model providing >for a better realized solution for anything than any other model? It is not a proof but there are plenty of systems rewritten using a pseudorelational approach which saved a lot of code.
But it would be a waste of time to show them to you.
Regards Alfredo
Eric Kaun - 01 Jun 2004 15:12 GMT > > >>>... It was mathematically proven > > >>>that it is better than the graph [quoted text clipped - 14 lines] > Exactly. I've read a lot of what people have suggested is a mathematical > proof that relational database theory is good for business. I don't think such a thing is even possible. However, it seems obvious to me that in this industry we have the capacity to stay very close to theory, given that computers are very unforgiving, and therefore our programs have to achieve some degree of rigor with respect both to the language at hand and to the business conditions we're trying to automate. So I certainly think that theory is worth a good first look, given that it impacts us in a much more direct way than many other disciplines.
Now look at the relational theory vs. some of the emerging semantics of XPath and XQuery (from Philip Wadler, Don Chamberlin, and many others). It's extraordinarily complex in comparison, and furthermore offers nothing like normalization rules (at least that I've seen - Jan Hidders and others may have seen something like that). Therefore good design criteria are less formal. A more complex theory, with weaker criteria for good design, is (all other things being equal, which they never are) to be preferred over something with a simpler theory and stronger criteria for good design. At least that's my theory. ;-)
> While the > mathematical theory itself is fine, the application of it to databases can > have no mathematical proof of its usefulness (math does not prove its > usefulness!) and seems to also have no scientific proof of its usefulness > either. One could argue the same for XML, Pick, etc. (for which it would still be useful to see a theory), and we're back to the "...in my experience... bang for the buck..." argument. I'm not criticizing that argument, but "in my experience" I have seen many more problems with Pick-like designs, problems that have a direct impact on agility - on my ability to evolve a system toward the expanding and changing needs of a business.
> There are exceptions to this, such as logically proving/showing > that if you handle functional dependencies one way or another, it affects > what changes need to be made when requirements change. So, I use those > techniques. There are tradeoffs. You design one way with agility in mind > and mitigate the risks. Agreed - in the final analysis it seems somewhat like a typing exercise to me. Whereof one cannot speak, thereof one must be silent... if the business doesn't know of any rules or structure surrounding Field X, but know it's always been captured, then perhaps something that's just an untyped list is best. But you have to ask the question, and have it answered, even if the answer is "I dunno."
> > >> ... - Better at what? > > > Simplicity [quoted text clipped - 10 lines] > for the logic in retrieving data to have no relation-valued-attributes and > yet they have now been tossed into the mix. So, what's simpler? Even those who discuss relation-valued attributes (RVAs) (Codd, Date, Pascal, etc.) acknowledge that they're more complex, and argue against using them. However, in some cases they're simply the best model. Take example: a system catalog, family relationships, even prime factors (which introduces to me the interesting notion of "relation-valued functions" especially infinite ones). In all these cases, eliminating the RVA introduces keys with no real meaning, and adds some complexity. RVAs introduce a different type of complexity, perhaps - I'm not going to attempt to characterize the two here.
> The old > version of 1NF or the new version? Is simpler always better? Applying the > simplest mathematics to complex problems isn't our goal here. I disagree completely. If you can successfully apply simple math to complex problems, you're way ahead of the game. Part of the problem these days is jumping on one or more complex technologies because in some vague, unanalyzed way they "seem like" the structure of the problem. Mastery of the basic tools is a useful prerequisite to understanding when and how to apply the complex ones.
> > * First of all, Codd realized that to compare the very concrete > > CODASYL specifications and the much more abstract relational [quoted text clipped - 6 lines] > field and then beat it, right? Sorry, I'm getting ahead of you, excited to > hear the story unfold. There isn't a home-court advantage here. The court had yet to be built. Screaming "that's not fair" at an attempt to compare several models, without implementations of both at hand, is disingenuous. The point of a model, which we should all understand, is to characterize X before going to all the work of implementing X (which requires many other concerns and distractions). Even businesses understand that.
> > * Hence, it would be necessary first to define an abstract > > "network model." The comparison could then be done on a > > level playing field, as it were, in a fair and sensible > > manner. > > laughing In what way is the "playing field" unfair?
> > >>I happen to like graph based approaches > > >>for the overall picture and to elicit design > > >>ideas from non-IT professionals. It's fine as a starting point, but the approach quickly breaks down as you get into details. I have no proof, simply my experience doing JAD sessions and user requirements...
> As I understand it, the purpose of the relational model is to have a way to > "view" the structure of the data. It's a predicate-based model of data - not sure what you mean by "view". It's to define the structure of data, to retrieve relation values based on other values, and to update relation variables with new values.
> It isn't intended to be the way that it > is implemented. So, if users (e.g. me) want to view the data in a graph, > then that's seems like a good model to use, right? No. There is a logical and practical difference between what we do and what users see. The above is like saying that since the users see a graphic, that Adobe Photoshop is the proper GUI modeling tool.
> Has there been any proof, ever, of the use of the relational model providing > for a better realized solution for anything than any other model? It is in > the application of the model that I think we lack evidence. That's a smooth bit of useless rhetoric - the use of the word "ever" in there is especially galling. I'd like to see what "proof" there is of such a thing for any technology, language, model, etc. You're not asking for evidence about the application of the model - you're looking for something like evidence on the statistically-support success of solutions using X compared with solutions using Y, where both X and Y aren't implementations, or even designs, but models on which those designs and implementations can be based. It isn't there.
Let me ask you this: Has there been any proof, ever, of the use of object-orientation providing for a better realized solution for anything than any other model? Has there been any proof, ever, of the use of three-valued logic providing for a better realized solution for anything than any other model? etc. etc... I was going to babble on with more examples but my fingers are tired.
No, there's no proof. Let's move on to a useful discussion...
- Eric
Alfredo Novoa - 31 May 2004 15:55 GMT >> Simplicity > [quoted text clipped - 4 lines] >but it still leaves the questions unanswered: >simpler at what? etc. At number of instructions. Simpler in orders of magnitude, and susceptible of many automatic optimizations.
The superiority is very striking and overwhelming. That's why teachers don't spend a lot of time with this topic.
>While this does give some insights in the >history of the use of 'data model' and >related terms (for the people here who >showed interest in that topic), it doesn't >at all claim to mathematically prove anything. Because you are quoting the wrong parts.
What about this?:
<quote>
CODASYL relational GO TO 15 0 PERFORM UNTIL 1 0 currency indicators 10 0 IF 12 0 FIND 9 0 GET 4 1 STORE / PUT 2 1 MODIFY 1 0 MOVE CURRENCY 4 0 other MOVEs 9 1 SUPPRESS CURRENCY 4 0 total statements > 60 3
The relative simplicity of the relational solution is very striking. Note: In fact, the relational solution could have been reduced to just a single statement, a PUT; the GET and MOVE aren't strictly necessary. What's more (although Codd doesn't mention the fact), the CODASYL "solution" -- which was taken from another source, by the way, not created by Codd himself -- included at least two bugs!
</quote>
>But what exactly is compared? Relational model versus network model for >interactive support(1) for nonprogrammers. To dissmiss all graph based >approaches for all purposes based on it is overstretching it, IMO, >jumping to conclusions. There is only one graph based approach. The hierarchical approach is only a specialization of the network approach.
>Equating network approaches to graph based approaches, for all >purposes? If you want to formalize the network and hierarchical approaches the only way is graph theory.
> The network approach is Codd's formalization of the CODASYL >specification for the purpose of interactive support(1) for >nonprogrammers, in the documents you referenced. >(Or should I say pointed me to :-) Yes.
>To determine wether it possible generalise Codd's comparisons >to relational approach vs. graph based approach, some >more levelling is needed. No, CODASYL has all the essential features of the network approach.
>I suspect we will not be able to agree on this one. Surely. Rationalism is the contrary of faith. Faith means to believe without reason.
>However, maybe we can try to agree on the >'mathematical proof' issue, by clearly >stating what exactly was proven. I recoment you to read Relational Database: Selected Writings.
Regards Alfredo
Anthony W. Youngman - 01 Jun 2004 23:43 GMT >>>>... It was mathematically proven that it is better than the graph >>>>based approaches. [quoted text clipped - 19 lines] >but it still leaves the questions unanswered: >simpler at what? etc. And Occam's Razor (the Einstein version iirc) says "make things as simple as possible, BUT NO SIMPLER".
For example, Newtonian Mechanics is a damn sight simpler than General Relativity. But precisely *because* it is simpler, it is also a hell of a lot more dangerous, because it is *too* simple, and more prone to screw-ups.
Simplicity - if carried too far - is lethal.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
mAsterdam - 02 Jun 2004 00:05 GMT mAsterdam writes
>> This reduces the statement to >> "It was mathematically proven that it is simpler [quoted text clipped - 5 lines] > And Occam's Razor (the Einstein version iirc) says "make things as > simple as possible, BUT NO SIMPLER". The examples given in Alfredo's links did a good job at shaving CODASYL's beard by providing the same and better results (for the "no simpler" part) from a much simpler construct. Did you read them?
Anthony W. Youngman - 04 Jun 2004 00:50 GMT >mAsterdam writes >>> This reduces the statement to [quoted text clipped - 10 lines] >results (for the "no simpler" part) from a much >simpler construct. Did you read them? Probably. And I probably didn't understand them.
All I'm trying to say is that simplicity as a goal in itself is a delusion.
And just because relational may be simpler than codasyl doesn't mean that it's a good thing. We have a real-world problem here ... look at the following mapping ...
real world <=> business analysis <=> database
What matters is the complexity (or simplicity) of the WHOLE SYSTEM. There's no point in simplifying the database, if the necessary increase in complexity of the business analysis totally negates it. By focussing on minimising the complexity of one part of the system, we make the system as a whole more complex. That will explain why Dawn's experience is that MV is more productive than relational - the simplicity of the relational database over MV simply pushes all the complexity into the business analysis side, turning that into a total nightmare.
Which is simpler - to model a single real world entity as a single database "table" as MV does (we can model an invoice in a single FILE), or as five or six relational tables? And don't forget - our FILE (should be) normalised, so we can access it just as if it were five or six relational tables ...
Yep. The database itself is more complex. But the business analysis is MUCH simpler, such that the total system complexity is a lot less.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
mAsterdam - 04 Jun 2004 02:20 GMT > mAsterdam writes >> mAsterdam writes [quoted text clipped - 13 lines] > > Probably. And I probably didn't understand them. You PROBABLY read them? Schroedingers cat would go from right to left. I'm positively sure about that, I think. Maybe.
> All I'm trying to say is that simplicity as a goal in itself is a delusion. As is clarity. As is having a goal for that matter.
> And just because relational may be simpler than codasyl doesn't mean > that it's a good thing. We have a real-world problem here ... look at > the following mapping ... > > real world <=> business analysis <=> database <=> defined as 'having some mutual metaphorical resemblances to'?
> What matters is the complexity (or simplicity) of the WHOLE SYSTEM. > There's no point in simplifying the database, if the necessary increase > in complexity of the business analysis totally negates it. Very true. Often made mistaek.
> By focussing > on minimising the complexity of one part of the system, we make the > system as a whole more complex. That will explain why Dawn's experience > is that MV is more productive than relational - the simplicity of the > relational database over MV simply pushes all the complexity into the > business analysis side, turning that into a total nightmare. I'll state my intuition (not backed up by experience) about not taking the time to analyse data: postponing the basic issues will bring volatile quick wins, pushing depth investment (cost) of reflection and the real benefits of data assests into the future. So, if and only if your survival depends on quick wins, go for it.
> Which is simpler - to model a single real world entity as a single > database "table" as MV does (we can model an invoice in a single FILE), [quoted text clipped - 4 lines] > Yep. The database itself is more complex. But the business analysis is > MUCH simpler, such that the total system complexity is a lot less. Did you *read* what I replied to your post about mapping concepts from different contexts a while ago? Probably. Maybe. Later.
Anthony W. Youngman - 07 Jun 2004 23:50 GMT >> By focussing on minimising the complexity of one part of the system, >>we make the system as a whole more complex. That will explain why [quoted text clipped - 10 lines] >into the future. So, if and only if your survival >depends on quick wins, go for it. Except that Dawn's experience (and most MV consultants, too) is that the cost of maintaining old MV databases is lower than that of maintaining relational ...
They're cheaper to write, they're cheaper to maintain, and they take a LOT longer to get decrepit ...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
mAsterdam - 08 Jun 2004 01:05 GMT > mAsterdam writes > [quoted text clipped - 19 lines] > They're cheaper to write, they're cheaper to maintain, and they take a > LOT longer to get decrepit ... As I understood your writings you claim to analyse your data before taking a well-informed decision to prefer MV implementation above a RDBMS implementation. How can my statement about quick wins trigger this response?
Anthony W. Youngman - 10 Jun 2004 02:01 GMT >>> I'll state my intuition (not backed up by experience) >>> about not taking the time to analyse data: [quoted text clipped - 13 lines] >prefer MV implementation above a RDBMS implementation. >How can my statement about quick wins trigger this response? Except you don't understand. I prefer to put normalised data into a MV database, on the basis of past experience that it is ALWAYS easier to understand the result.
Don't forget. If you've analysed your data properly, then the conversion of data from MV-form to RDBMS-form is trivial and easily done "on the fly" by any modern MV database.
So if I put my data into an MV database I can access it as if it were in an RDBMS. However, the converse is not true.
AND it's a hell of a lot simpler to understand the "real world <=> logical data" mapping in MV as opposed to relational - in MV it is almost invariably one real world object "instance of class noun" maps directly to one "RECORD in a FILE". In relational, typically one "instance of class noun" will map to many rows spread across multiple tables.
Experience says MV is simpler to understand. Maths says MV gives me the best of both worlds.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
mAsterdam - 10 Jun 2004 19:30 GMT > So if I put my data into an MV database I can access it as if it were in > an RDBMS. However, the converse is not true. It would be very interesting to know - in some detail - what kind of data gives difficulties in putting stuff from a RDBMS into a MV database. This maybe somewhat awkward in this newsgroup, because some will be just waiting to say: See? You *can* express proposition_set(x) in a RDBMS, and you *can't* in MV, therefore MV is better. More is not a priori better.
But I trust you can stand that reaction. Could you give some examples?
Bill H - 11 Jun 2004 00:40 GMT Notes embedded.
> "mAsterdam" <mAsterdam@vrijdag.org> wrote... > [quoted text clipped - 12 lines] > > But I trust you can stand that reaction. Could you give some examples? Here's an A/P invoice record. The view is vertical, not horizontal. The numbers on the left indicate the field location# in the physical string on disk. The unnumbered value is the record key.
340*VR3-2 001 480 002 13279 003 13210 004 74*578 005 LOAN PAYMENT, 3/2004 006 -297645 007 0 008 -297645 009 5170]3370]5170]3370 010 -101261]-196384]0]0 011 200404 012 200404 013 014 015 AUTO 016 017 74
The disk storage would look like:
^^340*VR3-2^480^13279^12114^74*578^LOAN PAYMENT, 3/2004^-297645^0^-297645^5170]3370]5170]3370^-101261]-196384]0]0^200404^2004 04^^^AUTO^^74
The contents of field#s 009 & 010 are the G/L acct#s assigned this invoice and the "associated" G/L amounts allocated to each G/L acct#. The last two "values" of field# 10 are zero (0), but could just as easily be nothing so the field could look like:
010 -101261]-196384]]
There are no data types in this record. Field#s 002 & 003 are dates where each date is the number of days past 31 Dec 1967 (kind of like unix's # of seconds past midnight on 01 January 1970. The values of field#s 006, 008, and 010 are monetary values with the decimal stripped, so the value 25 would indicate $.25 (or .25 of whatever denomination used).
To load a valid date into a Pick-like dbms (mvDbms) a conversion needs to be done similar to what is done for Unix. The same is true for money amounts. However, extraction is very easy so that:
::LIST APOPEN '340*VR3-2' INVDATE DUEDATE ACCTS AMTS Page 1 APOPEN
APOPEN.... INV-DATE DUE-DATE ACCT. ACCT/AMTS....
340*VR3-2 03-01-01 05-09-04 5170 1,012.61- 3370 1,963.84- 5170 0.00 3370 0.00
[405] 1 items listed out of 1 items.
or the relational way:
::LIST APOPEN '340*VR3-2' BY-EXP ACCTS INVDATE DUEDATE ACCTS AMTS Page 1 APOPEN
APOPEN.... INV-DATE DUE-DATE ACCT. ACCT/AMTS....
340*VR3-2 03-01-01 05-09-04 3370 1,963.84- 340*VR3-2 03-01-01 05-09-04 3370 0.00 340*VR3-2 03-01-01 05-09-04 5170 1,012.61- 340*VR3-2 03-01-01 05-09-04 5170 0.00
[405] 4 items listed out of 1 items.
(if this appears in a proportional font simply cut & paste to notepad)
I hope this answered your question. :-)
Bill
mAsterdam - 11 Jun 2004 07:14 GMT > Notes embedded. >>"mAsterdam" <mAsterdam@vrijdag.org> wrote... [quoted text clipped - 9 lines] >>will be just waiting to say: See? You *can* express proposition_set(x) >>in a RDBMS, and you *can't* in MV, therefore MV is better. Heh. Strange typo - sorry.
>>More is not a priori better. >> [quoted text clipped - 28 lines] > 3/2004^-297645^0^-297645^5170]3370]5170]3370^-101261]-196384]0]0^200404^2004 > 04^^^AUTO^^74 I can see the mapping between the 'storage' and the 'vertical view' representation. The FILE definition (if this is the correct term) would help to get the example clear, no?
> The contents of field#s 009 & 010 are the G/L acct#s assigned this invoice > and the "associated" G/L amounts allocated to each G/L acct#. The last two [quoted text clipped - 6 lines] > each date is the number of days past 31 Dec 1967 (kind of like unix's # of > seconds past midnight on 01 January 1970. So, a different time-offset than unix. Different internal date representation. No big deal, IMO.
> The values of field#s 006, 008, > and 010 are monetary values with the decimal stripped, so the value 25 would [quoted text clipped - 29 lines] > > [405] 4 items listed out of 1 items. Hm... no big deal either, or is it? Grouping / levelled duplicate suppression and it's the same, no?
> (if this appears in a proportional font simply cut & paste to notepad) > > I hope this answered your question. :-) Certainly on the detail level. But what is the problem? I don't see the RDBMS ==> MV problem - but I may be overlooking something. Anyway, thank you for your effort.
Anthony W. Youngman - 15 Jun 2004 00:29 GMT >> So if I put my data into an MV database I can access it as if it were >>in an RDBMS. However, the converse is not true. [quoted text clipped - 8 lines] > >But I trust you can stand that reaction. Could you give some examples? Actually, all you have to do to make RDBMS appear (superficially) to look like MV is to declare the appropriate views. This does, however, have the unfortunate side-effect of presenting your application with apparently redundant data. The app is also unaware of "if I change this, then that will change too" responses. Or "if I delete that, then the other will go with it".
However, what you can not do with RDBMS is predict system response :-) With MV, you can *prove* that it's damn near impossible to improve on it...
You also have difficulty guessing which tables represent which real-world object - while MV has no guarantees either, your chances of being correct "by accident" are much, much higher. Does an address table represent a company address, a billing address, a shipping address or what? While a relational table may make it clear in the name, MV makes it clear because it would be part of the company file, or the invoice file, or whatever.
I gather it is possible to hide the underlying tables such that an app can access them, but only through views that the dba wishes to permit. With MV, that extra "clutter" isn't there...
Basically, MV is so much simpler :-) the database organisation maps roughly one-to-one to real-world reality :-)
Actually, it's quite difficult to answer your question - we've taken so much from relational :-) But we've taken mostly in "the theory of good design", and not made any changes to the fundamental design of MV, just in how we use it. I think the most important difference is that thing about being able to predict real-world system response. Something relational theory actively avoids... and as I comment elsewhere, the fact that MV stores so much more information as metadata, not data, so it's actually available to the dbms to help it optimise.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
mAsterdam - 15 Jun 2004 02:20 GMT > mAsterdam writes >> [quoted text clipped - 15 lines] > have the unfortunate side-effect of presenting your application with > apparently redundant data. 'superficially' only? explain. 'unfortunate side-effect of presenting your application with apparently redundant data' what's so unfortunate in having redundancy in presenting stuff? This is getting to look more and more like sales-crap. I don't want to offend you, but please try to understand what I am asking instead of giving a rebuttal to a non-existent attack.
Anthony W. Youngman - 19 Jun 2004 01:41 GMT >> mAsterdam writes >>> [quoted text clipped - 23 lines] >please try to understand what I am asking instead of giving >a rebuttal to a non-existent attack. Sorry. I don't want to offend you either. But MV will present you with a normalised view of the data. If an RDBMS presents you with a view of the same data and it contains a "many" join, then the app will get multiple copies of certain bits of data. In relational, this could lead to an attempt to update one copy without realising that there IS only one copy, so all the others change as well. Yes that would be stupid programming, but an MV app would know that there was only one copy.
That's why I said "superficial". The MV app will know more about the data, because of the way the data is presented by the database.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 04 Jun 2004 15:58 GMT > >mAsterdam writes > >>> This reduces the statement to [quoted text clipped - 15 lines] > All I'm trying to say is that simplicity as a goal in itself is a > delusion. Sorry, but it's not. There are other goals, and sometimes the goals short-circuit one another, but simplicity is always a worthy goal, in every aspect of every design of... well, anything. It's a necessary but not sufficient component of design quality.
> And just because relational may be simpler than codasyl doesn't mean > that it's a good thing. We have a real-world problem here ... look at [quoted text clipped - 3 lines] > > What matters is the complexity (or simplicity) of the WHOLE SYSTEM. And the complexity of the individual components at least form an input to the "complexity measure function" of the whole system.
> There's no point in simplifying the database, if the necessary increase > in complexity of the business analysis totally negates it. Agreed.
> By focussing on minimising the complexity of one part of the system, we make the
> system as a whole more complex. Not true, at least not necessarily.
> That will explain why Dawn's experience > is that MV is more productive than relational - the simplicity of the > relational database over MV simply pushes all the complexity into the > business analysis side, turning that into a total nightmare. Oh please. You have to be kidding. Moving from a 3NF-but-not-1NF MV structure to a fully 3NF structure produces, from bliss, a "total nightmare"? That's delusion.
> Which is simpler - to model a single real world entity as a single > database "table" as MV does (we can model an invoice in a single FILE), > or as five or six relational tables? Why do you have attributes? You're, like, totally dissecting the holistic nature of the invoice, dude. C'mon, let's harmonize with the cosmic gestalt and just have one big file with attributes INVOICE1, INVOICE2, INVOICE3, etc...
> And don't forget - our FILE (should > be) normalised, so we can access it just as if it were five or six > relational tables ... > > Yep. The database itself is more complex. But the business analysis is > MUCH simpler, such that the total system complexity is a lot less. In what way does storing the invoice in a single file mean the business analysis is simpler? You still had to identify attributes, and then of course make sure the ordering of elements in the MV attributes is the same, and possibly dissect values into sub-values...
- erk
Dawn M. Wolthuis - 04 Jun 2004 16:49 GMT > > >mAsterdam writes <snip>
>> Which is simpler - to model a single real world entity as a single > > database "table" as MV does (we can model an invoice in a single FILE), [quoted text clipped - 4 lines] > and just have one big file with attributes INVOICE1, INVOICE2, INVOICE3, > etc... hey man, now you're talkin' but now we want to ask questions of the data, so we need to tag some parts, without harming any animals, and there you have it ;-)
> > And don't forget - our FILE (should > > be) normalised, so we can access it just as if it were five or six [quoted text clipped - 7 lines] > course make sure the ordering of elements in the MV attributes is the same, > and possibly dissect values into sub-values... One way it makes it easier is that it takes us down to a smaller number of portals, namespaces, vocabularies, means of making our way into the data. If you don't think in terms of equal relations, but of some being important entry points into the data, you simply things greatly for the user. I don't know about thinking about ordering of the elements -- we don't give it a second thought -- you start tossing those puppies in there and add to the end or leave a few open spaces if you like it that way. No one spends any brain cells considering the ordering of attributes in PICK/MV. --dawn
Eric Kaun - 04 Jun 2004 21:10 GMT > > > >mAsterdam writes > <snip> [quoted text clipped - 11 lines] > we need to tag some parts, without harming any animals, and there you have > it ;-) Heh... oh, so you don't give a darn about plants? Speciesist.
Asking questions is precisely the point of relational, and has been pointed out, relational is more egalitarian with respect to what you can ask, in that it requires that only relations (and tuples, which are directly implied) "tag" the data. Tagging, though, implies a high ratio of text to markup - else you end up with the mess that is many XML docs, with a 5:1 ratio of tags to data. And XML tagging has a limited type system, and nothing about constraints.
So when you say "tag parts", you've already decided on the format - an implication that the data comes "naturally" in some form. In my experience, that "natural" form is as useful as natural language is for programming, even with the tags.
Maybe I'd be swayed by a more complete tagging system, but I think it's the cart pulling the horse...
> > In what way does storing the invoice in a single file mean the business > > analysis is simpler? You still had to identify attributes, and then of [quoted text clipped - 6 lines] > If you don't think in terms of equal relations, but of some being important > entry points into the data, you simply things greatly for the user. Some users. What about the users with other entry points?
> I don't > know about thinking about ordering of the elements -- we don't give it a > second thought -- you start tossing those puppies in there and add to the > end or leave a few open spaces if you like it that way. No one spends any > brain cells considering the ordering of attributes in PICK/MV. --dawn I don't mean the ordering of attributes - I mean the ordering of elements within an attribute. For example, you have an invoice, which probably has a LineItem attribute, and a Part# attribute, and a Quantity attribute. As separate attributes, those aren't connected by anything but convention; and it's those sort of assumptions that normalization is meant to codify.
- erk
Anthony W. Youngman - 08 Jun 2004 00:10 GMT >> hey man, now you're talkin' but now we want to ask questions of the data, >so [quoted text clipped - 10 lines] >ratio of tags to data. And XML tagging has a limited type system, and >nothing about constraints. Relational is more egalitarian about what you can ask (actually, I'm not sure about that, but never mind ...)
But it's like being "politically correct". Sure, you may want to know "how many blokes have a car the same colour as their wife's hair?".
BUT! Do you want to make "all questions equally easy to answer" or do you want to make "common questions easier to answer than unusual ones". If by levelling off the ease of asking questions, you simply make the easier questions harder in order to level the playing field, you're doing your users a very big disservice. Is that what you're trying to achieve?
And with a little bit of thought, you can make nearly ANY question in Pick easy to answer. More to the point, you can PROVE that the system can answer the question easily. Given that relational goes to extreme lengths to separate the logical from the physical, relational actually prevents you from even trying to prove the question is easy, merely saying "you have no choice but to trust the optimiser" :-(
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 09 Jun 2004 20:27 GMT > Relational is more egalitarian about what you can ask (actually, I'm not > sure about that, but never mind ...) [quoted text clipped - 8 lines] > doing your users a very big disservice. Is that what you're trying to > achieve? Not at all. I want a firm logical foundation for any questions, and to have that foundation serve as the basis for making questions easier to answer - in other words, to define useful (though necessarily restricted) views in terms of an egalitatian model. Otherwise my initial choice is simply too risky, at least in complex domains.
> And with a little bit of thought, you can make nearly ANY question in > Pick easy to answer. More to the point, you can PROVE that the system > can answer the question easily. Please elaborate on both of these. I have no idea what proving something easy to answer entails...
> Given that relational goes to extreme > lengths to separate the logical from the physical, ? Extreme lengths to separate? I'd say is simply tries to keep them "naturally" separate, though of course I have no definition for "natural".
:-)
> relational actually > prevents you from even trying to prove the question is easy, merely > saying "you have no choice but to trust the optimiser" :-( Huh? Prove the question is easy? What does that mean?
At any level above hardware, we have no choice - it depends on the processor, and hard disk, and memory speed, and... so of course there's a level of trust involved. Do you trust the Pick compiler / interpreter? I certainly want to delegate the nasty business of optimization (which we've demonstrated is useful in the considerably more-difficult area of compilers) to a machine which can do the job better and faster than I.
- erk
Anthony W. Youngman - 10 Jun 2004 02:06 GMT >> relational actually >> prevents you from even trying to prove the question is easy, merely [quoted text clipped - 8 lines] >demonstrated is useful in the considerably more-difficult area of compilers) >to a machine which can do the job better and faster than I. Basically, if we assume (reasonable assumption) that everything else is irrelevant when compared to disk access, I can prove that (almost) every attempted disk access actually retrieves data that is relevant to the question.
I can also show statistically that the chances of retrieving multiple items of interest with a single access are also high.
Of course, that argument is less relevant now we have huge amounts of ram...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 10 Jun 2004 21:23 GMT > >> relational actually > >> prevents you from even trying to prove the question is easy, merely [quoted text clipped - 13 lines] > attempted disk access actually retrieves data that is relevant to the > question. Oh... easy == fast.
Perhaps the above is true, but that requires your data be structured in the same file, which is both boon and bane. In a TRDBMS (remember, this is c.d.t), the DBMS would reorganize base relations's storage based on access patterns, whereas in Pick you have to decide that in advance, and do a lot of work later if it changes. Unless I'm misinterpreting... in any event, access optimization and clustering based on common usage (which can change, especially as reports and ad hoc queries enter the fray) should be dynamic, and analyzed by a computer.
- erk
Anthony W. Youngman - 15 Jun 2004 01:02 GMT >> Basically, if we assume (reasonable assumption) that everything else is >> irrelevant when compared to disk access, I can prove that (almost) every [quoted text clipped - 11 lines] >especially as reports and ad hoc queries enter the fray) should be dynamic, >and analyzed by a computer. And dynamic re-organisation can be prohibitively expensive as it reorganises the data to optimise your year-end reports, only for you to have run your final report five minutes ago. This is the fallacy of making all reports equally "easy" by imposing unnecessary overhead on the common ones!
If you want to know one thing about an invoice, chances are you want to know several. MV will (if properly designed) return EVERYTHING in a single disk hit. If you then want to know about the company it was sent to, a further SINGLE disk hit will return EVERYTHING you want there.
You're trying to optimise everything. If you access one bit of an invoice, the chances of you accessing another bit of the same invoice are HIGH. The chances of the computer guessing correctly whether you want another invoice, or the company, or any other bit of information unrelated to that invoice, are piss-poor. So why try?
MV optimises retrieval of information about any single real-world object. It will step, with blinding speed, down a list of keys. It doesn't even try to second-guess the user's next random data access - what's the point? Was it Knuth said "premature optimisation is the root of all evil"? The design of MV naturally clusters related data. And ignores unrelated data. And it shows! As I've said, again and again, why does experience say that MV beats relational for speed hands down every time, especially for "large" databases?
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Anthony W. Youngman - 08 Jun 2004 00:02 GMT >> And don't forget - our FILE (should >> be) normalised, so we can access it just as if it were five or six [quoted text clipped - 7 lines] >course make sure the ordering of elements in the MV attributes is the same, >and possibly dissect values into sub-values... When I analyse our MV system at work, I think in terms of physical objects. I then decompose each physical object into normal form. I DON'T NEED to think about other objects while I'm decomposing the one in front of me.
We're in the progress of porting to MS SQL-Server. The data diagram is an ABSOLUTE NIGHTMARE! When I look at the table diagram it's an absolute spaghetti of links EVERYWHERE! I don't have a clue which tables model which physical object, the meaning of links isn't intuitive.
Trying to juggle hundreds of tables is far harder than keeping track of several tens of physical objects to which I can relate, even if each of those objects is then broken down logically into normal form.
The MV database layout imposes a grouping which helps me grasp the system complexity. While I can easily view the MV structure as equal to the relational structure, a true relational database does not give me the MV structure which appears much less complicated by virtue of appealing to the way I naturally view the world.
Take the invoice. From the MV point of view, I see it as a SINGLE object. It is *TRIVIAL* for the database itself to decompose that and present it, via ODBC, to a relational programmer who wouldn't even realise that the MV back-end viewed it as a single object.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 09 Jun 2004 21:13 GMT > When I analyse our MV system at work, I think in terms of physical > objects. I then decompose each physical object into normal form. I DON'T > NEED to think about other objects while I'm decomposing the one in front > of me. Physical objects - how quaint. :-)
In my experience, even entities like invoices and orders become unmanageable as "physical objects" - specifically when someone placing an order needs to know some fairly complex interrelationships between the parts in that order, the warehouses and their parts, and parts previously ordered by the same customer. And in a paint formula database, the "physical objects" are both far from clear, and far different than even the users suppose they are.
> We're in the progress of porting to MS SQL-Server. The data diagram is > an ABSOLUTE NIGHTMARE! When I look at the table diagram it's an absolute > spaghetti of links EVERYWHERE! I don't have a clue which tables model > which physical object, the meaning of links isn't intuitive. Several points: 1. Diagrams can be messy, and they can be nicely-organized 2. If you were to draw a diagram that captures the functions you present to users in the dictionary, the files you use, and the files split for efficiency, you might see something similarly ugly 3. The meaning of links should be fairly straightforward if you're thinking in terms of predicates. Then again, if the database designer didn't think that way, you could be in trouble. Surely you've seen bad Pick data models?
> Trying to juggle hundreds of tables is far harder than keeping track of > several tens of physical objects to which I can relate, even if each of > those objects is then broken down logically into normal form. Agreed - good modeling is difficult, and it's certainly very useful to think of things in groups. ER/win always did a good job for me - I could keep a 100-table logical data model segregated into domains [sic]. I never looked at the entire thing all at once.
> The MV database layout imposes a grouping which helps me grasp the > system complexity. That's a very good thing. What's not a good thing is selecting a grouping that makes sense to Function X to push its way into the definition of the data, since Function Y might have a very different idea what groupings should be imposed.
> While I can easily view the MV structure as equal to > the relational structure, a true relational database does not give me > the MV structure which appears much less complicated by virtue of > appealing to the way I naturally view the world. Again, if you only ever need to "view the objects" in one way, I envy you. Experience hasn't been that kind to me.
> Take the invoice. From the MV point of view, I see it as a SINGLE > object. It is *TRIVIAL* for the database itself to decompose that and > present it, via ODBC, to a relational programmer who wouldn't even > realise that the MV back-end viewed it as a single object. I think you're backwards on this:
1. Seeing it as a single objects implies some degree of encapsulation
2. Many queries need to cross encapsulation boundaries - hence the need for the Pick/MV query mechanism to directly support list attributes and sub-attributes and sub-sub-attributes... if it were really one object, you could only "get at" those pieces via the Invoice's defined operations. Encapsulation is a red herring (direct quote from Date) - every "persistence mapping" tool violates it.
3. It's not at all trivial. Say your Invoice has 4 attributes: customer, parts, ship dates, and payments. Part[N] must have a corresponding Date[N] which is the date on which that part was received; Payments is completely separate. How would the mapping "know" that that correspondence exists? Certainly there is "meaning" there? The aggregation has traded one general form of meaning for a very app-specific form; in examples other than the somewhat-hierarchical Invoice/Order, the value of the trade diminishes even further.
4. Given that you do various queries, and that the database answers lots of questions, what value is it for the database to "know" it's a single objects? You don't see the "lowest level" of a data model being something "egalitatian", which can support several views? It seems to me much more powerful and general to supply predicates as the foundation, and layer a "path expression" on top which can present any arbitrary view of the data. Example: atop a relational model, I can present the Invoice, as well as a Warehouse which "contains" the parts it shipped and which customers bought them? Reporting and GUI generation, here I come...
- erk
Anthony W. Youngman - 10 Jun 2004 02:20 GMT >> When I analyse our MV system at work, I think in terms of physical >> objects. I then decompose each physical object into normal form. I DON'T >> NEED to think about other objects while I'm decomposing the one in front >> of me. > >Physical objects - how quaint. :-) No comment :-)
>In my experience, even entities like invoices and orders become unmanageable >as "physical objects" - specifically when someone placing an order needs to >know some fairly complex interrelationships between the parts in that order, >the warehouses and their parts, and parts previously ordered by the same >customer. And in a paint formula database, the "physical objects" are both >far from clear, and far different than even the users suppose they are. Except that if you think of it as "noun or adjective", it does actually become a lot clearer ... if the same adjective describes several nouns then you need multiple instances of it :-)
>> We're in the progress of porting to MS SQL-Server. The data diagram is >> an ABSOLUTE NIGHTMARE! When I look at the table diagram it's an absolute [quoted text clipped - 9 lines] >in terms of predicates. Then again, if the database designer didn't think >that way, you could be in trouble. Surely you've seen bad Pick data models? Have I seen bad models? I work with them :-(
>> Trying to juggle hundreds of tables is far harder than keeping track of >> several tens of physical objects to which I can relate, even if each of [quoted text clipped - 4 lines] >100-table logical data model segregated into domains [sic]. I never looked >at the entire thing all at once. Pick never expects me to :-)
>> The MV database layout imposes a grouping which helps me grasp the >> system complexity. [quoted text clipped - 3 lines] >data, since Function Y might have a very different idea what groupings >should be imposed. Here again, the noun/adjective paradigm just seems to work a treat ...
>> While I can easily view the MV structure as equal to >> the relational structure, a true relational database does not give me [quoted text clipped - 28 lines] >somewhat-hierarchical Invoice/Order, the value of the trade diminishes even >further. Very easily. Pick metadata actually provides a very simple mechanism for saying which fields are linked, and which are not.
For example, some of our Pick FILES are broken up into three or four tables for export to ODBC. It's not a problem at all.
>4. Given that you do various queries, and that the database answers lots of >questions, what value is it for the database to "know" it's a single [quoted text clipped - 5 lines] >Warehouse which "contains" the parts it shipped and which customers bought >them? Reporting and GUI generation, here I come... Statistics says that if you want to know one thing about an object, then the chances are high that you want to know several things about that object.
Okay, if the query is "please list all customers who bought part X", then Pick gains nothing over relational. But if the query is "please list all parts invoiced on date Y" then Pick gains big time - merely by asking "what invoices are dated Y" I get all the data I want for free as a side effect. If there are ten invoices, I need ten data accesses to get all the invoice details. Relational needs ten data accesses to get the invoice numbers from the date, then loads more accesses to get the actual items from the invoice numbers.
Relational needs an optimiser - Pick gets it for free ... and the stats say that on average it pays off handsomely :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Tony - 10 Jun 2004 10:32 GMT > Relational needs an optimiser - Pick gets it for free ... and the stats > say that on average it pays off handsomely :-) Pick doesn't have a "free" optimizer: it just doesn't have one. So either there is only one access path, and too bad if it isn't optimal, or you the designer/programmer are playing the part of the optimizer yourselves (and I don't suppose your time comes free).
Bill H - 10 Jun 2004 23:42 GMT Tony:
> Pick doesn't have a "free" optimizer: it just doesn't have one. So > either there is only one access path, and too bad if it isn't optimal, > or you the designer/programmer are playing the part of the optimizer > yourselves (and I don't suppose your time comes free). I would ask: is the developer more efficient optimizing application data or is the DBA? A Pick-like dbms asks the developer to optimize while RDBMS products ask the DBA (or it's self optimizing based on queries) to optimize.
Bill
Anthony W. Youngman - 15 Jun 2004 01:04 GMT >> Relational needs an optimiser - Pick gets it for free ... and the stats >> say that on average it pays off handsomely :-) [quoted text clipped - 3 lines] >or you the designer/programmer are playing the part of the optimizer >yourselves (and I don't suppose your time comes free). So our time isn't free ... but optimisation is inherent in the design. We don't even think about it - it just happens ...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Dawn M. Wolthuis - 15 Jun 2004 02:25 GMT > >> Relational needs an optimiser - Pick gets it for free ... and the stats > >> say that on average it pays off handsomely :-) [quoted text clipped - 6 lines] > So our time isn't free ... but optimisation is inherent in the design. > We don't even think about it - it just happens ... Having spent some years trying to teach PICK folks to use SQL as a query language (don't worry, I saw the light) I can definitely vouch for ths as a MAJOR difference between the MV query language and SQL. Users who had been accustomed to MV Query langauges couldn't believe they had to work so hard to get a good SQL query from the same data. Of course, there are better optimizers than what they had in the product at that time, but nothing holds a candle to the query language they were used to.
I'm hopeful that someone will write an implementation of GIRLS (mv query) that goes against XML data so others can benefit from the language. It needs some updating since it hasn't changed much in 40 years, but it sure beats XQuery for ease of use by a hardly-trained human.
Cheers! --dawn
mAsterdam - 15 Jun 2004 02:26 GMT >>> Relational needs an optimiser - Pick gets it for free ... and the stats >>> say that on average it pays off handsomely :-) [quoted text clipped - 6 lines] > So our time isn't free ... but optimisation is inherent in the design. > We don't even think about it - it just happens ... Magic! Yey!
Bill H - 05 Jun 2004 17:49 GMT Lets look at
[snipped]
> And just because relational may be simpler than codasyl doesn't mean > that it's a good thing. We have a real-world problem here ... look at > the following mapping ... > > real world <=> business analysis <=> database Here's an visual example:
:LIST APHIST '340*VR11-1' INVDATE DESC ACCTS AMTS APHIST.... INV-DATE Description......................... Acct.. Acct/Amts.... 340*VR11-1 11-01-00 LOAN PAYMENT, NOV 5170 999.22-
3370 1,977.23-
5170 0.00
3370 0.00
This is a single A/P invoice with multiple G/L account#s defined (and amounts). This invoice will update four G/L accounts in the general ledger and financial statements by the amount "associated" with each account#. These are the properties (some anyway) of this invoice. The invoice is _not_ like:
:LIST APHIST '340*VR11-1' BY-EXP ACCTS INVDATE DESC ACCTS AMTS APHIST.... INV-DATE Description................................ Acct... Acct/Amts.... 340*VR11-1 11-01-00 LOAN PAYMENT, NOV 3370 1,977.23- 340*VR11-1 11-01-00 LOAN PAYMENT, NOV 3370 0.00 340*VR11-1 11-01-00 LOAN PAYMENT, NOV 5170 999.22- 340*VR11-1 11-01-00 LOAN PAYMENT, NOV 5170 0.00
To alter the invoice, when preparing it for data storage, is to alter its fundamental structure. Although, in many instances it can be put back together again, sometimes it cannot. In addition, there's a lot of work to go through in order to decompose then recompose this invoice. This additional complexity, I think, is unnecessary and costly.
Then again, cost and complexity may not be an issue for some.
Bill
Alfredo Novoa - 02 Jun 2004 15:08 GMT >And Occam's Razor (the Einstein version iirc) says "make things as >simple as possible, BUT NO SIMPLER". It is simpler to build information systems with the relational approach than with any other approach.
Your quote is out of context because it is related to physical theories and not about engineering approaches. You always make the same mistakes.
Regards Alfredo
Dawn M. Wolthuis - 02 Jun 2004 16:01 GMT > >And Occam's Razor (the Einstein version iirc) says "make things as > >simple as possible, BUT NO SIMPLER". > > It is simpler to build information systems with the relational > approach than with any other approach. What are the other approaches you have tried? --dawn <snip>
Alfredo Novoa - 02 Jun 2004 17:27 GMT >> It is simpler to build information systems with the relational >> approach than with any other approach. > >What are the other approaches you have tried? --dawn ><snip> To process files in applications, business objects, xBase, ADO.NET, SQL, etc.
Regards Alfredo
Leandro Guimaraens Faria Corsetti Dutra - 31 May 2004 13:26 GMT >>Rationalism is as irrational(/rational) as any oher faith. > > What a nonsense! No, it is not. Rationalism, usually defined as the belief that human reasoning is the sole or sufficient test of truth, is circular reasoning.
Now, this is Philosophy in general. In Natural Philosophy, also known as Science, reason is indeed our only resource. It just doesn't hold water when erected in the sole basis of Epistemology, Metaphisics and Ethics.
 Signature Leandro Guimarães Faria Corsetti Dutra +55 (11) 5685 2219 Av Sgto Geraldo Santana, 1100 6/71 leandro@dutra.fastmail.fm 04.674-000 São Paulo, SP BRASIL http://br.geocities.com./lgcdutra/
Alfredo Novoa - 31 May 2004 15:55 GMT >>>Rationalism is as irrational(/rational) as any oher faith. >> >> What a nonsense! > > No, it is not. It is incredible nonsense. One of the biggest nonsense written here.
> Rationalism, usually defined as the belief >that human reasoning is the sole or sufficient test of truth, is >circular reasoning. Wrong.
<quote>
Rationalism, also known as the rationalist movement, is a philosophical doctrine that asserts that the truth should be determined by reason and factual analysis, rather than faith, dogma or religious teaching.
</quote>
http://en.wikipedia.org/wiki/Rationalist
It is only common sense, the less common of the senses.
Faith is a form of superstition.
<quote>
Superstition is a term used to refer to a set of behaviors that may be faith based, or related to magical thinking, whereby the practitioner believes that the future, or the outcome of certain events, can be influenced by certain of his or her behaviors.
Critics argue that superstition is not based on reason, but instead springs from religious feelings that are misdirected or unenlightened, which leads in some cases to rigor in religious opinions or practice, and in other cases to belief in extraordinary events or in charms, omens, and prognostics. Many superstitions can be prompted by misunderstandings of causality or statistics.
</quote>
http://en.wikipedia.org/wiki/Superstition
> Now, this is Philosophy in general. In Natural Philosophy, >also known as Science, reason is indeed our only resource. Of course not!
In science the only resource is observation. Reason is used to understand the observations and to make predictions.
Regards Alfredo
Leandro Guimaraens Faria Corsetti Dutra - 31 May 2004 16:53 GMT >> Now, this is Philosophy in general. In Natural Philosophy, >>also known as Science, reason is indeed our only resource. [quoted text clipped - 3 lines] > In science the only resource is observation. Reason is used to > understand the observations and to make predictions. Ergo, observations are only useful in the presence of reason. This makes reason, as sustained by reasonable faith, our most fundamental resource in Natural Philosophy, as we wished to demonstrate.
 Signature Leandro Guimarães Faria Corsetti Dutra +55 (11) 5685 2219 Av Sgto Geraldo Santana, 1100 6/71 leandro@dutra.fastmail.fm 04.674-000 São Paulo, SP BRASIL http://br.geocities.com./lgcdutra/
Anthony W. Youngman - 01 Jun 2004 23:37 GMT >[snip] >>>It is in the leap from doing relational theory to thinking that [quoted text clipped - 14 lines] > - What exacltly was proven? > - Could you please give a reference? And by its very definition, maths proves nothing about the real world. Just because relational databases are perfect (as indeed they are) at modelling "data as defined by the relational model", that says nothing about whether the real world can be described by "data" that fits the mathematical definition.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Leandro Guimaraens Faria Corsetti Dutra - 31 May 2004 13:17 GMT > I am completely opposed to faith and other forms of irrationalism. The > Relational Model is maths not irrational faith. This is OT, but there is such a thing as rational faith. You use it everyday to function in society.
 Signature Leandro Guimarães Faria Corsetti Dutra +55 (11) 5685 2219 Av Sgto Geraldo Santana, 1100 6/71 leandro@dutra.fastmail.fm 04.674-000 São Paulo, SP BRASIL http://br.geocities.com./lgcdutra/
Alfredo Novoa - 31 May 2004 15:55 GMT >> I am completely opposed to faith and other forms of irrationalism. The >> Relational Model is maths not irrational faith. > > This is OT I disagree.
>, but there is such a thing as rational faith. You >use it everyday to function in society. No, faith is irrational by definition. If it is rational then it is not faith. It is deduction based on incomplete information.
Regards Alfredo
Leandro Guimaraens Faria Corsetti Dutra - 31 May 2004 16:51 GMT >> This is OT > > I disagree. Too bad, as one of the luminaries of this newsgroup you've just given permission for me to argue.
>> but there is such a thing as rational faith. You >>use it everyday to function in society. > > No, faith is irrational by definition. If it is rational then it is > not faith. It is deduction based on incomplete information. You are messing the concepts of religion and faith.
Religion is, quite by definition, based on faith. But there are quite a lot of other instances of faith besides religious ones, even if you don't differentiate institutional from spiritual religion.
The reason for this is the long association of faith and religion, or more generally spirituality and authority, so as to change even the dictionaries carry only the contaminated definition.
Faith originally was not opposed to reason, but to vision. One has faith not necessarily because one accepts another's authority, but because one continues to believe something even without presently seeing proof of it.
For example, I have no real assurance the building I am working in won't crumble while I write these lines. But based on its apparently solid form, the fact that buildings don't usually crumble without first showing some signs as cracks and shifts, and that the municipality has taken some reasonable steps to check the engineers' work, I have a reasonable faith in being able to leave for home unharmed from falling bricks.
In another example, I have no way to reasonably prove the meal my wife serves me isn't poisoned. But I have faith in her, and good indications in her past behaviour and my knowledge of her psiche, that it isn't.
To carry this into the Informatics realm, I can't really remember all the time all the proofs of why we need RDBMSs. But I've seen the proof, and based on my admittedly partial remembrance of them I have faith this is the way to go.
Without faith, even Science wouldn't be possible. A scientist has faith his reason is reasonably functioning, and so is his colleagues'. He can't keep all the proofs of this all the time at his conscience.
The erection of Science as independent of the very basis of human reasoning is sub-scientific, and ultimately dangerous to Science itself.
Well, I still maintain all this is OT. But sure Philosophy 101 isn't lost time.
 Signature Leandro Guimarães Faria Corsetti Dutra +55 (11) 5685 2219 Av Sgto Geraldo Santana, 1100 6/71 leandro@dutra.fastmail.fm 04.674-000 São Paulo, SP BRASIL http://br.geocities.com./lgcdutra/
Alfredo Novoa - 31 May 2004 19:48 GMT > Faith originally was not opposed to reason Indeed, it was a "solution" used when the reason had no answers. Faith is related to magic and authority.
Fortunately the reason has a lot more answers that it had in the prehistory.
>One has faith not necessarily because one accepts another's authority, >but because one continues to believe something even without presently >seeing proof of it. No, proofs are not the only reasons. You are confusing faith with deduction based on incomplete information.
Although like many other words, faith has many meanings like fidelity (faithful implementation) and it is even used for reason based confidence. But I don't think we are talking about these lose uses of the term.
The pickies use the term in order to equate The Relational Model and the primitie approaches. Here is their fallacy:
Irrationalism is faith, rationalism is faith so irrationalism is as good as rationalism therefore The Relational Model is not better than any primitive ad hoc approach.
The flaws of the argument are rather obvious.
> For example, I have no real assurance the building I am >working in won't crumble while I write these lines. But based on its [quoted text clipped - 3 lines] >work, I have a reasonable faith in being able to leave for home >unharmed from falling bricks. And it has nothing to do with the strict sense of faith. You have a reasonable knowledge with a very high degree of certainty. You are making a good use of the avaiable information and applying maths.
When you say that you have no real assurance about the building, you are saying that you have not faith on it. Faith is all or nothing.
Faith would be to have the absolute certainty about that the building will collapse without any previous sign or logical reason, because it was revealed.
> In another example, I have no way to reasonably prove the meal >my wife serves me isn't poisoned. But I have faith in her You have confidence in her among other things because you know that she has no reasons to kill you and good reasons to not do it.
>, and good >indications in her past behaviour and my knowledge of her psiche, that >it isn't. Correct. That's why you have reason and experience based knowledge and not faith.
> To carry this into the Informatics realm, I can't really >remember all the time all the proofs of why we need RDBMSs. But I've >seen the proof, and based on my admittedly partial remembrance of them >I have faith this is the way to go. You are using the term very losely. IMO it is not a good use of the term and you are playing the troll's game.
This is faith according to the strict meaning of the word:
<quote> "We believe", says the Vatican Council (III, iii), "that revelation is true, not indeed because the intrinsic truth of the mysteries is clearly seen by the natural light of reason, but because of the authority of God Who reveals them, for He can neither deceive nor be deceived." </quote>
http://en.wikipedia.org/wiki/Faith
So what the pickies want to say is that we don't promote The Relational Model because the measurable advantages it has, but because the authority of Codd (that rhymes with God) and his apostle Date Who reveals them, for They can neither deveive nor be deceived. :-)
Regards Alfredo
Leandro Guimaraens Faria Corsetti Dutra - 31 May 2004 20:13 GMT > You are using the term very losely No I am not. I am giving it its original meaning.
You are dealing with current reinterpretations focusing on spirituality and authority.
 Signature Leandro Guimarães Faria Corsetti Dutra +55 (11) 5685 2219 Av Sgto Geraldo Santana, 1100 6/71 leandro@dutra.fastmail.fm 04.674-000 São Paulo, SP BRASIL http://br.geocities.com./lgcdutra/
Dawn M. Wolthuis - 31 May 2004 20:35 GMT > > Faith originally was not opposed to reason > [quoted text clipped - 22 lines] > good as rationalism therefore The Relational Model is not better than > any primitive ad hoc approach. I was with you until you seem to have jumped on some irrationalism bandwagon with that last paragraph. Given that my heavy programming years were not spent with PICK, I'm not a real pickie (and I also don't know if they call themselves that -- I have used that term as short-hand, but I'm not sure who else might). But I'll classify myself a pickie, if I may, because I see that the implementations based on the Nelson-PICK efforts (which appear to be based on the Postley-Buettell work) provide very cost-effective solutions for business, especially when compared to those claiming to be based on the relational model.
I have worked with a number of different environments, and my opinion from experience is that PICK provides a big bang for the buck and runs "lean and mean" compared to other solutions. I do not claim that I can prove it is better in any way -- it seems better. Intuition is not irrational -- it is based on our brains working in such a way as to arrive at a hypothesis (not proof!) that we might be incapable of defending through any logic, at least at some point in time.
This is NOT irrational thinking. Agreed?
--dawn
<snip>
Dawn M. Wolthuis - 01 Jun 2004 04:04 GMT > >> I am completely opposed to faith and other forms of irrationalism. The > >> Relational Model is maths not irrational faith. [quoted text clipped - 8 lines] > No, faith is irrational by definition. If it is rational then it is > not faith. It is deduction based on incomplete information. Faith does not base decisions of truth or falsity on rationality, but it need not be irrational -- perhaps a-rational would be more accurate? For example if one were to believe that rationality were the only means to determining truth, that might be wrong or it might be outside of rationality for this believer to arrive at that conclusion, but I don't think it is irrational to hold to such a belief. Similarly for other religions.
--dawn
Paul - 02 Jun 2004 00:16 GMT >> To go back to your favourite analogy (apologies everyone), it is >> like saying that algebra was responsible for the shortcomings of [quoted text clipped - 17 lines] > except where "the proof is in the pudding" -- scientific observation, > for example). No, I think in this analogy Newton's model does correspond to a specific database design. The possibility that the relational theory itself is wrong corresponds to the possibility that algebra is wrong.
>> Einstein didn't invent a better algebra, he designed a better model >> using the SAME algebra - like a later designer designing a better [quoted text clipped - 3 lines] > functional theory that is better than the relational theory before > it. I think that would correspond to Einstein inventing a better algebra for his model. I belive tensor algebra was actually used for relativistic mechanics, but it's not really an improvement to standard algebra, just a different example of it.
I think you're mistaking the theory of (algebra, relational databases, logic) itself with theories that can be developed in them, for example (Newtonian mechanics, payroll database, relativistic mechanics). We've got two levels of theories here.
Paul.
mountain man - 02 Jun 2004 02:12 GMT > No, I think in this analogy Newton's model does correspond to a specific > database design. The possibility that the relational theory itself is > wrong corresponds to the possibility that algebra is wrong. Or incomplete, as has been formally demonstrated at least 30 years prior to the emergence of the RM.
Pete Brown Falls Creek Oz
mAsterdam - 02 Jun 2004 09:24 GMT >>No, I think in this analogy Newton's model does correspond to a specific >>database design. The possibility that the relational theory itself is >>wrong corresponds to the possibility that algebra is wrong. > > Or incomplete, as has been formally demonstrated > at least 30 years prior to the emergence of the RM. What do you mean? Do you mean that incomplete is wrong? Or that the Goedel-incompleteness somehow implies that any specific database design/relational theory must be incomplete/wrong? What are you saying?
mountain man - 02 Jun 2004 13:23 GMT > >>No, I think in this analogy Newton's model does correspond to a specific > >>database design. The possibility that the relational theory itself is [quoted text clipped - 7 lines] > any specific database design/relational theory must be > incomplete/wrong? What are you saying? http://www.mountainman.com.au/GIF/logic_space_1.jpg In reference to the above diagram:
Starting with a given set of axioms (yellow), one can derive a specific set of formalised "provable truths" (green). This logic space of provable truth however is not all there is to the notion of truth.
Godel showed that there exists "unprovable truths" in all mathematical systems, which are valid and true, but which are not capable of being referenced by the foundational axioms. More recently Chaitin showed that there exists "random truths", which are valid and tue, but which require no reference to any axioms. (purple)
Here is a recent (2000) transcript of a talk given by Chaitin on the relevant details of the history of these developments: http://www.cs.auckland.ac.nz/CDMTCS/chaitin/cmu.html entitled Historical Introduction --- A Century of Controversy Over the Foundations of Mathematics
IMO it implies that the complete notion of whatever-it-is-that -is-truth cannot be encapsulated in any traditional mathematical language using the traditional axiomatic methodology everyone has been spoon fed the last few hundred years, and that another approach is required, in the long run. This includes algebra.
How does this apply to relational database theory and the Relational Model, and tables and row values? There will necessarily exist example truths such as those defined above that exist independent of the relational model, and which are not addressable by the model.
I believe that an example of this is:
The intelligence (ie: data) that is encoded in (application level) SQL code captured in RDBMS stored procedures exists right alongside the data, and the constraints, etc. While the RM and theory address the data and constraints, etc, the intelligence (which is data) of the application level processes cannot be formally addressed by it, even though it consists of valid SQL statements expressing manipulations of perfectly valid data objects known to the model and theory.
Pete Brown Falls Creek Oz
mAsterdam - 02 Jun 2004 14:50 GMT >>>>No, I think in this analogy Newton's model does correspond to a specific >>>>database design. The possibility that the relational theory itself is >>>>wrong corresponds to the possibility that algebra is wrong. >>> >>>Or incomplete, as has been formally demonstrated >>>at least 30 years prior to the emergence of the RM. [snip]
> ... There will > necessarily exist example truths such as those defined above > that exist independent of the relational model, and which are > not addressable by the model. Indeed. And you even go on looking for such truths. Chapeau.
> I believe that an example of this is: > [quoted text clipped - 6 lines] > statements expressing manipulations of perfectly valid data > objects known to the model and theory. Some of it may be capturable in the model by redefining the model - but this does not invalidate your statement. Here is another example: http://www.essentialstrategies.com/documents/brules.pdf
mountain man - 03 Jun 2004 04:20 GMT > >>>>No, I think in this analogy Newton's model does correspond to a specific > >>>>database design. The possibility that the relational theory itself is [quoted text clipped - 27 lines] > Here is another example: > http://www.essentialstrategies.com/documents/brules.pdf Looks like an interesting article. Many thanks for the reference.
Pete Brown Falls Creek Oz
Alfredo Novoa - 02 Jun 2004 15:08 GMT >Godel showed that there exists "unprovable truths" in all >mathematical systems, which are valid and true, but which >are not capable of being referenced by the foundational >axioms. More recently Chaitin showed that there exists >"random truths", which are valid and tue, but which require >no reference to any axioms. (purple) This means that there are theorems you can not prove deriving from the axioms.
>How does this apply to relational database theory and the >Relational Model, and tables and row values? It means that perhaps you can find unprovable relational theorems. Nothing less and nothing more.
>I believe that an example of this is: > [quoted text clipped - 6 lines] >statements expressing manipulations of perfectly valid data >objects known to the model and theory. No you don't understand anything. It does not have any relationship with what Godel said.
BTW have you readen The Third Manifesto?
It has many pages devoted to the integration of The Relational Model with procedural programming (stored procedures). Just what you want to address.
Regards Alfredo
Paul - 03 Jun 2004 00:21 GMT > Godel showed that there exists "unprovable truths" in all > mathematical systems, which are valid and true, but which > are not capable of being referenced by the foundational > axioms. Not exactly, Godel's Incompleteness Theorem only applies to theories or systems that are above a certain complexity. See here for example: http://www.sm.luth.se/~torkel/eget/godel/complete.html
There are certainly complete theories, for example the theories of real numbers, of complex numbers, and of Euclidean geometry. In these theories there are no truths that cannot be proved within the system.
> IMO it implies that the complete notion of whatever-it-is-that > -is-truth cannot be encapsulated in any traditional mathematical > language using the traditional axiomatic methodology everyone > has been spoon fed the last few hundred years, and that another > approach is required, in the long run. This includes algebra. Are you sure? I can't find any definite links to a completeness result for algebra, but it's quite a simple system compared to the one for the whole of arithmetic, so I'd be surprised if it wasn't complete.
> How does this apply to relational database theory and the > Relational Model, and tables and row values? There will > necessarily exist example truths such as those defined above > that exist independent of the relational model, and which are > not addressable by the model. Check out this article on Codd's 1972 paper "Relational Completeness of Data Base Sublanguages": http://www.intelligententerprise.com/db_area/archives/1999/990501/online.jhtml
Unfortunately I can't find an online version of Codd's original paper, but he appears to prove that relational algebra is complete. Whether this is "completeness" used in exactly the same sense as Godel's Incompleteness Theorem I'm not quite sure though.
> I believe that an example of this is: > [quoted text clipped - 6 lines] > statements expressing manipulations of perfectly valid data > objects known to the model and theory. Do you have a simple concrete example of what you mean by this? What kind of stored procedures are you thinking of? Plain single SELECT statements? Or a series of INSERTs, UPDATEs and DELETEs that do some business process? In this latter case you've got procedural code and I think it should be possible to replace it with declarative code. It's difficult to talk about without an example though.
Paul.
mountain man - 03 Jun 2004 04:20 GMT > > Godel showed that there exists "unprovable truths" in all > > mathematical systems, which are valid and true, but which [quoted text clipped - 8 lines] > numbers, of complex numbers, and of Euclidean geometry. In these > theories there are no truths that cannot be proved within the system. I think you should double-check the above. Godel's incompleteness theorem was a statement in elementary number theory (arithmetic). Here is a reference: http://www.cs.auckland.ac.nz/CDMTCS/chaitin/cmu.html
> > IMO it implies that the complete notion of whatever-it-is-that > > -is-truth cannot be encapsulated in any traditional mathematical [quoted text clipped - 5 lines] > for algebra, but it's quite a simple system compared to the one for the > whole of arithmetic, so I'd be surprised if it wasn't complete. See above.
> > How does this apply to relational database theory and the > > Relational Model, and tables and row values? There will [quoted text clipped - 4 lines] > Check out this article on Codd's 1972 paper "Relational Completeness of > Data Base Sublanguages": http://www.intelligententerprise.com/db_area/archives/1999/990501/online.jhtml
> Unfortunately I can't find an online version of Codd's original paper, > but he appears to prove that relational algebra is complete. Whether [quoted text clipped - 19 lines] > should be possible to replace it with declarative code. It's difficult > to talk about without an example though. Well here is an example that goes to the extreme: http://www.mountainman.com.au/software/southwind/
The entire (100% of) application software suite is in the form of stored procedures. No intelligence specific to the organization is stored external to the RDBMS software layer.
The Relational Model of the data needs expansion to be able to address this configuration of organizational intelligence. It remains mute to this intelligence.
Pete Brown Falls Creek Oz
Paul - 03 Jun 2004 16:41 GMT >> Not exactly, Godel's Incompleteness Theorem only applies to >> theories or systems that are above a certain complexity. See here [quoted text clipped - 9 lines] > Here is a reference: > http://www.cs.auckland.ac.nz/CDMTCS/chaitin/cmu.html I'm fairly sure I'm right. The axioms of number theory or Peano arithmetic are more complicated than those for the theories of real numbers or Euclidean geometry. I think the important ones are probably the "inductive" ones (every number has a successor, and different numbers have different successors). Although it seems intuitively that the theory of real numbers must somehow be a superset of the Peano arithmetic, that's not the case. In the theory of real numbers, you don't have the concept of a successor function, they're all on a continuum and (maybe paradoxically) this makes it simpler.
Here is something about Tarski's completeness proof for Euclidean geometry: http://www.math.psu.edu/simpson/papers/philmath/node15.html
>> Do you have a simple concrete example of what you mean by this? >> What kind of stored procedures are you thinking of? Plain single [quoted text clipped - 6 lines] > Well here is an example that goes to the extreme: > http://www.mountainman.com.au/software/southwind/ I still don't quite see what the killer point is here. Is it that your application just has a single form to display all data? And the form knows what views to use by consulting a table? How is this different to the "Switchboard Manager" functionality that comes with Access for example?
> The entire (100% of) application software suite is in the form of > stored procedures. No intelligence specific to the organization is > stored external to the RDBMS software layer. Isn't this the same as what is done by any of the front-end GUIs you get with most DBMSs? For example SQL Server's Enterprise Manager? It's totally general, all the information it needs is in relations in the database. In theory you could just give every database user a copy of Enterprise Manager and it would suffice for any database, for any purpose. It just wouldn't be very user-friendly.
I don't think that applications store any essential business knowledge, they are just there to make things easier for the user. If you like they are non-essential business knowledge. For example "Users of the accounting form 53(a) don't need to see the 'foobar' column". I'm assuming here that the foobar column isn't actually restricted for security purposes; it's just not useful for some particular task.
Paul.
Torkel Franzen - 04 Jun 2004 09:38 GMT > I think you should double-check the above. Godel's incompleteness > theorem was a statement in elementary number theory (arithmetic). Statements of arithmetic cannot be expressed in the language of the theory of the real field. This is because although the natural numbers are a subset of the real numbers, you cannot define "(the real number) x is a natural number" using only +,*,0,1,= and quantification over the real numbers.
Paul - 02 Jun 2004 09:37 GMT >>No, I think in this analogy Newton's model does correspond to a specific >>database design. The possibility that the relational theory itself is >>wrong corresponds to the possibility that algebra is wrong. > > Or incomplete, as has been formally demonstrated > at least 30 years prior to the emergence of the RM. Do you mean that algebra as in: http://en.wikipedia.org/wiki/Universal_algebra is incomplete?
I've never come across this, what does it mean, do you have a link?
A confusion here is that our analogy is comparing things from different levels. Algebra is usually thought of as a model, so it comes above logic in "reality". In our analogy, we are comparing it to logic, which is fine as it's only an analogy. But I don't think the analogy extends to having a completeness theorem for algebra.
Paul.
Tony - 02 Jun 2004 10:23 GMT > > No, I think in this analogy Newton's model does correspond to a specific > > database design. The possibility that the relational theory itself is > > wrong corresponds to the possibility that algebra is wrong. > > Or incomplete, as has been formally demonstrated > at least 30 years prior to the emergence of the RM. "Wrong" and "incomplete" are not synonyms. If algebra is correct but incomplete, then it is safe to use it right? There may be a question it can't answer, but there are no questions for which it can give the wrong answer.
Anthony W. Youngman - 04 Jun 2004 00:56 GMT >>> To go back to your favourite analogy (apologies everyone), it is >>> like saying that algebra was responsible for the shortcomings of [quoted text clipped - 18 lines] >database design. The possibility that the relational theory itself is >wrong corresponds to the possibility that algebra is wrong. Exactly.
And Newton's algebra is NOT wrong. It's just that the axioms (on which he based his algebra) don't match reality. And that cannot be proved from WITHIN the algebra.
So you cannot prove that relational theory is right or wrong from WITHIN the theory.
>>> Einstein didn't invent a better algebra, he designed a better model >>> using the SAME algebra - like a later designer designing a better [quoted text clipped - 12 lines] >(Newtonian mechanics, payroll database, relativistic mechanics). We've >got two levels of theories here. And we also have experiments to show that the axioms do (or don't) accurately describe reality. Einstein showed that Newton's axioms didn't describe reality, and replaced them by new axioms that did a better job. He didn't alter Newton's algebra at all - indeed, he used exactly the same algebra ...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Paul - 04 Jun 2004 18:27 GMT >> No, I think in this analogy Newton's model does correspond to a >> specific database design. The possibility that the relational [quoted text clipped - 6 lines] > which he based his algebra) don't match reality. And that cannot be > proved from WITHIN the algebra. We have to be careful here with ambiguities of language. When I said the "possibility that algebra is wrong" I meant the theory of algebra itself, not some model that could be set up using the language of algebra.
Newton's algebra was not wrong, but what we mean here is the model that Newton made using algebra as the meta-language wasn't wrong. Newton didn't invent *an* algebra, he invented a model using algebra.
It's very confusing when you've got algebra as a kind of meta-language for Newton's theory. But then also you have logic as a kind of meta-language for algebra itself. In common speech we use the word "algebra" to mean both the theory of algebra itself, and pieces of code written in algebra.
> So you cannot prove that relational theory is right or wrong from > WITHIN the theory. That depends what you mean by "right"(!). First-order logic is your meta-language for talking about your database. So you can show that it will be "complete" in the senses mentioned before, because you are kind of jumping "outside" the theory. What you can't do is show that's it's the best methodology for managing data.
> And we also have experiments to show that the axioms do (or don't) > accurately describe reality. Einstein showed that Newton's axioms > didn't describe reality, and replaced them by new axioms that did a > better job. He didn't alter Newton's algebra at all - indeed, he used > exactly the same algebra ... OK, agreed.
Hmm, this stuff has the tendency to seem crystal-clear when you're writing it, but then turns to gobbledygook when you re-read it...
:) Paul.
Anthony W. Youngman - 28 May 2004 19:12 GMT >> > So if you use Newtonian Mechanics to prove where Mercury was 400 years >> > ago, your proof is more accurate than Tycho Brahe's observations - which [quoted text clipped - 38 lines] >to do that. So, the proof that various aspects of relational theory have >been good for use with DBMS's is not within mathematics. Thanks, Dawn. It was Laconic, I think, who gave that wonderful quote about "the axioms are self-evident, the logic is flawless, therefore the experiments must be wrong" :-)
That's why you and I tear our hair out when people say relational is best because it's based on mathematics! They're the people who assume the axioms are self evident ...
And I think that's a major failing in current RDBMS thinking - nobody is questioning the axioms. Unfortunately, to me, they are self-evidently wrong...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Anthony W. Youngman - 28 May 2004 19:07 GMT >> So by definition the theory is unscientific because you cannot show >>that the dbms proof is true (or false) in real life. > >Given that your axioms and your interpretation are correct, then I >think you can show the DBMS proof is true in real life (for the reasons >given above and in previous posts). What do you mean by interpretation? Do you mean the philosophy of data by which you convert your mathematical description to a real-world description?
>I know that the language used by logicians can seem very inpenetrable >but I think it does actually make sense; it's not just a conspiracy of >people talking gibberish and pretending to understand each other. But logic is a branch of mathematics. As such, it has nothing to do with philosophy and the matching up of theory with reality. This is a matter of science and experiment, not logic.
>I don't know how much you've read about logic but it is very >mathematical and well worth the steep learning curve. Wikipedia is a >good place to start. Be warned though: logicians to have a tendency to >go insane in later life; it is a serious brainfuck if you think about >it too much! I can imagine :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Paul - 30 May 2004 18:35 GMT >> Given that your axioms and your interpretation are correct, then I >> think you can show the DBMS proof is true in real life (for the [quoted text clipped - 3 lines] > by which you convert your mathematical description to a real-world > description? Yes.
>> I know that the language used by logicians can seem very inpenetrable >> but I think it does actually make sense; it's not just a conspiracy of [quoted text clipped - 3 lines] > philosophy and the matching up of theory with reality. This is a matter > of science and experiment, not logic. I agree.
I don't think what I'm saying is that controversial really - it's really just what you would intuitively expect, expressed more rigorously in mathematical jargon. Tony explains it very well in another branch of this thread.
I'm not Pick-bashing either, what I'm saying is separate to relational theory. It's just that because the relational model is based so closely on first-order predicate logic, it's easier to see the connections.
I think what might be confusing is just terminology: the words "model" and "theory" can mean slightly different things in different contexts. From one point of view something could be regarded as a model, from another it could be regarded as a theory. Neither is wrong, it's just the context.
Paul.
Todd B - 25 May 2004 01:59 GMT > Todd: > > Does this pass the "reasonableness" test? The thought that: ...there are > questions that can't be answered so they're meaningless and, thus, ignored > (so the system is still complete) doesn't say much for consistency (i.e. > anything that shows inconsistency is ignored so we still have consistency). Yes, in a formal system, everything that fails to show consistency becomes invalid (and is ultimately ignored or denied due to the fact that it lacks basic logic - like: A proves B, and B proves C, but C contradicts A - that would be a very simple example of an informal system). At least that's the way I understand it.
> With postulates like these, I'm depressed about getting A's in college logic > and statistics classes, as they were obviously worthless. :-) > > Bill I wouldn't suggest that logic is 'worthless', or comment that any questions that cannot be answered are 'meaningless'; just that logic is not as complete as most logicians or mathematicians would like it to be.
Now Paul has a point about first order logic - and its completeness - that I would like to look into. I'm still betting that my interpretation of Godel's theorem is correct in the sense that 'Any formal system that is consistent is definitely _not_ complete'.
Is any formal system useful? That's a whole different argument to me, because any formal or informal system can be put to use. So, in a way, there is still hope for us logical (and illogical) people :)
I guess I'm ranting about this topic and I'm sure everyone in this group is hoping I'll shut up. So, before they tell me that, and before I step out of line (too late), since I don't have the impressive logic and math background that some of you have, I sum up Godel's Incompleteness theorem like so: "Within a formal system, there are things that are true within that system that you cannot prove/derive within that system". Whew.
Todd
mountain man - 25 May 2004 11:42 GMT Modern scholars of logic should be aware that whereas the ancient western philosophers and logicians were not aware of the limitations of the system of logic, their ancient eastern counterparts (cf: Indian logic) in fact were aware of its limitations.
And secondly, the following article might be of interest to those who have been thinking about the implications of the work of Godel, Turing and Chaitin in logic:
The article is a transcript of an address given by Chaitin: http://www.cs.auckland.ac.nz/CDMTCS/chaitin/cmu.html
Best wishes,
Pete Brown Falls Creek Oz
> Todd: > [quoted text clipped - 18 lines] > > > they are meaningful in a "real-world" sense, because we are thinking in > > > a larger system which includes second-order logic. Anthony W. Youngman - 25 May 2004 18:04 GMT >> I suppose at least we would know that in theory, every query that it is >> possible to formulate in some given relational query language can be [quoted text clipped - 7 lines] >requirement. Is it 'complete', though? I don't think so, but please >prove me wrong or point me to some articles that do. But that definition of "complete" only works if we can prove, scientifically, that the system accurately describes the reality.
If we cannot show that "the system" and "the reality" match up with each other (which we can't if we don't have a philosophical definition of "data" in "the reality") then it's impossible for "the system" to be complete ...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
mountain man - 22 May 2004 00:06 GMT ...[trim]...
> > As I have > > outlined, I have constructed an arrangment whereby all of E3 [quoted text clipped - 6 lines] > arbitrary languages, and executed anywhere. It seems you see their value in > their genericity, rather than in where they happen to execute. I see value in avoiding redundancies. All application code that relates to database I/O that is external to the RDBMS environment requires redundancy of definition of the database schema. This is so because the entire system spans two software environments (E2 and E3).
Current technology looks at this as the status quo. The world is used to defining things in a union and conjunction of two separate software systems. However it is very inefficient.
When things can be defined using one software layer alone, the redefinitions referred to above no longer exist.
...[trim]...
Pete Brown Falls Creek Oz
Eric Kaun - 24 May 2004 14:22 GMT > I see value in avoiding redundancies. All application code that relates > to database I/O that is external to the RDBMS environment requires [quoted text clipped - 7 lines] > When things can be defined using one software layer alone, > the redefinitions referred to above no longer exist. I agree, though using some fairly simple techniques definitions can exist in one place, and be propagated to others. If an "object" could be defined in E2, for example, and automatically generate the appropriate changes on E3, how would that fit?
I'm also curious why you suggest moving objects from E3 to E2 for reasons of efficiency and duplicate elimination, but don't include E1 in the mix. Certainly services in E1 are relevant to applications?
- Eric
mountain man - 25 May 2004 11:42 GMT > > I see value in avoiding redundancies. All application code that relates > > to database I/O that is external to the RDBMS environment requires [quoted text clipped - 12 lines] > E2, for example, and automatically generate the appropriate changes on E3, > how would that fit? It would still imply replication of existent data, or if you prefer, redundant I/O in that updates would need to be pushed up from the database (E2) into E3.
However I reckon this would be a better "practice" to engineer as its operation relies on maintainance of the database, and avoids duplicate maintainance at the code level. It is a step closer to optimum running, for sure.
> I'm also curious why you suggest moving objects from E3 to E2 for reasons of > efficiency and duplicate elimination, but don't include E1 in the mix. > Certainly services in E1 are relevant to applications? First steps first. ;-)
Services in E1 and wheel-in wheel-out redundant services. An organisation uses these much the same way as the next. Their physical network may be different and their user base however, IMO, the elements of code resident at E2 and E3 are the uniquely specifying elements for that "organization's intelligence".
The first step in moving objects from E3 to E2 will obviate the (database application's) client environment of code.
This is the big mixmaster in complexity with the reality of the management of RDBMS production sites, the most expensive, the least responsive, the change-management headache, the cause of the bulk of many problems.
Binding the results into E1 should fall out of the consolidation effort (E3 to E2) and represents the logical next step in the theory of database systems, imo.
Pete Brown Falls Creek Oz
Bill H - 15 May 2004 18:23 GMT Wol:
"Anthony W. Youngman" <wol@thewolery.demon.co.uk> wrote in message
> Okay. So what is "data". Because if we can't anchor that in the real > world, we have no way of knowing if, or how strongly, relational theory > is relevant (and usable) in the real world. Anything that can be reduced to an electrical impulse? :-)
Bill
|
|
|