Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / July 2004

Tip: Looking for answers? Try searching our database.

In an RDBMS, what does "Data" mean?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Anthony W. Youngman - 14 May 2004 00:44 GMT
In relational theory, everyone seems to be talking about modelling
"data", but I've never seen an explanation of what "data" is. As far as
I can tell, C&D took this philosophical concept of "data", and then
built their relational theory on top of it. That's okay. We have a
(fairly) simple, consistent model. But what the heck IS data?

Okay. Let's explain where I'm coming from. You've seen me going on about
"evidence" and "science" etc etc. So I'm going to drag science into
this, Newtonian Mechanics, to be precise (of course).

Newton came up with these philosophical concepts called "mass",
"energy", "space" and "time". On these, he built his (fairly) simple
consistent model. And then Einstein came along and said he'd got his
fundamentals wrong - mass and energy were the same thing, and space and
time were the same thing. And because Newton didn't take the fact that
these things were interchangeable, his model didn't work when compared
to reality.

Okay. So what is "data". Because if we can't anchor that in the real
world, we have no way of knowing if, or how strongly, relational theory
is relevant (and usable) in the real world.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Leandro Guimarães Faria Corsetti Dutra - 14 May 2004 02:29 GMT
> Newton came up with these philosophical concepts called "mass",
> "energy", "space" and "time". On these, he built his (fairly) simple
[quoted text clipped - 3 lines]
> fact that these things were interchangeable, his model didn't work
> when compared to reality.

    Nice story.  But irrelevant.

> Okay. So what is "data". Because if we can't anchor that in the real
> world, we have no way of knowing if, or how strongly, relational
> theory is relevant (and usable) in the real world.

    So you are suggesting Newton wasn't (and isn't) relevant in
the real world?  Or are you just trying to be smart?

    Now, it is a nice thing to be smart.  But remember it is not
everyday we face situations where Relativity is relevant and usable in
the real world... in everyday life Newtonian physics are quite useful,
and unless you are in some limit situation relevant -- and much
simpler than The Real Thing.

Signature

Leandro Guimarães Faria Corsetti Dutra           +55 (11) 5685 2219
Av Sgto Geraldo Santana, 1100 6/71               +55 (11) 5686 9607
04.674-000  São Paulo, SP                                    BRASIL
http://br.geocities.com./lgcdutra/

x - 14 May 2004 13:46 GMT
"Leandro Guimaraes Faria Corsetti Dutra" <leandro@dutra.fastmail.fm> wrote

> > Newton came up with these philosophical concepts called "mass",
> > "energy", "space" and "time". On these, he built his (fairly) simple
[quoted text clipped - 18 lines]
> and unless you are in some limit situation relevant -- and much
> simpler than The Real Thing.

Anthony said because we work with data, we should know what data is.
He would want an answer to his question: "But what the heck IS data ?"
Of course this is a trivial question for you :-)
I remember Fabian Pascal started one of his seminars with several such
"trivial" questions.

Why you have not answered the question ?
Anthony W. Youngman - 15 May 2004 22:58 GMT
>> Now, it is a nice thing to be smart.  But remember it is not
>> everyday we face situations where Relativity is relevant and usable in
[quoted text clipped - 9 lines]
>
>Why you have not answered the question ?

Thanks, X.

I take it Leandro is parading his ignorance, rather than seeking
enlightenment.

But I'll try to enlighten him, anyway. We now know that mass as it
really is, and mass as it is defined in Newton's model, aren't quite the
same thing. Therefore, as Leandro says, we know that Newtonian
Mechanics, for the most part, works, and we also know where it doesn't
work.

But *I* don't know what "data" is "as it really is", and from the
answers I've got so far I don't think anybody else does. The best
definition so far is for data as it is defined in the relational model
(and that's pretty much the only proper definition anybody's tried to
give).

And if we haven't got a philosophical definition, we can't compare the
philosophical and theoretical definitions, and therefore we haven't got
a clue as to whether either "the relational model mostly works", or (and
this is important) where its limitations are and where it breaks down.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

mAsterdam - 16 May 2004 01:10 GMT
...
>> Why you have not answered the question ?
...
> But *I* don't know what "data" is "as it really is", and from the
> answers I've got so far I don't think anybody else does. The best
[quoted text clipped - 6 lines]
> a clue as to whether either "the relational model mostly works", or (and
> this is important) where its limitations are and where it breaks down.

I won't answer the original question either (I'll just rephrase it),
but I will share some thoughts about just what "data" means.
Just a few associated concepts I have used to have some
grasp of this - a semantical network, if you will.
I have no sources or proofs, no famous
philosofer to refer you to.

The network roughly consists of: sign, media, shape and meaning.

We have signs. They serve to communicate.
Signs: A handshake, a hieroglyph, an ideogram (e.g. a chinese
character), a sonogram (roman, arab character), a facial expression,
a traffic light on red, an alarm - these are elementary, but
I would also include: the collected works
of <your favorite moviestar>

In order to (just) exist all of these signs have media and shape,
their pure existence does *not* require human (or just active)
interpretation to assign meaning to them. Their function (purpose, ie
communication), however *does* require some interpretation activity.

This combination of sign and meaning we call data.

To illustrate that this is not trivial:
Data (but not signs by themselves) can represent
other signs: I can write "The traffic light was red",
but they can also represent other data: "We stopped
because of the traffic light".

Aside: From here (sign and meaning) on "up" there is actually
a lot of philosofical work and practical research. Disciplines:
Semiotics, semiology and linguistics.
(Note: no computer needed)

Now, when we assign same or similar meanings to bitpatterns,
most of the time conviniently represented by the same shape
but evidently on another medium, we have computerdata,
data for short.

Finally, the rephrase of your question:
How does the type of DBMS affect what we consider data?
mAsterdam - 16 May 2004 02:02 GMT
...
>> Why you have not answered the question ?
...
> But *I* don't know what "data" is "as it really is", and from the
> answers I've got so far I don't think anybody else does. The best
[quoted text clipped - 6 lines]
> a clue as to whether either "the relational model mostly works", or (and
> this is important) where its limitations are and where it breaks down.

I won't answer the original question either (I'll just rephrase it),
but I will share some thoughts about just what "data" means.
Just a few associated concepts I have used to have some
grasp of this - a semantical network, if you will.
I have no sources or proofs, no famous
philosofer to refer you to.

The network roughly consists of: sign, media, shape and meaning.

We have signs. They serve to communicate.
Signs: A handshake, a hieroglyph, an ideogram (e.g. a chinese
character), a sonogram (roman, arab character), a facial expression,
a traffic light on red, an alarm - these are elementary, but
I would also include: the collected works
of <your favorite moviestar>

In order to (just) exist all of these signs have media and shape,
their pure existence does *not* require human (or just active)
interpretation. Their function (purpose, ie
communication), however *does* require some
interpretation activity to assign meaning to them.

This combination of sign and meaning we call data.

To illustrate that this is not trivial:
Data (but not signs by themselves) can represent
other signs: I can write "The traffic light was red",
but they can also represent other data: "We stopped
because of the traffic light".

Aside: From here (sign and meaning) on "up" (towards
information, knowledge, insight, wisdom, action, ...)
there is actually a lot of philosofical work and practical research.
Disciplines:
Semiotics, semiology and linguistics.
(Note: no computer needed)

Now, when we assign same or similar meanings to bitpatterns,
most of the time conviniently represented by the same shape
but evidently on another medium, we have computerdata,
data for short.

Finally, the rephrase of your question:
How does the type of DBMS affect what we consider data?
Anthony W. Youngman - 17 May 2004 14:37 GMT
>...
>>> Why you have not answered the question ?
[quoted text clipped - 15 lines]
>I have no sources or proofs, no famous
>philosofer to refer you to.

<major chomp>

>Aside: From here (sign and meaning) on "up" (towards
>information, knowledge, insight, wisdom, action, ...)
[quoted text clipped - 10 lines]
>Finally, the rephrase of your question:
>How does the type of DBMS affect what we consider data?

Okay. That's actually a very good insight ...

Now let's go back to "The Philosophy of Science" :-) and Newton :-) For
my first attempt at a Masters, practically the first thing we did was
"The philosophy of Science". And, helped by both students and a lecturer
who didn't have a clue (the student extrapolated a line from the origin,
through an asymptote, to a random position in number-space, and then
used this to ridicule the theory he didn't like. And the lecturer said
"good argument" !?!?!? )

I'm going to start saying "metaphysics" instead of philosophy here - I
think it's a subset of philosophy, and a better word to use, but as you
can see, I'm really getting into territory I don't understand ...

Anyway. Newtonian Mechanics is a self-contained, consistent,
mathematical theory. It relies on the concepts (call them "axioms") of
mass, energy, space, and time (and maybe more). We can define mass in
mathematical terms as "F=ma, where mass m is the constant property
describing the resistance of an object to a change in its velocity".
Likewise, space "is a co-ordinate system with distance measured in
metres along three mutually perpendicular axes". I won't attempt to
define energy or time ...

But just as those four concepts have neat, clean, mathematical
definitions they also have messy real world definitions. Mass can be
defined as "my god it's heavy", or "come on! PUSH!". Space is "where are
you?" or "I'm here, you're there".

Metaphysics is, I believe, the attempt to clarify both the real-world
definitions and the mathematical definitions, and to try to make sure
that they are describing the same thing. This is why, despite knowing
that Newtonian Mechanics is wrong, we find it so useful. We know the two
definitions don't match, we know WHERE they don't match, and we can
predict with certainty that where the discrepancy is minimal, Newtonian
Mechanics will give us a suitably accurate answer.

Now I'm going to get into the difference between "relational theory" and
"relational database theory" :-) Another analogy coming up - Linux and
microkernels :-) Linus realised that all this research into "Microkernel
Operating Systems" was actually just as applicable to "Operating
Systems". I'm putting peoples' noses out of joint because, whether they
realise it or not, they believe in "relational database theory" (think
Tanenbaum saying he'd give Linus an F :-) And yet, I keep on saying Pick
data should be normalised! So I'm actually very pro relational theory
(just leave relational databases out of it! :-)

Now here comes the crunch. As I see it, in relational *database* theory,
the concept of "data" lies on this metaphysical boundary. And this is
why I view every relational database I've ever seen as a tangled mess of
spaghetti. What the hell is "data"! What's the real world equivalent?
Like any true mathematician :-) the relational database theorists seem
to be saying "metaphysics? that's not our problem. That's just an
implementation detail!". Except that, going back to Newton, the fact
that energy and mass are interchangeable and, as such, the equation
"F=ma where m is a constant" isn't true, isn't an "implementation
detail". Well, to God it may be, but it certainly isn't to us!

Going to another thread, where Lauri asked what were the advantages of
Pick, I'd say that one of them is a very clear metaphysical interface.
To compare Pick and Relational Database Theory ...

A Pick FILE is a real-world collective noun. What's a relational table?

A Pick RECORD is a real-world object. What's a relational row? A noun?
An adjective? A gerund? (relation, for those who don't know their
grammar)

A Pick FIELD is a real-world adjective. What's a relational column? An
adjective? A gerund?

Because Pick's metaphysical layer is at a higher level than Relational
Database Theory, we can then implement relational theory WITHIN our
model without having the nasty spaghetti of a vague and undefined
real-world interface. And I can righteously and reasonably throw my
hands up in horror and tear my hair out when presented with a Pick
database that hasn't been normalised :-)

So. Can anyone come up with a clear, simple, and NON-VAGUE definition of
what "data" means when specified in a real-world, not a mathematical,
context. Or come up with a perfectly good reason of why you don't have
to! (Basically, because you've done it somewhere else, because you've
got to do it somewhere!)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
The society which scorns excellence in plumbing because plumbing is a humble
activity, and tolerates shoddiness in philosophy because it is an exalted
activity, will have neither good plumbing nor good philosophy. Neither its
pipes nor its theories will hold water. John W Gardner

Dawn M. Wolthuis - 17 May 2004 23:23 GMT
<snip>
> And yet, I keep on saying Pick
> data should be normalised! So I'm actually very pro relational theory
> (just leave relational databases out of it! :-)

This wasn't the crux of your post, Wol, but just a minor point that
relational theorist take all of the functional dependency normal forms and
state at the front of each that the data must FIRST be in FIRST NORMAL FORM
and some would state that the definition of normalization requires that the
data be in 1NF.  So, while I accept normal forms that are based on
functional dependeny logic, I'm fine with keeping a list of valid e-mail
addresses together during this process.  I don't want to put words in your
mouth, but when you are pro normalization, are you including 1NF in
hat?  --dawn
Anthony W. Youngman - 18 May 2004 23:43 GMT
><snip>
>> And yet, I keep on saying Pick
[quoted text clipped - 10 lines]
>mouth, but when you are pro normalization, are you including 1NF in
>hat?  --dawn

As a tool of analysis, yes. For storing the data, no.

Why first normal? If data is normalised, there is no redundancy. Like so
many things relational, First Normal Form seems to be case of carrying
things to unnecessary and not-required extremes.

It's incredibly easy to transform other normal forms to first normal.
It's not easy to go the other way (assuming you wish to reconstruct a
real-world object, that is). So NFNF is functionally equivalent to FNF,
but the reverse is not true.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

x - 18 May 2004 10:28 GMT
> >...
> >>> Why you have not answered the question ?
[quoted text clipped - 116 lines]
> to! (Basically, because you've done it somewhere else, because you've
> got to do it somewhere!)

I have read somewhere that in addition to mass, energy, space, and time
there is also information.
I'm not an expert, but I've heard the ADN is an example of this.
I've read that "information" show up in systems with cycles.
By accident, I've found this  http://www.bkent.net/Doc/darxrp.htm
Chris Hoess - 19 May 2004 07:55 GMT
> I've read that "information" show up in systems with cycles.
> By accident, I've found this  http://www.bkent.net/Doc/darxrp.htm

I probably don't do the book justice just from a quick skim of the extract,
but I felt compelled to comment on one point of the extract. The author
claims, quite reasonably, that data models are artificial constructs and can
never completely represent the true nature of information, and goes on to
provide various philosophical examples of recategorization. While this will
doubtless stimulate discussion from many here, I think it may be a red
herring from a purely database perspective, in that these categories already
exist, to some degree, in the way information is handled. Databases don't
exist in vacuo; they're fed (and consulted) by users who would have some
system of mental categorization even if they were shuffling everything
around with paper and pencil. So while it may be philosophically
interesting, the questions raised may not impinge directly on
databases--except that we must recognize that the organization of data
within a database can and will change with circumstances, and the database
should provide facilities for changing this structure with minimum
inconvenience.

Signature

Chris Hoess

mAsterdam - 19 May 2004 00:20 GMT
<major chomp>
>> How does the type of DBMS affect what we consider data?
>>
[quoted text clipped - 60 lines]
>
> A Pick FILE is a real-world collective noun. What's a relational table?

1. A contradictio in terminis.
2. A collection of similarly shaped utterances.

> A Pick RECORD is a real-world object. What's a relational row? A noun?
> An adjective? A gerund? (relation, for those who don't know their grammar)

1. A contradictio in terminis.
2. One utterance.

> A Pick FIELD is a real-world adjective. What's a relational column? An
> adjective? A gerund?

Mu.

You compare
P.FILE to S.TABLE,
P.RECORD to S.ROW and
P.FIELD to S.COLUMN.
What do we learn from this comparison? Nothing.
These terms are all taken out of the context where
they have meaning. One may just as well choose to
compare
P.FILE to S.SCHEMA,
P.RECORD to S.VIEW and
P.FIELD to S.TABLE
- it doesn't mean anything.
It is out of context.

> Because Pick's metaphysical layer is at a higher level  

That depends on which terms you compare from one realm to which other
terms from the other. It's your pick. (sorry :-)

> than Relational
> Database Theory, we can then implement relational theory WITHIN our
> model without having the nasty spaghetti of a vague and undefined
> real-world interface. And I can righteously and reasonably throw my
> hands up in horror and tear my hair out when presented with a Pick
> database that hasn't been normalised :-)

Yup. That even goes for very old fixed record batch processing.

> So. Can anyone come up with a clear, simple, and NON-VAGUE definition of
> what "data" means when specified in a real-world, not a mathematical,
> context. Or come up with a perfectly good reason of why you don't have
> to! (Basically, because you've done it somewhere else, because you've
> got to do it somewhere!)

Yup. It seems most people prefer to have that done
implicitely or at least by someone else.
n++k - 28 May 2004 10:14 GMT
> A Pick FILE is a real-world collective noun. What's a relational table?

A sentence that has not yet been uttered, because it relates "unknown values."

> A Pick RECORD is a real-world object. What's a relational row? A noun?
> An adjective? A gerund? (relation, for those who don't know their
> grammar)

A statement of fact, as an utterance of the "meta" sentence described above.

> A Pick FIELD is a real-world adjective. What's a relational column? An
> adjective? A gerund?

any piece of utterable information.
Karel Miklav - 18 May 2004 12:10 GMT
> ...
>>> Why you have not answered the question ?
[quoted text clipped - 4 lines]
>> or (and this is important) where its limitations are and where it
>> breaks down.

It mostly works, but we have some clues where it breaks too: metadata,
use patterns...

> I won't answer the original question either (I'll just rephrase it),
> but I will share some thoughts about just what "data" means.
[quoted text clipped - 11 lines]
> I would also include: the collected works
> of <your favorite moviestar>

I think our aim is to model reality and entertain users by creating nice
illusions or giving them competitive advantage by reducing entropy in
their work environment or by predicting the future.

There are many realities, but let me mention two; the reality of the
current IT with implemented infrastructure and the worldview of a modern
intellectual. Our interpretation of what's implemented in our (heads) is
what we try to model in our toys. And by what we learnt this is nothing
like mechanical wheels of a watch nor computer's random access memory
and not even the relational database. The problem is in compressing the
representation of data and easing the recall of that data. Here it
becomes useful to know what data is, but for the current state of the
art that has unfortunately already been settled.

> In order to (just) exist all of these signs have media and shape,
> their pure existence does *not* require human (or just active)
> interpretation. Their function (purpose, ie
> communication), however *does* require some
> interpretation activity to assign meaning to them.

That's what you think and if I'm ever your customer, you won't model it
that way :) Seriously, I don't believe in _pure_ sh.ts or that anything
exists without being observed/interpreted, but I'll not go deeper as it
may look like off-topic religion bashing.

> This combination of sign and meaning we call data.

I'd say fixation of this on a media is called data, couse otherwise you
can't recall it later. And there is a very important thing that folks
miss: if you vanish and nobody knows the way you fixed that data there's
just (series of ones and zeros) without meaning. Thus a fixation can't
be generally called data without known way to interpret it.

Regards,
Karel Miklav
mAsterdam - 18 May 2004 23:32 GMT
>> ...
>>>> Why you have not answered the question ?
[quoted text clipped - 7 lines]
> It mostly works, but we have some clues where it breaks too: metadata,
> use patterns...
[snip]
>> The network roughly consists of: sign, media, shape and meaning.
>>
[quoted text clipped - 8 lines]
> illusions or giving them competitive advantage by reducing entropy in
> their work environment or by predicting the future.

The modeling of *what* of reality? Surely not all of it.

> There are many realities, but let me mention two; the reality of the
> current IT with implemented infrastructure and the worldview of a modern
[quoted text clipped - 3 lines]
> and not even the relational database. The problem is in compressing the
> representation of data and easing the recall of that data.  

Here you are speaking of data allready gathered, right?

> Here it
> becomes useful to know what data is, but for the current
> state of the art that has unfortunately already been settled.

Settled? I don't think the understanding of what we now call data
we has grown beyond the metaphore level yet (unlike for instance
our understanding of 'number' or 'motion').

>> In order to (just) exist all of these signs have media and shape,
>> their pure existence does *not* require human (or just active)
[quoted text clipped - 6 lines]
> exists without being observed/interpreted, but I'll not go deeper as it
> may look like off-topic religion bashing.

Watch out, cats! :-)

>> This combination of sign and meaning we call data.
>
[quoted text clipped - 3 lines]
> just (series of ones and zeros) without meaning. Thus a fixation can't
> be generally called data without known way to interpret it.

Although this suggests you have a way around Shroedingers cat
whithout reverting to 'purity' or 'essence' etc...
(and I don't) we do agree on that. Do you have an idea
*why* folks miss this?
Karel Miklav - 19 May 2004 07:14 GMT
>> I think our aim is to model reality and entertain users by creating
>> nice illusions or giving them competitive advantage by reducing
>> entropy in their work environment or by predicting the future.
>
> The modeling of *what* of reality? Surely not all of it.

As little as possible to solve the case, I don't see the problem here.

>> Here it becomes useful to know what data is, but for the current state
>> of the art that has unfortunately already been settled.
>
> Settled? I don't think the understanding of what we now call data
> we has grown beyond the metaphore level yet (unlike for instance
> our understanding of 'number' or 'motion').

Computers can mostly only work with data that's captured as a sequence
of bits. 17th century philosophers made the model, 20th century computer
scientist implemented it and I don't see how you could escape that now.
And most people here have clients with limited resources and strong
competition and there's very, very little margin for experimentation.

>>> This combination of sign and meaning we call data.
>>
[quoted text clipped - 9 lines]
> (and I don't) we do agree on that. Do you have an idea
> *why* folks miss this?

We were learnt that way, now we're trying to adapt to the world as we
see it.

Regards,
Karel Miklav
Leandro Guimaraens Faria Corsetti Dutra - 17 May 2004 16:23 GMT
> I take it Leandro is parading his ignorance, rather than seeking
> enlightenment.

    If you had any to offer...

> But *I* don't know what "data" is "as it really is", and from the
> answers I've got so far I don't think anybody else does.

    As far as I remember my Philosophy, that's where English
Objectivists -- that's not their real name, I forget it -- went wrong.
They wanted to start from data, and couldn't define that.

    That's the other reason for my not answering the original
question -- there is no answer, other than the trivial -- and useless --
ones already given.  The other reason, it's irrelevant to our discussions
here.

> The best definition so far is for data as it is defined in the
> relational model (and that's pretty much the only proper definition
> anybody's tried to give).

    Which definition, in which version of whose version of it?

> And if we haven't got a philosophical definition, we can't compare the
> philosophical and theoretical definitions, and therefore we haven't got
> a clue as to whether either "the relational model mostly works", or (and
> this is important) where its limitations are and where it breaks down.

    It would be more interesting to compare not to a non-existing,
non-achievable philosophical definition, but to misunderstanding. Like the
differentiation of data and metadata.

Signature

Leandro Guimarães Faria Corsetti Dutra           +55 (11) 5685 2219
Av Sgto Geraldo Santana, 1100 6/71               +55 (11) 5686 9607
04.674-000  São Paulo, SP                                    BRASIL
http://br.geocities.com./lgcdutra/

x - 14 May 2004 08:39 GMT
> Okay. So what is "data". Because if we can't anchor that in the real
> world, we have no way of knowing if, or how strongly, relational theory
> is relevant (and usable) in the real world.

Data:
----------
1. facts
2. encoded information
Dawn M. Wolthuis - 14 May 2004 14:02 GMT
> > Okay. So what is "data". Because if we can't anchor that in the real
> > world, we have no way of knowing if, or how strongly, relational theory
[quoted text clipped - 4 lines]
> 1. facts
> 2. encoded information

I'd vote for adding this nice short, crisp definition of data to our
glossary.  --dawn
x - 14 May 2004 15:19 GMT
> > > Okay. So what is "data". Because if we can't anchor that in the real
> > > world, we have no way of knowing if, or how strongly, relational theory
[quoted text clipped - 7 lines]
> I'd vote for adding this nice short, crisp definition of data to our
> glossary.  --dawn

Oops. I forgot one archaic meaning:  FATE   :-)
mAsterdam - 14 May 2004 21:58 GMT
>>>Data:
>>>----------
[quoted text clipped - 5 lines]
>
> Oops. I forgot one archaic meaning:  FATE   :-)

And the plural of datum (eng: date)?

So it should be:
      [Data]
      0. fate
      1. facts
      2. encoded information
      3. dates

- except I think it doesn't help at all.
Maybe this is how the metadata modellers got to 900.

:-)
Anthony W. Youngman - 14 May 2004 19:54 GMT
>> > Okay. So what is "data". Because if we can't anchor that in the real
>> > world, we have no way of knowing if, or how strongly, relational theory
[quoted text clipped - 7 lines]
>I'd vote for adding this nice short, crisp definition of data to our
>glossary.  --dawn

It is nice and crisp. But (see my other post) if "data" is the
philosophical gateway linking the real world and database theory, then
it's far too simplistic.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Mike Nicewarner - 14 May 2004 18:12 GMT
I agree that Data is defined as facts and that the facts could be encoded in
some way.  However, information is simply defined as data in context.  For
instance, a value of data could be 12.  12 by itself is data, but it lacks
meaning until you put it in context to say it is a specific baby's weight at
1 year, taken at the doctor's office on a specific date.  Then, the date in
the context becomes information that can be used.    Much of the data in a
database is in a very limited and incomplete context, and is incorrectly
called information, because of business assumptions about the missing
context.

My 2.5 cents.  :-)

Signature

Mike Nicewarner [TeamSybase]
http://www.datamodel.org
mike@nospam!datamodel.org
Sybase product enhancement requests:
http://www.isug.com/cgi-bin/ISUG2/submit_enhancement

> In relational theory, everyone seems to be talking about modelling
> "data", but I've never seen an explanation of what "data" is. As far as
[quoted text clipped - 20 lines]
> Cheers,
> Wol
Mike Preece - 04 Jun 2004 03:39 GMT
Sorry for the delayed response.

> I agree that Data is defined as facts and that the facts could be encoded in
> some way.  However, information is simply defined as data in context.

Context. Important.

> For
> instance, a value of data could be 12.  12 by itself is data, but it lacks
[quoted text clipped - 4 lines]
> called information, because of business assumptions about the missing
> context.

I'm thinking back to a previous thread in this ng where the fact that
relationships between data can be implied by their physical proximity
in a Pick database. It makes good sense logically to physically store
data in context. Never mind Codd's wallop.

Mike.

> My 2.5 cents.  :-)
Dawn M. Wolthuis - 04 Jun 2004 03:46 GMT
> Sorry for the delayed response.

<snip>
> I'm thinking back to a previous thread in this ng where the fact that
> relationships between data can be implied by their physical proximity
> in a Pick database. It makes good sense logically to physically store
> data in context. Never mind Codd's wallop.

And popping up one level on that, since I let others care about the physical
storage, it makes sense to logically model data in context as well.
Cheers!  --dawn.
Alan - 14 May 2004 18:29 GMT
From "Fundamentals of Database Systems", Elmasri & Navathe [some direct
quote, some rephrased for brevity] :

Data: Known facts that can be recorded and have implicit meaning. [direct
quote]

Database: A logically coherent collection of related real-world data
assembled for a specific purpose. [rephrased]

See? It's not all that complicated. You are applying way too much GRAVITY to
your question.

> In relational theory, everyone seems to be talking about modelling
> "data", but I've never seen an explanation of what "data" is. As far as
[quoted text clipped - 20 lines]
> Cheers,
> Wol
Anthony W. Youngman - 14 May 2004 19:53 GMT
In message <2gkdtnF3saspU1@uni-berlin.de>, Alan <alan@erols.com> writes
>From "Fundamentals of Database Systems", Elmasri & Navathe [some direct
>quote, some rephrased for brevity] :
>
>Data: Known facts that can be recorded and have implicit meaning. [direct
>quote]

Nice quote. But I'm being philosophical here. Mass, Energy, and Time are
all (from Newton's standpoint) simple, immutable things. Space is as
well, although it's slightly different, because it's three orthogonal
instances of length.

By these standards, "data" is woefully vague and undefined. And it's not
even atomic! Within the theory it's chopped up into tuples, which are
themselves chopped up into (I'm not into terminology here) keys,
attributes, relations, and probably other stuff besides.

>Database: A logically coherent collection of related real-world data
>assembled for a specific purpose. [rephrased]

Given that "data" is so vague, how do we know it's related to the real
world?

>See? It's not all that complicated. You are applying way too much GRAVITY to
>your question.
>
:-) But I'm looking for the TOE of data.

We know Newton got it wrong. Energy and mass are the same thing. Time is
merely a fourth dimension of space. But at least Newton had his
philosophical anchors to the real world firmly in place, even if he knew
something was wrong.

"data" is not an anchor. It's a formless cloud. One fact may be "object
X exists". Another may be "Person A is the mother of Person B". And
again, "object Z is blue". Each of those is a different *type* of fact,
a different "immutable object". And RDBMS theory lumps them all together
in the amorphous philosophical concept of data, and then dismantles them
inside the theory, despite the fact that they can't be dismantled in the
real world.

Just as we couldn't combine mass and energy and move them inside the
theory until we realised that they were interchangeable - e=mc^2 - so we
can't move "data" inside relational theory and deal with it there unless
we have a rule that can transform one type of data into another. And
until we have that rule, we need to treat the different types of data as
external to the theory, and have a one-2-one mapping of those with
reality.

Cheers,
Wol

>> In relational theory, everyone seems to be talking about modelling
>> "data", but I've never seen an explanation of what "data" is. As far as
[quoted text clipped - 28 lines]
>> as Lies-to-People.
>> The Science of Discworld : (c) Terry Pratchett 1999

Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Alan - 14 May 2004 20:47 GMT
Not everything can be expressed in a formula. Data is an example. You are
also confused about the way data is 'chopped up". It isn't. Data has
different characteristics depending on its use and the point of view from
which one views it (verty much like relativity). Let me try to express this
in a way that may be satisfactory to you. It is necessary to do a top-down
explanation, I think.

Given a minimum of third normal form:

An instantiated database contains stored information about a miniworld
(domain, if you like).

Information is represented by one or more tuples in one or more tables that
may or may not be joined to one or more other tables, or itself (a table).

Tables consist of tuples (rows). A tuple represents a piece of complete,
bounded, and finite information about the primary key.

The primarty key is a piece of discrete data, as is each attribute (column)
in the tuple (row).

Under ideal circumstances, the primary key is a piece of meaningful data
from the miniworld (sometimes it is necessary to create an artificial key,
but this is still a piece of discrete data. Sometimes the primary key is
made up of several attributes (composite key), but this is consistent for
each tuple. Although each attribute represents a discrete piece of data,
when combined into a composite key, the composite key is also a discrete
piece of data, but now contains more information. In chemistry, this would
be a "compound" made from several "elements".

Because the Primary Key is unique, and all attributes in a tuple are about
that key and nothing but the key, each tuple is complete, bounded, and
finite. If the tuple is complete, bounded, and finite, then each element of
the tuple (the attributes) must also be complete, bounded, and finite. The
attributes themselves do not contain "information" until the tuple is
realized.

Knowledge is realized by the examination of all of the information.

So, we have

data is contained in attributes
information is contained in tuples (which are made of attributes)
information is also contained in tables, which are really just many tuples
knowledge is contained in a database

It's not confusing at all if you don't want it to be.

"Relation" is a term from logical modeling and can be thought of as a
"superclass" term that encompasses "entities" and "relationships" and has no
place in this argument.

BTW, Time is not the 4th dimension of space. Space is expressed in three
dimensions. Time is another dimension for sure, but of something larger that
we can't yet identify. For now, we can say that space and time are
dimensions of the universe. Space is measured by three dimensions. The
universe is measured by the three dimesions of space plus the dimension of
time. Of course, there may be more dimensions.

> In message <2gkdtnF3saspU1@uni-berlin.de>, Alan <alan@erols.com> writes
> >From "Fundamentals of Database Systems", Elmasri & Navathe [some direct
[quoted text clipped - 80 lines]
> >> as Lies-to-People.
> >> The Science of Discworld : (c) Terry Pratchett 1999

reports
> as Lies-to-People.
> The Science of Discworld : (c) Terry Pratchett 1999
Laconic2 - 14 May 2004 21:26 GMT
If we are going to bring Newton into this forum, again,  let's go back to
the data. And let's see if we can get it  right, this time.

Tycho Brahe made years worth of very careful meticulous observations as to
the positions of the planets,  at observed points in time.  That's data.

Johannes Kepler studied Brahe's observations for years,  and discovered that
the orbits of the planets were elliptical,  with one focus at the sun.  He
also discovered the "equal areas in equal times"  rule for how fast they are
moving.  That's analysis.

What Newton added were the laws of motion,  and the law of gravitation.
That's physics.

All this talk about how "Newton got it wrong,  and Einstein got it right"
is a bunch of claptrap.  The people in this forum, for the most part, don't
know what they are talking about.

There are internal problems,  at the cosmological level,  with Newton's view
of the universe.  But that's not what led Einstein to push the envelope
further.  Physics was in crisis in the 19th century,  due to results like
the Michelson-Morley experiment.  That's more data.

It's data that Einstein had and Newton did not.
Tony - 15 May 2004 15:22 GMT
> If we are going to bring Newton into this forum, again,  let's go back to
> the data. And let's see if we can get it  right, this time.
[quoted text clipped - 13 lines]
> is a bunch of claptrap.  The people in this forum, for the most part, don't
> know what they are talking about.

True.  For the most point our expertise, if any, is in databases not
physics.  But some people just can't help bringing their secondary
school-level knowledge of physics into every topic for some reason
(not that I'm claiming to have any more than that myself).  It is very
tiresome.
Anthony W. Youngman - 16 May 2004 00:43 GMT
>> All this talk about how "Newton got it wrong,  and Einstein got it right"
>> is a bunch of claptrap.  The people in this forum, for the most part, don't
[quoted text clipped - 5 lines]
>(not that I'm claiming to have any more than that myself).  It is very
>tiresome.

And some of us like bringing our 3rd-year undergrad Physics knowledge
(from a top-5 Uni) into it, too :-)

It's just that I find Newtonian mechanics an excellent analogy. To
express it in computerese, both Newtonian Mechanics and Relational
Theory are instances of the class Mathematical_Theory. BOTH are
mathematically perfect (well, I know Newtonian Mechanics is).

I just find it fascinating that, while we know that Newtonian Mechanics
doesn't belong in the set Accurately_Matches_The_Real_World, so many
people here (on the grounds of it's mathematical correctness) seem to
believe that relational theory does. That argument just doesn't make
sense to me.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Alfredo Novoa - 16 May 2004 12:45 GMT
>It's just that I find Newtonian mechanics an excellent analogy. To
>express it in computerese, both Newtonian Mechanics and Relational
>Theory are instances of the class Mathematical_Theory.

You never learn. Newtonian Mechanics are not mathematical theory, they
are physics. They are derived from the observation of the physical
world phenomenons.

>I just find it fascinating that, while we know that Newtonian Mechanics
>doesn't belong in the set Accurately_Matches_The_Real_World

False. Newtonian Mechanics matches the physical world very accurately
in many circumstances and in almost every practical circumstance. They
are extremely useful.

If we compare The Relational Model with Newtonian Mechanics, then the
Pick approach should be compared with troglodite superstition.
Bill H - 16 May 2004 17:54 GMT
"Alfredo Novoa" <alfredo@ncs.es> wrote in message

> ...Newtonian Mechanics matches the physical world very accurately
> in many circumstances and in almost every practical circumstance. They
> are extremely useful.
>
> If we compare The Relational Model with Newtonian Mechanics, then the
> Pick approach should be compared with troglodite superstition.

I'm surprised at seeing such a miscomparison.  Maybe I shouldn't be.  :-)

Bill
Leandro Guimaraens Faria Corsetti Dutra - 20 May 2004 05:22 GMT
> the
>> Pick approach should be compared with troglodite superstition.
>
> I'm surprised at seeing such a miscomparison.

    Why do you consider it a miscomparision?

Signature

Leandro Guimarães Faria Corsetti Dutra           +55 (11) 5685 2219
Av Sgto Geraldo Santana, 1100 6/71               +55 (11) 5686 9607
04.674-000  São Paulo, SP                                    BRASIL
http://br.geocities.com./lgcdutra/

Dawn M. Wolthuis - 16 May 2004 13:02 GMT
> >> All this talk about how "Newton got it wrong,  and Einstein got it right"
> >> is a bunch of claptrap.  The people in this forum, for the most part, don't
[quoted text clipped - 19 lines]
> believe that relational theory does. That argument just doesn't make
> sense to me.

While I have no knowledge related to Newtonian Mechanics, I can agree with
your comparison when it comes to applying Mathematical theories.  There are
folks who think that Mathematics, like science, is a discipline of
discovery.  Others, like me, believe it to be a creative act -- our use of
the logic in our brains to propose axioms and then draw logical conclusions
from those.  We create Mathematics, sometimes in order to address the real
world (counting sheep, for example) and sometimes without such a trigger in
nature.  Mathematical errors can be found by proving new theorems or showing
where previous proofs were incorrect.  There is no need to talk about
anything in the real world in order to talk about such Mathematics.  Folks
on this list who want to discuss "relational theory" as strictly a
Mathematical theory are correct in suggesting that my questions, pretty much
all of them, are outside of the scope of such a theory and would, therefore,
we unwelcoming of such in this forum.

If we have such a mathematical theory we can "apply it".  That act is a
scientific one and one that can easily be done poorly.  The application of
Mathematics is like the application of a metaphor (I know, I know, I've said
that many times before) where the Mathematics will fit some aspects of our
target domain and possibly not fit others.  While it might lay down
perfectly on top of its target application, it is likely there will be many
areas physically related to the domain for which the Mathematical theory is
irrelevant.  For example, with the counting of sheep, we can apply the set
of Integers with some basic arithimetic functions and we can get the
counting right.  But that will not tell us what to do if one sheep is
missing.  Such a question would be orthogonal to the "Counting Theory" that
so many shepards are into.  A shepherd who is immersed only in such a theory
could lose their entire flock while sticking to the truth of their theory,
convinced that if they only study it more and learn more about it, they will
solve this problem too.  That is why "sheep herding theory" is not the same
as "counting theory".

My interest is in helping Little Bo Peep as well as the owner of those
sheep.  I'm curious about why when she took a course in college about
shepherding, most of the time was spent talking about counting them, which
didn't actually help her much when she got to the "real world".  That is why
I do not feel guilty about bringing up issues about databases in a database
theory newsgroup.  If this were a "relational theory" newsgroup where the
goal were to push the edges of a Mathematical theory without interest in
whether this theory were useful to databases or in what way it might be
useful or not, that would be a different discussion.

Cheers!  --dawn
Tony - 16 May 2004 14:30 GMT
> >> All this talk about how "Newton got it wrong,  and Einstein got it right"
> >> is a bunch of claptrap.  The people in this forum, for the most part, don't
[quoted text clipped - 8 lines]
> And some of us like bringing our 3rd-year undergrad Physics knowledge
> (from a top-5 Uni) into it, too :-)

I am suitably impressed and humbled... ;-)

> It's just that I find Newtonian mechanics an excellent analogy. To
> express it in computerese, both Newtonian Mechanics and Relational
[quoted text clipped - 6 lines]
> believe that relational theory does. That argument just doesn't make
> sense to me.

You keep saying that (on and on, tediously...) but it just doesn't
work, does it?  After all, didn't NASA put a man on the moon using
Newtonian Mechanics?  Expensive and complex successful experiments
have been done to observe the effects of relativity, but it hardly
impacts on the real world as lived in by us humans does it?   If your
analogy holds any water at all (to give you the benefit of very large
doubt), it suggests that relational theory will do just fine for
pretty much anything we ever want to do "in the real world".
Tony - 16 May 2004 19:21 GMT
> > I just find it fascinating that, while we know that Newtonian Mechanics
> > doesn't belong in the set Accurately_Matches_The_Real_World, so many
[quoted text clipped - 10 lines]
> doubt), it suggests that relational theory will do just fine for
> pretty much anything we ever want to do "in the real world".

Perhaps more to the point, Newtonian Mechanics is an attempt (accurate
or not) to model "how the world works".  By contrast, database theory
(any database theory) is merely trying to come up with the best way to
computerize book-keeping.  The two are hardly comparable endeavours,
are they?
Anthony W. Youngman - 17 May 2004 09:21 GMT
>> > I just find it fascinating that, while we know that Newtonian Mechanics
>> > doesn't belong in the set Accurately_Matches_The_Real_World, so many
[quoted text clipped - 16 lines]
>computerize book-keeping.  The two are hardly comparable endeavours,
>are they?

But both are attempts to apply a mathematical model to a real world
problem. Viewed from a dispassionate oversight, both are instances of
the SAME problem, and the same techniques can be applied to solving
them. Namely "how well does my mathematical model work in the real
world?".

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Marshall Spight - 22 May 2004 04:25 GMT
> But both are attempts to apply a mathematical model to a real world
> problem. Viewed from a dispassionate oversight, both are instances of
> the SAME problem, and the same techniques can be applied to solving
> them. Namely "how well does my mathematical model work in the real
> world?".

It's clear you love this analogy, but it doesn't work.

What we put in the database is data, not the real world. Neither do
we attempt to say anything about the real world with our databases.
Consider a payroll database. Does it contain one single fact about
the natural world? It does not. It has names, social security numbers,
addresses, salaries, phone numbers, etc. These are all 100% human
constructs; none of them are found anywhere in the real world; they
are exclusively in our heads.

I suppose you will counter with some NASA database or something.
But what will it have in it? Let's say it's full of the positions of rocks.
But how do we record those positions? With a GPS machine that tells
us lattitude and longitude. Note that we don't have *actual* rocks
in the database; we only have data for the lat/lon pairs. You could comb
all over the surface of Mars or Earth and never find a lattitude line.

The internal predicate is in the database; the external predicate is in
our heads. Humans convert from one to the other; machines can't.
It's imperative that that the humans be able to tell the difference.

Marshall
Anthony W. Youngman - 22 May 2004 13:54 GMT
>> But both are attempts to apply a mathematical model to a real world
>> problem. Viewed from a dispassionate oversight, both are instances of
[quoted text clipped - 11 lines]
>constructs; none of them are found anywhere in the real world; they
>are exclusively in our heads.

Well, if they're not facts about the real world, then I presume they are
imaginary musings? In which case they are no better than fantasy. So why
bother with them?

Names, Social Security Numbers, etc etc are all ways of describing real
things (in these cases a person). An address describes a real thing - a
building. Etcetera.

But the point is, if you do not have some way of FORMALLY converting
between a person (you, me, whoever) or a phone (a physical thing you can
hold) or a building (something you can look at), and the data that
describes those things, then your theory of data MUST be unscientific.

Bearing in mind that this is the study of philosophy ("does the tree,
continue to be, if no-one's there to see") I'm quite happy with a
scrappy attempt to explain things. But the conversion has to be both
ways - with "mass" we know exactly what Newton meant in his mathematical
theory, and we know exactly what we mean in the real world when we pick
up a heavy object. And we (now, thanks to Einstein) know that those two
definitions (the real and the mathematical) don't quite tie up.

But if you can't give me a way of converting between "data" and the
real-world objects it describes - in both directions! - then by
definition any theory of data must be unfalsifiable, therefor it is
unscientific, therefor it lives very firmly in the realms of mathematics
and religion. I'm sorry, but I'm a scientist by training and I most
definitely don't believe in that religion.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

x - 24 May 2004 13:48 GMT
> >> But both are attempts to apply a mathematical model to a real world
> >> problem. Viewed from a dispassionate oversight, both are instances of
[quoted text clipped - 15 lines]
> imaginary musings? In which case they are no better than fantasy. So why
> bother with them?

But they could be fantasy   :-)

> Names, Social Security Numbers, etc etc are all ways of describing real
> things (in these cases a person). An address describes a real thing - a
> building. Etcetera.

Or an imaginary thing :-)
The question is: How do you test if some "fact" is real or imaginary ?

> But the point is, if you do not have some way of FORMALLY converting
> between a person (you, me, whoever) or a phone (a physical thing you can
> hold) or a building (something you can look at), and the data that
> describes those things, then your theory of data MUST be unscientific.

Well, we can have data about many kinds of "things": physical, chemical,
imaginary, etc.:-)
Why are you interested only in "physical" ones ? :-)

> Bearing in mind that this is the study of philosophy ("does the tree,
> continue to be, if no-one's there to see") I'm quite happy with a
[quoted text clipped - 3 lines]
> up a heavy object. And we (now, thanks to Einstein) know that those two
> definitions (the real and the mathematical) don't quite tie up.

Many of us gave up asking WHY long time ago.
Instead, we ask HOW MANY/MUCH :-)

> But if you can't give me a way of converting between "data" and the
> real-world objects it describes - in both directions! - then by
> definition any theory of data must be unfalsifiable, therefor it is
> unscientific, therefor it lives very firmly in the realms of mathematics
> and religion. I'm sorry, but I'm a scientist by training and I most
> definitely don't believe in that religion.

We have NOTARIES, ACCOUNTANTS, LAWYERS,... :-)
Anthony W. Youngman - 17 May 2004 09:18 GMT
>> I just find it fascinating that, while we know that Newtonian Mechanics
>> doesn't belong in the set Accurately_Matches_The_Real_World, so many
[quoted text clipped - 10 lines]
>doubt), it suggests that relational theory will do just fine for
>pretty much anything we ever want to do "in the real world".

I think you need to read up - and fast!

If NASA had used Newtonian Mechanics, from what I know, the astronauts
would never have come back.

Even under such "near earth" conditions as that, the discrepancy between
Newtonian Mechanics and Relativity would have been enough to ensure the
rockets ran out of fuel, stranding the astronauts in space.

We're talking velocities of 7 miles a second here, more than fast enough
for relativity to make itself felt. That's roughly c*10^-5 - not small
beer. Actually - it looks like we probably need relativity even with the
Shuttle!

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony - 18 May 2004 10:59 GMT
> >> I just find it fascinating that, while we know that Newtonian Mechanics
> >> doesn't belong in the set Accurately_Matches_The_Real_World, so many
[quoted text clipped - 12 lines]
>
> I think you need to read up - and fast!

I will indeed read up - though don't worry, there is no real urgency:
I am not personally involved in putting men on the Moon.

> If NASA had used Newtonian Mechanics, from what I know, the astronauts
> would never have come back.
[quoted text clipped - 7 lines]
> beer. Actually - it looks like we probably need relativity even with the
> Shuttle!

Despite being no expert, I am pretty confident that you are completely
wrong here.  0.0000376 * c sounds pretty small to me.  How much
discrepancy in fuel usage could that lead to - a millilitre even?  I
bet whatever difference it makes is insignificant compared to other
more mundane factors such as the accuracy of measuring the rate of
fuel use, quality of fuel, etc.

But yes, I will do a little Googling to see if you are right.  If I
had a hat, I'd be prepared to eat it if it turned out you were
correct.
Laconic2 - 18 May 2004 15:28 GMT
> But yes, I will do a little Googling to see if you are right.  If I
> had a hat, I'd be prepared to eat it if it turned out you were
> correct.

You are right, Tony.  Your hat, if you had a hat, would be safe.  The
divergence between Newtonian mechanics and Einsteinian mechanics for the
entire Apollo mission is less than the margin of error in the instruments on
board.

OTOH,  the Apollo missions did carry a fair number of instruments to the
moon whose purpose was to capture data that would confirm or contradict
Einstein's predictions.  AFAIK, Einstein is batting a thousand.

You actually don't have to go so far afield to find a connection between
Einstein's theories and everyday life.  Some percentage (I don't know how
much) of Europe's electric energy is generated by nuclear plants.  Inside
those plants, nuclear fission is the source of energy.  And that energy
corresponds to the reduction of mass that results from the splitting of
certain kinds of nuclei.
Chris Hoess - 19 May 2004 07:41 GMT
> It's just that I find Newtonian mechanics an excellent analogy. To
> express it in computerese, both Newtonian Mechanics and Relational
> Theory are instances of the class Mathematical_Theory. BOTH are
> mathematically perfect (well, I know Newtonian Mechanics is).

But you're missing an important point, namely, Newtownian mechanics
incorporates into it distinct physical concepts such as mass, distance, and
time. Relational theory does not. This is why we can't set up some
experiment to test "relational theory" as such against the real world and
see what happens: only by creating a specific schema which links together
machine-readable definitions of relations and constraints and the semantic
import of those relations can we try and test relational theory, or any
other general theory of data modelling, against the real world.

To put it another way, relational theory is analogous to the equation for a
Gaussian distribution, f(x) = ae^(-bx^2). Were I to assert that Gaussian
distributions are useful in describing scientific phenomena, you might ask
me for a test; and what are f, a, b, and x? And when I tell you that it
depends on the phenomenon we are trying to describe, and that f, a, b, and x
can be many different things, you might mistake it to be of no practical
value, as it makes no verifiable predictions. But if I were to substitute
for f C, the concentration, for a C0/sqrt(4piDt), for b 1/4Dt, and
proclaimed x to be distance, I would have made use of a Gaussian
distribution to describe the process of diffusion, and it could be checked
experimentally and the predictions of the equation (Fick's Second Law)
verified. Only by giving a physical interpretation to the variables of the
Gaussian distribution does it become a scientifically verifiable theory; and
only by creating a schema which we associate with semantics are we able to
test the application of the relational model to our problems.

Having established that the relational model is an underlying mathematical
framework bound to reality by the "glue" of the schemas we create, we're on
better grounds to discuss the applicability of the model without premature
calls for "experiment". We know that data in the relational model is
formulated as logical propositions whose validity is evaluated by
first-order logic. Hence my tenative suggestion in a post here about a month
ago for examining alternatives to the relational model: are logical
propositions the best way to formulate data, and do we need more power than
first-order logic can bring us (and what trade-offs does that present)?

(Incidentally, can we agree that while consistency is not sufficient to
prove the correctness of a data model, it is necessary?)

Signature

Chris Hoess

Anthony W. Youngman - 20 May 2004 00:28 GMT
>> It's just that I find Newtonian mechanics an excellent analogy. To
>> express it in computerese, both Newtonian Mechanics and Relational
[quoted text clipped - 9 lines]
>import of those relations can we try and test relational theory, or any
>other general theory of data modelling, against the real world.

If we can't set up an experiment (even a Gedanken thought experiment),
then relational theory is not provable, therefor it is not scientific,
therefor it is irrelevant to the real world, therefor why the hell are
we using it :-)

As a scientist/engineer type, not a mathematician, I want some
experimental proof at least. Unfortunately, all the (anecdotal) evidence
I have says that other models work better ...

>To put it another way, relational theory is analogous to the equation for a
>Gaussian distribution, f(x) = ae^(-bx^2). Were I to assert that Gaussian
[quoted text clipped - 11 lines]
>only by creating a schema which we associate with semantics are we able to
>test the application of the relational model to our problems.

Yup! We have an experiment!

>Having established that the relational model is an underlying mathematical
>framework bound to reality by the "glue" of the schemas we create, we're on
[quoted text clipped - 5 lines]
>propositions the best way to formulate data, and do we need more power than
>first-order logic can bring us (and what trade-offs does that present)?

If we accept that data is an abstract proposition INSIDE relational
theory, then I might well agree that logical propositions, first-order
logic etc may well be the best way to formulate data. But that implies
that data is fundamental to database theory in the same way as mass and
energy individually are fundamental to Special Relativity - ie they are
NOT - there is a supra-concept called mass-energy, and the
transformation between mass and energy is part of the theory and nothing
to do with the metaphysical interface to reality ...

>(Incidentally, can we agree that while consistency is not sufficient to
>prove the correctness of a data model, it is necessary?)

Of course. I'd actually rephrase that. While (internal) consistency may
prove the model to be correct (mathematically), we need external
consistency to prove the model accurate (here we go - arguing over the
meaning of words again :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Paul - 20 May 2004 10:18 GMT
> If we can't set up an experiment (even a Gedanken thought experiment),
> then relational theory is not provable, therefor it is not scientific,
> therefor it is irrelevant to the real world, therefor why the hell are
> we using it :-)

Newtonian mechanics is more like a particular instance of a database in
the relational model, rather than the model itself.

The relational model is really just an implementation of first-order
predicate logic that is suitable for computers.

Logic is more like a "meta-theory": it's kind of how we reason *about
how we reason*, so it's a bit self-referential.

For a particular database we can test it experimentally: we add data,
query it and check that the answers correspond with reality.

For first-order predicate logic itself, it's almost axiomatic that it
corresponds to reality, because we are saying this is how we argue
logically by definition. Godel proved that first order logic is
"complete" in some sense (see here for example:
http://www.sm.luth.se/~torkel/eget/godel/completeness.html), though the
whole area of Godel is guaranteed to cause confusion and
misunderstanding, and will possibly explode your brain.

>> (Incidentally, can we agree that while consistency is not sufficient to
>> prove the correctness of a data model, it is necessary?)
[quoted text clipped - 3 lines]
> consistency to prove the model accurate (here we go - arguing over the
> meaning of words again :-)

But in order to prove the model is accurate externally we'd have to use
logic. So we've got a chicken and egg situation here. What logic is
external to logic itself?

Paul.
mountain man - 20 May 2004 11:32 GMT
> > If we can't set up an experiment (even a Gedanken thought experiment),
> > then relational theory is not provable, therefor it is not scientific,
[quoted text clipped - 32 lines]
> logic. So we've got a chicken and egg situation here. What logic is
> external to logic itself?

Random truths (Chaitin) and unprovable truths (Godel).
See http://www.mountainman.com.au/GIF/logic_space_1.jpg

Pete Brown
Falls Creek
Oz
Laconic2 - 20 May 2004 15:28 GMT
> If we can't set up an experiment (even a Gedanken thought experiment),
> then relational theory is not provable, therefor it is not scientific,
[quoted text clipped - 4 lines]
> experimental proof at least. Unfortunately, all the (anecdotal) evidence
> I have says that other models work better ...

You make an interesting point here.  I would add that the same arguments
that would render relational theory
not provable would equally well render the theory non falsifiable.  In that
case,  the question for the engineer becomes moot.
The question "why the hell are we using it" can be countered by "why the
hell not".

I would suggest that the disciplined practices  of engineers are based on
several sources.  One is prior experience,  either the personal experience
of an individual engineer, or the distilled experience of other engineers.
Another is the accumulated results of science, and of other specialties
within engineering.  A third is the results of mathematics.  A fourth is the
study of how people carry out certain data management and data manipulation
tasks in the absence of automation.  A fifth is the study of the strengths
and defects of "legacy systems".

Sorry the list got so long.

My personal experience tells me that the relational data model can be,  in
certain circumstances, an enormous aid in managing the complexity of
defining the data itself,  and in clarifying certain issues in the
development of application software.

This is a far cry from saying that "all data should be in 1NF".
Anthony W. Youngman - 20 May 2004 22:31 GMT
>> If we can't set up an experiment (even a Gedanken thought experiment),
>> then relational theory is not provable, therefor it is not scientific,
[quoted text clipped - 11 lines]
>The question "why the hell are we using it" can be countered by "why the
>hell not".

Actually, you and me both have just said exactly the same thing.
"provable" and "falsifiable" both mean exactly the same thing as far as
science goes - to take that widely misquoted saying "the exception
proves the rule (is wrong)". The bit in parentheses is ignored or
unknown to most people who quote the saying ... Look in a dictionary.
"to prove" can mean "to test".

As for "why the hell not" - well we should be looking for theories that
ARE provable/falsifiable. Because if relational theory is not
falsifiable, then equally we can't show that it works in practice (for
any suitable value of "works"). Would you trust an engineer using
Newtonian Mechanics if you had no way of knowing whether relativistic
effects were likely in that particular application?

>I would suggest that the disciplined practices  of engineers are based on
>several sources.  One is prior experience,  either the personal experience
[quoted text clipped - 6 lines]
>
>Sorry the list got so long.

Well, when I was talking about Newtonian Mechanics metaphysics my list
was about that long :-) but if things have to be complete and
comprehensive, then sometimes they do get long ...

>My personal experience tells me that the relational data model can be,  in
>certain circumstances, an enormous aid in managing the complexity of
>defining the data itself,  and in clarifying certain issues in the
>development of application software.

I would very much agree ... indeed I would say it almost invariably is a
great help, if used as your TOOL and not as your MASTER!

>This is a far cry from saying that "all data should be in 1NF".

And the same here. As far as I am concerned, the job of the "database
analyst designer" is to take real-world information, and convert it to a
data schema for the database. If that includes conversion to 1NF, this
involves a massive loss of metadata, meaning that the conversion is
one-way and cannot be reversed, and therefore the act of conversion
renders the whole thing unprovable / unfalsifiable / unscientific.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Laconic2 - 21 May 2004 14:07 GMT
> I would very much agree ... indeed I would say it almost invariably is a
> great help, if used as your TOOL and not as your MASTER!

Agreed.  The old saying is,  "A fool with a tool is still a fool".

Or perhaps the pragma is better expressed as:

"If you can dream, and not make dreams your master,
If you can think, and not make thoughts your aim,
If you can meet with triumph and disaster,
And treat those two impostors just the same,"

On this subthread, I think the difference between you and me is more how we
choose to express things than on the substance of the matter.
Marshall Spight - 22 May 2004 04:40 GMT
> As for "why the hell not" - well we should be looking for theories that
> ARE provable/falsifiable.

How might one falsify arithmetic? If arithmetic was falsified, would
that mean it wasn't useful anymore?

Marshall
Paul - 22 May 2004 11:26 GMT
> How might one falsify arithmetic? If arithmetic was falsified, would
> that mean it wasn't useful anymore?

It depends what you mean by "falsify arithmetic". There's a result by
Tarski ( http://plato.stanford.edu/entries/tarski-truth/ ) that says
essentially that no language can talk about the truth of sentences
contained within itself without leading to things like the liar paradox.

You need to have a "meta-language" for arithmetic in order to talk about
whether statements in arithmetic are true or not.

I guess the problem is where do you start? Set theory I suppose but then
how do you talk rigorously about set theory?

All this stuff is very subtle but I think it is useful to know a bit
about this kind of thing if you're interested in relational database theory.

If you mean that arithmetic is inconsistent i.e. there is a statement
where you can prove both it and its negation, then that means
*everything* in arithetic is both true and false.

Check out this though:
http://plato.stanford.edu/entries/mathematics-inconsistent/
Inconsistent Mathematics, where you have theories that use non-classical
logic and can deal with inconsistencies without collapsing in on themselves.

Paul.
Anthony W. Youngman - 22 May 2004 14:34 GMT
>> As for "why the hell not" - well we should be looking for theories that
>> ARE provable/falsifiable.
>
>How might one falsify arithmetic? If arithmetic was falsified, would
>that mean it wasn't useful anymore?

What an excellent question !!! Because if I answer it properly, it
clearly explains the difference between mathematics and science. Thanks!

Arithmetic is part of mathematics. Therefor, it is NOT falsifiable. We
merely prove it correct or incorrect. The best example is "reductio ad
absurdam" - if from our starting point we end up with two mutually
exclusive results then either our starting point or our logic must be
wrong. Now what's this got to do with science?

Let's go from arithmetic to geometry. In three dimensions we have
Euclidean geometry. We can prove it correct (or self-consistent - same
thing). In four dimensions, we have special relativity, and again we can
prove it correct. In two dimensions, we have planar, spherical, and
toroidal geometry, and yet again, we can prove them correct.

NOW! Let's apply all three of our two-dimensional geometries to the
surface of the earth. THIS is the "falsifiable" bit.

Let's use planar geometry to describe the little bit of the world we can
see. I know a little bit of American geography, as do many others, so
I'll use that. Let's say we're in Kansas. We know New York is 1500 miles
east, and Dallas is 2000 miles south. So we predict the distance and
direction from New York to Dallas. The reality is we are going to be
well wrong - we've just falsified the assumption that the world is flat.
Or, to put it another way, "planar geometry does not describe the
world". Toroidal geometry will come up with a similar mess.

Spherical geometry, on the other hand will be pretty close. So either
we've cocked up on our geometry or, as is actually the case, the earth
is an approximate sphere not a perfect one. Newton mapped his
mathematical "mass", "energy", "space" and "time" to the real-world
equivalents, and came up with a load of predictions that mostly worked.
So he concluded that his maths was wrong. If he'd concluded that reality
wasn't quite as he envisaged it, he might well have beaten Einstein to
the theory of relativity!

So no. Your question "how do we falsify arithmetic" is meaningless. But
science is about falsifying theories *based* *on* arithmetic (and other
branches of mathematics). Use the maths to make a prediction about the
real world, and then prove (as in test) the theory by seeing if the
prediction is true or false. And if the prediction is falsified by an
exception, then you've just got an example of "the exception proves the
theory is wrong".

And that's why I say Newtonian Mechanics is scientific - it is a
mathematical theory that can be proved/falsified, while Relational
Theory is unscientific because I can see no way - not even with a
Gedanken thought experiment - of trying to falsify it.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Bill H - 24 May 2004 17:17 GMT
Wol:

"Anthony W. Youngman" <wol@thewolery.demon.co.uk> wrote in message

[snipped]

> ... As far as I am concerned, the job of the "database
> analyst designer" is to take real-world information, and convert it to a
> data schema for the database. If that includes conversion to 1NF, this
> involves a massive loss of metadata, meaning that the conversion is
> one-way and cannot be reversed, and therefore the act of conversion
> renders the whole thing unprovable / unfalsifiable / unscientific.

But it still might be useful under the circumstances.  :-)

Bill
Marshall Spight - 22 May 2004 04:35 GMT
> If we can't set up an experiment (even a Gedanken thought experiment),
> then relational theory is not provable, therefor it is not scientific,

Correct, relational theory is not scientific.

> therefor it is irrelevant to the real world, therefor why the hell are
> we using it :-)

Because it is *mathematical.*

I can imagine giving you a four function calculator, and you saying,
how can I devise a real-world, scientific experiment to verify the
validity of this thing, and then throwing it out because you couldn't.

Four function calculators are not scientific, but they are still useful, mathematically.

> As a scientist/engineer type, not a mathematician, I want some
> experimental proof at least.

I am a computer scientist, which is a kind of mathematician.
I have no illusion that what I do relates to the physical world.

> Unfortunately, all the (anecdotal) evidence
> I have says that other models work better ...

I have this gedanken experiment that says, what if I have two apples
and I try to take away three. In the real world, I get an error, because
once I have taken away two, I no longer have any apples that I can
take away. Therefor, only positive integers are scientific. I have no
use for negative numbers because they are not scientific, either, since
there are no negative numbers I can observe in the natural world.

Marshall
Anthony W. Youngman - 22 May 2004 15:58 GMT
>> If we can't set up an experiment (even a Gedanken thought experiment),
>> then relational theory is not provable, therefor it is not scientific,
>
>Correct, relational theory is not scientific.

Good. We agree :-)

>> therefor it is irrelevant to the real world, therefor why the hell are
>> we using it :-)
>
>Because it is *mathematical.*

So I can use any theory I like, so long as it's mathematical, then?

You'd be quite happy for me to calculate your aeroplane's route from A
to B using whatever geometrical theory I cared for, and you wouldn't
object if I used a theory who's practical effect was to destroy your
aircraft in a huge fireball as it underwent a "controlled flight into
terrain", just as long as I could prove the maths I was using was
perfectly sound. The fact that it was the wrong theory for the
real-world task in hand wouldn't bother you in the slightest?

>I can imagine giving you a four function calculator, and you saying,
>how can I devise a real-world, scientific experiment to verify the
>validity of this thing, and then throwing it out because you couldn't.
>
>Four function calculators are not scientific, but they are still
>useful, mathematically.

Well, actually, I could think of an experiment. "If I type '4' '+' '4'
'*' '4' '=' into this thing, then it should come up '20' but might come
up '32' ". And either way, I will be happy at using it because I can
predict (ie "do science") that it will come up with a "correct" answer,
and I can verify that answer.

Actually, I've just done exactly that with the calculator in my copy of
Windows, and guess which answer it came up with ...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Gene Wirchenko - 23 May 2004 23:25 GMT
[snip]

>>Four function calculators are not scientific, but they are still
>>useful, mathematically.
[quoted text clipped - 7 lines]
>Actually, I've just done exactly that with the calculator in my copy of
>Windows, and guess which answer it came up with ...

    I would have to guess since it could come up with either!

    In Standard view, the answer is 32.

    In Scientific view, the answer is 20.  (In this view, parentheses
are available for grouping operations.)

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Anthony W. Youngman - 15 May 2004 23:05 GMT
>All this talk about how "Newton got it wrong,  and Einstein got it right"
>is a bunch of claptrap.  The people in this forum, for the most part, don't
>know what they are talking about.

Well, I would say Newton got it wrong, and I do know what I'm talking
about, and I know I'm right :-)

>There are internal problems,  at the cosmological level,  with Newton's view
>of the universe.  But that's not what led Einstein to push the envelope
>further.  Physics was in crisis in the 19th century,  due to results like
>the Michelson-Morley experiment.  That's more data.
>
>It's data that Einstein had and Newton did not.

Except Newton DID have data that told him he was wrong. And he spent
pretty much the rest of his life trying to work out why his theory
didn't work completely.

Fundamental to Newtonian Mechanics is the conservation of mass - it
cannot be created or destroyed. To Newton, this seemed obvious. To us,
well, we know he got it wrong - we know the rule is that mass-energy is
conserved, and that mass CAN be created and destroyed.

Mercury's orbit is relativistic, not classical. Try as he might, Newton
just could not get his calculations and Tycho's data to agree.

Einstein just had a couple of insights that Newton didn't, due quite
likely as you say to Michelson-Morley amongst other things. More data
always does make life easier :-) and the data he had led him to suspect
that the law of conservation of mass might actually be wrong ... the
rest as they say is history ... (there's a nice story of the same sort
of thing happening to Dick Feynman :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Laconic2 - 16 May 2004 02:34 GMT
Try as I might, I cannot find confirmation of your extraordinary assertion
that the relativisitc precession of mercury  was observable in Tycho's data.
I don't think you are right about this.

Keep in mind that the vast majority of mercury's precession is explainable,
in classical Newtonian mechanics,  by the gravitational attraction of the
other planets.

In the timelines I've seen,  the observation of 35 arcseconds per century of
excess precession of Mercury was attributed to an observation in 1845 by
Leverrier.  It was further corrected to an excess of 43 arcseconds per
century by Newcomb in 1882.
Before Einstein, the excess precession of Mercury was attributed to a
hitherto unknown (and, it turns out nonexistent) planet inside the orbit of
mercury,  to which they gave the name "Vulcan".  (Live long and prosper).

But the descriptions of the amount of time for which you need observations
of Mercury to obtain these findings are very long.  So long that I find it
doubtful that Tycho could have observed for long enough for his data to
detect the Einsteinian precession.

As far as Newton refining and cross checking his work, and seeking to verify
or falsify it down to the last epsilon (so to speak),  I find that very easy
to believe.  In fact, his own assessment of his work is that he felt like a
little child, playing with the shells on the seashore, while the vast ocean
of truth lay undiscovered before him.  And Einstein, when asked to comment
on Newton's work,  said that his own work would have been impossible without
Newton's earlier work.

Those  people in this forum who seem to have every human gift except
humility might do well to learn from such people as Newton and Einstein.
Anthony W. Youngman - 20 May 2004 00:42 GMT
>Try as I might, I cannot find confirmation of your extraordinary assertion
>that the relativisitc precession of mercury  was observable in Tycho's data.
[quoted text clipped - 8 lines]
>Leverrier.  It was further corrected to an excess of 43 arcseconds per
>century by Newcomb in 1882.

I did a "google" on "mercury orbit newton relativity", and it gave me a
load of good pages. About the first one I looked at (the third or so it
found) gave me rather bigger figures than yours for precession (although
it did have a few problems...)

>Before Einstein, the excess precession of Mercury was attributed to a
>hitherto unknown (and, it turns out nonexistent) planet inside the orbit of
>mercury,  to which they gave the name "Vulcan".  (Live long and prosper).

And apparently half the excess precession found by Newton was due to
relativity ...

>But the descriptions of the amount of time for which you need observations
>of Mercury to obtain these findings are very long.  So long that I find it
>doubtful that Tycho could have observed for long enough for his data to
>detect the Einsteinian precession.

How long? Don't forget. Tycho STARTED these observations in about 1550.
Newton was around about 1750. So he actually had about 200 years worth
of data to play with.

>As far as Newton refining and cross checking his work, and seeking to verify
>or falsify it down to the last epsilon (so to speak),  I find that very easy
[quoted text clipped - 6 lines]
>Those  people in this forum who seem to have every human gift except
>humility might do well to learn from such people as Newton and Einstein.

And modern man would do well to learn humility from the ancients. 1
arcsecond is easily detected today, I would think. And it wouldn't
surprise me if Newton had access to some pretty accurate instruments too
- why shouldn't he be able to resolve with that sort of accuracy too?
With two centuries of data, that makes well over an arcminute due to
relativity alone. We know he could detect that sort of accuracy, because
he was trying to explain it!

(The website I looked at said the precession was more like 540
arcseconds a year, but it also said there were 360 arcseconds in a
degree, so I think it has mislaid a few powers of ten somewhere :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Laconic2 - 20 May 2004 16:02 GMT
> I did a "google" on "mercury orbit newton relativity", and it gave me a
> load of good pages. About the first one I looked at (the third or so it
> found) gave me rather bigger figures than yours for precession (although
> it did have a few problems...)

Yes.  The figure are considerably bigger because the precession of Mercury
is for the most part, due to
attraction from the other planents.  The only figures I quoted were the
"excess" (that is, non Newtonian)
observed precession of Mercury.

> (The website I looked at said the precession was more like 540
> arcseconds a year, but it also said there were 360 arcseconds in a
> degree, so I think it has mislaid a few powers of ten somewhere :-)

There are 3600 arcseconds in a degree.

As far as the connection to this forum goes,  I think the discussion in here
reminds me more of the discussions between
Simplicio, Salviati, and Sagredo in Galileo's writings.

I can just hear Simplicio saying something like:

<quote>
Aristotle's axioms are self evident, and his logic is irrefutable.
Therefore his conclusions are correct.

Therefore, if you report experimental observations that contradict his
conclusions,  then you are either lying or you have been misled by your
infatuation with experimental observation.

If you had the proper respect for your betters you would restrain yourself
from making such rash claims, in contradiction of the wisdom of the
ancients.  And if you had proper training in philosophical thinking, you
would be able to confirm Aritstotle's work for yourself,  instead of all
this nonsense about taking cannonball up to the top of a tower and dropping
them.

</quote>
Anthony W. Youngman - 20 May 2004 22:36 GMT
>> (The website I looked at said the precession was more like 540
>> arcseconds a year, but it also said there were 360 arcseconds in a
>> degree, so I think it has mislaid a few powers of ten somewhere :-)
>
>There are 3600 arcseconds in a degree.

Yup :-) 60 seconds times 60 minutes = 3600 seconds in a degree :-) I
knew that.

>As far as the connection to this forum goes,  I think the discussion in here
>reminds me more of the discussions between
[quoted text clipped - 18 lines]
>
></quote>

PERFECT! That's a quote I would have loved to have had available to me
earlier :-)

Thanks.
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
The society which scorns excellence in plumbing because plumbing is a humble
activity, and tolerates shoddiness in philosophy because it is an exalted
activity, will have neither good plumbing nor good philosophy. Neither its
pipes nor its theories will hold water. John W Gardner

Laconic2 - 21 May 2004 14:30 GMT
> PERFECT! That's a quote I would have loved to have had available to me
> earlier :-)

The thing is, I'm not satisfied with the arguments of either the pro
relational camp or their challengers in this forum.

There's clearly a lot of intelligence and erudition in here,  but it seems
to be savagely misused,  on both sides of the argument.

I've used the power of relational joins,  ever since I was first exposed to
the concept.  And my first use involved nothing more sophisticated than
Datatrieve and indexed files on a VAX.  And the theorists in this forum who
dismiss that as "not relational"  have a fundamental synapse missing with
regard to the connection between theory and pragma.

I've never used PICK,  but from what I've read in here,  if one were to
study the reason why certain PICK applications were
(and possibly still are) successful,  and do the same for Datatrieve,  one
would find a surprising overlap.  And I think data models would play a minor
role in both studies.  I'd love to see some rational discussion of that.
But we'd have to get away from some of the cultural norms of this forum.

For me, the migration from Datatrieve to VAX Rdb/VMS,  and later from that
to Oracle were pretty natural.  While I find much to criticize about SQL,
it's far, far better than the access languages that grew up around CODASYL
databases!  If a better language can be designed, implemented and adopted,
I'm all for that!  But don't expect me to wait!
Anthony W. Youngman - 21 May 2004 23:37 GMT
>> PERFECT! That's a quote I would have loved to have had available to me
>> earlier :-)
[quoted text clipped - 10 lines]
>dismiss that as "not relational"  have a fundamental synapse missing with
>regard to the connection between theory and pragma.

Well, think of a join, and then think of that join being along a
"cascading delete" link - ie the linked table is an attribute of the
master table.

In Pick, that join wouldn't be necessary, the linked table would
logically and physically be part of the master table ...

And if you haven't got a cascading delete, chances are you either have a
code lookup; or you're only interested in viewing fields in one table,
but need the other table for certain SELECT fields. In the former case
you declare a virtual field in Pick, and in the latter it may require
(slightly) more effort on the part of the programmer, but a lot less
effort on the part of the database...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Jonathan Leffler - 15 May 2004 01:30 GMT
> In message <2gkdtnF3saspU1@uni-berlin.de>, Alan <alan@erols.com> writes
>
[quoted text clipped - 10 lines]
>
> By these standards, "data" is woefully vague and undefined.

The recording of a mass, or of the energy of an object, or the time at
which an event was perceived to occur, or any of a myriad other
things, could be data.  So, data encompasses all of the things you
mention and many other things too.

> And it's not
> even atomic! Within the theory it's chopped up into tuples, which are
> themselves chopped up into (I'm not into terminology here) keys,
> attributes, relations, and probably other stuff besides.

Hmmm - that's an odd set of comments.  Data generally are atomic
facts.  The way I'd view it is that a tuple is composed of individual
pieces of data.  And tuples are certainly not chopped up into
relations.  Attributes within a tuple contain 'atomic facts' (though
when you get into sub-structures such as relation-valued attributes,
the definition of atomic is more complex).  Keys are properties of
relations, etc.  My goodness me, that paragraph is so confused as to
be close to meaningless - and what meaning there is is almost all
completely antithetical to the theory behind a RDBMS, which is what
the subject of the thread is discussing.

Signature

Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler@earthlink.net, jleffler@us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/

mountain man - 15 May 2004 17:43 GMT
...[trim sci.physics thread] ...

> Okay. So what is "data". Because if we can't anchor that in the real
> world, we have no way of knowing if, or how strongly, relational theory
> is relevant (and usable) in the real world.

Relational theory is useful and relevant. For the people
who are database academics, database technical, indeed
anything database-centric, the theory is generally all they
need and require to do what they do (within E2 below)

There are 3 software environments:
E1 = Operating system and network os layer
E2 = RDBMS layer
E3 = application layer

The "data" is bound within E2, and although is operated
on within E2 (hopefully in accordance with the RM), the
ultimate control for these operations are from the end user
within an organisation, via the app layer (E3). (GIGO)

However in the real world the data within the RDBMS is
in fact owned by an organisation, not by the RDBMS vendor,
nor the application vendor/developer, nor the RM, and
in reality only has meaningful context for that organisation,
at that instant in space & time. (production data backup)

[An aside: now I can see where the physics thread may
have become self-emergent ;-]

The RM does not reflect the actuality of the above, nor
make any provision for the management of the E3 layer
because it is not yet completely evolved.

The catch-cry "the RM is just as applicable to database
systems today, as it was in the early 1980's" should be
taken as an indication that something is wrong with it as
a pedagogic device for 2004.

The reason for this is that E2 and E3 have changed  alot
since 1980, particularly E2, the RDBMS software. Due
to the emergence of  addressable stored procedures in
the RDBMS, there has been an effective "migration" of
intelligence (code) from E3 to E2.

The boundaries between E2 and E3 are now probably
best described as fractal, whereas in the past they were
heavily demarked.

Back to your question on the "data". It is physically
anchored by a backup, and theoretically anchored by
the database schema, constraints, etc. from the perspective
of the (incomplete) RM.

However in practice, it is a dynamic fluid element that
must be managed, with the assistance of, but also outside
the realm of the present applicability of the RM.

Change management is the name given to the bag that holds
together everything that falls through the cracks of theory
and out into the world of practice.

Pete Brown
Falls Creek
Oz
Alfredo Novoa - 16 May 2004 12:56 GMT
>There are 3 software environments:
>E1 = Operating system and network os layer
>E2 = RDBMS layer
>E3 = application layer

>The RM does not reflect the actuality of the above, nor
>make any provision for the management of the E3 layer
>because it is not yet completely evolved.

No, the application layer is what must be adapted to the RM and not
the contrary. What is not evolved is the application layer.

>The catch-cry "the RM is just as applicable to database
>systems today, as it was in the early 1980's" should be
>taken as an indication that something is wrong with it as
>a pedagogic device for 2004.

There are many things wrong in the application layer. For instance the
application programming languages.

>The reason for this is that E2 and E3 have changed  alot
>since 1980, particularly E2, the RDBMS software. Due
>to the emergence of  addressable stored procedures in
>the RDBMS

But complete RDBMS's still don't exist.

>, there has been an effective "migration" of
>intelligence (code) from E3 to E2.

But not enough, and in the last years we are seeing a regression. A
migration of business logic from SQL DBMS's to the crappy "Application
Servers".

Regards
 Alfredo
mountain man - 17 May 2004 14:56 GMT
> >There are 3 software environments:
> >E1 = Operating system and network os layer
[quoted text clipped - 7 lines]
> No, the application layer is what must be adapted to the RM and not
> the contrary. What is not evolved is the application layer.

Demonstrated here is the entire application layer contained
in the RDBMS software. Zero apps on clients:
http://www.mountainman.com.au/software/southwind

This uses stored procedures, which are DBMS objects.
These objects have functional relationships to the data
structures and the data structures have an evolving
structure via the objects.  All is heavily inter-related
and unified within the database system.

But the RM in its present state cannot reference this
other-side-of-the-coin object data.  It should be able to
in the future, perhaps.

> >The catch-cry "the RM is just as applicable to database
> >systems today, as it was in the early 1980's" should be
[quoted text clipped - 10 lines]
>
> But complete RDBMS's still don't exist.

Machines using the basic "un-blessed" principles of the RM
have only been around for 25 years.  These are good enough
for me, because they (especially the more recent ones) do
actually incorporate *much* of the basics of the RM.

> >, there has been an effective "migration" of
> >intelligence (code) from E3 to E2.
>
> But not enough,

Then you do agree that there exists (object) "data"
within the SQL DBMS's that is unable to be referenced
by the relational model of "data"?

> and in the last years we are seeing a regression. A
> migration of business logic from SQL DBMS's to the crappy "Application
> Servers".

What do you think are the major elements behind this
migration to these (I actually agree with your here) crappy
"Apps boxes"?  I used to suspect they were "caused by bad
apps".

Pete Brown
Falls Creek
Oz
Alfredo Novoa - 18 May 2004 13:32 GMT
>Machines using the basic "un-blessed" principles of the RM
>have only been around for 25 years.  These are good enough
>for me, because they (especially the more recent ones) do
>actually incorporate *much* of the basics of the RM.

A truly RDBMS would be a lot better. Most of the everyday problems of
the database programmers are due to the flaws of the current DBMSs.

>> >, there has been an effective "migration" of
>> >intelligence (code) from E3 to E2.
[quoted text clipped - 4 lines]
>within the SQL DBMS's that is unable to be referenced
>by the relational model of "data"?

No, I mean that most people does not know how to take advantage on the
few that SQL DBMS's offer.

> and in the last years we are seeing a regression. A
>> migration of business logic from SQL DBMS's to the crappy "Application
[quoted text clipped - 4 lines]
>"Apps boxes"?  I used to suspect they were "caused by bad
>apps".

The key elements are ignorance and the flaws of SQL DBMS's

Regards
 Alfredo
mountain man - 19 May 2004 09:41 GMT
> >Machines using the basic "un-blessed" principles of the RM
> >have only been around for 25 years.  These are good enough
> >for me, because they (especially the more recent ones) do
> >actually incorporate *much* of the basics of the RM.
>
> A truly RDBMS would be a lot better.

Well where is it?

> Most of the everyday problems of
> the database programmers are due to the flaws of the current DBMSs.

Not if you program in SQL from within the RDBMS.

> >> >, there has been an effective "migration" of
> >> >intelligence (code) from E3 to E2.
[quoted text clipped - 7 lines]
> No, I mean that most people does not know how to take advantage on the
> few that SQL DBMS's offer.

Well, that may certainly be true, but does not relate
to the applicability, or in this instance, the ineffectiveness
of the current RM to address this (object) data.

> > and in the last years we are seeing a regression. A
> >> migration of business logic from SQL DBMS's to the crappy "Application
[quoted text clipped - 6 lines]
>
> The key elements are ignorance and the flaws of SQL DBMS's

Either way, application servers are (usually) a step backwards.
My focus is building suites of application system components
as SQL stored procedures within the (R)DBMS to the extent
that there exists zero components external to the (R)DBMS.

The modern (R)DBMS environment is capable of
"internalising" the entire applications environment.

Pete Brown
Falls Creek
Oz
Alfredo Novoa - 19 May 2004 12:14 GMT
>> A truly RDBMS would be a lot better.
>
>Well where is it?

I hope it is in the near future.

>> Most of the everyday problems of
>> the database programmers are due to the flaws of the current DBMSs.
>
>Not if you program in SQL from within the RDBMS.

You suffer the problems specially if you program in SQL from within
the SQL DBMS.

See Date's writings about the SQL flaws.

>Well, that may certainly be true, but does not relate
>to the applicability, or in this instance, the ineffectiveness
>of the current RM to address this (object) data.

The RM supports objects. See The Third Manifesto.

>> The key elements are ignorance and the flaws of SQL DBMS's
>
>Either way, application servers are (usually) a step backwards.

Agreed. They are network DBMS's without an storage engine.

>My focus is building suites of application system components
>as SQL stored procedures within the (R)DBMS to the extent
>that there exists zero components external to the (R)DBMS.

And what is the problem with The Relational Model?

>The modern (R)DBMS environment is capable of
>"internalising" the entire applications environment.

And the future TRDBMS's will do it a lot better.

Regards
 Alfredo
mountain man - 19 May 2004 14:04 GMT
"Alfredo Novoa" <alfredo@ncs.es> wrote :

...[trim]...

> >My focus is building suites of application system components
> >as SQL stored procedures within the (R)DBMS to the extent
> >that there exists zero components external to the (R)DBMS.
>
> And what is the problem with The Relational Model?

It has a Godel-like incompleteness:
http://www.mountainman.com.au/software/history/relational_model_incomplete.htm

Pete Brown
Falls Creek
Oz
Todd B - 19 May 2004 22:37 GMT
> "Alfredo Novoa" <alfredo@ncs.es> wrote :
> >
> > And what is the problem with The Relational Model?
>
> It has a Godel-like incompleteness:
> http://www.mountainman.com.au/software/history/relational_model_incomplete.htm

I'm no mathematician, but didn't Godel prove that 'any' formal system
is incomplete?

Also, the interpretation in the 'real' world of the symbols of any
formal system seems to be pretty much up in the air.

Todd
mountain man - 20 May 2004 03:11 GMT
> > "Alfredo Novoa" <alfredo@ncs.es> wrote :
> > >
> > > And what is the problem with The Relational Model?
> >
> > It has a Godel-like incompleteness:

http://www.mountainman.com.au/software/history/relational_model_incomplete.htm

> I'm no mathematician, but didn't Godel prove that 'any' formal system
> is incomplete?

Yes, he did.  But I am being specific about provision of one specific
instance
in which the incompleness of the RM is comprehendable.

> Also, the interpretation in the 'real' world of the symbols of any
> formal system seems to be pretty much up in the air.

In a database, relational or otherwise, the interpretations are usually
sorted out in advance with respect to the data elements. They are
interpretted with respect to the organisation (IMO)

Pete Brown
Falls Creek
Oz
Paul - 20 May 2004 10:37 GMT
>>>>And what is the problem with The Relational Model?
>>>
>>>It has a Godel-like incompleteness:
>
> http://www.mountainman.com.au/software/history/relational_model_incomplete.htm

I don't quite understand what you mean here. Even if you think that
relational theory is missing something, I don't think it is a
"Godel-like" incompleteness.

>>I'm no mathematician, but didn't Godel prove that 'any' formal system
>>is incomplete?
>
> Yes, he did.  But I am being specific about provision of one specific
> instance in which the incompleness of the RM is comprehendable.

Well, Godel acually proved that first-order predicate logic (upon which
the relational model is based) is complete in some sense. The
Incompleteness theorem only applies to theories that are above a certain
complexity. To add to the confusion, there are slightly different
meanings of "complete" here. See this page for more details:
http://www.sm.luth.se/~torkel/eget/godel/completeness.html

I think essentially the difference is that you need to use logic to show
that some other theories are incomplete, but to show the completeness of
 logic itself you've got a bit of a self-referential paradox. I could
be completely wrong here though. Very interesting though.

Paul.
Anthony W. Youngman - 20 May 2004 22:39 GMT
>>>>>And what is the problem with The Relational Model?
>>>>
[quoted text clipped - 19 lines]
>different meanings of "complete" here. See this page for more details:
>http://www.sm.luth.se/~torkel/eget/godel/completeness.html

And isn't there something about if they are complete, then they also
have to be simplistic (and therefore cannot be real-world accurate)?

>I think essentially the difference is that you need to use logic to
>show that some other theories are incomplete, but to show the
>completeness of  logic itself you've got a bit of a self-referential
>paradox. I could be completely wrong here though. Very interesting though.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Christopher Browne - 20 May 2004 23:52 GMT
Quoth "Anthony W. Youngman" <wol@thewolery.demon.co.uk>:
> And isn't there something about if they are complete, then they also
> have to be simplistic (and therefore cannot be real-world accurate)?

No, there isn't.  You're just making that up because it would be
convenient to your position.
Signature

(reverse (concatenate 'string "moc.enworbbc" "@" "enworbbc"))
http://cbbrowne.com/info/finances.html
Rules of the  Evil Overlord #22. "No matter how tempted  I am with the
prospect  of unlimited  power, I  will  not consume  any energy  field
bigger than my head. <http://www.eviloverlord.com/>

Todd B - 20 May 2004 23:05 GMT
> >>>>And what is the problem with The Relational Model?
> >>>
[quoted text clipped - 5 lines]
> relational theory is missing something, I don't think it is a
> "Godel-like" incompleteness.

I'm not entirely certain, but it seems to me that any logic model that
is consistent (i.e. theorems derived from the axioms do not contradict
the axioms or other theorems so derived) will be unable to find
certain truths within the system.  And that seems to be Godel's sword
in the stone (you know, he's actually not the first to come up with
the idea, but the first to apply it to number theory).  In other
words, pretty much everything is Godel-like, unless you adapt an
informal system, but then when you do that, you lose the power of
logic altogether.

> >>I'm no mathematician, but didn't Godel prove that 'any' formal system
> >>is incomplete?
[quoted text clipped - 8 lines]
> meanings of "complete" here. See this page for more details:
> http://www.sm.luth.se/~torkel/eget/godel/completeness.html

Good short article that touches on some key points of the theorem and
its implications.  But seriously, I'm a bit over my head here, since
my only source on Godel is the book "Godel, Escher, Bach: a Golden
Braid".  I haven't read the actual Incompleteness proof.

Todd
mountain man - 27 May 2004 04:11 GMT
> > >>>>And what is the problem with The Relational Model?
> > >>>
> > >>>It has a Godel-like incompleteness:

http://www.mountainman.com.au/software/history/relational_model_incomplete.htm

> > I don't quite understand what you mean here. Even if you think that
> > relational theory is missing something, I don't think it is a
[quoted text clipped - 9 lines]
> informal system, but then when you do that, you lose the power of
> logic altogether.

Not necessarily.  Deduction goes out the window, true,
but inference is still as valid as ever.  The measure of the
power of inference over the power of deduction is a
tricky subject area, for sure.

...[trim]...

Pete Brown
Falls Creek
Oz
Paul - 27 May 2004 10:42 GMT
>>I'm not entirely certain, but it seems to me that any logic model that
>>is consistent (i.e. theorems derived from the axioms do not contradict
[quoted text clipped - 10 lines]
> power of inference over the power of deduction is a
> tricky subject area, for sure.

What's the difference between inference and deduction?
Are they not the same thing?

Paul.
Tony - 20 May 2004 14:11 GMT
> > "mountain man" <hobbit@southern_seaweed.com.op> wrote in message
>  news:<LXIqc.47648$TT.3115@news-server.bigpond.net.au>...
[quoted text clipped - 12 lines]
> instance
> in which the incompleness of the RM is comprehendable.

You may consider that the RM is incomplete, but it is NOT a
"Godel-like" incompleteness: you are just attaching a fancy-sounding
but irrelevant label to your claim.  It is like describing any kind of
uncertainty as "Heisenberg-like" or any kind of cat as
"Schrodinger-like"!
Eric Kaun - 19 May 2004 22:20 GMT
> There are 3 software environments:
> E1 = Operating system and network os layer
[quoted text clipped - 11 lines]
> make any provision for the management of the E3 layer
> because it is not yet completely evolved.

I disagree, although your use of the term "management" is questionable. E1
provides services for E2; E2 does an analogous thing for E3. E2 provides E3
with data (whatever it means) and inferences about that data; that's its
job.

> The catch-cry "the RM is just as applicable to database
> systems today, as it was in the early 1980's" should be
> taken as an indication that something is wrong with it as
> a pedagogic device for 2004.

It's more than a pedagogic device, but it's at least that.

> The reason for this is that E2 and E3 have changed  alot
> since 1980, particularly E2, the RDBMS software. Due
> to the emergence of  addressable stored procedures in
> the RDBMS, there has been an effective "migration" of
> intelligence (code) from E3 to E2.

You contradict yourself here, as the migration doesn't mean that the
previous services supplied by E2 are any more or less different. The code is
running in a different place. Date's Intro to DB Systems book discusses
various "levels" of the DB schemata - I forget the exact terms he uses, and
I don't have the book here. So yes, there is a distinction between "shared"
and "app-specific" components, whether they're running in the DBMS or not.

But that doesn't alter the fact that the code "objects" (E2b) in the DBMS
are different from the relational "objects" (E2a) in the DBMS. There's still
a logical separation, with E2b relying on E2a just like E2 relies on E1, and
E3 on E2. Capiche?

I'm still curious what sorts of concepts (other than the vacuous term
"management") you think the relational model should include for such things.

> The boundaries between E2 and E3 are now probably
> best described as fractal, whereas in the past they were
> heavily demarked.

They're hardly fractal, and I would say that the layering is more severe in
the E2b category. E2a remains solidly relational.

> However in practice, it is a dynamic fluid element that
> must be managed, with the assistance of, but also outside
[quoted text clipped - 3 lines]
> together everything that falls through the cracks of theory
> and out into the world of practice.

I think I'm starting to agree with you on this, but still don't think that's
the province of relational - at least not yet. I would like to see
discussions on a standard system catalog - in essence, relational statements
about relvars! Since the catalog is a set of relvars like the others, yet
describes those others, we now have true relational metadata, an interesting
topic...

- erk
Dawn M. Wolthuis - 20 May 2004 01:20 GMT
<snip>
> Change management is the name given to the bag that holds
> > together everything that falls through the cracks of theory
> > and out into the world of practice.

I think we agree on the problem being a rather narrow definition of
databases or a vertical partitioning of the software development problem
that isn't necessarily the best.  Your solution to place everything "in the
database" (if I understand you correctly) is fine from my perspective, but
not the one I lean toward naturally -- I'd prefer an OO language to a
declarative one, particularly a vendor-specific declarative language that is
difficult to convert from one db to another.

> I think I'm starting to agree with you on this, but still don't think that's
> the province of relational - at least not yet. I would like to see
> discussions on a standard system catalog - in essence, relational statements
> about relvars! Since the catalog is a set of relvars like the others, yet
> describes those others, we now have true relational metadata, an interesting
> topic...

Agreed that the system catalog would be a good place to make some industry
gains.  For metadata such as "keywords" I'd think that employing those
nested relations might give SQL-DBMS's or RDBMS's a place to start getting
their operators shaped up for relation-valued attributes (or whatever one
wants to call them) -- just a thought.  --dawn
Eric Kaun - 21 May 2004 14:42 GMT
> <snip>
>  > Change management is the name given to the bag that holds
[quoted text clipped - 4 lines]
> databases or a vertical partitioning of the software development problem
> that isn't necessarily the best.

Yes, I think so - my preference is to push things that have to work together
from a single source, via a template-driven code generation approach (in
lieu of higher-level languages, though code generation amounts to the same).

> Your solution to place everything "in the
> database" (if I understand you correctly) is fine from my perspective, but
> not the one I lean toward naturally -- I'd prefer an OO language to a
> declarative one,

Why not both? Something like Tutorial D, which offers both OO capabilities
for domains, plus relations? In any even, we'll have to agree to disagree on
the value of declarative... I'll simply point to the explosion of Java
frameworks (Struts, Avalon, blah blah blah) and even language extensions
(e.g. AspectJ). The "config files" are in most cases declarative statements
of system structure, and in the case of aspects, are "cross-cutting"
concerns which indicate the failings of the OO language (and I'd argue the
OO paradigm itself). Languages like Lisp and Haskell don't require such
"cross-cutting" because the languages themselves support abstractions
orthogonal to objects. More than OO is needed, I think.

> particularly a vendor-specific declarative language that is
> difficult to convert from one db to another.

Couldn't agree more - a vendor-specific language is worthless. Witness even
the various proprietary "extensions" to SQL...

> Agreed that the system catalog would be a good place to make some industry
> gains.  For metadata such as "keywords" I'd think that employing those
> nested relations might give SQL-DBMS's or RDBMS's a place to start getting
> their operators shaped up for relation-valued attributes (or whatever one
> wants to call them) -- just a thought.  --dawn

Interesting - certainly relation-valued attributes would require decent
relational operations. Or at least one would think. Industry might always,
in its infinite wisdom, decide it knows better.

- erk
mountain man - 20 May 2004 03:11 GMT
> > There are 3 software environments:
> > E1 = Operating system and network os layer
[quoted text clipped - 16 lines]
> with data (whatever it means) and inferences about that data; that's its
> job.

The term management reflects the mandatory overview of all
components in the system, and their coordination.  As I have
outlined, I have constructed an arrangment whereby all of E3
has been subsumed in the form of stored procedures, within the
E2 environment.

The Relational Model and theory cannot distinguish this specific
arrangment from any other, because it disregards E3 (application
layer) because of its traditional frame of reference, which in
historical terms is understandable, but is not so important in
today's world.

This specific arrangement developed however is complete,
and requires no other support to function.  So you see, we
may have a large number of stored procedures which act
as E3 components, each written is SQL, each syntactically
as per Date's exemplary treatment, each relating precisely
and specifically to known data structures defined in the RM.

Yet the model can say nothing in its present state. This is an
absurd state of affairs for database systems managment.

> > The catch-cry "the RM is just as applicable to database
> > systems today, as it was in the early 1980's" should be
> > taken as an indication that something is wrong with it as
> > a pedagogic device for 2004.
>
> It's more than a pedagogic device, but it's at least that.

It is an incomplete device.

> > The reason for this is that E2 and E3 have changed  alot
> > since 1980, particularly E2, the RDBMS software. Due
[quoted text clipped - 8 lines]
> I don't have the book here. So yes, there is a distinction between "shared"
> and "app-specific" components, whether they're running in the DBMS or not.

Date has one diagram and a few words to say about the apps environment.
Sure, in my argument, as you note, the code is running in a different place.
But which place?  Inside the RDBMS software environment?

Date and the RM are not capable of uttering anything sensible about
this state of affairs.  The RM cannot address stored procedure object
data, end of story.

> But that doesn't alter the fact that the code "objects" (E2b) in the DBMS
> are different from the relational "objects" (E2a) in the DBMS. There's still
[quoted text clipped - 3 lines]
> I'm still curious what sorts of concepts (other than the vacuous term
> "management") you think the relational model should include for such things.

I understand the inter-dependencies, but you seem not to understand
the term management.  This term means the ability, or lack thereof, to
properly look after everything in that environment.

> > The boundaries between E2 and E3 are now probably
> > best described as fractal, whereas in the past they were
> > heavily demarked.
>
> They're hardly fractal, and I would say that the layering is more severe in
> the E2b category. E2a remains solidly relational.

Lookup the term fractal basisn boundary.

> > However in practice, it is a dynamic fluid element that
> > must be managed, with the assistance of, but also outside
[quoted text clipped - 10 lines]
> describes those others, we now have true relational metadata, an interesting
> topic...

The relational model was an ideal of Codd and the pioneers that
has been promulgated by Date et al.   It is at least 30 y/o and is
not consistent with technological reality.

It has a Godel-like incompleteness:
http://www.mountainman.com.au/software/history/relational_model_incomplete.htm

Pete Brown
Falls Creek
Oz
Eric Kaun - 21 May 2004 15:00 GMT
> > I disagree, although your use of the term "management" is questionable. E1
> > provides services for E2; E2 does an analogous thing for E3. E2 provides
[quoted text clipped - 4 lines]
> The term management reflects the mandatory overview of all
> components in the system, and their coordination.

The term "management" has at least the same "Goedel-like" incompleteness
that you refer to elsewhere, unless you really believe "overview" and
"coordination" are precise. My point is simply that I don't know which
aspects of "management" you're referring to - it has many definitions,
components, and dimensions. Be precise.

However, I do have your papers printed out, and will be reading them
shortly - so far I'm only judging by what you've written in these posts.

> As I have
> outlined, I have constructed an arrangment whereby all of E3
> has been subsumed in the form of stored procedures, within the
> E2 environment.

The fact that they're executing in E2 doesn't imply "subsumption." They are
still "objects" of a very different sort than those E2 traditionally
"manages." For instance, those stored procedured could be written in
arbitrary languages, and executed anywhere. It seems you see their value in
their genericity, rather than in where they happen to execute.

> The Relational Model and theory cannot distinguish this specific
> arrangment from any other, because it disregards E3 (application
> layer) because of its traditional frame of reference, which in
> historical terms is understandable, but is not so important in
> today's world.

So why does E1 disregard E2? Don't you think that's short-sighted and
incomplete? How would you rectify that? Shouldn't they all be subsumed into
E* ("*" as in transitive closure) ?

> This specific arrangement developed however is complete,

In a Goedelian sense? Doubtful.

> and requires no other support to function.

Not even E1?

> So you see, we
> may have a large number of stored procedures which act
> as E3 components, each written is SQL, each syntactically
> as per Date's exemplary treatment, each relating precisely
> and specifically to known data structures defined in the RM.

That's no different than any other E3 component, is it? Like a Java program
doing JDBC and issuing SQL Strings in exchange for ResultSets?

> Yet the model can say nothing in its present state. This is an
> absurd state of affairs for database systems managment.

So once E3 components are "subsumed" in E2, what more can be said about
them? Specifically, what special properties or powers or whatever do they
derived from executing in E2 rather than regarded as part of E3?

> Date has one diagram and a few words to say about the apps environment.
> Sure, in my argument, as you note, the code is running in a different place.
> But which place?  Inside the RDBMS software environment?

Could be - why does it matter? They're still code components, not relvars.
What difference does it make where they run? I agree with it, don't get me
wrong - but I think then you just happen to have E3 components running in
E2, which indicates to me that your operational levels are perhaps
orthogonal to the real issue of what "types" of components are being
"managed".

> Date and the RM are not capable of uttering anything sensible about
> this state of affairs.

What should they say? In contrast to their muteness, say something -
anything. I have no idea what you're looking to be said.

> The RM cannot address stored procedure object data, end of story.

So say something, even informally, that should be part of some "RM++"
theory. I have no idea what you're hinting at - this argument reminds me of
internal auditors at a former company, who could point out things done
incorrectly but were not allowed to say a word about what SHOULD be done
instead.

> > I'm still curious what sorts of concepts (other than the vacuous term
> > "management") you think the relational model should include for such
> things.
>
> I understand the inter-dependencies, but you seem not to understand
> the term management.

Perhaps, but you're not helping. If you understand it thoroughly, then your
words aren't helping the rest of us... granted that this isn't a
"management" class, but some clarification would help.

> This term means the ability, or lack thereof, to
> properly look after everything in that environment.

"Properly look after"? That's clarification?

> > > However in practice, it is a dynamic fluid element that
> > > must be managed, with the assistance of, but also outside
[quoted text clipped - 17 lines]
> has been promulgated by Date et al.   It is at least 30 y/o and is
> not consistent with technological reality.

> It has a Godel-like incompleteness:

http://www.mountainman.com.au/software/history/relational_model_incomplete.htm

And your theory is Goedel-complete? Doubtful. Let's agree to stop waving
Goedel and Occam about, and concentrate on specific areas of incompleteness
that matter in both theory and practice...

- erk
Todd B - 21 May 2004 18:52 GMT
> And your theory is Goedel-complete? Doubtful. Let's agree to stop waving
> Goedel and Occam about, and concentrate on specific areas of incompleteness
> that matter in both theory and practice...
>
> - erk

Well said.

In a way, however, Godel's theorem is pertinent because it touches on
the fact that a database, no matter what it's design is or underlaying
structure is, will 'definitely' not be able to answer every question
we want to ask it.  Not that I'm being doomsday about logic :)  I just
think there can be a source of frustration in being able to answer a
corporation's questions, and the culprit may not always be the
database choice or database design, but may be that the question is
simply unanswerable (although I have to admit, this has never actually
happened to me, so take me with a grain of salt).  It's something to
think about, though.

Todd
Paul - 21 May 2004 20:08 GMT
> In a way, however, Godel's theorem is pertinent because it touches on
> the fact that a database, no matter what its design is or underlaying
> structure is, will 'definitely' not be able to answer every question
> we want to ask it.

Are you certain this is true?

As I understand it:
1) Godel's Incompleteness theorem only applies to system that are
powerful enough to model arithmetic.
2) It's impossible to model arithmetic using only first-order logic.
3) Relational theory (which basically *is* first-order logic) is
actually both complete and consistent.

I'm not a professional logician though, and I know Godel's results are
very open to misinterpretation, so I could well be wrong. I guess it
depends on the exact definitions of "model", "theory", "system", "logic"
etc., and what exactly we mean by "complete" and "consistent".

Also, does it actually matter? Because for example suppose I'm right and
relational theory is complete, there are still questions like the
transitive closure which can't be answered. That's because these
questions can't even be written down in first order logic so they are
meaningless within the system (so the system is still complete). But
they are meaningful in a "real-world" sense, because we are thinking in
a larger system which includes second-order logic.

I suppose at least we would know that in theory, every query that it is
possible to formulate in some given relational query language can be
answered.

Paul.
Todd B - 23 May 2004 23:22 GMT
> > In a way, however, Godel's theorem is pertinent because it touches on
> > the fact that a database, no matter what its design is or underlaying
[quoted text clipped - 9 lines]
> 3) Relational theory (which basically *is* first-order logic) is
> actually both complete and consistent.

To be honest, I don't know.  I'll do some reading and certainly
revisit this topic in this group (regardless of whether it bothers the
other readers or not) after some good research.

> Also, does it actually matter? Because for example suppose I'm right and
> relational theory is complete, there are still questions like the
[quoted text clipped - 3 lines]
> they are meaningful in a "real-world" sense, because we are thinking in
> a larger system which includes second-order logic.

Good point.

> I suppose at least we would know that in theory, every query that it is
> possible to formulate in some given relational query language can be
> answered.

Can you give me an example of where there is proof of first order
logic being complete?  Keep in mind I'm sticking to the definition of
complete as 'things that we prove true within the system are also true
in the reality which we use the system to describe'.  Is first order
logic 'consistent'?  Well, of course it is; it's kind of a
requirement.  Is it 'complete', though?  I don't think so, but please
prove me wrong or point me to some articles that do.

So, in summary, this last thing you say about every query being
answerable is, IMO, 'incompletely' untrue :)

Perhaps there is a query that one could conjure in their head, but
would be impossible to write down absolutely?  I hate to do this, but
I'm going to drop back into a classic example of unproveability
because I'm lazy.  Prove to me, without brute force methods, that the
number 2481997 is prime.  (Don't try, because it's not).  The point
is, for me anyway, is that - okay maybe from a more optimistic
perspective - we have the ability to come up with questions, that any
logic system will fall short in answering.

Todd
Bill H - 24 May 2004 17:09 GMT
Todd:

Does this pass the "reasonableness" test?  The thought that: ...there are
questions that can't be answered so they're meaningless and, thus, ignored
(so the system is still complete) doesn't say much for consistency (i.e.
anything that shows inconsistency is ignored so we still have consistency).

With postulates like these, I'm depressed about getting A's in college logic
and statistics classes, as they were obviously worthless.  :-)

Bill

"Todd B" <toddkennethbenson@yahoo.com> wrote in message
[snipped]

> > Also, does it actually matter? Because for example suppose I'm right and
> > relational theory is complete, there are still questions like the
[quoted text clipped - 3 lines]
> > they are meaningful in a "real-world" sense, because we are thinking in
> > a larger system which includes second-order logic.
Paul - 24 May 2004 19:02 GMT
> Does this pass the "reasonableness" test?  The thought that: ...there are
> questions that can't be answered so they're meaningless and, thus, ignored
> (so the system is still complete) doesn't say much for consistency (i.e.
> anything that shows inconsistency is ignored so we still have consistency).

The point is that these questions can't even be *asked* in the system.

The system can still be internally complete and consistent though.

To talk about statements in a language we always need a meta-language.
It can be that questions can be posed in the meta-language that can't in
the language itself.

Suppose you have a "theory", e.g. field theory, with its various axioms.
Then you can have various "models" that are kind of examples of this
theory, for example the real numbers, complex numbers, etc.

Now what the Completeness Theorem says is that is something is true in
every model of a given theory, it will be possible to prove it in the
theory itself (using first order logic). So in other words if you start
from your axioms and apply first order logic to them, it's possible for
you to extract every possible true statement of your theory.

So I guess the applicability of databases here is that your relations
are the axioms of your "theory". Your real-world interpretations of
those relations are your "models" of the theory. And the Completeness
Theorem assures you that everything you expect to be true in the real
world will in fact be provable by the DBMS.

For example suppose I have a database containing the tuple:
(1, 2, 'blue')

There could be many interpretations of what this means: for example
"customer #1 bought 2 blue widgets"
"on day 1 of the study we saw 2 blue cars"

But in any of these models if I look for distinct values of the third
argument of my predicate (i.e. project on the "third" column) I expect
to get the answer ('blue').

So my query language (which is really just first order logic) is
guaranteed to give the right answer when I do:
"SELECT DISTINCT colour FROM r"

And this same argument holds even for more complicated queries.

The interesting thing is that if you go up into second-order logic,
there is no corresponding completeness theorem. So you may either have
things that are true in all interpretations of a theory but you can't
prove them in the theory itself, or you'd have things you can prove in
the theory but which aren't true in some model of the theory.

So maybe Codd was wise to stick with first-order logic!

Paul.
Anthony W. Youngman - 25 May 2004 18:12 GMT
>Suppose you have a "theory", e.g. field theory, with its various
>axioms. Then you can have various "models" that are kind of examples of
[quoted text clipped - 11 lines]
>Theorem assures you that everything you expect to be true in the real
>world will in fact be provable by the DBMS.

And if they turn out to be false in the real world and provable in the
DBMS, then the DBMS theory is wrong ... (or the DBMS predicts something
is false when it turns out to be true ...)

Or if you can't prove it in the DBMS, then the theory is incomplete ...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Paul - 26 May 2004 10:51 GMT
>> So I guess the applicability of databases here is that your relations
>> are the axioms of your "theory". Your real-world interpretations of
[quoted text clipped - 5 lines]
> DBMS, then the DBMS theory is wrong ... (or the DBMS predicts something
> is false when it turns out to be true ...)

Well, the Completeness Theorem has a converse called the Soundness
Theorem (http://en.wikipedia.org/wiki/Soundness_theorem), which assures
us that first order logic is consistent. i.e. everything that you can
prove in the DBMS is true in real life. This was known long before the
Completeness Theorem I think, and is easier to prove.

> Or if you can't prove it in the DBMS, then the theory is incomplete ...

The Completeness Theorem proves the "complete" part. i.e. everything
that is true in all models or interpretations of the database will be
provable by the DBMS.

Note that Godel's Incompleteness Theorem is something slightly
different. That's really talking about the completeness of theories that
just happen to be manipulated with first order logic. The Completeness
Theorem is talking about the completeness of first-order logic itself.
So in the first instance you could say first order logic is being a
meta-language, but in the second instance it is just being a language.

Paul.
x - 26 May 2004 13:43 GMT
> >> So I guess the applicability of databases here is that your relations
> >> are the axioms of your "theory". Your real-world interpretations of
[quoted text clipped - 17 lines]
> that is true in all models or interpretations of the database will be
> provable by the DBMS.

Is something that is true in only one model  provable by the DBMS ?
What this "all models" thing has to do with databases ?
Just one model wouldn't be enough ?
Paul - 26 May 2004 14:43 GMT
>>The Completeness Theorem proves the "complete" part. i.e. everything
>>that is true in all models or interpretations of the database will be
[quoted text clipped - 3 lines]
> What this "all models" thing has to do with databases ?
> Just one model wouldn't be enough ?

No, it'd have to be all models, because the DBMS can only prove things
that are true under all circumstances, or in the most general case.

Suppose for example I have the following tuples in a relation:

('Alan', 'Bill')
('Bill', 'Chas')

Now in one model, this might mean:
Alan is an ancestor of Bill.
Bill is an ancestor of Chas.

So in this model, the tuple ('Alan', 'Chas') could also be legitimately
added to this relation. i.e the proposition 'Alan is an ancestor of
Chas' is true.

Similarly if it means "is a brother of'.

But consider the model where it means:
Alan is a friend of Bill.
Bill is a friend of Chas.

Then it doesn't follow that Alan is a friend of Chas. It could easily be
that Alan hates Chas.

So the DBMS shouldn't be able to prove that ('Alan', 'Chas') is a
legitimate tuple for that relation, because the DBMS has no idea what
model is being used to interpret the database. And there's no way it
could have an idea either.

I guess what it is really saying is that the model is larger than the
theory, in the sense that it has concepts external to the theory. The
theory can only prove things that are common to all models based on the
theory (and the Completeness Theorem says it can *always* do this).

I'm not an expert though, so it's quite possible I've either
misunderstood the theorem or misapplied it - please correct me if you
think this is the case.

Paul.
Anthony W. Youngman - 26 May 2004 23:47 GMT
>>> So I guess the applicability of databases here is that your
>>>relations are the axioms of your "theory". Your real-world
[quoted text clipped - 10 lines]
>prove in the DBMS is true in real life. This was known long before the
>Completeness Theorem I think, and is easier to prove.

So if you use Newtonian Mechanics to prove where Mercury was 400 years
ago, your proof is more accurate than Tycho Brahe's observations - which
place it somewhere else?

You are making exactly the mistake that made me start this thread - you
are assuming that the DBMS *defines* reality, rather than carrying out
experiments to show that the DBMS accurately *describes* reality.

What you should have said is "IF the dbms is an accurate model of real
life then ...". Which is basically what I said - if the dbms and real
life disagree then the dbms model must be wrong. You seem to be saying
that it's reality that's wrong ...

The problem I have is that the mathematicians seem to have taken C&D's
idea of "data" and built this wonderful theory on top of it.
Unfortunately, what they have not done is to define "data" in real-world
terms (rather than mathematical), and as such there is no way we can go
from a "proof within the model" to a formal description of the reality
that that proof represents. So you can come up with all the proofs you
like within the dbms, but you cannot show that the equivalent real-life
scenario is true because you cannot describe that scenario accurately.
So by definition the theory is unscientific because you cannot show that
the dbms proof is true (or false) in real life.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Paul - 27 May 2004 11:16 GMT
> So if you use Newtonian Mechanics to prove where Mercury was 400 years
> ago, your proof is more accurate than Tycho Brahe's observations - which
> place it somewhere else?

The proof will still be 100% accurate.
Newtonian Dynamics assumes certain axioms, which we now know to be
slightly wrong. The first-order logic is still perfectly accurate; it's
just your starting assumptions have changed.

> You are making exactly the mistake that made me start this thread - you
> are assuming that the DBMS *defines* reality, rather than carrying out
[quoted text clipped - 4 lines]
> life disagree then the dbms model must be wrong. You seem to be saying
> that it's reality that's wrong ...

I'm just talking about the system of logic that enables us to talk about
our database (our "theory" if you like). Whether our theory has axioms
that correspond to the real world, or whether our interpretation (or
"model") of our theory is accurate, is a totally different question.

> The problem I have is that the mathematicians seem to have taken C&D's
> idea of "data" and built this wonderful theory on top of it.
[quoted text clipped - 4 lines]
> like within the dbms, but you cannot show that the equivalent real-life
> scenario is true because you cannot describe that scenario accurately.

What I'm saying isn't really relying on DBMSs at all, it's just pure
logic. A DBMS is just an example of a system that uses it. We have
several layers:

1. First-order logic itself (our meta-language)
2. Our theory (all the relations and tuples in the database, our axioms)
3. Our model (how we interpret our theory in the real world)

All I'm saying is that we know that part 1 is guaranteed to be complete
and consistent. Parts 2 & 3 can be totally wrong, which is when your
database will give answers that diverge from reality.

> So by definition the theory is unscientific because you cannot show that
> the dbms proof is true (or false) in real life.

Given that your axioms and your interpretation are correct, then I think
you can show the DBMS proof is true in real life (for the reasons given
above and in previous posts).

I know that the language used by logicians can seem very inpenetrable
but I think it does actually make sense; it's not just a conspiracy of
people talking gibberish and pretending to understand each other.

I don't know how much you've read about logic but it is very
mathematical and well worth the steep learning curve. Wikipedia is a
good place to start. Be warned though: logicians to have a tendency to
go insane in later life; it is a serious brainfuck if you think about it
too much!

Paul.
Dawn M. Wolthuis - 27 May 2004 12:11 GMT
> > So if you use Newtonian Mechanics to prove where Mercury was 400 years
> > ago, your proof is more accurate than Tycho Brahe's observations - which
[quoted text clipped - 3 lines]
> Newtonian Dynamics assumes certain axioms, which we now know to be
> slightly wrong.

If talking about mathematical axioms, they are not right or wrong -- they
just are.  It is the use of those axioms in some setting or another that
could be inappropriate, not useful, or lead one to draw incorrect
conclusions due to applying a poor mathematical analogy (metaphor) to the
situation.

> The first-order logic is still perfectly accurate; it's
> just your starting assumptions have changed.

So the mathematics is right, but the science is wrong -- and I think that is
a major point of this thread.

> > You are making exactly the mistake that made me start this thread - you
> > are assuming that the DBMS *defines* reality, rather than carrying out
[quoted text clipped - 9 lines]
> that correspond to the real world, or whether our interpretation (or
> "model") of our theory is accurate, is a totally different question.

Exactly -- so I think you and Wol (and I) are in agreement on that.  It is
why whenever anyone suggests that the best way to set up a databases is by
employing relational theory BECAUSE relational theory is based on
mathematics, I laugh (then cry).  I have an appreciation of what mathematics
is and what it isn't.  How do we determine whether a mathematical model is a
good metaphor for what we are doing?  We have to step outside of mathematics
to do that.  So, the proof that various aspects of relational theory have
been good for use with DBMS's is not within mathematics.

> > The problem I have is that the mathematicians seem to have taken C&D's
> > idea of "data" and built this wonderful theory on top of it.
[quoted text clipped - 16 lines]
> and consistent. Parts 2 & 3 can be totally wrong, which is when your
> database will give answers that diverge from reality.

Additionally, the metaphor we choose might limit us so that what we say is
true, but not the whole story.  And another possibility is that our metaphor
is useful and provides accurate answers, but does so in a clumsy fashion so
as to cost more than it needs to. The cost of one metaphor might be higher
than another because the human brain or people in a particular culture might
find one metaphor easier to grasp.  If I tell a person on the street that I
have data in a relation (using a mathematical metaphor), that might not be
as good as telling them I have data in a folder (a non-mathematical
metaphor), for example.

[Slight digression: If we could the 1st-order predicate logic behind the
"folder" metaphor (ah ha -- how 'bout a function?) we could make some
progress perhaps?]

> > So by definition the theory is unscientific because you cannot show that
> > the dbms proof is true (or false) in real life.
>
> Given that your axioms and your interpretation are correct, then I think
> you can show the DBMS proof is true in real life (for the reasons given
> above and in previous posts).

And how do you show that your interpretation is correct -- by not showing it
to be incorrect, by showing many cases where it is correct?  I think that is
central to this discussion.  I'm about to read the book someone mentioned,
"Data and Reality," and perhaps that will shed some more light on that
question.

Summarizing -- three questions:
1) (How) can we prove that our mathematical model (e.g. relational theory)
aligns with what we are applying it to (e.g. databases)?  I think we can
only disprove it or fail to disprove it.

2) Are we missing some important aspects of databases (e.g. mountain man's
concerns) if we limit ourselves to a single mathematical metaphor (e.g. to
what relational theory can tell us, or can tell us today)?

3) Are we applying the best, most effective, most efficient, etc metaphor or
is there something better to either supplement or replace it?

--dawn
<snip>
Paul - 28 May 2004 15:36 GMT
>> Newtonian Dynamics assumes certain axioms, which we now know to be
>> slightly wrong.
[quoted text clipped - 4 lines]
>  incorrect conclusions due to applying a poor mathematical analogy
> (metaphor) to the situation.

Well, OK, when I say the axioms are wrong I mean that the axioms don't
quite give a theory on which we can base an accurate model of reality.
(Though they may be good enough for an approximate model of reality).

> So the mathematics is right, but the science is wrong -- and I think
>  that is a major point of this thread.

My point is that the DBMS is only concerned the mathematical part, and
theory proves that it does it perfectly. The science part is beyond the
scope of the DBMS - making sure that is OK is up to the database users.

>> I'm just talking about the system of logic that enables us to talk
>>  about our database (our "theory" if you like). Whether our theory
[quoted text clipped - 6 lines]
> databases is by employing relational theory BECAUSE relational theory
>  is based on mathematics, I laugh (then cry).

Why? This seems like a reasonable statement. Suppose for example we
based our DBMS on second-order logic. Then theory tells us we will have
incompleteness (ignoring the fact that databases are finite!). So this
would tell us that the mathematical part of the DBMS is on shaky ground.
As it happens that DBMSs use first-order logic, we know it is rock-solid
because of Godel's Completeness Theorem. That seems very reassuring to
me. Maybe this point seems so obvious that people just take it for
granted - they don't even realise that there is something to be proved
in the first place.

Now it may well be that the "multivalue" database model also just uses
first-order logic presented in a slightly obfuscated way, in which case
you'd have the peace of mind for that as well.

> I have an appreciation of what mathematics is and what it isn't.  How
> do we determine whether a mathematical model is a good metaphor for
> what we are doing?  We have to step outside of mathematics to do
> that.  So, the proof that various aspects of relational theory have
> been good for use with DBMS's is not within mathematics.

The proof of the usefulness of the mathemtical part of DBMSs is
definitely within mathematics. But as you say, deciding whether your
model is a good metaphor for linking your database to reality is beyond
the scope of both DBMSs and mathematics.

> [Slight digression: If we could the 1st-order predicate logic behind
>  the "folder" metaphor (ah ha -- how 'bout a function?) we could make
>  some progress perhaps?]

I think the problem here is that if you want trees you can't do it with
first-order logic.

>> Given that your axioms and your interpretation are correct, then I
>>  think you can show the DBMS proof is true in real life (for the
[quoted text clipped - 5 lines]
> read the book someone mentioned, "Data and Reality," and perhaps that
>  will shed some more light on that question.

You can't; it's impossible. To show that your interpretation is correct
we move away from mathematics into science. And in science you can never
prove something, only disprove it. You just hypothesize that something
is true and try to find a counterexample to show you were wrong.

> Summarizing -- three questions: 1) (How) can we prove that our
> mathematical model (e.g. relational theory) aligns with what we are
> applying it to (e.g. databases)?  I think we can only disprove it or
>  fail to disprove it.

Well we kind of go right to the very basis of everything: logic is by
definition what we think of as truth, so it applies to everything. If p
is true and q is true, then so is "p and q" true. We could build a DBMS
around a logic where this isn't the case, but I don't think it would be
very helpful! Alternatively we can go upwards to a more complex logic,
but theory tells us this could cause incompleteness problems.

> 2) Are we missing some important aspects of databases (e.g. mountain
>  man's concerns) if we limit ourselves to a single mathematical
> metaphor (e.g. to what relational theory can tell us, or can tell us
>  today)?

I'm not quite sure what mountain man's point is. Is it that we should
store things like constraints, view definitions, etc. in a relational
format rather than as strings in some query language? I can see the
appeal of this idea but I think how we store statements in our
"meta-language" doesn't change the fact that our actual data is stored
in relations. Or is it that we could store things like form layouts and
application flow logic in tables - if so then I don't think this is a
totally new idea, though maybe an interesting one to explore. MS Access
had something like this built-in I think, created by a form wizard -
"table-driven forms".

Either way I think this is orthogonal (excuse the buzzword!) to the
central idea of relational database theory: to base things as closely as
possible on first-order predicate logic.

> 3) Are we applying the best, most effective, most efficient, etc
> metaphor or is there something better to either supplement or replace
>  it?

I think we are. I think the insight that Codd had was to start with
logic and build upwards from there, instead of putting together an
ad-hoc data model first and then trying to reconcile it downwards to logic.

I think the only ways we could go would be to different logics e.g.
multi-valued logic or "fuzzy" logic etc. I don't claim to know what
these all are but a search should bring up various weird and wonderful
logics.

Or upwards to higher-order logic, although I don't know if
incompleteness becomes an issue then. Maybe because we are always
dealing with unbounded but finite systems it doesn't apply or something.
I think if you go this route you end up with things like Datalog or Prolog.

Paul.
Anthony W. Youngman - 28 May 2004 19:27 GMT
>>> Newtonian Dynamics assumes certain axioms, which we now know to be
>>>slightly wrong.
[quoted text clipped - 14 lines]
>theory proves that it does it perfectly. The science part is beyond the
>scope of the DBMS - making sure that is OK is up to the database users.

So, what you're saying, basically, is every time the users have a
problem "it's an implementation thing", despite the fact that it may
well be down to a screwy axiom? What you're saying is that you couldn't
care less whether your axioms are correct or not?

Or, to put it more bluntly, you don't know whether your model correctly
models the real world, and you don't care whether your model correctly
models the real world, yet you loudly trumpet that it is the only model
that can model the real world ... excuse me while I puke ...

Surely you owe it to your users to at least try and make sure the
foundations of your theory are securely anchored in the real world,
rather than building castles in the air and then blaming users when
their real applications built on those castles come tumbling to earth in
an awful heap.

Sorry, I don't want this to come over as nasty, but that last paragraph
of yours that I quoted is basically just abdicating any responsibility
on the part of mathematicians as to whether the theory is useful in any
shape or form whatsoever; and worse, blames the users for incompetence
if they can't get it to work.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony - 29 May 2004 18:15 GMT
> >>> Newtonian Dynamics assumes certain axioms, which we now know to be
> >>>slightly wrong.
[quoted text clipped - 39 lines]
> Cheers,
> Wol

You just don't get it, do you Wol?  No matter how many times people
try to explain it to you it just doesn't sink in.  The relational
model is NOT a model of "the real world" and it therefore doesn't have
to correspond to the real world.  It is a model of data, which is an
abstract concept.

Now, when someone uses the relational model to build a database
corresponding to some real world thing, say a payroll system, then it
is up to the database designer (not the relational model) to ensure
that what he builds corresponds to the reality he is building it for.

To go back to your favourite analogy (apologies everyone), it is like
saying that algebra was responsible for the shortcomings of Newton's
model of planetary motion.  But it wasn't the algebra that "got it
wrong", it was Newton's application of it.  The relational model
corresponds to algebra in this analogy, not to Newton's model of the
solar system - that corresponds to a specific database design.
Einstein didn't invent a better algebra, he designed a better model
using the SAME algebra - like a later designer designing a better
payroll database, but still using the same RDBMS.
Dawn M. Wolthuis - 29 May 2004 20:25 GMT
> > >>> Newtonian Dynamics assumes certain axioms, which we now know to be
> > >>>slightly wrong.
[quoted text clipped - 45 lines]
> to correspond to the real world.  It is a model of data, which is an
> abstract concept.

and I just responded to Alfredo who said that data were facts and I thought
for sure the idea was that these facts corresponded to reality.

I don't have a problem with the mathematical theory termed "relational
theory" except when the words used are those used in set theory and the
definitions are different ;-)

If there is a tight mathematical definition of "data" within relational
theory, then that's great, but it is not the commonly used definition, I
suspect.  It is in the leap from doing relational theory to thinking that
the application of such theory is the best approach to storing/retrieving
propositions using computers by a business -- that is where there is a
rather significant leap of faith.  That connection is NOT science, although
we could conceivably set up some experiments to collect a bit more
information about whether it is better than some other approach.  I'm not
opposed to faith, but we need to call it what it is.  There is mathematical
relational theory and then a leap of faith in the use of relational theory
for anything.

> Now, when someone uses the relational model to build a database
> corresponding to some real world thing, say a payroll system, then it
> is up to the database designer (not the relational model) to ensure
> that what he builds corresponds to the reality he is building it for.

And perhaps that person opts out of using (at least all of) relational
theory and that's fine, right?

> To go back to your favourite analogy (apologies everyone), it is like
> saying that algebra was responsible for the shortcomings of Newton's
> model of planetary motion.  But it wasn't the algebra that "got it
> wrong", it was Newton's application of it.

Agreed!

> The relational model
> corresponds to algebra in this analogy,

YES!

> not to Newton's model of the
> solar system - that corresponds to a specific database design.

wrong -- that corresponds to the use of relational theory at all while
working with computers.  It is not the specific implementation only that
could be wrong -- it is the use of this theory AT ALL related to "data
processing" that COULD BE wrong (I don't think it is entirely irrelevant,
but there is nothing that proves its relevance except where "the proof is in
the pudding" -- scientific observation, for example).

> Einstein didn't invent a better algebra, he designed a better model
> using the SAME algebra - like a later designer designing a better
> payroll database, but still using the same RDBMS.

No, like a later database theorist designing a graphical theory or a
functional theory that is better than the relational theory before it.

smiles.  --dawn
Gene Wirchenko - 30 May 2004 05:31 GMT
[snip]

>If there is a tight mathematical definition of "data" within relational
>theory, then that's great, but it is not the commonly used definition, I
[quoted text clipped - 7 lines]
>relational theory and then a leap of faith in the use of relational theory
>for anything.

    It is an even bigger leap of faith to operate without a theory
underlying what you are doing.

>> Now, when someone uses the relational model to build a database
>> corresponding to some real world thing, say a payroll system, then it
[quoted text clipped - 3 lines]
>And perhaps that person opts out of using (at least all of) relational
>theory and that's fine, right?

    What is the replacing theory?  Is it better or worse?  How do you
know?  Is the consideration of better/worse a leap of faith?  If not,
why not?

[snip]

sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Anthony W. Youngman - 01 Jun 2004 23:34 GMT
>[snip]
>
[quoted text clipped - 24 lines]
>know?  Is the consideration of better/worse a leap of faith?  If not,
>why not?

I was thinking of replying to Tony, but I think I can answer here.

And no, Tony, Einstein did NOT "build a better model" using the same
algebra. What he DID do was realise that Newton's fundamental axioms
were wrong. He redefined the metaphysical interface between reality and
the model.

And the problem I have is that I cannot see any metaphysical interface
between reality and relational theory. This is basically Dawn's point
about "is relational theory even the right theory to use?".

As for Gene, I agree we need a theory, and actually, I think relational
theory is a great theory. Unfortunately it is a theory about a - call it
abstract, call it imaginary, they're the same thing - concept called
"data" that does not seem to have any basis in the real world.

So what do I think should replace it? Nothing actually, we can just
improve it. BUT IN DOING SO, IT WILL BE TRANSFORMED BEYOND RECOGNITION
:-)

Go back to my analogies :-) In hindsight, we just can't understand why
the Church couldn't see that Copernicus' theory that the planets orbit
the sun didn't make sense. Except that *WE* have got Copernicus' theory
wrong. He thought that the planets *circled* the sun. And as a result
his theory was just as much as mess (if not more) than that of the
Church who said the planets and sun orbited the earth. I think *that* is
the current state of database theory.

What we NEED is a "theory of business analysis" - a formal theory that
tells analysts how to analyse the real world. And I'm pretty damn
confident that you can NOT create a theory that will do a reversible
mapping between the real world and relational data.

This theory will then be the equivalent of Kepler and Newton discovering
ellipses and calculus, or of Einstein realising that mass and energy
were interchangeable. Basically, pretty much ALL of relational theory's
axioms are taken as given by the mathematicians, and no thought is given
as to whether they actually match the real world.

To give you a simple example, the business analyst analyses an invoice,
and you design the database to store the data. Can you then ask the
DATABASE to give you the invoice data back? Certainly with current
relational databases accessed with SQL, you're relying on either an
application programmed OVER the database, or a view which gives you
multiple copies of data of which the original only had one.

Yes I know people are likely to say that "SQL is not genuine
relational", but you're still relying on a view - even a valid
relational one - or an application.

If we can't go - using formal theory - from the database back through
the analysis to get back to the real world we started from, then we have
no idea if our axioms are correct, and as Dawn says, we have no idea if
relational theory is the correct theory to solve real world problems.

And as I said before, it we have no idea if it's the correct theory, why
are we using it? Dawn was going on about faith. Do you have faith in
business analysts to get the analysis correct, or would you rather have
a formal, REVERSIBLE and PROVABLE (or testable, falsifiable, scientific,
whatever term you want to use) logical theory to do it for you?

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Gene Wirchenko - 02 Jun 2004 01:42 GMT
[snip]

>As for Gene, I agree we need a theory, and actually, I think relational
>theory is a great theory. Unfortunately it is a theory about a - call it
>abstract, call it imaginary, they're the same thing - concept called
>"data" that does not seem to have any basis in the real world.

    That is not surprising since data is abstract.

>So what do I think should replace it? Nothing actually, we can just
>improve it. BUT IN DOING SO, IT WILL BE TRANSFORMED BEYOND RECOGNITION
>:-)

    I do not think so.  See further.

>Go back to my analogies :-) In hindsight, we just can't understand why
>the Church couldn't see that Copernicus' theory that the planets orbit
[quoted text clipped - 3 lines]
>Church who said the planets and sun orbited the earth. I think *that* is
>the current state of database theory.

    No, the mess was smaller.  The new theory was a better theory.

    Newton's is pretty good and will work for everyday situations
fine.  Einstein's refines Newton's to cover yet more cases.

    The world is nearly flat.  The variation from that is a small
fraction of an inch per mile.  If you are dividing your backyard into
plots for gardening, you are safe assuming that the world is flat.
When you hit the big time, a different theory is needed.  Before then,
it is more complicated than you need.

[snip]

>If we can't go - using formal theory - from the database back through
>the analysis to get back to the real world we started from, then we have
>no idea if our axioms are correct, and as Dawn says, we have no idea if
>relational theory is the correct theory to solve real world problems.

    There is meaning that the DBMS understands (for example, FK and
RI), and there is meaning that the user understands (and the DBMS does
not) such as what a location is.

    A database models relevant portions of the Real World.  What does
relevant mean?  Of interest to someone.

>And as I said before, it we have no idea if it's the correct theory, why
>are we using it? Dawn was going on about faith. Do you have faith in

    It is the closest that we know of.

>business analysts to get the analysis correct, or would you rather have
>a formal, REVERSIBLE and PROVABLE (or testable, falsifiable, scientific,
>whatever term you want to use) logical theory to do it for you?

    I would rather have the theory, but in its absence, I will use
what I have.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Anthony W. Youngman - 04 Jun 2004 00:18 GMT
>[snip]
>
[quoted text clipped - 4 lines]
>
>     That is not surprising since data is abstract.

Well, is "mass" abstract? Or "energy"?

No they are not. They have formal mathematical definitions within
Newtonian Mechanics or relativity, but they also have clear metaphysical
descriptions within reality.

As far as I can tell, "relational data" does not have that metaphysical
description.

>>So what do I think should replace it? Nothing actually, we can just
>>improve it. BUT IN DOING SO, IT WILL BE TRANSFORMED BEYOND RECOGNITION
[quoted text clipped - 11 lines]
>
>     No, the mess was smaller.  The new theory was a better theory.

Basically, Kepler corrected Copernicus' axiom that "orbit == circle"

>     Newton's is pretty good and will work for everyday situations
>fine.  Einstein's refines Newton's to cover yet more cases.

More improvements here :-) The mathematical definition is steadily
getting closer to the metaphysical reality ...

>     The world is nearly flat.  The variation from that is a small
>fraction of an inch per mile.  If you are dividing your backyard into
>plots for gardening, you are safe assuming that the world is flat.
>When you hit the big time, a different theory is needed.  Before then,
>it is more complicated than you need.

But it doesn't make it correct ...

>[snip]
>
[quoted text clipped - 14 lines]
>
>     It is the closest that we know of.

It is the closest that YOU know of.

>>business analysts to get the analysis correct, or would you rather have
>>a formal, REVERSIBLE and PROVABLE (or testable, falsifiable, scientific,
>>whatever term you want to use) logical theory to do it for you?
>
>     I would rather have the theory, but in its absence, I will use
>what I have.

Great. So why aren't you prepared to question the accuracy of the axiom
that "data comes in tuples".

Yes, relational data DOES come in tuples - because that's what the
definition says.

But if you can't come up with some formal way of converting between
"real-world-data" and "relational tuples", then surely you have to come
to the conclusion (which my and Dawn's EXPERIENCE has forced us to) that
your tuple is equivalent to a Copernican circle - it may be close to
reality but there's something seriously wrong somewhere that needs
correcting - and it CAN'T be done WITHIN the theory, because the fault
lies in the theory-to-reality map.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Gene Wirchenko - 04 Jun 2004 01:40 GMT
>>[snip]
>>
[quoted text clipped - 6 lines]
>
>Well, is "mass" abstract? Or "energy"?

    No, but data is.

[snip]

>>     No, the mess was smaller.  The new theory was a better theory.
>
>Basically, Kepler corrected Copernicus' axiom that "orbit == circle"

    Yup.

>>     Newton's is pretty good and will work for everyday situations
>>fine.  Einstein's refines Newton's to cover yet more cases.
[quoted text clipped - 9 lines]
>
>But it doesn't make it correct ...

    It makes it correct *enough* for the simple case.  And, of
course, using the simpler form does introduce the possibility of
scaling problems later.

    Einstein's might not be the ultimate either, but that is not
going to stop people from dividing up their backyards into plots for
gardening.

[snip]

>>     A database models relevant portions of the Real World.  What does
>>relevant mean?  Of interest to someone.
[quoted text clipped - 5 lines]
>
>It is the closest that YOU know of.

    Produce your theory, please, in comparable rigourousness to
Codd's.

>>>business analysts to get the analysis correct, or would you rather have
>>>a formal, REVERSIBLE and PROVABLE (or testable, falsifiable, scientific,
[quoted text clipped - 8 lines]
>Yes, relational data DOES come in tuples - because that's what the
>definition says.

    You have just answered your question for me.

>But if you can't come up with some formal way of converting between
>"real-world-data" and "relational tuples", then surely you have to come
[quoted text clipped - 3 lines]
>correcting - and it CAN'T be done WITHIN the theory, because the fault
>lies in the theory-to-reality map.

    So maybe we need a second theory that deals with something that
Codd's does not.  In the meantime, I will not throw out the baby with
the bathwater, and people will keep dividing up backyards.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Eric Kaun - 04 Jun 2004 15:18 GMT
> [SNIP]
> But if you can't come up with some formal way of converting between
[quoted text clipped - 4 lines]
> correcting - and it CAN'T be done WITHIN the theory, because the fault
> lies in the theory-to-reality map.

True, but I have yet to hear a better proposal. When it comes to modeling
information, I suspect there will always be a gap. Relational advocates
favor being able to derive truths from other truths, acknowledging of course
that the internal predicates must be defined relative to an external one,
and that that's a human effort which can always go awry. You and Dawn, as
best I can understand, place more value on reproduction of the original
inputs. I suspect there are simply different expectations; I'd rather
stretch the computer to avoid stretching humans in ways they're not good at
(e.g. repetitive symbolic manipulation).

- erk
Bill H - 05 Jun 2004 17:27 GMT
erk:

Several notes below.

> > [SNIP]
> > But if you can't come up with some formal way of converting between
[quoted text clipped - 6 lines]
>
> True, but I have yet to hear a better proposal.

I've noticed that many people aren't interested in a better proposal, or
even a different proposal.  Dogma rules.  :-)

The main reason others use different data models is that they allow a much
closer interaction between the language of dbms and applications and the
environment they're designed to operate in (mostly the business community).
Because of this, the cost of development, maintenance, and administration is
significantly lower than those models having additional expertise and
liaison requirements.

Now, this advantage may not be what you are looking for.  It may not be, for
that matter, what the CIO of a large company is looking for.  However, in
the world of small to meduim sized businesses (SMBs) this cost advantage
means something.

I might also note there are numerous features of an RDBMS that may or may
not be available in competing data models, so any analysis will have to take
this into account.

> When it comes to modeling
> information, I suspect there will always be a gap. Relational advocates
[quoted text clipped - 3 lines]
> best I can understand, place more value on reproduction of the original
> inputs.

I can't speak for Anthony and Dawn, but I place more value not on the
original inputs but the original concept.  An invoice _is_ something that
usually has multiple items ordered.  It is an object in and of itself that
needs no "chopping up", so to speak.

This is where simpler means don't destroy the properties of the invoice in
order to make the data fit into an arbitrary data model with tautological
axioms and theorems.  Keep the business objects as close to what they are.
A data model that can do this has many advantages.

> I suspect there are simply different expectations; I'd rather
> stretch the computer to avoid stretching humans in ways they're not good at
> (e.g. repetitive symbolic manipulation).

I think you right here.  I've been in business for many years.  I would like
development to be easy for me.  We can watch the pendulum swinging towards
making software development easier for those of us using the software.
.NET, for better or worse, is attempting to make development easier (if it
wasn't for the bizarre data typing and variable scoping it would be a lot
easier).  Hopefully dbms theory will contribute to this too.

Bill
Eric Kaun - 07 Jun 2004 19:18 GMT
> [SNIP]
> I've noticed that many people aren't interested in a better proposal, or
> even a different proposal.  Dogma rules.  :-)

A fun movie... :-)

> The main reason others use different data models is that they allow a much
> closer interaction between the language of dbms and applications and the
[quoted text clipped - 3 lines]
> significantly lower than those models having additional expertise and
> liaison requirements.

I am all for lowering this cost - decreasing the "impedance mismatch", so to
speak. However, I think my ideas move in the opposite direction - making
application languages more relational, rather than DBMSs more procedural (or
OO, if you like).

> Now, this advantage may not be what you are looking for.  It may not be, for
> that matter, what the CIO of a large company is looking for.  However, in
> the world of small to meduim sized businesses (SMBs) this cost advantage
> means something.

Agreed - however, while my experience comes from a large company, it's work
done for a relatively small business unit. I was the only developer on
several of the projects, and my user base was fairly small. I was DBA,
developer, customer support, etc. And I still found the relational metaphor
(even though I had to use SQL) much easier than XML. I've never used Pick -
sounds like their environment gives them a lot of power, and while that's
nice, I'd still never think of thinking of an invoice as a single
proposition or "object". It's not. It's a fairly complex series of them.
Just like an "order", an invoice is a fairly complex confluence of
phenomena, and not even a static one (modifications / confirmations to
various invoice "pieces" was common in my world, as an invoice was often
correlated with multiple shipments and warehouses).

> I can't speak for Anthony and Dawn, but I place more value not on the
> original inputs but the original concept.  An invoice _is_ something that
> usually has multiple items ordered.

And I disagree. An invoice is many somethings. If your questions deal only
with the set (e.g. presenting an invoice on a screen), then great - treat it
as one. But when you're attempting to analyze the distribution of parts
across warehouses and across time, "viewing" the invoice as a number of
components is far, far more useful. So it depends on your needs, but I'd far
rather place my bet on something that allows me to scale my queries and
reports to more detailed questions than one that restricts me. And I still
think having to correlate multiple line-item attributes across multiple MV
attributes in a single File is nonsensical and error-prone.

> It is an object in and of itself that
> needs no "chopping up", so to speak.

Yes, it does. "Analysis" means chopping up. We gain power in chopping up.
Our problems are solvable when they're chopped; our solutions are scalable
and provable when they're chopped. Domains are intellectually tractable when
they're separated. Holism may be fine in medicine (???) where human
psychology is involved, but any translation of a "real world" domain to an
automated system involves "chopping up." You can either acknowledge it and
chop in a rational way, or pay the price later on.

While I'm not dogmatic about 1NF (believe it or not), or even relational, I
do believe based on experience that the balance point for using relational
is far, far sooner than critics would believe.

> This is where simpler means don't destroy the properties of the invoice in
> order to make the data fit into an arbitrary data model with tautological
> axioms and theorems.

Tautological? Arbitrary? Any logical model is arbitrary; an invoice has no
shape, or at least none beyond that of a piece of paper, and as I've said,
if all they want to do is store the invoice, let's scan the thing into a JPG
and be done with it.

"Making the data fit" is also nonsense; whatever physical and logical model
you choose, you're pushing the data into something. You can either push it
into something with maximum power or a lesser degree of power. Perhaps you
gain short-term efficiency; in my experience with XML, you gain squat.

> Keep the business objects as close to what they are.

So forgetting an invoice for a moment, what "is" a paint color? A paint
formula? A carmaker code? A digital certificate store? What's their "natural
form"?

There is none. What we do is unnatural. (<insert unnatural-act joke here>)

> A data model that can do this has many advantages.

That can do what - model arbitrary data in its "natural form", whatever that
means? I agree. If you show that to me, I'll use it.

> > I suspect there are simply different expectations; I'd rather
> > stretch the computer to avoid stretching humans in ways they're not good
[quoted text clipped - 7 lines]
> wasn't for the bizarre data typing and variable scoping it would be a lot
> easier).  Hopefully dbms theory will contribute to this too.

I hope so - that would be nice. I think XPath and XQuery, while convoluted,
are reasonable enough operators over an XML type / type generator. I just
see far more benefit from the structures and declarative constraints of
relational.

- erk
Anthony W. Youngman - 07 Jun 2004 22:58 GMT
>> A data model that can do this has many advantages.
>
>That can do what - model arbitrary data in its "natural form", whatever that
>means? I agree. If you show that to me, I'll use it.

And there we have our problem.

Yep, I can see where you're coming from, in practical terms. But can't
you see where we're coming from? My problem, as I see it, is that
'arbitrary data in its "natural form" ' is NOT amenable to easy coercion
into "relational data".

From that, it follows that relational databases are the wrong tool to
model natural data with.

However, as I said, I do feel that it's like the circle/ellipse problem
that Copernicus had. IF people are prepared to *look* at real data in
its "natural form" and develop a model that really addresses that, while
it will make one hell of a mess of current relational theory, combining
a "natural form data" model with the relational model will yield a very
powerful database theory.

After all, isn't that exactly what I do when I insist on normalising all
my data within Pick FILEs? And I really don't see the problems you do,
even if the line-items come from multiple warehouses etc etc. If the
relational analyst didn't foresee that, you're going to end up in an
equally big mess (experience says "even bigger" mess) than a Pickie, if
both are faced with the same analysis failure.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 09 Jun 2004 16:33 GMT
> >> A data model that can do this has many advantages.
> >
[quoted text clipped - 7 lines]
> 'arbitrary data in its "natural form" ' is NOT amenable to easy coercion
> into "relational data".

My point is that data has no natural form, plain and simple. I've
encountered too many cases over the years where accepting the "natural form"
as the users stated it would have resulted in brittle design - where
abstraction and extension yielded immediate results.

>  From that, it follows that relational databases are the wrong tool to
> model natural data with.

I don't believe natural data exists. It's all unnatural. To simply accept
the intuitive "sense" of business data as the users see it gives you
something quickly, but in every case I've ever encountered, that simplicity
is a limitation not just on future requirements, but on immediate ones as
well, and on the ability to craft a strong solution to current problems. In
every case I've ever experienced, abstracting and structuring beyond what
the users would consider "natural" (and for which we still have no
definition) has benefitted me in terms of shorter development time, more
flexibility for future extensions, and better ability to explain nuances of
their situations to users (e.g. better analysis).

> However, as I said, I do feel that it's like the circle/ellipse problem
> that Copernicus had. IF people are prepared to *look* at real data in
> its "natural form" and develop a model that really addresses that,

It's difficult to address it when it's so intuitive. I understand this is a
human discipline, but formal guidelines can be useful in meeting real needs.

> while
> it will make one hell of a mess of current relational theory, combining
> a "natural form data" model with the relational model will yield a very
> powerful database theory.

I doubt it, but am willing to think about it. What is that "natural form"?
Is it just 1NF? Is that still the linchpin of the argument?

> After all, isn't that exactly what I do when I insist on normalising all
> my data within Pick FILEs? And I really don't see the problems you do,
> even if the line-items come from multiple warehouses etc etc.

OK, I'll accept that - it just sounds like a massive pile of aggravation to
me. Predicates can expand as requirements do (and as you learn more about
them), which is where normalization acquires its power. To just pile on
attribute after attribute or sub-attribute after sub-attribute, and to have
to keep straight (for myself and future developers) which of those are
correlated with others just sounds like a leap of faith that's both annoying
and unnecessary.

> If the
> relational analyst didn't foresee that, you're going to end up in an
> equally big mess (experience says "even bigger" mess) than a Pickie, if
> both are faced with the same analysis failure.

Perhaps that's true - I can't say it is, but don't have a strong
counter-argument - but in my experience, analysis failures are much less
destructive when you've normalized properly.

- erk
Laconic2 - 09 Jun 2004 17:28 GMT
> My point is that data has no natural form, plain and simple. I've
> encountered too many cases over the years where accepting the "natural form"
> as the users stated it would have resulted in brittle design - where
> abstraction and extension yielded immediate results.

Form is in the eye of the beholder.

The ER model has given me very good results,  when it comes to data
analysis, and two way communication with subject matter experts who are
typically not systems experts.

The relational model,  such as I know it,  has given me very good results
when it comes to data design, with the exception of certain cases, where a
different model would have been more natural.  But those are the exceptions
rather than the rule.

SQL or indexed files have given me very good results when implementing a
relational design.  The principle difference between "SQL databases" and
indexed files   has been in the areas of classical DBMS services and data
independence.  But you can implement a relational data model using either
one.

Which form is "natural".  It depends.  Who are we talking to?
Anthony W. Youngman - 10 Jun 2004 00:38 GMT
>> >> A data model that can do this has many advantages.
>> >
[quoted text clipped - 13 lines]
>as the users stated it would have resulted in brittle design - where
>abstraction and extension yielded immediate results.

Fine. But if data has no "natural form", then in the real world there is
no such thing as data. Therefore there is no point in building a system
to model it :-)

>>  From that, it follows that relational databases are the wrong tool to
>> model natural data with.
[quoted text clipped - 9 lines]
>flexibility for future extensions, and better ability to explain nuances of
>their situations to users (e.g. better analysis).

It's all unnatural? As I said above, then surely it doesn't exist ...

>> However, as I said, I do feel that it's like the circle/ellipse problem
>> that Copernicus had. IF people are prepared to *look* at real data in
[quoted text clipped - 22 lines]
>correlated with others just sounds like a leap of faith that's both annoying
>and unnecessary.

Except that actually, it works EXTREMELY WELL in practice. We think in
terms of language. To me a "table" is a noun. A field (your "column") is
typically an adjective, or grouped into an adjectival clause. It can
also be a gerund (your foreign key) which is, sort of, an adjective.

Basically, it fits the way nature has designed our brains to work. So
the practice is simple. By imposing abstraction, the relational model is
forcing the data into a framework that our brains are not designed to
understand.

Try describing your tables in terms of natural language. I can guarantee
you'll end up with a mess ... :-) one may be a noun, another is an
adjectival phrase, another is a bunch of gerunds - what the hell - in
simple intuitive terms - is a table?

>> If the
>> relational analyst didn't foresee that, you're going to end up in an
[quoted text clipped - 4 lines]
>counter-argument - but in my experience, analysis failures are much less
>destructive when you've normalized properly.

Which is why I'm such a fan of normalising within a Pick FILE. It forces
you to analyse properly, and reduces the likelihood of an analysis
failure. Supporting our Pick systems at work can be a real pain, I
admit. But EVERY screwup can be attributed - directly - to an analysis
failure that was just plain sloppy, or poor programming practice like
not separating updates from reports :-(

Unfortunately, that's typical of Pick systems - so many systems are
*written* by USERS for USERS, so that while they work extremely well
they also have the computer pros tearing their hair out.

But surely that says something - wasn't the design aim of SQL to be "so
easy that users can use it"? Pick has actually achieved that aim!

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 10 Jun 2004 20:18 GMT
> >My point is that data has no natural form, plain and simple. I've
> >encountered too many cases over the years where accepting the "natural form"
[quoted text clipped - 4 lines]
> no such thing as data. Therefore there is no point in building a system
> to model it :-)

Cute... but the whole point of building a system is to impart some form to
the data, to render it manipulable. Otherwise we can stick with pieces of
paper in filing cabinets, if that meets the users' needs.

On second thought, you're right - there is no such thing in the real world
as data. Data is our model of the real world, or at least part of that
model. "Data modeling" is thus a misnomer, though I can't think of a better
gerund than "modeling". "Data creation" might be confused with the actual
population of a database.

> >OK, I'll accept that - it just sounds like a massive pile of aggravation to
> >me. Predicates can expand as requirements do (and as you learn more about
[quoted text clipped - 13 lines]
> forcing the data into a framework that our brains are not designed to
> understand.

Heh. You're a scientist, right? Surely much of science is somewhat
unnatural, at least until after years learning it? Quantum theory not even
then... but in any event, "naturalness" is a poor criterion for use.

> Try describing your tables in terms of natural language. I can guarantee
> you'll end up with a mess ... :-) one may be a noun, another is an
> adjectival phrase, another is a bunch of gerunds - what the hell - in
> simple intuitive terms - is a table?

Each relation is a sentence.

> >> If the
> >> relational analyst didn't foresee that, you're going to end up in an
[quoted text clipped - 11 lines]
> failure that was just plain sloppy, or poor programming practice like
> not separating updates from reports :-(

OK, I can buy that. But I'm still somewhat wary - you normalize, but not all
the time. You suggest that normalization assists in verifying the
correctness of your analysis. So then at some point you de-normalize. What
triggers you to do so? Or do you not denormalize because something in your
analysis causes you not to normalize a specific aspect of the model?

> Unfortunately, that's typical of Pick systems - so many systems are
> *written* by USERS for USERS, so that while they work extremely well
> they also have the computer pros tearing their hair out.
>
> But surely that says something - wasn't the design aim of SQL to be "so
> easy that users can use it"? Pick has actually achieved that aim!

Perhaps, I'm not sure. It certainly has failed in that regard, although with
a few well-designed views and functions, I've taught some of my users SQL.
But I think there's certainly an application level that sits above
relational (and SQL), and hides some of the details that are important (or
else they shouldn't be there) - namely joins.

- Eric
Laconic2 - 11 Jun 2004 11:39 GMT
> Cute... but the whole point of building a system is to impart some form to
> the data, to render it manipulable. Otherwise we can stick with pieces of
> paper in filing cabinets, if that meets the users' needs.

I'd like to suggest that "form follows function"  applies here.  It's
architecture 101.

The function of data inside a database is profoundly different from the
function of data inside a file cabinet.  That's why the form is different.

> On second thought, you're right - there is no such thing in the real world
> as data. Data is our model of the real world, or at least part of that
> model. "Data modeling" is thus a misnomer, though I can't think of a better
> gerund than "modeling". "Data creation" might be confused with the actual
> population of a database.

How about "data design"?  Notice I didn't say "database design"...
Anthony W. Youngman - 10 Jun 2004 00:21 GMT
>>> A data model that can do this has many advantages.
>>
[quoted text clipped - 24 lines]
>an equally big mess (experience says "even bigger" mess) than a Pickie,
>if both are faced with the same analysis failure.

Following up to myself, a line item on an order is different to a line
item on a delivery note is different to a line item on an invoice. I
know analysts screw up and get it wrong, but I'll draw an almost
identical analogy :-)

If your customer's warehouse moves, you do NOT want changing the
warehouse address to change the delivery address on all your old
invoices... so while your line item on the invoice may point to the line
item on the delivery note, it MUST NOT be the same "object". Because
things change. After all, in the transition from order to delivery note,
it's quite possible for you to substitute an equivalent. And you can't
even guarantee that the invoice line will always be identical to the
delivery note rather than the order - because certainly under UK law, if
the supplier substitutes on their own initiative they are obliged to
bill the cheaper item, not the one that was actually supplied ...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Bill H - 09 Jun 2004 17:52 GMT
Eric:

[snipped]

> Agreed - however, while my experience comes from a large company, it's work
> done for a relatively small business unit. I was the only developer on
[quoted text clipped - 8 lines]
> various invoice "pieces" was common in my world, as an invoice was often
> correlated with multiple shipments and warehouses).

An excellent example of what I was talking about.  In an SMB, or small
business unit, there should be no staff to support the dbms, server, or any
other IT function.  Very part time support is all that should be necessary.
This is one of the cost issues we discuss all the time.

Secondly, as a business person, an invoice _is_ a single object.  I view its
function much differently than the way IT might think of it.  It has a
singular purpose: to get cash for the company.  Any discussion of an invoice
needs to keep this in mind.  Again decomposition/recomposition are issues.

> > I can't speak for Anthony and Dawn, but I place more value not on the
> > original inputs but the original concept.  An invoice _is_ something that
[quoted text clipped - 9 lines]
> think having to correlate multiple line-item attributes across multiple MV
> attributes in a single File is nonsensical and error-prone.

See my comment above.  An invoice is a business object that serves a
business purpose.  Neither of us will ever get our payroll checks unless
this invoice is handled as a business object.  (remember, get the cash!)

> > It is an object in and of itself that needs no "chopping up", so to
speak.

> Yes, it does. "Analysis" means chopping up. We gain power in chopping up.
> Our problems are solvable when they're chopped; our solutions are scalable
> and provable when they're chopped.

[snipped]

This is true only when decomposition doesn't alter the fundamental
characteristics of the object; otherwise analysis has tremendous risk
introduced.  In a business environment, IT personnel (especially DBAs) are
not usually in a position to assess such risk.  So why put them in this
position?

My point is: since databases are such natural extensions of business, why
make decomposition of business objects a requirement of storing data.  Also,
why make the language of databases so obscure to ordinary business people
that new expertise, with their attendant costs, are required?

> > This is where simpler means don't destroy the properties of the invoice in
> > order to make the data fit into an arbitrary data model with tautological
[quoted text clipped - 9 lines]
> into something with maximum power or a lesser degree of power. Perhaps you
> gain short-term efficiency; in my experience with XML, you gain squat.

Perhaps I am being a little unfair here.   There are three fundamental rules
in business and finance:  1) get the cash, 2) get the cash, and 3) get the
cash.  :-)  Seriously, IT and databases provide support to a business.
Their rules and nomenclature had better fit in with this environment
otherwise their usefulness becomes less than cost effective.  How much cost
a business will tolerate is dependent on a number of factors.

I'm trying not to lose sight of the fundamental purpose of data and a dbms.

Bill
Dawn M. Wolthuis - 10 Jun 2004 00:40 GMT
> > [SNIP]
> > I've noticed that many people aren't interested in a better proposal, or
> > even a different proposal.  Dogma rules.  :-)
>
> A fun movie... :-)

indeed
[I'm sure I've missed a bunch since my ISP first had nntp down and then
seemed to reinitialize the database (is that the right term?) but I'll read
a bit before a long weekend away from news again.]

> > The main reason others use different data models is that they allow a much
> > closer interaction between the language of dbms and applications and the
[quoted text clipped - 10 lines]
> application languages more relational, rather than DBMSs more procedural (or
> OO, if you like).

And the likelihood of that is ... NIL (choosing not to use that NULL set
designation).   Why?  Because people tend to choose solutions that work.  If
there were overwhelmingly good evidence that you get a better bang for the
buck by using relational theory, that would be a different story.  I'd
strongly suggest we nudge relational databases toward pragmatism ;-)

> > Now, this advantage may not be what you are looking for.  It may not be,
> for
[quoted text clipped - 7 lines]
> developer, customer support, etc. And I still found the relational metaphor
> (even though I had to use SQL) much easier than XML.

Didn't some of that have to do with having to perform conversions to and
from XML which might not have been necessary if the data were stored in the
way it was sent?  OR was it the loosey-gooseyness of it where there are not
as many texts with rules for "how to"?

> I've never used Pick -
> sounds like their environment gives them a lot of power, and while that's
> nice, I'd still never think of thinking of an invoice as a single
> proposition or "object". It's not.

Perhaps you've never seen one?  ;-)

> It's a fairly complex series of them.

That too, but through how many portals would you want to have to go to
collect all such?  This has to do with how the "user" (application developer
or dba, for example) should view the data.

> Just like an "order", an invoice is a fairly complex confluence of
> phenomena, and not even a static one (modifications / confirmations to
[quoted text clipped - 4 lines]
> > original inputs but the original concept.  An invoice _is_ something that
> > usually has multiple items ordered.

Yes and I'm trying to narrow that down a bit while trying to tap into just
how I do database design given that I don't start with 1NF.  It has to do
with people, places and things and entities that are not functional
dependent on any other entities in the system.  What is that top level of
nodes after ENTITY in a system, such as PEOPLE PLACES THINGS.

> And I disagree. An invoice is many somethings. If your questions deal only
> with the set (e.g. presenting an invoice on a screen), then great - treat it
> as one. But when you're attempting to analyze the distribution of parts
> across warehouses and across time, "viewing" the invoice as a number of
> components is far, far more useful.

I see where you are coming from.  No, an invoice is just one of these
things, but the data from the invoice is also available through other data
portals (for lack of a better word -- don't make me use the word "view"!)
such as warehouses and parts.  I can see that one difference is that the
same data from my perspective is available as an invoice and as
parts-invoiced.  These are different entities with the same or similar data
accessed.  Each portal can see everything you can "get to" from there (via
declared links as one might have in a join statement).

> So it depends on your needs, but I'd far
> rather place my bet on something that allows me to scale my queries and
> reports to more detailed questions than one that restricts me. And I still
> think having to correlate multiple line-item attributes across multiple MV
> attributes in a single File is nonsensical and error-prone.

I'll grant that there are pros and cons and not everyone designs an invoice
identically no matter what the database, but when you add in the virtual
fields (derived data or data found elsewhere), the INVOICE vocabulary for
everyone has what it needs to show an invoice.

> > It is an object in and of itself that
> > needs no "chopping up", so to speak.
>
> Yes, it does. "Analysis" means chopping up. We gain power in chopping up.

and putting back together

> Our problems are solvable when they're chopped; our solutions are scalable
> and provable when they're chopped.

again, I think you are confusing something here -- perhaps physical and
logical (although I think I've ascertained that would not be like you) but
perhaps it is your notion that data can only be accessed through one place -
it's base relation.  Remove that obstacle -- free yourself.  Yes, we still
divide it all up, but into wholes, not pieces.

> Domains are intellectually tractable when
> they're separated. Holism may be fine in medicine (???) where human
> psychology is involved, but any translation of a "real world" domain to an
> automated system involves "chopping up." You can either acknowledge it and
> chop in a rational way, or pay the price later on.

yes, there is some chopping up and the functional dependency thing takes you
quite far for that, even if you allow for both scalar values and compound
ones (such as lists).

> While I'm not dogmatic about 1NF (believe it or not), or even relational, I
> do believe based on experience that the balance point for using relational
> is far, far sooner than critics would believe.

Someday grasshopper ...

> > This is where simpler means don't destroy the properties of the invoice in
> > order to make the data fit into an arbitrary data model with tautological
[quoted text clipped - 4 lines]
> if all they want to do is store the invoice, let's scan the thing into a JPG
> and be done with it.

No, the data needs to be available to other entities as well, as you pointed
out.

> "Making the data fit" is also nonsense; whatever physical and logical model
> you choose, you're pushing the data into something. You can either push it
[quoted text clipped - 6 lines]
> formula? A carmaker code? A digital certificate store? What's their "natural
> form"?

It is relational folks who become democratic about this and start thinking
about understanding the nature of any particular noun outside of its use in
"this" context.  Define it based on its use and if a new use comes up,
redefine it if necessary, otherwise add qualifiers to it.

> There is none. What we do is unnatural. (<insert unnatural-act joke here>)

OK and it's funny, but nevermind.

> > A data model that can do this has many advantages.
>
> That can do what - model arbitrary data in its "natural form", whatever that
> means? I agree. If you show that to me, I'll use it.

as entities.  Still working on how to show it.

> > > I suspect there are simply different expectations; I'd rather
> > > stretch the computer to avoid stretching humans in ways they're not good
[quoted text clipped - 13 lines]
> see far more benefit from the structures and declarative constraints of
> relational.

Have you found that when you map from xml to relational, you don't need to
add anything to the information in your source, but when you go the other
direction, you need to add data (such as ordering)?

> - erk

Cheers!  --dawn
mAsterdam - 10 Jun 2004 01:34 GMT
> [I'm sure I've missed a bunch since my ISP first had nntp down and then
> seemed to reinitialize the database (is that the right term?) but I'll read
> a bit before a long weekend away from news again.]

A shared databank of messages - check the glossary ...
yep it's a database!
Probably a hierarchical one (MV maybe? - nah just protocol),
definitely not designed with the relational model in mind,
though. I have no way of viewing it as tables -
because the crosspost I made about "database - prolog
and relational" to two newsgroups forces me to check both
groups for replies.

It works quite well, though :-)

> ... people tend to choose solutions that work.  If
> there were overwhelmingly good evidence that you get a better bang for the
> buck by using relational theory, that would be a different story.  I'd
> strongly suggest we nudge relational databases toward pragmatism ;-)

Roman numerals still exist. They work quite well in some contexts.
Besides, there is tradition.
Do you know what the QWERTY keyboard was designed for?

> ... an invoice is just one of these
> things, but the data from the invoice is also available through other data
[quoted text clipped - 4 lines]
> accessed.  Each portal can see everything you can "get to" from there (via
> declared links as one might have in a join statement).

Yep. The guys (mostly) who check the deliveries simply
can't afford having just the invoice as their unit of work.
They need to do it item by item - yep it's there/no it's not.

> again, I think you are confusing something here -- perhaps physical and
> logical (although I think I've ascertained that would not be like you) but
> perhaps it is your notion that data can only be accessed through one place -
> it's base relation.  Remove that obstacle -- free yourself.  Yes, we still
> divide it all up, but into wholes, not pieces.

So - let's pay the whole invoice or not *if* one minor item is not
there? I guess it's a way of doing business - but I would prefer
to not have the database implementation decision
determine this business style.

> It is relational folks who become democratic about this and start thinking
> about understanding the nature of any particular noun outside of its use in
> "this" context.  Define it based on its use and if a new use comes up,
> redefine it if necessary, otherwise add qualifiers to it.

The first department to get a database wins.
The rest has to jiggle their stuff into the imposed hierarchy.

> Have you found that when you map from xml to relational, you don't need to
> add anything to the information in your source, but when you go the other
> direction, you need to add data (such as ordering)?

If the order is *that* important, you can model it, but indeed
most relational modellers have a blind spot there.
However, getting rid of the possible contradictions is much more difficult.
Dawn M. Wolthuis - 10 Jun 2004 14:40 GMT
> > [I'm sure I've missed a bunch since my ISP first had nntp down and then
> > seemed to reinitialize the database (is that the right term?) but I'll read
[quoted text clipped - 10 lines]
>
> It works quite well, though :-)

I'm sure it would work much better if implemented in Oracle, but, ah well
... ;-)

> > ... people tend to choose solutions that work.  If
> > there were overwhelmingly good evidence that you get a better bang for the
[quoted text clipped - 4 lines]
> Besides, there is tradition.
> Do you know what the QWERTY keyboard was designed for?

I was told once it was to keep the mechanical hammers attached to the keys
from hitting each other, so they needed to put keys you would likely hit one
after the other so they were not close together.

> > ... an invoice is just one of these
> > things, but the data from the invoice is also available through other data
[quoted text clipped - 19 lines]
> to not have the database implementation decision
> determine this business style.

I'm still not saying this both accurately and clearly.  I'll think about it
some more.  There is no problem paying one line item from an invoice and I'm
not sure why you think there would be.  Again, this is a logical way of
looking at the data, but if you looked at a physical implementation, such as
a paper invoice form, does it seem difficult to you to check off one line
item from that form?  Would it be easier conceptually or in any way for this
to come on multiple sheets of paper so you could retrieve the one piece of
paper related to this line item and check it off that way?

> > It is relational folks who become democratic about this and start thinking
> > about understanding the nature of any particular noun outside of its use in
[quoted text clipped - 3 lines]
> The first department to get a database wins.
> The rest has to jiggle their stuff into the imposed hierarchy.

Not at all!  Dept #2 identifies their major entities, some of which might
align with Dept #1, others of which might be able to see information that
Dept #1 maintains.  There actually is no issue whatsoever that crops up
here.  There could be the usual types of changes that need to be made --
adding files, fields, functions, but it works just fine and again I'll have
to think of how to make that perfectly clear.

> > Have you found that when you map from xml to relational, you don't need to
> > add anything to the information in your source, but when you go the other
[quoted text clipped - 3 lines]
> most relational modellers have a blind spot there.
> However, getting rid of the possible contradictions is much more difficult.

As Wol has said, you can take any PICK database and view it as relational,
but you can't go the other way around.  If you could, then this discussion
would be moot -- we could just toggle between different perspectives on the
data.  Now, it is possible to design a relational database that can do that,
but you have to design for it.  I would not be surprised if you had a
relational data modeler and a pick data modeler both address the same
problem space, if the PICK modeler would actually encode more information
than the relational one.  One example that pops to mind is with
classifiers -- the relational modeler who identifies that 90% of the time a
single entity fits into a single classification and if they don't then the
user can pick one without anyone dying, will likely then proceed to make
that a rule by putting a classification code as an attribute on that entity,
rather than splitting it out into a separate table for the few times when
the entity could be classifed in two ways.  The PICK modeler doesn't blink
and identifies that this classification code is multivalued -- if you need
to put two classification codes on this entity, you do so.  Overly
simplified example, but ...

Later.  --dawn
Eric Kaun - 10 Jun 2004 20:57 GMT
> > The first department to get a database wins.
> > The rest has to jiggle their stuff into the imposed hierarchy.
[quoted text clipped - 5 lines]
> adding files, fields, functions, but it works just fine and again I'll have
> to think of how to make that perfectly clear.

I understand what you're saying - but as the number of departments (or even
job roles within a department) demands different views of the data, I
believe that whatever vocabularies you layer on top, your "base" data design
tends toward normalized relations.

> As Wol has said, you can take any PICK database and view it as relational,
> but you can't go the other way around.  If you could, then this discussion
> would be moot -- we could just toggle between different perspectives on the
> data.

He did say that, and I've been thinking about it, and am not sure it's
accurate. The order of values in a list attribute in a Pick file seems
primarily to correlate with other attributes that relate to the same
"nested" entity - e.g. a line item. Those can easily be spit out in
correlated lists by foreign key traversal. Other ordering would have to be
imposed, and maybe that's where the discrepancy is. Relational requires that
if order is important, you make it an attribute. I've never found such to be
a problem - in most cases, orderings are pseudo-IDs.

> Now, it is possible to design a relational database that can do that,
> but you have to design for it.  I would not be surprised if you had a
[quoted text clipped - 10 lines]
> to put two classification codes on this entity, you do so.  Overly
> simplified example, but ...

That's a good example, though... I'll have to give that some thought. The
question is whether any power is gained by using another relation, since
it's slightly more work; I'm assuming that the classification codes
themselves are stored in another relation/file, and thus you want some
referential integrity so nonexistent codes don't get entered...

- Eric
Laconic2 - 11 Jun 2004 12:09 GMT
> I understand what you're saying - but as the number of departments (or even
> job roles within a department) demands different views of the data, I
> believe that whatever vocabularies you layer on top, your "base" data design
> tends toward normalized relations.

I believe what you say.  I also believe that this is what databases are for:
sharing data between organizations that don't have a common view of the data
being shared.

Half of the databases being built today should have been built using file
systems.  It would have been faster and cheaper.
And there's no significant sharing being done.  It's all encapsulated in a
single subsystem.

> He did say that, and I've been thinking about it, and am not sure it's
> accurate. The order of values in a list attribute in a Pick file seems
[quoted text clipped - 4 lines]
> if order is important, you make it an attribute. I've never found such to be
> a problem - in most cases, orderings are pseudo-IDs.

Months ago,  I asked whether a pizza with pepperoni and onion was the same
as a pizza with onion and pepperoni.

I got several cute responses, but nobody really addressed the underlying
issue.  Sounds like you've got a handle on it.
Eric Kaun - 11 Jun 2004 21:39 GMT
> > I understand what you're saying - but as the number of departments (or
> even
[quoted text clipped - 11 lines]
> And there's no significant sharing being done.  It's all encapsulated in a
> single subsystem.

I agree with you from one viewpoint; on the other hand, an RDBMS doing its
job (we have a bit of an employment gap in that regard) would also help your
design (not just data design); I'm thinking of the ability of a TRDBMS to
encode business rules. If that engine also had a significant client-side
presence (which it should!), you'd be doing design in a more general sense
than just "data design".

> Months ago,  I asked whether a pizza with pepperoni and onion was the same
> as a pizza with onion and pepperoni.
>
> I got several cute responses, but nobody really addressed the underlying
> issue.  Sounds like you've got a handle on it.

It's a slippery handle, but maybe - but be careful asking about "the same
as" in an OO context - that subject gets very confusing to OOers. :-)

A related and interesting issue is that of relation-valued attributes as
primary keys; for example, from one of Date's non-free papers, a relation
with a single column: a relation of siblings. Since in a relation order is
irrelevant, you couldn't insert the tuple ( {Eric, Curt, Amy} ) if the
relation already contained ( {Amy, Curt, Eric} ), for example. He did a
similar thing with prime factors; the relation consisted of 2 columns:
Integer and {Integer}.

Anyway, I'm rambling...

- erk
Laconic2 - 11 Jun 2004 23:07 GMT
> > Months ago,  I asked whether a pizza with pepperoni and onion was the same
> > as a pizza with onion and pepperoni.
[quoted text clipped - 14 lines]
>
> Anyway, I'm rambling...

I don't think it's rambling at all... It's precisely where I was heading
with the question.

There's a second question, along the same lines.

In the recent Pick example,  showing an invoice,  there's a list of account
numbers,  and a correlated list of amounts.
That is, the second amount "goes with" the second account number.  But, in
the earlier pizza pick example we had a list
of three toppings and an uncorrelated list of three cheeses.  Now my
question is this:  how the heck do you know that in one case the two lists
are correlated and in the other example they are uncorrelated?

Are you "just expected to know"  the logical structure of invoices and
pizzas enough to draw this inference?
Not that there aren't things you "just have to know"  in a schema of tables,
but the Pick people treat it as though it's "intuitively obvious".  Maybe to
an SME,  but maybe not to everybody else.
Bill H - 12 Jun 2004 02:36 GMT
Sir:

This is a good question.

> There's a second question, along the same lines.
>
[quoted text clipped - 11 lines]
> but the Pick people treat it as though it's "intuitively obvious".  Maybe to
> an SME,  but maybe not to everybody else.

The database side doesn't normally enforce this relationship (it could be
enforced with a trigger).  However, considering the number of business rules
associated with such a module, and the fact that the data is usually managed
from a single application, these rules are best kept in the application
code.  This is because the business person is much closer to the application
and database, and its tools.  The database nomenclature is not unique and
words mean what they've always meant (i.e. noone refers to a "row" or a
"column" when referencing a customer or a list of their outstanding
invoices).

The field definitions are where the descriptions of the field are kept.  Any
such relationships that exist (such as field#s 9 & 10 below) are also kept
in the field definition.  Again, it is not the database that enforces these
rules, it's the application.  You might see the following:

009 1010]1020]1050]1090
010 2500]32500]17525]15
011 9]12]33]34]35]36]37]38]39

in a customer record where:

009 - The G/L acct#s of recurring monthly billings (such as support fees).
010 - The amount of each G/L acct#s recurring billing amount.
010 - The unpaid invoices still associated with this customer.

This is very usual and a single disk read gets the salient properties of the
customer record.  The dictionary for the G/L acct#s may be defined as being
the controlling field with a relationship to field# 10 while field# 10 is
dependent on field# 9.

As you can tell, a well defined mvDbms application uses the field
definitions to describe the data (as it should be) and relationships with
other data (or other tables for that matter).  Naturally, the field
definitions are nothing more than data maintained in the database since
they're just data too.  :-)

Bill
Mikito Harakiri - 12 Jun 2004 03:04 GMT
> The field definitions are where the descriptions of the field are kept.  Any
> such relationships that exist (such as field#s 9 & 10 below) are also kept
[quoted text clipped - 4 lines]
> 010 2500]32500]17525]15
> 011 9]12]33]34]35]36]37]38]39

Is "]" the hole in the punch card that you store those records on? How do
you manage the space? Never mind, you could code it as 2 adjacent holes!
Gene Wirchenko - 13 Jun 2004 02:00 GMT
>Sir:
>
[quoted text clipped - 13 lines]
>> Are you "just expected to know"  the logical structure of invoices and
>> pizzas enough to draw this inference?

    From what Bill H wrote below, it appears he thinks so.

>> Not that there aren't things you "just have to know"  in a schema of
>tables,
[quoted text clipped - 7 lines]
>from a single application, these rules are best kept in the application
>code.  This is because the business person is much closer to the application

    Every application module that deals with that relationship is
going to have to have that code.  If just one of them gets it wrong,
trouble.  If the rule changes, trouble.

    That is why it would be better to put it in the database.  Do it
once, and do it right.

    I have an app where I do not have the integrity rules coded in
the database.  It is all in the application code.  It is biting me
very badly right now.  It made sense at the time (or rather more
accurately, it did not make as much non-sense at the time), but I am
certainly feeling it now.

    Now, when I go to change code of this sort, any that is in more
than one place is taking me a lot of time to change.

    It starts off being easy to make changes, and then it gradually
grows to the point where it is not so easy.  Then, it can get rather
awkward.

>and database, and its tools.  The database nomenclature is not unique and
>words mean what they've always meant (i.e. noone refers to a "row" or a
>"column" when referencing a customer or a list of their outstanding
>invoices).

    No one?  You are sure that it is impossible?  "This column is..."
or "This row has the subtotals for...".

[snip]

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Bill H - 14 Jun 2004 04:56 GMT
Gene:

Perhaps I was not specific enough.

> >> Are you "just expected to know"  the logical structure of invoices and
> >> pizzas enough to draw this inference?
>
>      From what Bill H wrote below, it appears he thinks so.

Not at all.  A field definition defines a field.  It also defines any
relationships between fields and multiple values within those fields.  So my
example of an A/P invoice with G/L accts and amounts would be defined as
being related.  As Dawn indicates one could reference the values singularly
(as pairs) or as a whole.

>      Every application module that deals with that relationship is
> going to have to have that code.  If just one of them gets it wrong,
> trouble.  If the rule changes, trouble.

No, they just have to know of the relationship, which is defined in the
field definition(s).

>      That is why it would be better to put it in the database.  Do it
> once, and do it right.

That's where the definition resides.

>      I have an app where I do not have the integrity rules coded in
> the database.  It is all in the application code.  It is biting me
> very badly right now.  It made sense at the time (or rather more
> accurately, it did not make as much non-sense at the time), but I am
> certainly feeling it now.

Understandably so.  Those kinds of constraints can be loaded into the
database with a trigger; if that's what one wants.

> >and database, and its tools.  The database nomenclature is not unique and
> >words mean what they've always meant (i.e. noone refers to a "row" or a
[quoted text clipped - 3 lines]
>      No one?  You are sure that it is impossible?  "This column is..."
> or "This row has the subtotals for...".

Well, noone within the management and administration group of the business.
:-)

Bill
Gene Wirchenko - 14 Jun 2004 17:04 GMT
[snip]

    Well, I may be having the chance to work with some of this
first-hand.  I have a job interview today, and the job description
included mention of a hierarchical DBMS.  It will be an interesting
contrast.

>> >and database, and its tools.  The database nomenclature is not unique and
>> >words mean what they've always meant (i.e. noone refers to a "row" or a
[quoted text clipped - 6 lines]
>Well, noone within the management and administration group of the business.
>:-)

    Come now.  "row" and "column" are ordinary English words.  Try
showing a child how to add multi-digit numbers without using the word
"column".  It is possible, but I submit that it is much easier to use
"column".  1s and 10s columns and all that.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Eric Kaun - 14 Jun 2004 17:19 GMT
> Sir:
>
[quoted text clipped - 24 lines]
> from a single application, these rules are best kept in the application
> code.

This is, more than anything, the philosophical divide between relational and
Pick folks. The more rules, the more they should be kept OUT of the
application code. "Application" means just that: a judicious application. Of
what? Rules. Application != definition, just as implementation !=
specification.

> This is because the business person is much closer to the application
> and database, and its tools.

Their closeness is irrelevant; they should of course be given tools that let
them do their job. But encoding the rules in those tools, as opposed to
having those tools generated from and respectful of the rules, is a big
difference.

Granted that some rules should be configurable; that doesn't imply that all
should be. The business, after all, has (or needs!) some structure.

> The field definitions are where the descriptions of the field are kept.  Any
> such relationships that exist (such as field#s 9 & 10 below) are also kept
[quoted text clipped - 21 lines]
> definitions are nothing more than data maintained in the database since
> they're just data too.  :-)

While I see many examples like the above, can you give us an example of how
the dictionary defines those? What language do you use to define the
dictionary? Is it user-accessible?

- erk
Laconic2 - 14 Jun 2004 17:54 GMT
Was:  In an RDBMS, what does "Data" mean?

> This is, more than anything, the philosophical divide between relational and
> Pick folks. The more rules, the more they should be kept OUT of the
> application code. "Application" means just that: a judicious application. Of
> what? Rules. Application != definition, just as implementation !=
> specification.

It isn't just the Pick folks.  The OO folks also feel that the business
rules belong encapsulated inside the objects that "really know what's going
on",  as opposed to formalized as metadata and shared  the same way data is
shared.

In the days when databases were being spread to the old COBOL and files
gang,  this divide was called the difference between "process centric"  and
"data centric"  views of the world.  I think it's really the same divide,
over and over again.

It even happens within the RDBMS vendors.  I've been watching SQL gradually
evolve from a bad answer to the requirement for a "universal data
sublanguange" into a bad programming language, in its own right.
Bill H - 14 Jun 2004 19:41 GMT
Laconic2:

It's amazing how one discovers this when one gets a little more mature (in
age).  :-)

Bill

"Laconic2" <laconic2@comcast.net> wrote in message

> In the days when databases were being spread to the old COBOL and files
> gang,  this divide was called the difference between "process centric"  and
> "data centric"  views of the world.  I think it's really the same divide,
> over and over again.
Eric Kaun - 14 Jun 2004 21:01 GMT
Yes, you're right on all counts.

It seems that at some point in the history of computing, software developers
decided to traipse down the path of implementation, rather than the other
fork: declarative logic. Somehow thinking like a processor, juggling long
procedures and registers (objects), is deemed better than writing engines /
JVMs / compilers that take declarative statements and generate the necessary
procedures.

> Was:  In an RDBMS, what does "Data" mean?
>
[quoted text clipped - 19 lines]
> evolve from a bad answer to the requirement for a "universal data
> sublanguange" into a bad programming language, in its own right.
Dawn M. Wolthuis - 14 Jun 2004 22:36 GMT
> Was:  In an RDBMS, what does "Data" mean?
>
[quoted text clipped - 10 lines]
> on",  as opposed to formalized as metadata and shared  the same way data is
> shared.

Although then "they" (or is that "we"?) start spec'ing things as parms,
perhaps pulled into the OO code by way of xml documents.  You can send
inputs to pre-existing functions and expect outputs (declarative) and/or
write functions along with the inputs and outputs (procedural).  Any way you
cut it, there are functions that get executed.  Those can be written by the
Oracle corporation so that if you decouple your assets from the products of
that company you have less than an entire solution, or you can write the
functions in a language that doesn't give you quite the same ties to a
single corporation (such as Java -- and of course we could then argue the
merits of the Java approach, which doesn't interest me so much as discussing
the merits of the Oracle approach).  You we could  store rules/constraints
as data in the database --  declarative -- and then write functions for that
part of the application rather than having the database provide those --
procedural.

> In the days when databases were being spread to the old COBOL and files
> gang,  this divide was called the difference between "process centric"  and
> "data centric"  views of the world.  I think it's really the same divide,
> over and over again.

yup, definitely

> It even happens within the RDBMS vendors.  I've been watching SQL gradually
> evolve from a bad answer to the requirement for a "universal data
> sublanguange" into a bad programming language, in its own right.

So, how should we fix the situation or is declarative vs procedural a matter
of taste?  --dawn
mAsterdam - 15 Jun 2004 02:08 GMT
>>>This is, more than anything, the philosophical divide
>>>between relational and Pick folks. The more rules,
[quoted text clipped - 41 lines]
> declarative vs procedural a matter
> of taste?  --dawn

Anybody running an operation with a large shared databank,
in practice, has had to bridge this gap.
I haven't seen it done in theory yet, though.

Can we fix it? (in theory, that is) -- I am positive we can.
But it will take a lot of unlearning.
"They" and "We" does not help.
Eric Kaun - 15 Jun 2004 16:28 GMT
> So, how should we fix the situation or is declarative vs procedural a matter
> of taste?  --dawn

Declarative is better; there are enough different styles of declarative to
satisfy many (not all) of those who find, say, Prolog distasteful. It's
simply easier to lapse into procedural; you quickly find yourself in a
quagmire, but that apparently is a lesson not easily learned (even by those
who've been through one quagmire after another). It's that resistance to
learning abstraction that's made me somewhat less tolerant of bad code than
I used to be... the knowledge that in most cases, they'll just do the same
type of thing again. And I'm even less intolerant of my own bad code... but
until we use our procedural abilities to write "engines" that interpret
declarations, we'll keep writing spaghetti. The trouble, of course, is that
those focusing in procedural (and OO) generally don't see the value in
bothering to deal with declarations at all.

What we're trying to accomplish is basic logic and computation; the
restrictions of the languages we use, and the adherence to algorithmic
thinking, keep us from advancing very far.

And, of course, the above is all just hand-waving and generalities, though
generally true. I've just been debugging some horrific splicings of Java and
InstallAnywhere (a rotten package with a GUI and no language at all), and am
in a foul mood...

- erk
Dawn M. Wolthuis - 15 Jun 2004 17:56 GMT
> > So, how should we fix the situation or is declarative vs procedural a
> matter
[quoted text clipped - 21 lines]
> InstallAnywhere (a rotten package with a GUI and no language at all), and am
> in a foul mood...

Take a deep breath and then delegate ;-)

Both relational and declarative seem rather obvious choices to you and
neither does to me.  My issues with declarative include:

1) There seem to be no standards for the black box that does something with
the declarations.  I don't need standards-committees with years of process
to adopt a standard -- I don't even know anything that would help ensure
portability of such declarations.  SQL has come the closest and does have
standards, but we all know you can't just take any code you write against
one database and run it against another.

2) It doesn't read like English -- the verbs are missing, for example.  I'd
like to keep some of Grace Hopper's goal alive of writing code that human
beings can read

3) While hiding much that should be hidden, it "feels like" so much gets
hidden that people spend time trying to figure out how it does things in
order to be good at writing declarations

4) Invariably functions become one of the things to get specified.  If we
are going to specify both data and functions, which, afterall, is what needs
to happen, then what benefit is there to specifying a function and
specifying where it is to be used rather than specifying the function and
then using it where it needs to be used?

It's all about data; It's all about functions.  --dawn
Laconic2 - 15 Jun 2004 18:49 GMT
> It's all about data; It's all about functions.  --dawn

Congratulations.  You've just reinvented LISP.
Laconic2 - 15 Jun 2004 18:55 GMT
> 2) It doesn't read like English -- the verbs are missing, for example.  I'd
> like to keep some of Grace Hopper's goal alive of writing code that human
> beings can read

This is where I agree with you.

This is where the "priesthood" consistently underestimates the "laity".  In
their ability to understand "code" and in the value of writing code they can
read.

I would say the same applies whether it's declarative or imperative.

> 3) While hiding much that should be hidden, it "feels like" so much gets
> hidden that people spend time trying to figure out how it does things in
> order to be good at writing declarations

That's because people are trying to be too clever by half.
Gene Wirchenko - 15 Jun 2004 22:01 GMT
[snip]

>Both relational and declarative seem rather obvious choices to you and
>neither does to me.  My issues with declarative include:
[quoted text clipped - 5 lines]
>standards, but we all know you can't just take any code you write against
>one database and run it against another.

    Let it be implemented in the manner that the implementer
determines is best for the given implementation.  Not having to deal
with the physical level helps considerably with abstraction.

>2) It doesn't read like English -- the verbs are missing, for example.  I'd
>like to keep some of Grace Hopper's goal alive of writing code that human
>beings can read

    Because it is not English?  French does not read like English
either.

    I can read a lot of code much more easily than English.  English
can be very ambiguous.

>3) While hiding much that should be hidden, it "feels like" so much gets
>hidden that people spend time trying to figure out how it does things in
>order to be good at writing declarations

    Typically, a waste of time.  Take variable declarations.  All I
need know is the behaviour of that type.  I do not need to know the
details of implementation in order to use the type.  If someone wants
to get to that level of detail, fine, but it is not required.

    I think it is a confusion of not knowing the appropriate theory
and thinking that examining the implementation will give that
information.  It will not.

>4) Invariably functions become one of the things to get specified.  If we
>are going to specify both data and functions, which, afterall, is what needs
>to happen, then what benefit is there to specifying a function and
>specifying where it is to be used rather than specifying the function and
>then using it where it needs to be used?

    You do not get bit when the function[ality] is required in four
places and you put it in only three of them.  With declaration, the
programming system takes care of where.

[snip]

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Tony - 16 Jun 2004 11:30 GMT
> 2) It [declarative code] doesn't read like English -- the verbs are missing, for example.  I'd
> like to keep some of Grace Hopper's goal alive of writing code that human
> beings can read

So why not add "Ensure that " or similar in front of the rule?
Laconic2 - 16 Jun 2004 14:10 GMT
was: One Ring to Bind them

> > 2) It [declarative code] doesn't read like English -- the verbs are missing, for example.  I'd
> > like to keep some of Grace Hopper's goal alive of writing code that human
> > beings can read
>
> So why not add "Ensure that " or similar in front of the rule?

You are right, Tony.  At first I agreed with Dawn's statement,  because I
want to keep that goal alive, as well.

But the verbs aren't missing from "declarative" sentences.  The declarative
mood is a feature of a verb.
A declarative sentence has averb.  It's not an imperative verb,  but it's a
verb
Tony Douglas - 16 Jun 2004 12:10 GMT
<snip>

> Take a deep breath and then delegate ;-)
>
[quoted text clipped - 7 lines]
> standards, but we all know you can't just take any code you write against
> one database and run it against another.

Well, I'm not sure about that; if you like Prolog style
declarativeness, then that's already subject to an ISO standard.
Alternatively, if you like Haskell, then the Haskell Standards
Committee does a good job of keeping that in good order. So, if either
of those is used for declarative constraints then yes, you should be
able to port around. (Although you might like to have an argument
about the "declarativeness" of Prolog when cut is used.)

> 2) It doesn't read like English -- the verbs are missing, for example.  I'd
> like to keep some of Grace Hopper's goal alive of writing code that human
> beings can read

This is no bad thing - what makes English excellent for writing poetry
and novels makes it hopeless for writing any sort of formal prose -
either systemised for computers or systemised for lawyers.

> 3) While hiding much that should be hidden, it "feels like" so much gets
> hidden that people spend time trying to figure out how it does things in
> order to be good at writing declarations

I agree with the other poster - this is because people are trying to
be too clever for their own good.

> 4) Invariably functions become one of the things to get specified.  If we
> are going to specify both data and functions, which, afterall, is what needs
> to happen, then what benefit is there to specifying a function and
> specifying where it is to be used rather than specifying the function and
> then using it where it needs to be used?

I'm a little confused by this point; care to expand on this ?

> It's all about data; It's all about functions.  --dawn

Cheers,

- Tony
Eric Kaun - 16 Jun 2004 15:45 GMT
> Take a deep breath and then delegate ;-)

Unfortunately, I am the delegate...

> Both relational and declarative seem rather obvious choices to you and
> neither does to me.

I don't expect everyone to agree with me; I'll just point out that I've
never worked in academia, and my opinion (however wrong) is based strictly
on commerical work (some internal for the company, some for resale to
customers) in the manufacturing, media, and print industries.

> My issues with declarative include:
>
[quoted text clipped - 4 lines]
> standards, but we all know you can't just take any code you write against
> one database and run it against another.

True enough; SQL is so convoluted, and the standard so large, that it's
difficult to implement. Contrast with the J2EE spec which, while also large
(and overwritten in Sun's usual verbose style, writing 100 pages where 10
would do), is implemented fairly quickly by commercial and open-source
vendors within 6 months of its release. SQL is atrocious.

That said, the standard for the black box should be a coherent spec, which
implies the need for a coherent language.

> 2) It doesn't read like English -- the verbs are missing, for example.  I'd
> like to keep some of Grace Hopper's goal alive of writing code that human
> beings can read

I think that dream is dead. English is a poor basis for automation, and
defining a useful subset would get us into even murkier water than the c.d.t
glossary. Nonetheless, relational offers a more useful basic structure (the
predicate, which is a sentence) than Pick/MV, which aspires to nouns.
Objects also aspire to be nouns, leaving verbs and sentences "encapsulated",
whatever that means (I know that it means, but doubt its utility in all
things).

> 3) While hiding much that should be hidden, it "feels like" so much gets
> hidden that people spend time trying to figure out how it does things in
> order to be good at writing declarations

A valid criticism. I could chalk much of that up to the operationally-minded
education of most programmers ("we're going to teach you to program just
like a compiler writes machine code!"), but it is true that it's harder
initially. I just think it would pay off, and that the learning curve, while
somewhat steep, has dividends (and not just in the long term; perhaps in the
"medium" range).

> 4) Invariably functions become one of the things to get specified.  If we
> are going to specify both data and functions, which, afterall, is what needs
> to happen,

But I draw a distinction between specifying a function and implementing it
operationally. I can code a function in Java (even though I have to attach
it to a class), but contrast that with any algebraic specification of a
function; maybe something like Prolog, where you are simply defining terms
using input patterns (pardon the gross oversimplification). True, you may
have to optimize later; but you don't have to decide on the optimization (or
on a sub-optimal algorithm) early.

> then what benefit is there to specifying a function and
> specifying where it is to be used rather than specifying the function and
> then using it where it needs to be used?

I'm not sure what you mean here. If you're arguing for the separation of
data and function, I tend to agree; however, that's a very non-OO position
to take... is that what you intended?

- erk
Dawn M. Wolthuis - 16 Jun 2004 16:20 GMT
<snip>
<snip>
> My issues with declarative include:
> >
[quoted text clipped - 14 lines]
> That said, the standard for the black box should be a coherent spec, which
> implies the need for a coherent language.

One issue ties back into the UI used by a developer for specifying anything.
We aren't just talking about languages that are typed in, but some drawn
with boxes, some spec'd with drop-down boxes, etc, so that the data
collected for a specification (whether declarative or not) is related to the
proprietary UI.  Somehow we need to get not only a language captured, but
also an IDE (bad word for me), sort of.  So, there are standards being
developed for IDEs.  Whether a declarative or procedural or OO language
flows from the UI, the user (developer) ends up specifying data that gets
stored by the toolset as well as generating anything that might be needed.
That was said in a convoluted way, but my point is that language standards
are not enough to protect my investment as soon as I opt for a tool from
Oracle or IBM or MS or whomever.

> > 2) It doesn't read like English -- the verbs are missing, for example.
> I'd
[quoted text clipped - 8 lines]
> whatever that means (I know that it means, but doubt its utility in all
> things).

and the others who spoke up were right that a declarative language (such as
SQL) does have declarations that are like sentences (SELECT this, that FROM
arelation) while a language that is spec'd -- options chosen -- is a set of
parms (kinda RPG-like for those who don't think that means role-playing
games).  And I do agree that English as we speak it is not the goal, but
just like the Palm Pilot taught us to write so it could understand, I do
think that is a reasonable way to approach a language.  COBOL has its charm.
Java, along with other modern languages are unnecessarily cryptic to the
seasoned IT professional picking up the language for a first time.

> > 3) While hiding much that should be hidden, it "feels like" so much gets
> > hidden that people spend time trying to figure out how it does things in
[quoted text clipped - 3 lines]
> education of most programmers ("we're going to teach you to program just
> like a compiler writes machine code!"),

I'd argue that it is not just the instruction, but the nature of people that
we think in terms of what to do in what order.

> but it is true that it's harder
> initially.  I just think it would pay off, and that the learning curve,
while
> somewhat steep, has dividends (and not just in the long term; perhaps in the
> "medium" range).
[quoted text clipped - 7 lines]
> operationally. I can code a function in Java (even though I have to attach
> it to a class)

or upset the OO purists and write a class that IS a function ;-)

, but contrast that with any algebraic specification of a
> function; maybe something like Prolog, where you are simply defining terms
> using input patterns (pardon the gross oversimplification). True, you may
[quoted text clipped - 8 lines]
> data and function, I tend to agree; however, that's a very non-OO position
> to take... is that what you intended?

No, rather the opposite -- data and functions are two sides of the same
coin.  So, clearly I wasn't writing clearly.  My point is that you either
write a function directly in procedural or OO code or you spec them and spec
when they are to be used -- 6 of one, half-dozen of the other.  Having seen
so many languages without trying to collect them over my career, I have not
yet seen one that makes an order of magnitude break-through in productivity
for the developer.  It seems to me that Java and OO languages had a good
chance of providing large chunks for reuse, but they aren't there.  It isn't
a silver bullet, but I'm thinking the services strategy, which is language
independent, has a chance at helping to get some of those bigger gains.
Knock on wood.

--dawn
Gene Wirchenko - 16 Jun 2004 17:42 GMT
[snip]

>That was said in a convoluted way, but my point is that language standards
>are not enough to protect my investment as soon as I opt for a tool from
>Oracle or IBM or MS or whomever.

    Particularly so when some companies consider it their duty to
break standards.

[snip]

>and the others who spoke up were right that a declarative language (such as
>SQL) does have declarations that are like sentences (SELECT this, that FROM
>arelation) while a language that is spec'd -- options chosen -- is a set of

    A select is not a declaration. A declaration would be, for
example, a constraint:
         accttype in ("A","B","C")

>parms (kinda RPG-like for those who don't think that means role-playing
>games).  And I do agree that English as we speak it is not the goal, but

    Help, help!  Acronym poisoning!  For a few minutes, I was trying
to figure out what Rocket-Propelled Grenades had to do with the
situation.  Taking out bad implementations?

    Then, I realised you probably meant the language whose acronym
expands to "Report Progam Generator".

>just like the Palm Pilot taught us to write so it could understand, I do
>think that is a reasonable way to approach a language.  COBOL has its charm.

    There are rules.  Some are explicit, some are not.  Some help,
some hinder.

>Java, along with other modern languages are unnecessarily cryptic to the
>seasoned IT professional picking up the language for a first time.

    I do not like the corner cases.  The switch statement in C and
C-derived languages is a nearly useless thing that puts the corner
case of falling through a case onto a pedestal.  Yuck!

[snip]

>No, rather the opposite -- data and functions are two sides of the same
>coin.  So, clearly I wasn't writing clearly.  My point is that you either
>write a function directly in procedural or OO code or you spec them and spec
>when they are to be used -- 6 of one, half-dozen of the other.  Having seen

    Maybe by declaration.

>so many languages without trying to collect them over my career, I have not
>yet seen one that makes an order of magnitude break-through in productivity
[quoted text clipped - 3 lines]
>independent, has a chance at helping to get some of those bigger gains.
>Knock on wood.

    Or a compatible substitute?  <g>

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Bill H - 16 Jun 2004 17:54 GMT
Eric:

[snipped]

> English is a poor basis for automation, and
> defining a useful subset would get us into even murkier water than the c.d.t
[quoted text clipped - 3 lines]
> whatever that means (I know that it means, but doubt its utility in all
> things).

...a more useful basic structure?  Some solutions, yes; some solutions, no;
some solutions, debatable.  Most competing systems have unique benefits and
this is no different.  The question: is the application the best place to
store business rules in a lanquage understandable to those defining those
rules?

You state no (I think).  I state yes!  There is a tremendous cost advantage
for my position, but as I've said before, cost is not an issues in all
scenarios.  The view that the RDM is more useful is tautological.  In other
words, a primary axion is: it is.  So one has no alternative but to conclude
so.

There is a huge disjoint here that encourages this ongoing debate.  Many MV
developers work with other RDBMS products too.  They're exposed to two
different environments and don't like the inefficiency of the RDM that
requires decomposition of business objects into a language that is not
understandable.  This creates uncertainty, additional cost, and additional
instability.  On a limited budget this is generally unacceptable.

On the other hand, there are good reasons, from my perspective, to use the
RDM.  Tools, ease of programming, rich graphical environment, decomposition
and recomposition that's completely hidden (the black box syndrome).

> > 3) While hiding much that should be hidden, it "feels like" so much gets
> > hidden that people spend time trying to figure out how it does things in
[quoted text clipped - 6 lines]
> somewhat steep, has dividends (and not just in the long term; perhaps in the
> "medium" range).

I program in several procedural languages, enhanced BASIC, Javascript, HTML,
VB and as little C{whatever} as I can.  BASIC is by far the easiest and the
one everyone I work with can read and understand.  However, it is not the
most useful in each circumstance.

It can always be said that we can learn a new language.  But what does it
bring to the table?  Is it worth it?  English is the most useful language in
the world because it is the primary language of international business.
This says nothing about its usefulness locally in Germany, Egypt, China, etc
(which is minimal).

> > 4) Invariably functions become one of the things to get specified.  If we
> > are going to specify both data and functions, which, afterall, is what
[quoted text clipped - 8 lines]
> have to optimize later; but you don't have to decide on the optimization (or
> on a sub-optimal algorithm) early.

These are areas where I would want to defer to IT professionals.  All I'd
like is to use the function to accomplish my local business task; as Dawn
noted, a date function.  The date is stored in the database internally but
the functions work miracles extracting the various pieces of a date.

In the MV model the dbms contains numerous functions to allow data
extraction.  This is the benefit with the dbms being both a database and an
application server and a development environment.  These functions become
part of the data when applications are developed.

I think this is the largest misunderstanding in this thread.  We're talking
past each other when the RDM and the MVM are compared.  The MVM includes, as
I noted before, both an application server and development environment.  The
RDM is more constrained by design.

> > then what benefit is there to specifying a function and
> > specifying where it is to be used rather than specifying the function and
[quoted text clipped - 3 lines]
> data and function, I tend to agree; however, that's a very non-OO position
> to take... is that what you intended?

Again, we're talking past each other.  The MVM includes the application
server so I think this explains why some think it appropriate to include
functions in the database, because the function call is stored in the
database.

Bill
Chris Hoess - 18 Jun 2004 04:04 GMT
> Eric:
>
[quoted text clipped - 22 lines]
> words, a primary axion is: it is.  So one has no alternative but to conclude
> so.

Huh? I think you've missed Eric's point. As he said, the relational model
is based on predicates: sentences which make a single, descriptive
assertion. I confess that I don't fully understand the correspondence
between Pick internals and parts of speech, but the point is clear
nonetheless: our task in compiling databases, as I see it, is to be able
to accurately describe relevant parts ofthe world and draw inferences from
those descriptions. It is from this premise that Eric's conclusion
follows: the best model for making descriptions of things is that whose
basic unit is description.

I'm not 100% sure how it follows that storing constraints in the database
of necessity results in a disadvantage in costs. As I just posted in
another followup here, the constraints themselves are easily accessible to
the applications given a good system catalog. For that matter, it doesn't
follow that the constraints must be presented exactly as represented
internally in the database. It seems to me that it should be possible to
make a 1:1 mapping of some terse, "programmer-friendly" language into a
more English-like statement of the constraint, suitable for checking by
"domain experts".

>> > 3) While hiding much that should be hidden, it "feels like" so much gets
>> > hidden that people spend time trying to figure out how it does things in
>> > order to be good at writing declarations

Is this universal, or is this just a by-product of SQL's redundancy and
the need for SQL query optimizers? (e.g. can we build a system where all
logically equivalent statements in this declarative language run in more
or less the same amount of time, and if so, will that stop people from
trying to outguess the compiler?)

> These are areas where I would want to defer to IT professionals.  All I'd
> like is to use the function to accomplish my local business task; as Dawn
[quoted text clipped - 5 lines]
> application server and a development environment.  These functions become
> part of the data when applications are developed.

I'll have to go back home and look, but I wonder if much of this would be
soluble by TTM's elevation of views to first-class relational citizens, so
to speak.

Signature

Chris Hoess

Anthony W. Youngman - 19 Jun 2004 00:24 GMT
>> You state no (I think).  I state yes!  There is a tremendous cost advantage
>> for my position, but as I've said before, cost is not an issues in all
[quoted text clipped - 11 lines]
>follows: the best model for making descriptions of things is that whose
>basic unit is description.

And I think that you've missed Bill's point. Predicates describe
relational data. So Eric's point holds. But relational ASSUMES that
relational data can be used to describe the real world - it's an axiom.
Bill doesn't think that that holds in the real world, and I don't either
which is why I asked the original question that started all this. Logic
(ie predicates) is great for showing that a position is self-consistent.
It is useless for showing that that position is relevant or useful.

>I'm not 100% sure how it follows that storing constraints in the database
>of necessity results in a disadvantage in costs. As I just posted in
[quoted text clipped - 5 lines]
>more English-like statement of the constraint, suitable for checking by
>"domain experts".

But are you storing your constraints as a trigger? (Which I would
sort-of consider an application in itself.)

As I understand "the database", in order to store constraints in the
database, you must store the constraints as *meta*data. Which means your
end-user developers (programmer, dba, whatever) MUST be able to program
*inside* the db engine so it can recognise metadata. At which point we
get user-defined data types and your relational database has started
down the road of mutating into an object database :-)

Actually, I've just reread what you wrote. Do you mean "constraint" as
in a relational constraint - foreign-key type stuff; or as a general
term for enforcing integrity. I was thinking the latter, hence my
reference to triggers, but I suspect you might be meaning the former. If
you did mean the former, Pick doesn't have them because it achieves the
same effect as a side-effect of its implementation. So the fact that
relational needs them is de-facto a hindrance relative to Pick.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony Douglas - 21 Jun 2004 11:46 GMT
<snip>

> And I think that you've missed Bill's point. Predicates describe
> relational data. So Eric's point holds. But relational ASSUMES that
[quoted text clipped - 3 lines]
> (ie predicates) is great for showing that a position is self-consistent.
> It is useless for showing that that position is relevant or useful.

Well, that's that darn "closed world assumption" again. Famously, you
may assert that "the King of France is bald". But as far as I know no
automatic logic system can tell that that's a total fib - unless you
want to wire your systems up to Google or Yahoo, in which case you've
abandoned any sense of a logical basis for what you're up to.
Consistency is (I will hedge and say probably) the best you can
achieve - correctness is beyond any automated logic system I'm aware
of. As an aside, if this isn't good enough for you, what would you
prefer to base your database systems on ? Intuition ? Appeals to
authority ? Artist's impressions ?

> But are you storing your constraints as a trigger? (Which I would
> sort-of consider an application in itself.)

Oh god, here we go with storing things and triggers. How dully
procedural ;)

> As I understand "the database", in order to store constraints in the
> database, you must store the constraints as *meta*data. Which means your
> end-user developers (programmer, dba, whatever) MUST be able to program
> *inside* the db engine so it can recognise metadata.

Ummm, no. The constraints would go in the catalogues, so they just
appear as data too. What do you mean by "*inside* the db engine" ?
Even if/when I get the source code to a DBMS server, I wouldn't expect
to be updating that to add constraints !

> At which point we get user-defined data types and your relational database
> has started down the road of mutating into an object database :-)

Umm, no not really - it would just be turning into a relational
database. Bit of a non sequitur though (constraints -> metadata ->
user defined types ?)

> Actually, I've just reread what you wrote. Do you mean "constraint" as
> in a relational constraint - foreign-key type stuff; or as a general
[quoted text clipped - 3 lines]
> same effect as a side-effect of its implementation. So the fact that
> relational needs them is de-facto a hindrance relative to Pick.

I would like to know your differential between a "relational
constraint - foreign-key type stuff" and "a general term for enforcing
integrity". How do you partition your constraints into those
categories ? Personally I view a constraint as boolean expression that
must never evaluate to false. Some are called "general constraints" or
"assertions" because they can refer to arbitrary combinations of
columns from tables, others as "base table" or "column" constraints
because they refer to one particular table. (In addition to the most
fundamental constraint of course - that of declaring the type of each
column.)

> Cheers,
> Wol

Cheers !

- Tony
Anthony W. Youngman - 25 Jun 2004 23:28 GMT
><snip>
>
[quoted text clipped - 16 lines]
>prefer to base your database systems on ? Intuition ? Appeals to
>authority ? Artist's impressions ?

What would I prefer to base my database systems on? Well, actually, I'd
like to base them on science. On some evidence (which by its very
nature, must be experimental and statistical) that says it's actually
relevant to the real world.

Anything that relies solely on logic (whether automated or not) is
useless. If we relied solely on logic then both Aristotle and Galileo
would be right, as would Ptolemy and Copernicus (and with hindsight we
would laugh BOTH the latter two out of court, despite BOTH of them
having impeccable logic. Because we have "experimental" evidence that
tells us their models are irrelevant. "correctness" doesn't come into
it). Logic merely shows that your theories are self-consistent. But what
do you do when you have TWO theories, both of which are logical and
self-consistent, but are mutually inconsistent? If I took your argument
at face value, I would have to believe both ...

>> But are you storing your constraints as a trigger? (Which I would
>> sort-of consider an application in itself.)
[quoted text clipped - 11 lines]
>Even if/when I get the source code to a DBMS server, I wouldn't expect
>to be updating that to add constraints !

That was why I made the comment about "user-defined types". If you
define something as an integer, it is enforced by the database. If you
define something as "someone's age", it cannot be negative, and it can
have the values "unknown" and "dead".

So we now have the position that either you do some of your
"type-constraint"ing inside the database and some outside, or you have
to have some way of pushing the validation "inside" the database.

>> At which point we get user-defined data types and your relational database
>> has started down the road of mutating into an object database :-)
[quoted text clipped - 21 lines]
>fundamental constraint of course - that of declaring the type of each
>column.)

I've mentioned it elsewhere, but I see two types of integrity. What I
call "natural law" and "statute law". Statute law says you can't have a
car without an owner, but either can cease to exist without affecting
the existence of the other. That was the general term - the latter of
the two I was thinking of. Natural law says the existence of one depends
on the existence of the other - you can't have an invoice detail without
having an invoice for it to be part of. I should have said "enforcing
integrity between tables". In that case, MV simply allocates the same
primary key to both - if the primary key goes, everything else goes with
it :-)

And MV doesn't constrain the type of each column :-) I'd like to be able
to do something like that, actually, but I'd rather validate than
constrain :-) Is it relational that enforces strong typing, or just
current implementations? Either way, I think it's wrong. But MV is
untyped and that's as bad the other way :-) - I would love the *ability*
to enforce typing, I just don't think it should be *mandatory*.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

mAsterdam - 26 Jun 2004 13:10 GMT
> I would love the *ability* to enforce typing,
> I just don't think it should be *mandatory*.

With mandatory type enforcement, what prevents
you from using types that are not very restrictive?
IOW: Mandatory soup isn't eaten as hot as it is served
(extended dutch saying).
Anthony W. Youngman - 05 Jul 2004 22:26 GMT
>> I would love the *ability* to enforce typing,
>> I just don't think it should be *mandatory*.
[quoted text clipped - 3 lines]
>IOW: Mandatory soup isn't eaten as hot as it is served
>(extended dutch saying).

You mean declaring stuff as "variant"? Fine!

I just think of my Fortran days when I *chose* to use the "declare all
variables" switch. I just think that enforcing strict typing is as bad
as no typing at all (and "variant" is a nice middle ground - "this
variable is explicitly untyped" :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony Douglas - 21 Jun 2004 15:07 GMT
<snip>

> Actually, I've just reread what you wrote. Do you mean "constraint" as
> in a relational constraint - foreign-key type stuff; or as a general
[quoted text clipped - 3 lines]
> same effect as a side-effect of its implementation. So the fact that
> relational needs them is de-facto a hindrance relative to Pick.

Additionally, with regards to your last point - how does Pick have
"foreign-key type stuff" as a "side-effect of its implementation" ? Is
this down to its multi-valuedness ? What happens if you use Pick in
faux-relational mode - do you just lose this kind of constraint ?

> Cheers,
> Wol

- Tony
Anthony W. Youngman - 25 Jun 2004 23:42 GMT
><snip>
>
[quoted text clipped - 10 lines]
>this down to its multi-valuedness ? What happens if you use Pick in
>faux-relational mode - do you just lose this kind of constraint ?

If by "faux relational", you mean splitting a normal-form FILE into a
bunch of first-normal-form FILEs, then yes, we do lose this constraint
(in MV mode, anyway. Any modern MV will let you declare a relational
constraint, but you are now invoking a load of code (and overhead) to do
what would have happened naturally).

Because, in MV, a "cell" can itself contain a "column", imagine the
"colour" column for a car. It can contain a list of colours, and
deleting a car's "row" will take out the list of colours. In relational,
that list would be in a different table and would require a constraint,
that would effectively have to do a select followed by a multi-row
delete.

So while we need a transaction mechanism to update an accounts system
because we need to update the bank, the customer file, the general
ledger, and other stuff besides in one hit; we do NOT need a transaction
mechanism to eg delete a car, because in MV, "delete car" is atomic (I'm
being a little unfair here because we ought to update owner and do a few
other things as well, which might need a transaction). But the point is,
if it's atomic in the real world, it should be atomic in an MV database.
A relational database has to assume it's not atomic, because 9 times out
of 10 normalisation means it can't be.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony Douglas - 18 Jun 2004 18:09 GMT
<snip>

> ...a more useful basic structure?  Some solutions, yes; some solutions, no;
> some solutions, debatable.  Most competing systems have unique benefits and
[quoted text clipped - 7 lines]
> words, a primary axion is: it is.  So one has no alternative but to conclude
> so.

And I will state categorically "no". To paraphrase my friend Roy, "a
database is for life - applications are for Christmas". There can't be
cost advantage for many when the applications are more mobile than the
data underneath, resulting in reimplementing the same logic over and
over (in Cobol, or C, or J2EE, or .Net, or whatever the next fad will
be). And if you have to change the constraints in the database, that
either means that the business you're dealing with has changed (which
is fair enough) or you missed something in your model (which isn't,
really).

Could I refer you to Roy's recent presentation at CA World 2004 and
the UK Ingres Users Association on Constraints for Performance ? It is
quite Ingres specific, but it may provide food for thought. It's
available on http://www.rationalcommerce.com/resources/constraints.htm.

> There is a huge disjoint here that encourages this ongoing debate.  Many MV
> developers work with other RDBMS products too.  They're exposed to two
> different environments and don't like the inefficiency of the RDM that
> requires decomposition of business objects into a language that is not
> understandable.  This creates uncertainty, additional cost, and additional
> instability.  On a limited budget this is generally unacceptable.

So, can I paraphrase as "we don't want to use relational, because we
disapprove of the perceived inefficiency of implementations, and
because we don't like the way relational modelling handles our data".

> On the other hand, there are good reasons, from my perspective, to use the
> RDM.  Tools, ease of programming, rich graphical environment, decomposition
> and recomposition that's completely hidden (the black box syndrome).

But then, "we like the fact that there are lots of nice bits and bobs
to paper over the bits we don't like" ?

> I program in several procedural languages, enhanced BASIC, Javascript, HTML,
> VB and as little C{whatever} as I can.  BASIC is by far the easiest and the
> one everyone I work with can read and understand.  However, it is not the
> most useful in each circumstance.

It is my firm (and hardening) view that the imperative model of
programming, with its silly word/record at a time view of the world
and reliance of fiddling with variables, is the source of the majority
of the programming world's ills. I think it is simply bizarre that in
the 21st century we are still being encouraged to think in terms of
updatable cells of storage and simplistic kiddie steps when far higher
levels of abstraction are readily available. This is one of my two
main bugbears with TTM; that it rejects declarative / applicative /
referentially transparent models of programming - so although it's
much better in terms of handling relations, in terms of programming,
operator definition etc. it's just more of the same old stuff.

> It can always be said that we can learn a new language.  But what does it
> bring to the table?  Is it worth it?  English is the most useful language in
> the world because it is the primary language of international business.
> This says nothing about its usefulness locally in Germany, Egypt, China, etc
> (which is minimal).

Hmmmmmmmm ! Depending on the language you're using, how about
provability ? Executable specifications ? No more worrying about race
conditions (if you don't have shared memory, how can you have race
conditions ?) ? Handling logically infinte data structures ? Type
inference ? Simpler programming by case analysis ?

> > > 4) Invariably functions become one of the things to get specified.  If
>  we
[quoted text clipped - 10 lines]
>  (or
> > on a sub-optimal algorithm) early.

To reply to Eric's point in passing - or, you could simply execute the
specification ... :)

> These are areas where I would want to defer to IT professionals.  All I'd
> like is to use the function to accomplish my local business task; as Dawn
> noted, a date function.  The date is stored in the database internally but
> the functions work miracles extracting the various pieces of a date.

Side question : are these *dbms* functions, or are they operators
defined on values of the date type ? There is a difference, and it
*is* an important difference...

> In the MV model the dbms contains numerous functions to allow data
> extraction.  This is the benefit with the dbms being both a database and an
> application server and a development environment.  These functions become
> part of the data when applications are developed.

Well, we have to be clear here; which functions are we talking about -
operators on data types (such as the date functions mentioned above)
which are independent of any given application or functions specific
to some particular application ?

> I think this is the largest misunderstanding in this thread.  We're talking
> past each other when the RDM and the MVM are compared.  The MVM includes, as
> I noted before, both an application server and development environment.  The
> RDM is more constrained by design.

The inclusion of an application server and/or a development
environment are implementation decisions - there's nothing in the
relational data model to say you can't do the same thing in an
implementation of an RDBMS if you wanted to. Necessarily RDM doesn't
prescribe anything about that AS or IDE.

<snip>

Anyway, it's 6 o'clock on Friday evening - it's time to be in the pub,
not writing on Google !!!

Cheers !

- Tony
Bill H - 24 Jun 2004 03:26 GMT
> > "Bill H" <wphaskett@THISISMUNGEDatt.net> wrote in message
news:<40d07bda$1_7@corp.newsgroups.com>...

> > The question: is the application the best place to
> > store business rules in a lanquage understandable to those defining those
[quoted text clipped - 15 lines]
> is fair enough) or you missed something in your model (which isn't,
> really).

Why would you think these same issues are less in a client/server model?  If
the rules are kept in the client application they're spread out everywhere
and are far more difficult to update and maintain.  In addition, in the
client world there are a lot more kinds of languages de'jour available to
cause this very difficulty you've identified.

If the application were moved to an application server this eliminates a lot
of issues present in a client/server model.  If, however, a dbms included an
application server the application would, by definition, be included in the
database.   So, I think your comments, although proper for a RD model are
not so for other models.

> > There is a huge disjoint here that encourages this ongoing debate.  Many MV
> > developers work with other RDBMS products too.  They're exposed to two
[quoted text clipped - 6 lines]
> disapprove of the perceived inefficiency of implementations, and
> because we don't like the way relational modelling handles our data".

You can paraphrase if you'd like.  :-)  I would, however, note that I
specifically stated  the RD model is perfectly useful at times.  I don't
agree it is useful at _all_ times and I think there are perfectly useful and
adequate alternatives that don't adhere to relational axioms.  I just happen
to think it is useful to keep things as simple as possible and
decomposition/recomposition creates complexity and, for me anyway,
confusion.  I am not, unfortunately, a rocket scientist and a business mogul
all in one.

> > On the other hand, there are good reasons, from my perspective, to use the
> > RDM.  Tools, ease of programming, rich graphical environment, decomposition
> > and recomposition that's completely hidden (the black box syndrome).
>
> But then, "we like the fact that there are lots of nice bits and bobs
> to paper over the bits we don't like" ?

Lawyers see the world through legalistic terms...everything seems to be a
legal conflict resolvable via dispute resolution.  I tend to appreciate a
broader perspective (even though I suffer from the same human condition).
The tools and other "...nice bits and bobs..." aren't associated with the RD
model (they aren't part of it).

> It is my firm (and hardening) view that the imperative model of
> programming, with its silly word/record at a time view of the world
[quoted text clipped - 7 lines]
> much better in terms of handling relations, in terms of programming,
> operator definition etc. it's just more of the same old stuff.

I prefer a more flexible view of the world.  I not looking for the next
"Theory of Relativity".  :-)  I'm simply looking for more clarity and ease
of use.

> > It can always be said that we can learn a new language.  But what does it
> > bring to the table?  Is it worth it?  English is the most useful language in
[quoted text clipped - 7 lines]
> conditions ?) ? Handling logically infinte data structures ? Type
> inference ? Simpler programming by case analysis ?

And there you have some points to bring to the collective table.  :-)

> > These are areas where I would want to defer to IT professionals.  All I'd
> > like is to use the function to accomplish my local business task; as Dawn
[quoted text clipped - 4 lines]
> defined on values of the date type ? There is a difference, and it
> *is* an important difference...

It is important if that's the structure or rules under which we're
operating.  If, on the other hand, a model exists where the functions are
both dbms and user defined this has some advantages too.  Of course the dbms
functions are application independent but more functions should be able to
be added.

> > I think this is the largest misunderstanding in this thread.  We're talking
> > past each other when the RDM and the MVM are compared.  The MVM
[quoted text clipped - 7 lines]
> implementation of an RDBMS if you wanted to. Necessarily RDM doesn't
> prescribe anything about that AS or IDE.

They are implementation decisions if is is defined as such.  Some dbms
products are also application servers, so there is no such decision to make
(except with the purchase).  This makes web development pretty simple but
client/server development using SQL more difficult.

Bill
Marshall Spight - 27 Jun 2004 02:04 GMT
> I prefer a more flexible view of the world.  I not looking for the next
> "Theory of Relativity".  :-)  I'm simply looking for more clarity and ease
> of use.

The defining characteristic of the next "Theory of Relativity" will
be the huge increase in clarity and ease of use it brings.

Marshall
Bill H - 27 Jun 2004 22:02 GMT
Marshall:

If we look at this through a statistical perspective we'll note that the
"Theory of Relativity" comes about once in every (pick your high number).
Such rigidity of focus isn't, therefore, required for most of our business
tasks and can have a deleterious effect on potential solutions.

It is a great attribute of human nature that so many people can come up with
so many unique ways of solving, what else, so many business problems.  I
would suggest these unique ways be embraced instead of laughed at because
they don't meet a narrow solution model.  :-)

Bill

> > I prefer a more flexible view of the world.  I not looking for the next
> > "Theory of Relativity".  :-)  I'm simply looking for more clarity and ease
[quoted text clipped - 4 lines]
>
> Marshall
Marshall Spight - 27 Jun 2004 22:56 GMT
> If we look at this through a statistical perspective we'll note that the
> "Theory of Relativity" comes about once in every (pick your high number).
> Such rigidity of focus isn't, therefore, required for most of our business
> tasks and can have a deleterious effect on potential solutions.

What makes you think I'm rigidly focused? In fact, I work on business
tasks for most of the week; I occasionally dabble in theory on the
weekends. All work and no play makes Jack etc. Most of my work
is done with Java, SQL, and HTML; I think that qualifies me
pretty well as someone who can make compromises for practical
business realities.

I would also assert that "required for most of our business tasks"
is not the defining characteristic of this group; otherwise it
would be called comp.databases.businesstasks. Since it's
comp.databases.theory, I think focus (whether rigid or not) on
the next "theory of relativity" is quite on-topic.

Neither is "required for ... business tasks" a filter through
which to live one's life. Getting stuff done is good, but so is
looking up at the stars. You can't have a balanced life
without both, and more still.

> It is a great attribute of human nature that so many people can come up with
> so many unique ways of solving, what else, so many business problems.

Again, business problems are only a portion of the scope of data
management.

> I would suggest these unique ways be embraced instead of laughed at because
> they don't meet a narrow solution model.  :-)

I didn't hear any laughing. Nor do I subscribe to a narrow solution model.

Marshall
Marshall Spight - 27 Jun 2004 02:01 GMT
> It is my firm (and hardening) view that the imperative model of
> programming, with its silly word/record at a time view of the world
[quoted text clipped - 7 lines]
> much better in terms of handling relations, in terms of programming,
> operator definition etc. it's just more of the same old stuff.

Hear hear! Bravo! Bravo!

Please, tell my what is your *other* main bugbear with TTM.

Marshall
Tony Douglas - 02 Jul 2004 10:59 GMT
> Hear hear! Bravo! Bravo!
>
> Please, tell my what is your *other* main bugbear with TTM.

My other main bugbear with TTM was over the type system, but I must
admit I'm softening my line on that, but not possibly for the normal
reason. I felt that the facility to have mulitple possible
representations for a type only served to create complications,
without adding a lot to the party. I still feel that way, purely from
the programming language point of view - but in terms of logical &
physical independence over time, as implementations of the types might
change in the database, a facility along these lines is probably
necessary - otherwise, changing a type representation would require a
fair bit of work altering values of that type in the database. Still
not a great fan of type hierarchies though, although I can grudgingly
accept some rationales for them. (Interestingly, I think Alphora 2
drops type hierarchies as well.)


> Marshall

Cheers,

- Tony
Marshall Spight - 03 Jul 2004 01:59 GMT
> > Hear hear! Bravo! Bravo!
> >
[quoted text clipped - 5 lines]
> representations for a type only served to create complications,
> without adding a lot to the party.

It doesn't seem hugely useful to me, either. If you consider the
canonical example of the CartesianPoint(int x, int y) vs. the
PolarPoint(float theta, float r), using the one constructor with
the other implemenation is annoyingly expensive. You have a
leaky abstraction problem, in that you can't get away from the
fact that your underlying type-implementation has performance
consequences for the choice of interface.

Alternatively, you could have a type Point, and types
CartesianPoint <: Point, and PolarPoint <: Point, which
all support the same set of methods/operators/what
have you. You still have the leaky abstraction problem
with interface calls; getting the radius is much easier with
PolarPoint than with CartesianPoint; getting the X or Y
attributes is easier the other way. But it seems like
less of an issue even so. I suppose one could also
declare that constructing Point is done with one or
the other Point subclass.

> Still
> not a great fan of type hierarchies though, although I can grudgingly
> accept some rationales for them.

What about polymorphism? Polymorphism is the best thing about
OO, and a good thing in general. You can't have subtype
polymorphism without subtypes.

Marshall
Anthony W. Youngman - 18 Jun 2004 18:21 GMT
>Was:  In an RDBMS, what does "Data" mean?
>
[quoted text clipped - 10 lines]
>on",  as opposed to formalized as metadata and shared  the same way data is
>shared.

Yes, but relational formalises metadata INTO data. Once it's in an RDBMS
it's no longer metadata, because the rdbms doesn't understand any
meaning in it and can't take advantage of that meaning so it's just
data.

The ordering in a list is metadata. Convert that into a set to put into
an rdbms and ORDER is now just a meaningless (as far as the db engine is
concerned) bit of data.

That's where MV and OO fundamentally differ. They try to *avoid*
converting metadata to data, so that the db engine can be intelligent
and take advantage of it to optimise things.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony - 19 Jun 2004 13:00 GMT
> The ordering in a list is metadata. Convert that into a set to put into
> an rdbms and ORDER is now just a meaningless (as far as the db engine is
[quoted text clipped - 3 lines]
> converting metadata to data, so that the db engine can be intelligent
> and take advantage of it to optimise things.

Very funny!  How can MV optimise ANYTHING given that you have already
determined the access paths for all the data?  Suppose we want to get
the data out ordered by product code instead of by line number within
order, which is how we chose to store it.  How can MV optimise for
that?  An RDBMS may choose a different access path to get the data
depending on the ORDER BY clause.
Bill H - 24 Jun 2004 03:37 GMT
> Very funny!  How can MV optimise ANYTHING given that you have already
> determined the access paths for all the data?

The default access path is the storage algorithm, managed by the db engine.
It is debatable whether this is an optimization.

> Suppose we want to get
> the data out ordered by product code instead of by line number within
> order, which is how we chose to store it.  How can MV optimise for
> that?

It will optimize how the applications people tell it to.

> An RDBMS may choose a different access path to get the data
> depending on the ORDER BY clause.

The access path is always the same as the default, unless otherwise
specified as noted above.  Output may or may not by ordered BY whatever.
Marshall Spight - 27 Jun 2004 02:21 GMT
> Yes, but relational formalises metadata INTO data. Once it's in an RDBMS
> it's no longer metadata, because the rdbms doesn't understand any
> meaning in it and can't take advantage of that meaning so it's just
> data.

This is just fundamentally wrong. It's so pervasively wrong it's almost
hard to know where to start.

Okay, simple example: foreign key. A foreign key relationship is
metadata. DBMSs record foreign key values, and they also record
foreign key relationships, in a table in the catalog. The database
knows what this metadata means, and may take advantage of this
knowedge in deciding how to store the data.

Those tables that form the catalog are the tables that the DBMS
understands the meaning of "out of the box." (If you include no
metadata when you add a user-defined table, then it could indeed
be said that the DBMS doesn't understand the meaning of the
new tables. Such as when someone using MySQL builds a
database without specifying any integrity constaints. But
that's pathological.)

Perhaps I misunderstand, but MV has only the one kind of
relationship it is capable of understanding: containment.
Yes, it understands the meaning of this, so it knows what
a one-to-many relationship means. Does it have any other
facilities for understanding meaning? It can do
ON DELETE CASCADE but can it do ON DELETE
RESTRICT? Can it handle and understand many-to-many
relationships? Can it understand and enforce arbitrary
constraints a la SQL's CHECK? (These are actual questions,
not rhetorical ones; I'm not that familiar with MV.)

Marshall
Laconic2 - 27 Jun 2004 11:27 GMT
> > Yes, but relational formalises metadata INTO data. Once it's in an RDBMS
> > it's no longer metadata, because the rdbms doesn't understand any
[quoted text clipped - 3 lines]
> This is just fundamentally wrong. It's so pervasively wrong it's almost
> hard to know where to start.

Excellent!
Laconic2 - 27 Jun 2004 11:44 GMT
> Perhaps I misunderstand, but MV has only the one kind of
> relationship it is capable of understanding: containment.
> Yes, it understands the meaning of this, so it knows what
> a one-to-many relationship means. Does it have any other
> facilities for understanding meaning?

I don't know much about MV, either.  What I've read in here reminds me of
LISP.   Only in the the sense that there are lots of pointers, everything is
a tree, and every value can be replaced by a subtree.

If that's correct,  then I would suggest that there is another relationship
that MV can understand:  sequence.
Sequence is inherent in a list.  Actually, the combination of sequence and
containment is quite powerful.  Almost powerful enough to constitute the
basis for a database system!

So near and yet so far.
Bill H - 27 Jun 2004 22:43 GMT
Marshall:

> Perhaps I misunderstand, but MV has only the one kind of
> relationship it is capable of understanding: containment.

I'm not sure why it is so difficult to express this concept.  An MV
environment is both a data store and an application server.  It is _NOT_ an
RD model.  To discuss its attributes strictly from a datastore perspective
is neither fair nor accurate.  To understand its methodologies for directly
solving business problems requires the willingness to work with its two
functions: storage and application properties/methods/rules/etc.

Secondly, solving business problems requires a great deal of flexibility.  A
non-relational model can, in a number of instances, provide additional
flexibility over and above whan the RD model can.  Not because the RD model
is incapable, but because the RD model declares for itself particular
limitations and methods of operation.  This structure doesn't always work
ideally.  Do I understand RD proponents to declare that it does in all
circumstances?

This is fine.  Why is it we can't allow that our own prejudices create
limitations on the ability to formalize solutions?  It takes Godel's
Incompleteness Theorem to declare that some certainties have gotten too big
for their britches.  It can happen to anybody.  :-)

For years HP calculators used RPN (reverse polish notation) instead of
standard algebraic entry mode (AEM).  Nowadays they offer both.  Does this
mean RPN is worthless or worse than AEM?  No.  Many people prefer RPN.  Is
AEM more often used?  Of course, but that doesn't say anything about RPN or
those who prefer to use it.  The same can be said about the RD model.  Not
everyone prefers it.

> Yes, it understands the meaning of this, so it knows what
> a one-to-many relationship means. Does it have any other
[quoted text clipped - 4 lines]
> constraints a la SQL's CHECK? (These are actual questions,
> not rhetorical ones; I'm not that familiar with MV.)

Of course it does, and can.  Remember it is both a datastore and an
application environment wrapped into one.  So, whatever needs to be done can
be done.  It is simply that this additional functionality is stored in the
datastore too.  Most of the MV products can even understand and cope with
SQL functionality.

Like I said before, this is not to say that the MV model does
everything...as nothing can.  But it provides an interesting confluence of
tools and capabilities that render the model very useful in solving business
problems for many people and businesses.

Bill
Marshall Spight - 28 Jun 2004 00:03 GMT
> > Perhaps I misunderstand, but MV has only the one kind of
> > relationship it is capable of understanding: containment.
[quoted text clipped - 5 lines]
> solving business problems requires the willingness to work with its two
> functions: storage and application properties/methods/rules/etc.

I read that paragraph a bunch of times, but it didn't seem to
address my statement that MV has only one kind of relationship
it is capable of understanding. Does it have relationships besides
containment that it can understand? An example of a non-containment
relationship would be cool, if the example does not require hand-written
application code to work.

I think the MV and the RM world divide things up very differently.
I will note first that "storage" is not a first-tier property of RM,
but it is a useful, second tier function that most products support
and that most applications take advantage of. It is perfectly reasonable,
and useful, to have an RDBMS that does not persist its relations.
We could still call this an RDBMS, but we couldn't call it a "datastore."

Another example is managing data integrity in procedural application
code. In RM this is considered a "stupid database trick" to quote from
another thread. There are significant disadvantages to application-managed
integrity rules, to the point where I do not consider it an approach
worth discussing (and yes, I've used that approach in the real world.)
However, it may be that this approach has lower overhead in situations
where you have small development teams and single-application databases.

> Secondly, solving business problems requires a great deal of flexibility.  A
> non-relational model can, in a number of instances, provide additional
> flexibility over and above whan the RD model can.

Please be specific. I am very interested in specific examples of specific
operations or structures that you feel are hard to solve with RM or SQL
and easy to solve with MV. I do believe there are some, but I want
to know what they are better. As it stands I have a hard time evaluating
the claims of the MV people, even the smart/nice ones such as you
and Dawn. I'm not saying I believe, and I'm not saying I disbelieve.
I just want to hear more specifics.

> Not because the RD model
> is incapable, but because the RD model declares for itself particular
> limitations and methods of operation.  This structure doesn't always work
> ideally.  Do I understand RD proponents to declare that it does in all
> circumstances?

I don't know how to measure the idealness of a solution, so I have
no particular claims about whether the RM is ideal or not.

> For years HP calculators used RPN (reverse polish notation) instead of
> standard algebraic entry mode (AEM).  Nowadays they offer both.  Does this
> mean RPN is worthless or worse than AEM?  No.  Many people prefer RPN.  Is
> AEM more often used?  Of course, but that doesn't say anything about RPN or
> those who prefer to use it.  The same can be said about the RD model.  Not
> everyone prefers it.

The problem with this analogy is that there is a simple one-to-one mapping
between AEM and RPN. It is easy to show that the two methods are
equivalent. I do not believe the RM and MV have such a mapping,
nor do they support the same operations nor structures.

> > Yes, it understands the meaning of this, so it knows what
> > a one-to-many relationship means. Does it have any other
[quoted text clipped - 8 lines]
> application environment wrapped into one.  So, whatever needs to be done can
> be done.

If by this you mean that you can implement these features by
hand-writing application code, then I don't consider that any
achievement. I can say the same thing about some Java code and
a hashtable, but it's not a good solution.

For example, in another thread someone said (over and over
if I remember correctly :-) that if you delete an invoice, all
the line items go with it, automatically. Okay, this is the same
thing as ON DELETE CASCADE. But sometimes you want
ON DELETE RESTRICT. (In other words, if you want to
delete a container but it is still containing something, you
have to dispose of the contained things first; you can't just
throw them away.) Can you do this declaratively in MV? How
is it done?

Can it handle many-to-many? I've heard some people say it
can, but is integrity enforced automatically, or is it just done
with references that are application managed?

Can it *automatically* enforce declared integrity constraints?
Can you have an integer attribute and declare that it must
always be divisible by 4? Is that enforced by auditing your
application code and manually inserting a check at each
place the attribute is updated, or is it enforced by declaring
the constraint centrally? Does the constraint have a hole in
it if you add a new place the attribute can change and forget
to put the %4 check in?

> Like I said before, this is not to say that the MV model does
> everything...as nothing can.

I don't think I agree. For example, Java, C++, and BASIC are
all able to compute anything that can be computed. They do
everything that can be done; no programming language of the
future can ever do anything more. (Which is not the same thing
as saying there is no room for improvement:
FORTRAN < C < C++ < Java < {OCaml, Haskell}, IMHO. But these are
usability and expressivity issues, not computability issues; we
need to be clear on the distinction.)

> But it provides an interesting confluence of
> tools and capabilities that render the model very useful in solving business
> problems for many people and businesses.

This is not so much what is under discussion in this newsgroup.
I will readily acknowledge that many people use MV to do useful
work, and that they solve business problems, and that they
enjoy themselves doing so. They on-topic question is the theoretical
basis for the tools. Are they complete? Are they correct? Are they
self-consistent?

SQL is relationally complete, over its lame type system. It could
really use a better type system. This will make it more usable but
it won't make it any more complete. SQL is already really good
at automatically enforcing integrity; it's a real strong point. OTOH,
it's not so good at ease-of-use, and could really stand to improve.
I suspect MV is much better at ease of use and worse at enforcing
integrity. Understanding why and where one is better and one is
worse will help us better use our own systems, evaluate others,
and also to build the next generation. In this respect, I think Dawn
and I are engaged in exactly the same exploration, although we
come at it from different backgrounds.

Marshall
Dawn M. Wolthuis - 30 Jun 2004 02:13 GMT
> > > Perhaps I misunderstand, but MV has only the one kind of
> > > relationship it is capable of understanding: containment.
[quoted text clipped - 12 lines]
> relationship would be cool, if the example does not require hand-written
> application code to work.

I'm not sure whether this answers your question as it depends on what you
mean by "relationship" but here is another type of relationship -- each
file(function/entity) requires a unique identifier for each record
(instance/row-ish) so that you have this relationship for a file named
People, for example

People(12345)={all attributes of this person including those stored directly
as part of the People function and those derived via links to other
functions}

Another type of relationship it understands is a link placed in a "virtual
field" for derived data.  So, even if the street address for People(12345)
is not part of the "base relation" (is not stored "in" People, the function
to link a foreign key to another file is a relationship that is understood.
So, once that virtual field is defined, I can ask the database to

List People Name Address

> I think the MV and the RM world divide things up very differently.
> I will note first that "storage" is not a first-tier property of RM,
> but it is a useful, second tier function that most products support
> and that most applications take advantage of. It is perfectly reasonable,
> and useful, to have an RDBMS that does not persist its relations.
> We could still call this an RDBMS, but we couldn't call it a "datastore."

Yes -- if you remove the storage feature of MV, you get something very close
to XML.  So, if you add it back in, you have something very close to "an XML
database" which is the creature that the big database vendors are saying
doesn't exist and won't be able to save your company from having to pay for
an RDBMS.  Hmmm.

> Another example is managing data integrity in procedural application
> code. In RM this is considered a "stupid database trick" to quote from
[quoted text clipped - 3 lines]
> However, it may be that this approach has lower overhead in situations
> where you have small development teams and single-application databases.

I think I agree in principle that we do not want constraints in application
code, but would add that we don't want them stuck in the proprietary
database language, inaccessible to the application either.  The odd thing is
that it really "seems like" the cause and effect are different -- you GET
smaller development teams when you use this approach and that is concerning
to me.  Something is decidedly less expensive in terms of time for
maintaining and having the constraints in the same language as the rest of
the application just might be one of the keys to that.

> > Secondly, solving business problems requires a great deal of flexibility.  A
> > non-relational model can, in a number of instances, provide additional
[quoted text clipped - 7 lines]
> and Dawn. I'm not saying I believe, and I'm not saying I disbelieve.
> I just want to hear more specifics.

But you see, I have a hard time evaluating the claims of people like me.  I
don't have proof.  I am very confident that I can find aspects of the
relational model that are not based on either mathematics or science (we've
had many such discussions in the past half year).  I do not have any
scientific evidence that models other than relational have anything better
going for them.  I have personal experience that is insufficient as proof
and a collection of anecdotes.  I'm in search of better science on the
matter and a mathematical model that is as useful to the practitioner as the
RM.

How do you think we could get evidence?  It seems to me that a class of
databases that advance the "older" approaches of Cache' and PICK could beat
today's SQL databases in a number of categories.  How could I prove that
starting with PICK would be better than starting with SQL Server if we want
to provide highly scalable but relatively inexpensive and agile software
development environments in the future?  It seems the best I can do is prove
that the relational model is not purely mathematics, but contains some
amount of religious claims.

I've considered other approaches such as approaching the Mountain Dew folks
to see if they would sponsor a "Dew IT" event where we put some hypotheses
to the test more.  I don't know what the equivalent of a placebo would be in
our tests, however.  Cheers  --dawn
Marshall Spight - 30 Jun 2004 16:01 GMT
> I'm not sure whether this answers your question as it depends on what you
> mean by "relationship" but here is another type of relationship -- each
[quoted text clipped - 5 lines]
> as part of the People function and those derived via links to other
> functions}

Gotcha.

> Another type of relationship it understands is a link placed in a "virtual
> field" for derived data.  So, even if the street address for People(12345)
> is not part of the "base relation" (is not stored "in" People, the function
> to link a foreign key to another file is a relationship that is understood.

Let me see if I understand this. You have a "file" of People and it might
have, directly in it, a field that is a list of addresses, so we have one:many
for People:Addresses.

In another scenario, you might have a file People, and it would have
directly in it a virtual field, whose value is a key into another file.
The fact that it's virtual is a metadata bit, and the file being referenced
is also metadata. Again, one:many for People:Addresses.

The difference between a virtual field and a non-virtual field is
one of implementation; the interface is the same either way. (Yes? No?)

> So, once that virtual field is defined, I can ask the database to
>
> List People Name Address

Uh, "List" is a command, "People" is the file, and are Name and Address
fields of the file people? (Whether virtual or not?)

These files are functions because you are required to have a primary
key, so the file is a function from <primary key domain> to
<field range>. Are you limited having a single field that is marked
unique?

> > Another example is managing data integrity in procedural application
> > code. In RM this is considered a "stupid database trick" to quote from
[quoted text clipped - 7 lines]
> code, but would add that we don't want them stuck in the proprietary
> database language, inaccessible to the application either.

Yes, we've discussed this before, and I believe we agree that it's important
that constaints be available to applications.

> The odd thing is
> that it really "seems like" the cause and effect are different -- you GET
> smaller development teams when you use this approach and that is concerning
> to me.

I didn't quite follow this.

> Something is decidedly less expensive in terms of time for
> maintaining and having the constraints in the same language as the rest of
> the application just might be one of the keys to that.

I'd buy that in a second. But I still want my constraints enforced (at least)
centrally.

> > Please be specific. I am very interested in specific examples of specific
> > operations or structures that you feel are hard to solve with RM or SQL
[quoted text clipped - 6 lines]
> But you see, I have a hard time evaluating the claims of people like me.  I
> don't have proof.

I'm not asking for proof. I know you care a lot about proof, but I don't
so much. Right now I'm more interested in hearing a lot of people's stories.
So if you have use-cases for situations where you feel MV is better than
the relational approach, I'm happy to hear them.

> I am very confident that I can find aspects of the
> relational model that are not based on either mathematics or science (we've
> had many such discussions in the past half year).  I do not have any
> scientific evidence that models other than relational have anything better
> going for them.  I have personal experience that is insufficient as proof
> and a collection of anecdotes.

Bring on the anecdotes!

> I'm in search of better science on the
> matter and a mathematical model that is as useful to the practitioner as the
> RM.
>
> How do you think we could get evidence?

Give me ten million dollars and 5 years and it should be no problem.
Since I have neither, I'm willing to forego the whole proof thing.

> It seems to me that a class of
> databases that advance the "older" approaches of Cache' and PICK could beat
> today's SQL databases in a number of categories.  How could I prove that
> starting with PICK would be better than starting with SQL Server if we want
> to provide highly scalable but relatively inexpensive and agile software
> development environments in the future?

I have serious doubts about the scalability claim, but then I have an
extreme view of scalability which has been skewed by my workplace.
However I can believe the agile part.

> It seems the best I can do is prove
> that the relational model is not purely mathematics, but contains some
> amount of religious claims.

If I just stipulate that, will it help?

Any time we are building a model, what we are doing is making design
choices. It is good if these choices are consistent with good mathematics,
but even if we completely succeed at that, it doesn't mean we are
doing math and not design. It's always design.

And there's not just one math, either. You come up with a formalism,
and if it useful, then we rejoice. It's certainly possible for a formalism
to be completely sound and self-consistent and utterly useless.

Marshall
Bill H - 05 Jul 2004 03:23 GMT
Marshall:

My comments are embedded.

> "Bill H" <wphaskett@THISISMUNGEDatt.net> wrote in message ...
> > "Marshall Spight" <mspight@dnai.com> wrote in message
[quoted text clipped - 5 lines]
> > I'm not sure why it is so difficult to express this concept. An MV
> > environment is both a data store and an application server. It is _NOT_
an
> > RD model. To discuss its attributes strictly from a datastore
perspective
> > is neither fair nor accurate. To understand its methodologies for
directly
> > solving business problems requires the willingness to work with its two
> > functions: storage and application properties/methods/rules/etc.
[quoted text clipped - 12 lines]
> and useful, to have an RDBMS that does not persist its relations.
> We could still call this an RDBMS, but we couldn't call it a "datastore."

An MV relationship isn't an RM relationship; at least it isn't stored as
such.  It is an expression of a relationship, the containment of which
resides in the database.  e.g. a relationship exists between a vendor and
invoices, between a check and invoices, between a bank transaction and a
check, and between a reconciliation and checks.

So, your statement that the only relationship an MV dbms can understand is
containment is not true; though not exactly false either because it does
fundamentally understand that.  The MV model understands defined
relationships, which are stored (or contained) within the database.  These
defined relationships are then understood by the MV model.

I can set a relationship between a vendor and invoices by simply storing the
data required for the relationship then defining the relationship.  So, I
can say:

:select vendors invoices
:sort invoices with no pddate by pddate

I can then define the above as a stored procedure named
"List-Unpaid-Invoices", then execute:

:List-Unpaid-Invoices '12345' which will list all unpaid invoices for vend#
'12345'.

Now, was this only containment?  I think it was much more than that.
However, it is what it is and the RM model will do the same thing; just
differently.  Notice that the relationship and the relationship data has to
be stored somewhere in both models.  One of the interesting aspects of the
MV model is the data and relationship is stored in the database.  The
application will usually initiate the creation of the relationship data but
it can be done via table triggers or relationship triggers separate from the
application (as long as it's defined that way).

There's nothing tricky about this.  All dbms models have to do the same
things to accomplish the same tasks.  The MV model doesn't do some
miraculous mumbo-jumbo and neither does the RM model.  Both store data, both
store relationships, and both store constraints.  In the MV model all this
is stored in the database!

> Another example is managing data integrity in procedural application
> code. In RM this is considered a "stupid database trick" to quote from
[quoted text clipped - 3 lines]
> However, it may be that this approach has lower overhead in situations
> where you have small development teams and single-application databases.

The models we use create "stupid" tricks.  It's the models that create the
constraints to make some tricks stupid and others smart.  An RM "stupid"
trick may be an MV smart move; and visa-versa.  However, most design and
development are constrained by the base delivery model: server vs
client/server.

> > Secondly, solving business problems requires a great deal of
flexibility. A
> > non-relational model can, in a number of instances, provide additional
> > flexibility over and above whan the RD model can.
[quoted text clipped - 6 lines]
> and Dawn. I'm not saying I believe, and I'm not saying I disbelieve.
> I just want to hear more specifics.

Let's reconcile a bank account.  We need a primary account table and a
transaction table in both models.  However, in the MV model we don't need
anything more than this.  We will define the keys of the transactions to
include the account#, so the account table can (and probably will) contain
the ref# of the transactions.  The transaction key would look like:
Account# and transaction#.  The account row would include all of the
uncleared transaction#s and the key of each transaction would include the
account key.   We have a defined relation in a format other than as defined
in RD model.  Don't let this fools us, a relation is a relation and has to
be defined and stored somewhere.  In the MV model it is simply stored in the
database.  My goodness, we've just defined a many to many relationship
(please note: this description if fundamentally viewed from an MV model
perspective).

Now we get a simple download file from our financial institution which
usually includes the fed route#, the account#, the transaction#, the date
cleared, and the amount of the cleared transaction.  It can be in any
format, we don't care as long as it's consistent.  :-)

Our transaction# is encoded on the financial instrument (the check or the
deposit) so the bank sends it back to us as their transaction#.  Part of the
transaction# returned by the bank is our account#, since it was part of our
transaction key!

Now, what good is this?  As Mr Youngman points out it only takes one disk
read to get the account and its relationships to the uncleared transactions.
This is an almost instantaneous response to our web clients.  So there's an
upside.

> > Not because the RD model
> > is incapable, but because the RD model declares for itself particular
[quoted text clipped - 4 lines]
> I don't know how to measure the idealness of a solution, so I have
> no particular claims about whether the RM is ideal or not.

I sit in a corporate VP meeting and discuss this with them and they with me.
We all see the same thing.  I'm not the odd man out here.  In addition, I
can almost directly translate their vast knowledger to the dbms design and
relationship definition.  I think this is good!

> > For years HP calculators used RPN (reverse polish notation) instead of
> > standard algebraic entry mode (AEM). Nowadays they offer both. Does this
> > mean RPN is worthless or worse than AEM? No. Many people prefer RPN. Is
> > AEM more often used? Of course, but that doesn't say anything about RPN
or
> > those who prefer to use it. The same can be said about the RD model. Not
> > everyone prefers it.
[quoted text clipped - 3 lines]
> equivalent. I do not believe the RM and MV have such a mapping,
> nor do they support the same operations nor structures.

They really do.  They do primarily the same things.   We're not talking
nuclear reactors and cigarettes.  :-)

> > > Yes, it understands the meaning of this, so it knows what
> > > a one-to-many relationship means. Does it have any other
[quoted text clipped - 7 lines]
> > Of course it does, and can. Remember it is both a datastore and an
> > application environment wrapped into one. So, whatever needs to be done
can
> > be done.
>
> If by this you mean that you can implement these features by
> hand-writing application code, then I don't consider that any
> achievement. I can say the same thing about some Java code and
> a hashtable, but it's not a good solution.

Yes and no.  You can always hand write code in an applicaton.  You can also
store the code in the dbms as triggers, constraints, relations, etc.  The
difference is that the RD model does some things one way and the MV model
does some things the other way.  The MV model is much more
application-centric.  This is only bad when working with the RD model, where
this is defined as bad (or "stupid").  Most things are done the same though.
:-)

It's nice to have a model implement some features for us.  It saves us time,
and I realize, and appreciate, this.  Most of my experience working with RD
models is: you give me data and I'll give you data.

> For example, in another thread someone said (over and over
> if I remember correctly :-) that if you delete an invoice, all
[quoted text clipped - 5 lines]
> throw them away.) Can you do this declaratively in MV? How
> is it done?

Let me point out that the MV model communicates with the database, with
respect to data maintenance, via an application language.  Where a RD model
might say:

INSERT ...

the MV model would need to:

OPEN My file
READ and Lock New record (make sure noone else is)
     or
READ and Lock Item to change (make sure noone else is)
CHANGE data
WRITE data TO My file

Lock contention is a part of the dbms.  There is no such thing as
"optimistic" locking (unless one is an idiot).  :-)   But this is an MV
perspective, not an RD perspective.

> Can it *automatically* enforce declared integrity constraints?
> Can you have an integer attribute and declare that it must
[quoted text clipped - 4 lines]
> it if you add a new place the attribute can change and forget
> to put the %4 check in?

Remember, a constraint is defined and stored somewhere.  The only value with
storing outside the application is if some other application is using it.
This is not a usual requirement but an MV model can simply enforce this via
via a trigger.  We're much more inclined to place this in the application
because all MV application are server-centric and run in the dbms.

> > Like I said before, this is not to say that the MV model does
> > everything...as nothing can.
[quoted text clipped - 7 lines]
> usability and expressivity issues, not computability issues; we
> need to be clear on the distinction.)

Rule one in life:  never say never.  Rule two in life:  never say I can do
everything.  :-)

> > But it provides an interesting confluence of
> > tools and capabilities that render the model very useful in solving
business
> > problems for many people and businesses.
>
[quoted text clipped - 4 lines]
> basis for the tools. Are they complete? Are they correct? Are they
> self-consistent?

I thoroughly agree.  That's what keeps us all here...the amount of knowledge
and interesting thought-provoking ideas elucidated.

> SQL is relationally complete, over its lame type system. It could
> really use a better type system. This will make it more usable but
[quoted text clipped - 7 lines]
> and I are engaged in exactly the same exploration, although we
> come at it from different backgrounds.

:-)

Bill
Marshall Spight - 05 Jul 2004 19:05 GMT
> > I read that paragraph a bunch of times, but it didn't seem to
> > address my statement that MV has only one kind of relationship
[quoted text clipped - 38 lines]
> However, it is what it is and the RM model will do the same thing; just
> differently.

So I read all of your comments, and I couldn't figure out what
they meant. I didn't see any clear answer to whether MV supports
relationships besides containment. In fact you evaluated that stament
as "not true but not exactly false." I have no idea what that means.

> Notice that the relationship and the relationship data has to
> be stored somewhere in both models.

Of course.

> One of the interesting aspects of the
> MV model is the data and relationship is stored in the database.

Uh, same with RM.

> > Another example is managing data integrity in procedural application
> > code. In RM this is considered a "stupid database trick" to quote from
[quoted text clipped - 9 lines]
> development are constrained by the base delivery model: server vs
> client/server.

I disagree. There are specific well-documented and *fundamental*
disadvantages to managing integrity in applications instead of centrally.
This is independent of MV vs. RM vs. whatever.

> > Please be specific. I am very interested in specific examples of specific
> > operations or structures that you feel are hard to solve with RM or SQL
[quoted text clipped - 10 lines]
> the ref# of the transactions.  The transaction key would look like:
> Account# and transaction#.

You're going to reuse transaction numbers in different accounts?
And you're also going to include the transaction number in
the account table? That kind of redundancy leads directly
to data corruption.

> Now, what good is this?  As Mr Youngman points out it only takes one disk
> read to get the account and its relationships to the uncleared transactions.
> This is an almost instantaneous response to our web clients.  So there's an
> upside.

Ugh. Let's please not talk about disk reads.

> > I don't know how to measure the idealness of a solution, so I have
> > no particular claims about whether the RM is ideal or not.
[quoted text clipped - 3 lines]
> can almost directly translate their vast knowledger to the dbms design and
> relationship definition.  I think this is good!

How is this a response to what I wrote? It sounds like what you
are saying is "I work in the computer industry."

> > > For years HP calculators used RPN (reverse polish notation) instead of
> > > standard algebraic entry mode (AEM). Nowadays they offer both. Does this
[quoted text clipped - 11 lines]
> They really do.  They do primarily the same things.   We're not talking
> nuclear reactors and cigarettes.  :-)

Okay, how do you map a relational table like this into MV:

create table Tri
(
a int,
b int,
c int,
unique(a,b),
unique(b,c),
unique(a,c)
);

> > > > Yes, it understands the meaning of this, so it knows what
> > > > a one-to-many relationship means. Does it have any other
[quoted text clipped - 16 lines]
>
> Yes and no.

This kind of answer is hard to work with. It's much easier to understand
you when you give me a straight answer. Saying "yes and no" is worse
than not responding, because it adds confusion.

> You can always hand write code in an applicaton.  You can also
> store the code in the dbms as triggers, constraints, relations, etc.  The
> difference is that the RD model does some things one way and the MV model
> does some things the other way.

Remember when I asked you to "be specific?"

> The MV model is much more
> application-centric.  This is only bad when working with the RD model, where
> this is defined as bad (or "stupid").  Most things are done the same though.

No, it's bad for more fundamental reasons. If you don't enforce constraints
centrally, then integrity support becomes ad-hoc and application-dependent,
so one application might fail to enforce a constraint. A constraint that
isn't enforced centrally is a constraint that won't necessarily hold.

> > For example, in another thread someone said (over and over
> > if I remember correctly :-) that if you delete an invoice, all
[quoted text clipped - 24 lines]
> "optimistic" locking (unless one is an idiot).  :-)   But this is an MV
> perspective, not an RD perspective.

Is this supposed to be a response to my earlier paragraph? Because
I don't see the answer to my question about "can MV do ON DELETE
RESTRICT" anywhere. Can it? What relevance does lock contention
have to my question?

> > Can it *automatically* enforce declared integrity constraints?
> > Can you have an integer attribute and declare that it must
[quoted text clipped - 10 lines]
> via a trigger.  We're much more inclined to place this in the application
> because all MV application are server-centric and run in the dbms.

So, is that a "yes?" Are you saying it *is* possible to enforce a
constraint centrally?

> > > Like I said before, this is not to say that the MV model does
> > > everything...as nothing can.
[quoted text clipped - 10 lines]
> Rule one in life:  never say never.  Rule two in life:  never say I can do
> everything.  :-)

It sounds like you don't understand Turing completeness. Also note
that I didn't say "everything." I said "anything that can be computed."
And I stand by my statement that BASIC can compute anything that
can be computed; it is a Turing complete language. This is not
the same thing as saying that it is a good language, though.

I appreciate you're trying to help me understand, but I'm
having trouble following your posts. It seems like you
quote me, then respond, but the response, while interesting
isn't a response per se but you talking about something else.
I get lost.

Marshall
Bill H - 08 Jul 2004 08:48 GMT
Marshall:

"Marshall Spight" <mspight@dnai.com> wrote...

[snipped]

> So I read all of your comments, and I couldn't figure out what
> they meant. I didn't see any clear answer to whether MV supports
> relationships besides containment. In fact you evaluated that stament
> as "not true but not exactly false." I have no idea what that means.

One of the primary impediments to communication is a different use of words
and definitions.  From a non-RD model perspective a relationship exists when
the properties of two pieces of data can be defined as having an aspect or
quality that connects them as being or belonging or working together or as
being of the same kind <the relation of time and space>.  This seems obvious
to me but I do not use the RD model, or mathematical, definition.

So, there exists a relationship between vendors and invoices.  Containment
has nothing to do with that relationship, except the relationship is
contained within the database.

What exactly is this relationship and how is it stored?  I can store the
invoice#s within the vendor in the vendor table.  This defines a
relationship in the MV model (although there are a number of other ways to
do so).  How is this relationship going to be exposed?  An example would be
to create a virtual field definition in the vendor table so that when asked,
will deliver the list of invoices associated with this vendor and any data
contained within the invoice table.

The phrase "...not true but not exactly false..." was intended to reflect my
desire to avoid being argumentative or didactic.  My apologies for being
obtuse and misleading.  :-)

> > One of the interesting aspects of the
> > MV model is the data and relationship is stored in the database.
>
> Uh, same with RM.

I'm sorry to say this is another of those "yes but not really" observations.
The relationship is stored in the relational database but not really like it
is stored in the MV database.  This is true because the MV model treats
everything like regular data; unlike the RD model.  As such, everything is
stored in the database tables right along with all the other data; names,
addresses, relationships, metadata, functions, constraints, stored
procedures, application code, compiled code, etc.  All MV tools, like RD
tools, are available for these additionally defined data; it's just stored
with all other data in the exact same formats.

What makes this different is only that these tables are usually part of the
database structure of the production data.  So, you'd have tables for
constraints, stored procedures, relationships, metadata, application code,
and data all within a single database structure built for an application.
The RD model would normally keep this kind of data separate from the
production data within its own special system tables.

However, once again, the stuff stored is the same.  It's just where its
stored in relation to the normal everyday production data that's different.

> > The models we use create "stupid" tricks.  It's the models that create the
> > constraints to make some tricks stupid and others smart.  An RM "stupid"
[quoted text clipped - 5 lines]
> disadvantages to managing integrity in applications instead of centrally.
> This is independent of MV vs. RM vs. whatever.

Don't forget, these "well documented" disadvantages revolve around the RD
model, as its structure requires a different dance.  If a dbms stores
integrity constraints in the dbms, and the application is stored and runs in
the dbms, then it makes little difference whether the integrity constraint
is in or out of the application, as the application is located centrally in
the dbms.  I would point out that from this perspective it is wise to
modularize the application so other applications can utilize the defined
constraints.

> > Let's reconcile a bank account.  We need a primary account table and a
> > transaction table in both models.  However, in the MV model we don't need
[quoted text clipped - 7 lines]
> the account table? That kind of redundancy leads directly
> to data corruption.

Ah, excuse me?  One reuses check#s in different accounts all the time.  One
reuses invoice#s for different vendors all the time too.  To include the
transaction# in the account table is to do nothing different than needs to
be done anyway to define a relation; A > B and B < A.  Redundancy?  Storing
the transaction#s in the account saves having to store the "transaction to
account" relationship, as it is already defined by the transaction key.  So
this reduces redundancy.  Data corruption?  No different than anywhere else.
Synchronization code performs the same task in all dbms products, although
sometimes differently.

> > > I don't know how to measure the idealness of a solution, so I have
> > > no particular claims about whether the RM is ideal or not.
[quoted text clipped - 6 lines]
> How is this a response to what I wrote? It sounds like what you
> are saying is "I work in the computer industry."

I'm not taking a stand here claiming the RD model is bad.  Nor am I stating
that other models are necessarily better.  I'm merely pointing out there are
other methods and tools and dbms models that work.

My point is the nomenclature, syntax, and concepts within the MV model are
specifically modeled after those of business.  Business people feel at ease
working with the model because of its business friendly terms and concepts.
That's why a lot of the MV modeling is done in a rapid development structure
directly with business people.

> > They really do.  They do primarily the same things.   We're not talking
> > nuclear reactors and cigarettes.  :-)
[quoted text clipped - 10 lines]
>  unique(a,c)
> );

A good example of the point I was trying to make, and have made before,
about deconstruction/reconstruction.  To a business person this is complete
nonsense.  However, it isn't nonsense to make sure a group of values are not
duplicates or to make sure that certain fields are certain data types.

So, for instance, it is important that all invoice#s for a particular vendor
are unique (we certainly wouldn't want to pay the same invoice twice).  In
the MV model the key is _not_ part of the data set but is part of the key (I
would read a dataset using the key as the unique identifier).  Thus both
models do the same thing but a little differently.  Other fields can be
constrained.  However, they're not constrained in the syntax of the table
creation statement.  They're done differently.  So I can say any invoice
must have a unique invoice# and a unique creation-stamp.

> > > If by this you mean that you can implement these features by
> > > hand-writing application code, then I don't consider that any
[quoted text clipped - 6 lines]
> you when you give me a straight answer. Saying "yes and no" is worse
> than not responding, because it adds confusion.

I can understand your frustration.  But it is true.  You can write
application code to implement these features.  This code isn't at all one
monolithic .exe.  Remember, the application code sits inside the dbms just
like any other data so its proximity to the datastore is significantly
closer than in the RD model.  A simple application may contain a thousand
executables and a vast portion of the application is probably nothing more
than functions that enforce integrity constraints, relationships, business
rules, etc.  So a function or API can be written and used by the application
code just like an .OCX or .dll or .exe can be used.  I would call this
written in the application but, in the RD model this could easily be defined
as a separate API residing on the application server serving any application
wishing to use its functionality.

> > You can always hand write code in an applicaton.  You can also
> > store the code in the dbms as triggers, constraints, relations, etc.  The
> > difference is that the RD model does some things one way and the MV model
> > does some things the other way.
>
> Remember when I asked you to "be specific?"

I can set a trigger to enforce integrity within the bank account table so if
a bank transaction is cleared, the uncleared reference to it within the bank
account table is removed.  So, right here I've set both a trigger and
constraint on a relation at the same time.  I know the RD model accomplishes
the same task but differently.

> > The MV model is much more
> > application-centric.  This is only bad when working with the RD model, where
[quoted text clipped - 4 lines]
> so one application might fail to enforce a constraint. A constraint that
> isn't enforced centrally is a constraint that won't necessarily hold.

I cannot emphasize this enough; the MV model is located centrally!  The
application server and dbms server reside within the same environment, on
the same machine.  Therefore, all constraints are enforced centrally.  The
centralized application APIs can be called from outside the application.
Additional constraints can be developed to provide service to more than one
application and to meet ever-changing requirements.

> Is this supposed to be a response to my earlier paragraph? Because
> I don't see the answer to my question about "can MV do ON DELETE
> RESTRICT" anywhere. Can it? What relevance does lock contention
> have to my question?

The answer is an emphatic yes.  But not by saying: "ON DELETE RESTRICT";
unless one wants to utilize the SQL functionality within the dbms.  More
like:

:select table with no defined_constraint
:delete table

> So, is that a "yes?" Are you saying it *is* possible to enforce a
> constraint centrally?

Remember, the constraints are stored centrally, as is the application APIs,
custom business rules, relationships, functions, data, metadata, etc.  So
not only is it possible to enforce constraints centrally but it is required,
and assumed from this model's perspective.

> > Rule one in life:  never say never.  Rule two in life:  never say I can do
> > everything.  :-)
[quoted text clipped - 4 lines]
> can be computed; it is a Turing complete language. This is not
> the same thing as saying that it is a good language, though.

There is a lot in the universe I don't understand.  I understand the word
tautology, though.  :-)

My tendencies are to accept imperfections and deal with them rather than
think I'm correct, if only within my limited definition of what correct is.

> I appreciate you're trying to help me understand, but I'm
> having trouble following your posts. It seems like you
> quote me, then respond, but the response, while interesting
> isn't a response per se but you talking about something else.
> I get lost.

And here I was thinking I was answering your queries directly, albeit in a
slightly different perspective.  Perhaps my writing skills, and clarity of
thought, will improve with time.  :-)

Bill
Marshall Spight - 10 Jul 2004 16:44 GMT
> "Marshall Spight" <mspight@dnai.com> wrote...
>
> One of the primary impediments to communication is a different use of words
> and definitions.

Yes. I'm trying to learn different terminology.

> From a non-RD model perspective a relationship exists when
> the properties of two pieces of data can be defined as having an aspect or
> quality that connects them as being or belonging or working together or as
> being of the same kind <the relation of time and space>.

In other words, a relation is anything we say it is. This works for me.

> This seems obvious
> to me but I do not use the RD model, or mathematical, definition.

Actually, the mathematical definition (as best I understand) is pretty
much the same thing: a "relation" is a set of pair. How do we decide
what the set is? It's anything we care to say it is.

> So, there exists a relationship between vendors and invoices.  Containment
> has nothing to do with that relationship, except the relationship is
> contained within the database.

I dunno. If every invoice has exactly one vendor, I think "containment"
is a pretty good term to describe that relationship. (And a popular one
as well.) Do you have a preferred term for "every x has an associated
y?"

> What exactly is this relationship and how is it stored?  I can store the
> invoice#s within the vendor in the vendor table.

I want to make sure I understand: when you say "invoice#***s***" (I
especially note the "s") you mean to say that the vendors table/file/collection
has an attribute/field that is a **list** of invoice numbers? Or is it a list
of invoices?

> This defines a
> relationship in the MV model (although there are a number of other ways to
> do so).

As an aside: can you enumerate the different ways?

> How is this relationship going to be exposed?  An example would be
> to create a virtual field definition in the vendor table so that when asked,
> will deliver the list of invoices associated with this vendor and any data
> contained within the invoice table.

The term "virtual" here; what does it mean? Is there an online reference
you like that I could use to read about this?

> The phrase "...not true but not exactly false..." was intended to reflect my
> desire to avoid being argumentative or didactic.  My apologies for being
> obtuse and misleading.  :-)

'Tis nothing. Thank you for having the conversation with me. I hope I
did not come off as impatient.

> > > One of the interesting aspects of the
> > > MV model is the data and relationship is stored in the database.
> >
> > Uh, same with RM.
>
> I'm sorry to say this is another of those "yes but not really" observations.

:-)

> The relationship is stored in the relational database but not really like it
> is stored in the MV database.  This is true because the MV model treats
[quoted text clipped - 4 lines]
> tools, are available for these additionally defined data; it's just stored
> with all other data in the exact same formats.

The distinction you're drawing is that in addition to all the stuff that
both model store, (names, addresses, relationships, metadata, functions,
constraints, stored procedures) the MV model additionally stores
application code, compiled code, MV tools, etc. Is that right?

Again, it's something I'd like to try out. Can you recommend a free
solution; the mysql of MV?

> What makes this different is only that these tables are usually part of the
> database structure of the production data.  So, you'd have tables for
> constraints, stored procedures, relationships, metadata, application code,
> and data all within a single database structure built for an application.
> The RD model would normally keep this kind of data separate from the
> production data within its own special system tables.

This separation you describe is not much of a separation.

> However, once again, the stuff stored is the same.  It's just where its
> stored in relation to the normal everyday production data that's different.

Okay.

> > I disagree. There are specific well-documented and *fundamental*
> > disadvantages to managing integrity in applications instead of centrally.
[quoted text clipped - 8 lines]
> modularize the application so other applications can utilize the defined
> constraints.

I am hesitant here. On the one hand, that which makes application-enforced
constraints not a good choice would seem to apply whether one kept
the applications in the DB or on the filesystem. But having the applications
stored centrally makes them central as well.

What if you have two comparatively unrelated applications that work
against the same schema; both applications are in the dbms; one
application enforces a constraint and one doesn't (for whatever reason:
a bug, or the programmer just forgot about it.) Wouldn't that be a
pathway for data corruption to enter the system?

> > > Let's reconcile a bank account.  We need a primary account table and a
> > > transaction table in both models.  However, in the MV model we don't
[quoted text clipped - 14 lines]
> transaction# in the account table is to do nothing different than needs to
> be done anyway to define a relation; A > B and B < A.

I agree up until the last sentence. You don't need both A > B and B < A
to define a relation; you only need one or the other. Likewise, you
don't need a list of invoice numbers in the accounts table *and* an
account number in the invoices table; that's a denormalization that
will lead to corruption. You need one or the other, but both is bad,
(unless they are just different views on the same data. Are they?
Or are they stored separately, and able to become out of sync.)

>  Redundancy?  Storing
> the transaction#s in the account saves having to store the "transaction to
> account" relationship, as it is already defined by the transaction key.  So
> this reduces redundancy.

Uh, no. I mean, it's less redundancy that storing it three times, but it's
more than just storing it once.

> Data corruption?  No different than anywhere else.
> Synchronization code performs the same task in all dbms products, although
> sometimes differently.

If you don't store the same information more than once, then
the entire concept of "synchronization code" (first time I've
heard the term) is unnecessary.

> I'm not taking a stand here claiming the RD model is bad.  Nor am I stating
> that other models are necessarily better.  I'm merely pointing out there are
> other methods and tools and dbms models that work.

Sure; yes. My interest in these conversations is to understand what
works well in each of various approaches, and also what doesn't.

> My point is the nomenclature, syntax, and concepts within the MV model are
> specifically modeled after those of business.

Hmmm. Data management is something that is very useful to
business, but it is not business-oriented in and of itself.
Same with adding up columns of numbers.

> > Okay, how do you map a relational table like this into MV:
> >
[quoted text clipped - 11 lines]
> about deconstruction/reconstruction.  To a business person this is complete
> nonsense.

You mean because they don't understand SQL? I don't get why we're
talking about business people here; we're discussing data management.

> So, for instance, it is important that all invoice#s for a particular vendor
> are unique (we certainly wouldn't want to pay the same invoice twice).  In
[quoted text clipped - 4 lines]
> creation statement.  They're done differently.  So I can say any invoice
> must have a unique invoice# and a unique creation-stamp.

Hmmm. You didn't really answer my question.

I don't really care whether the constraints are part of the table
declaration statement or not; I care about whether they are
declarative, automatically enforced, and at least flexible
enough to model that each of these three pairs must be
unique: {a,b}, {a,c}, {b,c}

Is there a way to do that? I'm guessing not.

> I can set a trigger to enforce integrity within the bank account table so if
> a bank transaction is cleared, the uncleared reference to it within the bank
> account table is removed.  So, right here I've set both a trigger and
> constraint on a relation at the same time.  I know the RD model accomplishes
> the same task but differently.

Do you have to set these up manually every place a bank transaction clear
is invoked, or do you just do it once?

> I cannot emphasize this enough; the MV model is located centrally!  The
> application server and dbms server reside within the same environment, on
> the same machine.  Therefore, all constraints are enforced centrally.  The
> centralized application APIs can be called from outside the application.
> Additional constraints can be developed to provide service to more than one
> application and to meet ever-changing requirements.

Where can I read more about this intriguing concept?

> > It sounds like you don't understand Turing completeness. Also note
> > that I didn't say "everything." I said "anything that can be computed."
[quoted text clipped - 4 lines]
> There is a lot in the universe I don't understand.  I understand the word
> tautology, though.  :-)

:-) back.

But it's not actually a tautology. There are languages that
can compute a lot of things but not everything. SQL is one
such language.

> > I appreciate you're trying to help me understand, but I'm
> > having trouble following your posts. It seems like you
[quoted text clipped - 5 lines]
> slightly different perspective.  Perhaps my writing skills, and clarity of
> thought, will improve with time.  :-)

Actually, I found this most recent message quite comprehensible. Also,
I want to thank you for hanging in with my questions as my frustration
grew. You are a gentleman, sir, and the world and this newsgroup needs
more gentlemen. (And ladies, of course.)

Marshall
Bill H - 11 Jul 2004 03:51 GMT
Marshall:

> "Marshall Spight" <mspight@dnai.com> wrote...
> > "Bill H" <wphaskett@THISISMUNGEDatt.net> wrote...
[quoted text clipped - 3 lines]
> as well.) Do you have a preferred term for "every x has an associated
> y?"

My apologies.  I thought you defined containment differently.  This will do
for me, although I'd probably not use that term for many to many
relationships.  But if you like, I'll stick with it.

> > What exactly is this relationship and how is it stored?  I can store the
> > invoice#s within the vendor in the vendor table.
[quoted text clipped - 3 lines]
> has an attribute/field that is a **list** of invoice numbers? Or is it a list
> of invoices?

A list of invoices numbers, in my example, is correct.  It may look like:

Field#  Contents.....
005      1272]7214-2]B715Z]1714A16

The order doesn't matter in this example.  So the record set not only
contains data but arrays/lists/collections/whatever.  You get the point.
The dbms tools work with this.  I can then reference data in the related
table as though the data were local to the referenced table (e.g. list
vendor inv# invdate invdesc invamount...) and this will list the vendors
with their associated invoice data extracted from the related table.

> > This defines a relationship in the MV model (although there are
> > a number of other ways to do so).
>
> As an aside: can you enumerate the different ways?

An index can define a relationship, so can a constraint and a function.  I
can also create custom rule relationships that operate off a trigger.

> > How is this relationship going to be exposed?  An example would be
> > to create a virtual field definition in the vendor table so that when asked,
[quoted text clipped - 3 lines]
> The term "virtual" here; what does it mean? Is there an online reference
> you like that I could use to read about this?

I alluded to it above.  It is a field definition that doesn't reference data
in the referenced table but references data in a related table.  I gave an
example of the vendor table list that actually returns data from the related
invoice table.

> > The relationship is stored in the relational database but not really like it
> > is stored in the MV database.  This is true because the MV model treats
[quoted text clipped - 9 lines]
> constraints, stored procedures) the MV model additionally stores
> application code, compiled code, MV tools, etc. Is that right?

Yes.  When I deliver an application the entire application is delivered in
the database, including code, table structure, field definitions, etc, etc,
etc.  Users can access and run it without loading any application code onto
their workstations.

> Again, it's something I'd like to try out. Can you recommend a free
> solution; the mysql of MV?

Try the following:

http://www-306.ibm.com/software/data/u2/universe/

http://www.jbase.com/products/jbase_download.html

http://www.revelation.com/SOFTWARE.NSF/06fb58066b4ed717852564030070163e?OpenView

There are some others but this will get anyone started.  Remember, the
product is both a dbms and an application environment.  Don't expect it to
just be a dbms where one starts the service and accesses it via SQL
(although one can).  I personally use the IBM product and another one but
you can't get the other one for free, so I won't send you to their web site.
:-)

> > What makes this different is only that these tables are usually part of the
> > database structure of the production data.  So, you'd have tables for
[quoted text clipped - 4 lines]
>
> This separation you describe is not much of a separation.

Remember this is both a dbms and an application server product.  It has more
capabilities and additional tools.

> What if you have two comparatively unrelated applications that work
> against the same schema; both applications are in the dbms; one
> application enforces a constraint and one doesn't (for whatever reason:
> a bug, or the programmer just forgot about it.) Wouldn't that be a
> pathway for data corruption to enter the system?

Under this scenario, one would have to design a non-RD model dbms
application like any other good application where the API's are designed to
return and/or do stuff via calls, where the user interface is separate from
the business rules.   However, there's just no solution for multiple
applications sharing data in a dbms and having different business rules
(constraints/relationships/etc) that effect the data.  Which is controlling?
Can application A build the business rule APIs and application B use them?

I'd say this is an ideal place for the RD model dbms, as many people who
don't know the business well can do the development and queries.

> > Ah, excuse me?  One reuses check#s in different accounts all the time.  One
> > reuses invoice#s for different vendors all the time too.  To include the
[quoted text clipped - 8 lines]
> (unless they are just different views on the same data. Are they?
> Or are they stored separately, and able to become out of sync.)

My last sentence only meant to describe my defined relation between the
vendor and the invoice and the invoice and the vendor; nothing more.  :-)

If the invoice is related to the vendor then the invoice has to have access
to the vendor, somehow, somewhere, plain and simple.  If not, I could never
get a list of invoices with the vendor# too.  The same is true with the
vendor. I described how to do this, explicitly.  There's no need to talk
about data corruption, considering the syncronization tools are available
and, hopefully, are in place.

These are the kinds of issues, however, we have to face as DBAs and
developers, no matter what we're developing with/in.  True?

> >  Redundancy?  Storing
> > the transaction#s in the account saves having to store the "transaction to
[quoted text clipped - 3 lines]
> Uh, no. I mean, it's less redundancy that storing it three times, but it's
> more than just storing it once.

I'm more inclined to design for errors with a "little" redundancy.  :-)
(although it isn't absolutely necessary).

> If you don't store the same information more than once, then
> the entire concept of "synchronization code" (first time I've
> heard the term) is unnecessary.

If the RD model does a cascading delete of a vendor and its association
invoices, it _has_ to know the invoices associated with the vendor too!  I
write the code once (15 years ago) and use it all the time in this
environment.  That's because the mvDbms environment is more than a dbms and
considers the application side of its nature.

Some things have to be written because it is "assumed" better to do so in
the application (hey, what can I say).  We've developed applications where
bad things happen to the hardware and, thus, the data.  In accounting data
some redundancy is an absolute requirement, if for no other reason than to
know of problems and be able to rebuild.

Is this assumption good?  I don't know.  Probably mostly yes in some
development environments and mostly no in other development environments.

> Hmmm. Data management is something that is very useful to
> business, but it is not business-oriented in and of itself.
> Same with adding up columns of numbers.
>
> You mean because they don't understand SQL? I don't get why we're
> talking about business people here; we're discussing data management.

This is one view.  Mine is different because I have to pay bills and
employees.  If there's not enough cash I don't get paid.  So, for me, it
_is_ all about business.  :-)

> > I can set a trigger to enforce integrity within the bank account table so if
> > a bank transaction is cleared, the uncleared reference to it within the bank
[quoted text clipped - 4 lines]
> Do you have to set these up manually every place a bank transaction clear
> is invoked, or do you just do it once?

No.  It's done once in the appropriate table definition, or perhaps the
field definition.

> > I cannot emphasize this enough; the MV model is located centrally!  The
> > application server and dbms server reside within the same environment, on
[quoted text clipped - 4 lines]
>
> Where can I read more about this intriguing concept?

You can read a lot of information about it at:

http://www-306.ibm.com/software/data/u2/

and you can google to newsgroups and read comp.databases.pick.  It's been
around forever and has a lot of fun stuff.  And they're mostly polite,
except for the occasional individual who forgot to take his lithium.  :-)

> > And here I was thinking I was answering your queries directly, albeit in a
> > slightly different perspective.  Perhaps my writing skills, and clarity of
[quoted text clipped - 4 lines]
> grew. You are a gentleman, sir, and the world and this newsgroup needs
> more gentlemen. (And ladies, of course.)

Thank you for your kind words and I look forward to passing them on to the
next person on this list.  :-)

Bill
Anthony W. Youngman - 13 Jul 2004 00:22 GMT
>The distinction you're drawing is that in addition to all the stuff that
>both model store, (names, addresses, relationships, metadata, functions,
[quoted text clipped - 3 lines]
>Again, it's something I'd like to try out. Can you recommend a free
>solution; the mysql of MV?

See my sig :-) Note however, it's open source and somewhere between
alpha and beta status ... (plus I wouldn't call it "pure MV" - more like
an "MV BASIC compiler over a relational database" almost :-( so it's
likely to be very misleading as an intro into the MV model.

The other thing is to download one of the commercial variants. There are
three at least that are "free for non-commercial use" namely Jbase
(www.jbase.com), and UniVerse and UniData (both now owned by IBM). I'm
not sure of the urls for those two - there's no point searching the IBM
site because it's a needle in a haystack, but if you go to www.u2ug.org 
and look at the FAQs, it'll almost certainly be there.

Cheers,
Wol
Signature

Anthony W. Youngman <pixie@thewolery.demon.co.uk>
'Yings, yow graley yin! Suz ae rikt dheu,' said the blue man, taking the
thimble. 'What *is* he?' said Magrat. 'They're gnomes,' said Nanny. The man
lowered the thimble. 'Pictsies!' Carpe Jugulum, Terry Pratchett 1998
Visit the MaVerick web-site - <http://www.maverick-dbms.org> Open Source Pick

Marshall Spight - 13 Jul 2004 03:12 GMT
> The other thing is to download one of the commercial variants. There are
> three at least that are "free for non-commercial use" namely Jbase
> (www.jbase.com), and UniVerse and UniData (both now owned by IBM).

I've downloaded UniVerse this weekend, but haven't gotten too far
with it.

What's the difference between UniVerse and UniData?

MarShall
Bill H - 14 Jul 2004 13:53 GMT
Marshall:

> I've downloaded UniVerse this weekend, but haven't gotten too far
> with it.
>
> What's the difference between UniVerse and UniData?

Universe merged several different mvDbms products and is now able to be used
by code from several different heritages (within the mvDbms market that is).
Unidata is a cleaner version and never really worried about trying to be all
things to all people (within the mvDbms market that is).  Anybody moving to
Unidata had to convert their applications and table structures to Unidata's
while this wasn't really necessary with Universe.

So, despite the basic similarities of the environment, the underlying dbms
code (I mean the actual code written to run the dbms) is significantly
different.

It has often been said if you're converting from one mvDbms to another move
to Universe but if you designing from scratch use Unidata.  :-)

Hope this helps.

Bill
Marshall Spight - 14 Jul 2004 18:28 GMT
> > What's the difference between UniVerse and UniData?
>
> It has often been said if you're converting from one mvDbms to another move
> to Universe but if you designing from scratch use Unidata.  :-)
>
> Hope this helps.

That's exactly what I needed to know. Thanks!

Marshall
Mike Preece - 19 Jul 2004 05:14 GMT
> > > What's the difference between UniVerse and UniData?
> >
[quoted text clipped - 6 lines]
>
> Marshall

So how's it going Marshall? Are you getting to grips with UniVerse?
Did you decide to go with UniData instead?

Mike.
Marshall Spight - 19 Jul 2004 05:31 GMT
> > > It has often been said if you're converting from one mvDbms to another move
> > > to Universe but if you designing from scratch use Unidata.  :-)
[quoted text clipped - 5 lines]
> So how's it going Marshall? Are you getting to grips with UniVerse?
> Did you decide to go with UniData instead?

Uh, I de-installed UniVerse and installed UniData. I haven't done
much with it yet, but I've browsed some docs. I think the thing
I really care about is learning the query language. I'm particularly
interested in learning the part about how they deal with MV
attributes. (No surprise, I suppose; that's the part that's different.)

I was a bit daunted by the pages about what one has to do to
get the sample database installed; it was longer than I would
have hoped. Still, I expect I'll slog through it all.

Marshall
Anthony W. Youngman - 10 Jul 2004 00:00 GMT
>> The models we use create "stupid" tricks.  It's the models that create the
>> constraints to make some tricks stupid and others smart.  An RM "stupid"
[quoted text clipped - 5 lines]
>disadvantages to managing integrity in applications instead of centrally.
>This is independent of MV vs. RM vs. whatever.

Well, yes ...

But my experience is that that begs the question. Do you want your data
to be "consistent" OR "accurate"? Constraints enforce consistency, but
what do you do when real-life decides that IT is going to be
INconsistent?

With flexibility comes power. MV solutions are more flexible. And with
that flexibility comes the ability to shoot yourself in the foot.

>> > Please be specific. I am very interested in specific examples of specific
>> > operations or structures that you feel are hard to solve with RM or SQL
[quoted text clipped - 15 lines]
>the account table? That kind of redundancy leads directly
>to data corruption.

What redundancy? MV does not normally contain redundant data. Unless you
mean storing a foreign key is "redundancy", and isn't that what
relational databases do all the time?

>> Now, what good is this?  As Mr Youngman points out it only takes one disk
>> read to get the account and its relationships to the uncleared transactions.
>> This is an almost instantaneous response to our web clients.  So there's an
>> upside.
>
>Ugh. Let's please not talk about disk reads.

So you're quite happy to give your clients a system that, running on a
Cray, still makes an old Z80-based system look like a speed demon?

The whole reason we hammer on about disk accesses is because we KNOW we
can't be beaten. And the whole reason relational people like you don't
like it is because you can't compete. Isn't that always the rule of
competition - try to make the other guy's advantage look like a
disadvantage?

At the end of the day, by avoiding things like disk reads, you are
saying "performance is irrelevant". Taken to its extreme, that means
that you would be quite happy delivering a system that guaranteed to eg
crack a 4096-bit RSA key. The fact that it wouldn't finish its first run
before the heat-death of the solar system isn't your problem ...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Marshall Spight - 10 Jul 2004 03:14 GMT
> But my experience is that that begs the question. Do you want your data
> to be "consistent" OR "accurate"? Constraints enforce consistency, but
> what do you do when real-life decides that IT is going to be
> INconsistent?

Can you give me an example? Also note, DBMSs are for managing
data, not for managing real life.

> With flexibility comes power. MV solutions are more flexible.

Easier to use, maybe. But less flexible, from what I can tell.

> >> Now, what good is this?  As Mr Youngman points out it only takes one disk
> >> read to get the account and its relationships to the uncleared transactions.
[quoted text clipped - 8 lines]
> The whole reason we hammer on about disk accesses is because we
> KNOW we can't be beaten.

I don't wish to condescend, but when you talk about
performance, I get the impression that it's not something you
know very much about, and that you have an extremely
simplified view of how it works. In any event, the
topic is *extremely* complicated, to the point that
counting disk reads is a useless endevour.

Sigh. All right.

Is there a canonical MV application that I can get easily and
try out, so as to evaluate your performance claims?
If someone were to ask me the same about a SQL dbms,
I'd say "mysql." Is there a mysql of MV?

I'd like to compare some complicated query performance
on mysql vs. X-MV. Not that complicated query performance
is a mysql strong point.

> At the end of the day, by avoiding things like disk reads, you are
> saying "performance is irrelevant".

Performance is very relevant. I know what it is that I'm saying,
and it's not what you're saying I'm saying.

> Taken to its extreme, that means
> that you would be quite happy ...

You're trying to put words in my mouth. Don't do that.

Marshall
Anthony W. Youngman - 09 Jul 2004 23:25 GMT
>> I'm not sure whether this answers your question as it depends on what you
>> mean by "relationship" but here is another type of relationship -- each
[quoted text clipped - 24 lines]
>The difference between a virtual field and a non-virtual field is
>one of implementation; the interface is the same either way. (Yes? No?)

Not quite. Yes the interface is the same, but your first example would
have a PEOPLE file with an ADDRESS datafield.

The second example would have a PEOPLE file with an ADDRESS-KEY
datafield and an ADDRESS virtual field. From the point of the person
using the query language, they would neither know nor care that the two
ADDRESS fields are fundamentally different "under the bonnet".

>> So, once that virtual field is defined, I can ask the database to
>>
[quoted text clipped - 7 lines]
><field range>. Are you limited having a single field that is marked
>unique?

Integrity-wise, the only uniqueness that the database itself enforces is
the primary key. Yes, this could be improved on ...

>> > Another example is managing data integrity in procedural application
>> > code. In RM this is considered a "stupid database trick" to quote from
[quoted text clipped - 17 lines]
>
>I didn't quite follow this.

Putting constraints in the app not the database leads to smaller
development teams.

>> Something is decidedly less expensive in terms of time for
>> maintaining and having the constraints in the same language as the rest of
>> the application just might be one of the keys to that.
>
>I'd buy that in a second. But I still want my constraints enforced (at least)
>centrally.

So would I :-) But I want my constraints *optional*.

>> > Please be specific. I am very interested in specific examples of specific
>> > operations or structures that you feel are hard to solve with RM or SQL
[quoted text clipped - 11 lines]
>So if you have use-cases for situations where you feel MV is better than
>the relational approach, I'm happy to hear them.

Well, you saw my example about the Australian breweries? Where one
brewery stole a march on the rest and hammered the lot in the market
place - apart from the one MV-based brewery that responded quickly
enough to ride up with them?

>> I am very confident that I can find aspects of the
>> relational model that are not based on either mathematics or science (we've
[quoted text clipped - 4 lines]
>
>Bring on the anecdotes!

The Witwatersrand study that said MV-based companies spent *half* the
money that relational-based companies did on their databases.

The experience of MV practitioners involved in conversions from MV to
relational - they *ALL* say that any company escaping with *just* a
*doubling* in head count (plus the same in licence fees) has got off
very lightly cost-wise.

The story I like, where consultants spent SIX MONTHS tuning a complex
query so's it ran faster than the MV system it was replacing - and when
they crowed to management that the new system was 10% faster than the
old system they were brought down to earth with a big bang as the guy
supporting the MV system pointed out that was running on an ancient P90
- the new system was a twin Xeon-800 box and surely it should be able to
do better than just 10%? (Oh - and I'm prepared to bet dollars to cents
that the MV query wasn't optimised AT ALL.)

>> I'm in search of better science on the
>> matter and a mathematical model that is as useful to the practitioner as the
[quoted text clipped - 4 lines]
>Give me ten million dollars and 5 years and it should be no problem.
>Since I have neither, I'm willing to forego the whole proof thing.

Well, the first thing you'd have to do is find some way of showing that
"data == tuple". It's all very well the relational model *asserting*
that it is, but unless you've got some real-world conjecture that links
the two, you're going to get nowhere.

Science has recently been surprised by the apparent existence of 5-quark
bosons. I think investigating the relationship between "real data" and
"relational tuples" (in other words, trying to formalise business
analysis) might provide a few (to say the least) surprises ...

>> It seems to me that a class of
>> databases that advance the "older" approaches of Cache' and PICK could beat
[quoted text clipped - 6 lines]
>extreme view of scalability which has been skewed by my workplace.
>However I can believe the agile part.

Anecdotally ... but there are apparently some pretty huge MV databases
out there, and they haven't hit problems yet. At least, not ones
attributable to the database - maybe the hardware isn't powerful enough,
but relational would have hit the same problems a lot harder AND sooner.

Or redundancy, hardware scalability, what have you but all things that
are external to the database.

>> It seems the best I can do is prove
>> that the relational model is not purely mathematics, but contains some
[quoted text clipped - 10 lines]
>and if it useful, then we rejoice. It's certainly possible for a formalism
>to be completely sound and self-consistent and utterly useless.

Exactly. So you see why we object when people say "relational MUST be
right because it's based on mathematics". It's formal, sound,
self-consistent, and ... :-)

>Marshall

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Marshall Spight - 10 Jul 2004 03:43 GMT
> >Let me see if I understand this. You have a "file" of People and it might
> >have, directly in it, a field that is a list of addresses, so we have one:many
[quoted text clipped - 15 lines]
> using the query language, they would neither know nor care that the two
> ADDRESS fields are fundamentally different "under the bonnet".

In other words, the implementation is different ("under the bonnet")
but the interface is the same.

Here's a question: let's imagine a giant nested data structure.
You have records nested ten levels deep, and the data in each
level is ten times as much as the level it's contained in.

Let's say you want to query only the top level, and not pull in
all that extra stuff. Can you do that? What does the query look
like? What if you want only 3 levels deep?

Is there a reference for the MV query language I could read
somewhere?

> >Yes, we've discussed this before, and I believe we agree that it's important
> >that constaints be available to applications.
[quoted text clipped - 8 lines]
> Putting constraints in the app not the database leads to smaller
> development teams.

I don't believe it. I'd believe that it only *works* with smaller
development teams.

> >> Something is decidedly less expensive in terms of time for
> >> maintaining and having the constraints in the same language as the rest of
[quoted text clipped - 4 lines]
> >
> So would I :-) But I want my constraints *optional*.

An optional constraint is a contradiction in terms.

What is it that makes you want it optional? What's an example of
a rule you want enforced sometimes but not other times?

> >I'm not asking for proof. I know you care a lot about proof, but I don't
> >so much. Right now I'm more interested in hearing a lot of people's stories.
[quoted text clipped - 5 lines]
> place - apart from the one MV-based brewery that responded quickly
> enough to ride up with them?

That's not a use-case, but it *is* a great success story.

I'm interested in lower-level, specific details. The nitty-gritty.
Like, here's table 1 and here's table 2 and I wanted to figure
out x, so I typed xxx and it was really easy and the comparable
SQL is xxxxxxxxxx which is really hard.

> >Bring on the anecdotes!
>
[quoted text clipped - 6 lines]
> do better than just 10%? (Oh - and I'm prepared to bet dollars to cents
> that the MV query wasn't optimised AT ALL.)

Again, a nice story, but without the *specific* query and the specific
tables, I don't learn much from it.

I guess a key thing I'm looking for is results that I can reproduce at home.

> Science has recently been surprised by the apparent existence of 5-quark
> bosons. I think investigating the relationship between "real data" and
> "relational tuples" (in other words, trying to formalise business
> analysis) might provide a few (to say the least) surprises ...

Can you propose a methodology? My best guess is that the question
you are posing is meaningless.

> >I have serious doubts about the scalability claim, but then I have an
> >extreme view of scalability which has been skewed by my workplace.
[quoted text clipped - 4 lines]
> attributable to the database - maybe the hardware isn't powerful enough,
> but relational would have hit the same problems a lot harder AND sooner.

How huge? My job involves working on a dataset that is measured in
terabytes.

> Or redundancy, hardware scalability, what have you but all things that
> are external to the database.

Agreed.

> >And there's not just one math, either. You come up with a formalism,
> >and if it useful, then we rejoice. It's certainly possible for a formalism
[quoted text clipped - 3 lines]
> right because it's based on mathematics". It's formal, sound,
> self-consistent, and ... :-)

Mmmm. That might be a fair criticism of this newsgroup as a whole, but
I don't know if it would stick to me that well. I don't recall saying
"relational MUST be right" at any point. I'm more along the lines
of "I like how well relational handles many-to-many relationships."

Marshall
Dawn M. Wolthuis - 30 Jun 2004 02:19 GMT
> > > Perhaps I misunderstand, but MV has only the one kind of
> > > relationship it is capable of understanding: containment.

I should have read from the top of the topic down, but I now understand what
you mean.  As far as the database itself, without any triggers written, nor
any application code, the only relationship "between relations" that it
understands is that of parent-child.  --dawn
Marshall Spight - 30 Jun 2004 16:05 GMT
> > > > Perhaps I misunderstand, but MV has only the one kind of
> > > > relationship it is capable of understanding: containment.
[quoted text clipped - 3 lines]
> any application code, the only relationship "between relations" that it
> understands is that of parent-child.  --dawn

So what do you do in the face of many:many relationships? I bet
it's the same thing that OO does: you have links on one side and
links on the other, and manage them in code.

Many to many relationships are one thing that the RM just totally
nails. I bring this up not to run a whole "mine's bigger" thing but
because I believe that if this entire years-long conversation has
a use, it is to highlight the areas where each side succeeds, so
that we may begin to work towards a new model that encompases
the best of several existing systems.

In programming languages, they are talkin more and more about
"multiparadigm." I think we should follow their lead.

Marshall
Dawn M. Wolthuis - 30 Jun 2004 17:50 GMT
> > > > > Perhaps I misunderstand, but MV has only the one kind of
> > > > > relationship it is capable of understanding: containment.
[quoted text clipped - 7 lines]
> it's the same thing that OO does: you have links on one side and
> links on the other, and manage them in code.

yup

> Many to many relationships are one thing that the RM just totally
> nails. I bring this up not to run a whole "mine's bigger" thing but
> because I believe that if this entire years-long conversation has
> a use, it is to highlight the areas where each side succeeds, so
> that we may begin to work towards a new model that encompases
> the best of several existing systems.

Sounds good.

RM does do well with M:M, the most conceptually difficult for the user, but
not in doing anything to simplify the presentation/ease of use for the user.
Viewing books and their authors from one perspective and then authors and
their books from another makes sense to a person.  Viewing it as a
many-to-many is not as helpful (as each book-author pair on a separate line
so you don't have one row for each book, nor one row for each author).  RM
also has difficulty with multiple 1:M with the same 1 when there is a need
for counting, summation, or other arithmetic and visuals/reporting against
the same.  STAR joins have helped a bit with that, I think.

> In programming languages, they are talkin more and more about
> "multiparadigm." I think we should follow their lead.

agreed.  --dawn
Marshall Spight - 01 Jul 2004 05:30 GMT
> RM
> also has difficulty with multiple 1:M with the same 1 when there is a need
> for counting, summation, or other arithmetic and visuals/reporting against
> the same.  STAR joins have helped a bit with that, I think.

Could you expand on that a bit? I didn't quite follow.

Marshall
Laconic2 - 30 Jun 2004 19:13 GMT
> Many to many relationships are one thing that the RM just totally
> nails. I bring this up not to run a whole "mine's bigger" thing but
[quoted text clipped - 5 lines]
> In programming languages, they are talkin more and more about
> "multiparadigm." I think we should follow their lead.

Hear, Hear!
Eric Kaun - 09 Jul 2004 03:51 GMT
>>Many to many relationships are one thing that the RM just totally
>>nails. I bring this up not to run a whole "mine's bigger" thing but
[quoted text clipped - 7 lines]
>
> Hear, Hear!

In support of this effort, I've taken all of my dimes and created a set
(or is it a list?) of stacks, 2 in each stack. Thus ends my support of
multi pair-o-dimes.

The problem, of course, is deciding where those paradigms apply... but I
certainly support the desire to merge them somehow. The Xen effort,
funded by Microsoft Research (I forget the researchers and am too lazy
to look them up) looks somewhat promising, though from what I've seen it
still completely lacks any declarative constraints.

- erk
Anthony W. Youngman - 05 Jul 2004 23:57 GMT
>For example, in another thread someone said (over and over
>if I remember correctly :-) that if you delete an invoice, all
[quoted text clipped - 5 lines]
>throw them away.) Can you do this declaratively in MV? How
>is it done?

You mean a bit like you can't delete a company if there are any
outstanding invoices? No that can't (or rather, shouldn't) be done
natively and declaratively in MV. But I wouldn't call that a "container
and contents".

>Can it handle many-to-many? I've heard some people say it
>can, but is integrity enforced automatically, or is it just done
>with references that are application managed?

Let's give an example - an owner can have multiple cars, and a car can
have multiple owners.

What I'd do is have an OWNERS field in the CARS file, and declare it as
an index. So if I want to know who owns a car, I just list the car and
pull the owners into the listing. If I want to know what cars someone
owns, I list all cars owned by that person.

Actually, thinking about it, this seems like a perfect case of "ON
DELETE RESTRICT" - don't delete an owner if any cars only have that
owner. But MV would leave that to the app (I'd rather be able to enforce
it, but it doesn't seem to be a problem in real life ... :-)

>Can it *automatically* enforce declared integrity constraints?
>Can you have an integer attribute and declare that it must
[quoted text clipped - 7 lines]
>> Like I said before, this is not to say that the MV model does
>> everything...as nothing can.

And centrally enforced constraints are a lack in the MV model ... but
design-enforced relations are its strength. Because it doesn't need 90%
of relational constraints that are necessary in relational, it hasn't
bothered with the other constraints (which I agree is a pity).

>I don't think I agree. For example, Java, C++, and BASIC are
>all able to compute anything that can be computed. They do
[quoted text clipped - 15 lines]
>basis for the tools. Are they complete? Are they correct? Are they
>self-consistent?

Are they even *RELEVANT*? Take any theory in PURE mathematics. It's
complete, it's correct, it's self-consistent. And if it assumes that
parallel lines in three dimensions can meet, it doesn't break. It just
models a completely different world to the one we actually live in ...

>SQL is relationally complete, over its lame type system. It could
>really use a better type system. This will make it more usable but
[quoted text clipped - 7 lines]
>and I are engaged in exactly the same exploration, although we
>come at it from different backgrounds.

I think I'd agree here ... MV is great at ease of use. It's great at
enforcing entity-level integrity (can't have an adjective (or adjectival
clause) without a noun for it to describe). It's not great at enforcing
constraints *between* *entities*. But then, neither is the real world
:-)

Relational fits theory fine. MV fits the real world fine.

>Marshall

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Marshall Spight - 07 Jul 2004 02:47 GMT
> >For example, in another thread someone said (over and over
> >if I remember correctly :-) that if you delete an invoice, all
[quoted text clipped - 10 lines]
> natively and declaratively in MV. But I wouldn't call that a "container
> and contents".

Why do you say "shouldn't?" It seems pretty clear to me that a
declarative approach is always better than a procedural one. (Dawn?
Care to rebut?)

If not container/contents, what terminology would you use?

> Let's give an example - an owner can have multiple cars, and a car can
> have multiple owners.
[quoted text clipped - 3 lines]
> pull the owners into the listing. If I want to know what cars someone
> owns, I list all cars owned by that person.

What does "declare it an index" mean? Is it like a pointer or foreign key?

> >> But [MV] provides an interesting confluence of
> >> tools and capabilities that render the model very useful in solving business
[quoted text clipped - 11 lines]
> parallel lines in three dimensions can meet, it doesn't break. It just
> models a completely different world to the one we actually live in ...

I'm sorry, you say this why? Because you have traced some lines
from one end of the universe to other and checked that they don't
meet? Actually, even the very idea of the "world we live in" having
lines in it doesn't work for me. Walking around my house, I never
saw an infinite sequence of colinear points.

> Relational fits theory fine. MV fits the real world fine.

That statement just seems totally bogus to me. Does subtraction
fit the real world? What happens when I subtract 5 lemons
from 3 lemons? Do I get -2 lemons? Can you send me a picture
of -2 lemons via email; I want to see what they look like.

Marshall
Anthony W. Youngman - 10 Jul 2004 00:20 GMT
>> >For example, in another thread someone said (over and over
>> >if I remember correctly :-) that if you delete an invoice, all
[quoted text clipped - 14 lines]
>declarative approach is always better than a procedural one. (Dawn?
>Care to rebut?)

What seems clear to me is not clear to you, and vice versa. Read Dick
Feynmann. Different brains are wired differently, and see the world
differently. Just because it SEEMS to you that declarative is better
than procedural it does not mean that that is the case.

>If not container/contents, what terminology would you use?

To me, an invoice is a container, and line items are contents thereof.
You can't have the latter without the former.

A company does NOT contain its invoices - a company can go bust but the
invoices are still outstanding ... okay - we now get into all sorts of
semantics such as "does a company entry in a database represent a real
company, or just a fictional representation thereof?".

But I view those two relationships as being fundamentally different, and
they are modelled completely differently in MV. I don't think relational
can see any difference between them.

>> Let's give an example - an owner can have multiple cars, and a car can
>> have multiple owners.
[quoted text clipped - 5 lines]
>
>What does "declare it an index" mean? Is it like a pointer or foreign key?

Surely you declare indices in relational dbs? Same thing here. So's I
can say "SELECT CARS WITH OWNER EQ 'X'", and it doesn't need to search
the entire CARS file, but just goes to the index and grabs a list of
primary keys into the CARS file from the index.

>> >> But [MV] provides an interesting confluence of
>> >> tools and capabilities that render the model very useful in
[quoted text clipped - 18 lines]
>lines in it doesn't work for me. Walking around my house, I never
>saw an infinite sequence of colinear points.

I'm just saying that being complete, correct and self-consistent isn't
enough. All that proves is that it works as pure maths. But if it
doesn't work as *applied* maths, then it's the wrong theory for the
problem at hand.

>> Relational fits theory fine. MV fits the real world fine.
>
>That statement just seems totally bogus to me. Does subtraction
>fit the real world? What happens when I subtract 5 lemons
>from 3 lemons? Do I get -2 lemons? Can you send me a picture
>of -2 lemons via email; I want to see what they look like.

Relational is complete, correct, and self-consistent. It's fine as a
pure-maths theory.

MV just seems to *fit* the real world rather better :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Marshall Spight - 10 Jul 2004 03:25 GMT
> >Why do you say "shouldn't?" It seems pretty clear to me that a
> >declarative approach is always better than a procedural one. (Dawn?
> >Care to rebut?)
>
> What seems clear to me is not clear to you, and vice versa.

I notice you didn't answer my question.

> >If not container/contents, what terminology would you use?
> >
> To me, an invoice is a container, and line items are contents thereof.
> You can't have the latter without the former.
>
> A company does NOT contain its invoices ...

Okay. What does contain the invoices? Or are they a top-level concept?
If so, how are they related to companies?

> - a company can go bust but the
> invoices are still outstanding ...

Their financial situation is irrelevant. Perhaps you are confusing
the company and the record of the company in the dbms.

> okay - we now get into all sorts of
> semantics such as "does a company entry in a database represent a real
> company, or just a fictional representation thereof?".

This is a simple question with a simple answer. The company entry
in the database represents a real-world company. It is not an actual
company, nor is it a representation of a representation of a company.

> But I view those two relationships as being fundamentally different, and
> they are modelled completely differently in MV. I don't think relational
> can see any difference between them.

Okay, so *how* are they different?

> >What does "declare it an index" mean? Is it like a pointer or foreign key?
> >
> Surely you declare indices in relational dbs? Same thing here.

MV terminology is quite foreign to me, so I do not assume that when
an MV person uses a word I'm used to, they are using it in the same
way. Note when I say foreign, I just mean that I'm not familiar with
it; I don't have any opinion on the goodness or badness of the terminology.
(Well, I might think "file" is an unfortunately-overloaded term.)

> So's I
> can say "SELECT CARS WITH OWNER EQ 'X'", and it doesn't need to search
> the entire CARS file, but just goes to the index and grabs a list of
> primary keys into the CARS file from the index.

Okay.

> >> Relational fits theory fine. MV fits the real world fine.
> >
[quoted text clipped - 7 lines]
>
> MV just seems to *fit* the real world rather better :-)

The simplest explanation here is that it's what you're used to,
and hence it seems to fit best for you. You haven't given any
evidence that it actually does fit the real world any better.
Note that I consider that question unanswerable and hence
irrelevant.

Marshall
Anthony W. Youngman - 05 Jul 2004 22:43 GMT
>Marshall:
>
[quoted text clipped - 12 lines]
>datastore too.  Most of the MV products can even understand and cope with
>SQL functionality.

Thanks Bill. Not knowing what ON DELETE RESTRICT means, I couldn't
really respond ...

>Like I said before, this is not to say that the MV model does
>everything...as nothing can.  But it provides an interesting confluence of
>tools and capabilities that render the model very useful in solving business
>problems for many people and businesses.

Well, it can't actually *do* ON DELETE CASCADE (in native mode anyway) -
it's just that what it can do has the same effect :-) As Bill says, it's
*just* *different*

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 09 Jul 2004 01:52 GMT
> Yes, but relational formalises metadata INTO data.

No formalization is needed; metadata is data. It's just data with a
different domain, but there's no reason to think it obeys different laws
or requires different structure.

> Once it's in an RDBMS
> it's no longer metadata, because the rdbms doesn't understand any
> meaning in it and can't take advantage of that meaning so it's just data.

I'm confused. How does placing it in an RDBMS make it no longer
metadata? The system catalog (metadata - data about your data) can be
represented relationally (or as XML if you're feeling masochistic).

How does the RDBMS "understand" no meaning in it? And how do other DBMSs
"understand" meaning? The constraints and relation definitions of the
metadata are as much meaning as the RDBMS can have.

> The ordering in a list is metadata. Convert that into a set to put into
> an rdbms and ORDER is now just a meaningless (as far as the db engine is
> concerned) bit of data.

No, in that case order is gone, vanished. If you don't state it, the
RDBMS doesn't know about it. On the other hand, it doesn't assume
anything either. Order is easily represented, and again if you're
masochistic, you can store a list-typed attribute.

> That's where MV and OO fundamentally differ. They try to *avoid*
> converting metadata to data, so that the db engine can be intelligent
> and take advantage of it to optimise things.

So by treating metadata as something other than data (what would that
be?), they can be intelligent and optimize? Intelligent how? Optimize what?

- erk
Anthony W. Youngman - 10 Jul 2004 22:52 GMT
>> Yes, but relational formalises metadata INTO data.
>
>No formalization is needed; metadata is data. It's just data with a
>different domain, but there's no reason to think it obeys different
>laws or requires different structure.

Yes - but metadata can be used by the database while data can't.

>> Once it's in an RDBMS  it's no longer metadata, because the rdbms
>>doesn't understand any  meaning in it and can't take advantage of that
[quoted text clipped - 3 lines]
>metadata? The system catalog (metadata - data about your data) can be
>represented relationally (or as XML if you're feeling masochistic).

Because you've converted it to data! And the system catalog doesn't let
you store ALL metadata AS metadata. It will only let you store metadata
it recognises.

>How does the RDBMS "understand" no meaning in it? And how do other
>DBMSs "understand" meaning? The constraints and relation definitions of
>the metadata are as much meaning as the RDBMS can have.

In other words, an RDBMS is incomplete. :-)

>> The ordering in a list is metadata. Convert that into a set to put
>>into  an rdbms and ORDER is now just a meaningless (as far as the db
[quoted text clipped - 4 lines]
>anything either. Order is easily represented, and again if you're
>masochistic, you can store a list-typed attribute.

But if you DO state it, the RDBMS doesn't know anything about it,
either! What do you mean by a "list-typed attribute"? Do you mean a
column that contains ordering information?

>> That's where MV and OO fundamentally differ. They try to *avoid*
>>converting metadata to data, so that the db engine can be intelligent
[quoted text clipped - 3 lines]
>be?), they can be intelligent and optimize? Intelligent how? Optimize
>what?

The whole point of a database is it STORES data, it does *not*
UNDERSTAND data. By converting metadata into data, you are now forcing
"intelligence" into the application.

A relational database thinks in terms of sets. In order to have a list,
you need to create extra DATA, and the database itself can't take
advantage of it, because it doesn't understand it.

DATA is what is stored IN a database. METADATA is data that is USED BY
the database. There *is* a difference, and the difference is crucial.
The more metadata you can leave as metadata, rather than convert to
data, the more information the database has available to it to optimise.

How does an RDBMS optimise access to a list, if it doesn't have any
understanding of what a list is?

That's the point of storing metadata *as* *metadata*. Because the
database understands it.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Marshall Spight - 13 Jul 2004 03:16 GMT
> The whole point of a database is it STORES data, it does *not*
> UNDERSTAND data.

The whole point of a database management system is to manage
data. As a side effect, it might also store it in a persistent storage
mechanism, but this is not a requirement. It has to be able
to manage the appropriate structure, enforce integrity, and
allow manipulation. ("Structure, integrity, manipulation.")
It can do this because it understands the data.

Marshall
Eric Kaun - 19 Jul 2004 18:11 GMT
>> No formalization is needed; metadata is data. It's just data with a
>> different domain, but there's no reason to think it obeys different
>> laws or requires different structure.
> Yes - but metadata can be used by the database while data can't.

Not quite true - if a DBMS supports a user-extensible typing system,
then it can "use" those types without understanding anything about them.
This is where SQL and so many other DBMSs completely fall down:
1. by requiring the user to rely solely on the types already provided by
the vendor
2. (the case with SQL today) making the type system so baroque as to be
useless
3. (also the case with SQL and JDBC/ODBC/etc. today) making the
embedding of data in programs so difficult as to force a compromise back
to the lowest-common-denominator primitives again.

Or some combination of the three.

>> I'm confused. How does placing it in an RDBMS make it no longer
>> metadata? The system catalog (metadata - data about your data) can be
[quoted text clipped - 3 lines]
> you store ALL metadata AS metadata. It will only let you store metadata
> it recognises.

Of course. So are you saying that 1) lists should be commonly-understood
"metadata", or 2) that Pick/MV let you extend the metadata recognized by
the DBMS?

If you're saying #1, then I could argue as well for other types (and
would say relation-valued attributes are far more powerful and useful
than lists). If you're saying #2, then again the typing mechanism would
help, though user-defined functions and views can aid somewhat.

What types of metadata does the DBMS need to recognize?

>> How does the RDBMS "understand" no meaning in it? And how do other
>> DBMSs "understand" meaning? The constraints and relation definitions
>> of the metadata are as much meaning as the RDBMS can have.
>
> In other words, an RDBMS is incomplete. :-)

Heh.

>>> The ordering in a list is metadata. Convert that into a set to put
>>> into  an rdbms and ORDER is now just a meaningless (as far as the db
[quoted text clipped - 8 lines]
> either! What do you mean by a "list-typed attribute"? Do you mean a
> column that contains ordering information?

No, I meant a single attribute that stores a list, much like in MV. The
difference is that it's not "first order" to the database; user-defined
types are orthogonal to relations.

So what does a Pick DB "know" that the RDBMS wouldn't? And how do you
tell it?

>>> That's where MV and OO fundamentally differ. They try to *avoid*
>>> converting metadata to data, so that the db engine can be intelligent
>>> and take advantage of it to optimise things.

I've read this several times, and still don't know what you mean. How
does OO avoid converting metadata to data? I'd say you're wrong; in Java
you can use classes like Class, Constructor, Method, etc. to do
"higher-order" operations, so the metadata is effectively converted to
the same sorts of things you write your programs in (i.e. classes). The
new JDK1.5 metadata will simply expand this; the metadata will still be
accessible as "data".

Other languages do similar things (albeit in a much more elegant way
than Java).

> The whole point of a database is it STORES data, it does *not*
> UNDERSTAND data. By converting metadata into data, you are now forcing
> "intelligence" into the application.

No, you're forcing intelligence [sic] into the RDBMS. You're telling it
what's allowed and what's not. What other meaning of "data definition"
is there?

My ongoing gripe about declaration vs. procedure is based on
descriptions of meaning. With procedural code, the meaning is implicit;
if you're lucky, the code was written in a clear way, and you can see
the meaning. With declarative, you don't guess (nor do you have to
implement in an algorithmic sense). The language/engine/DBMS does the
monkey work for you.

> A relational database thinks in terms of sets. In order to have a list,
> you need to create extra DATA, and the database itself can't take
> advantage of it, because it doesn't understand it.

Right, it understands relations and values; the types of those values
are something different. But what exactly does it matter? You seem to be
implying that lists are so useful as to be first-class citizens to the
DBMS, and I say they're not; I'd prefer sets, for one thing (and no,
from that standpoint, RDBMSs don't "do" sets either). Or even bags. Or
perhaps relations themselves. Lists are in so many cases poor
substitutes for a real data structure - as the presence of
"they-gotta-be-correlated" attributes in Pick files (e.g. QUANTITY
list-valued attribute and PRODUCT list-value-attribue to store line item
data for an order - better not lose the order or an item in one, or
you're hosed).

> DATA is what is stored IN a database. METADATA is data that is USED BY
> the database. There *is* a difference, and the difference is crucial.
> The more metadata you can leave as metadata, rather than convert to
> data, the more information the database has available to it to optimise.

That's ignoring what you mentioned earlier - the metadata that the DBMS
can understand. Are you saying that the metadata needs to be left in so
that later on, when the DBMS is extended in some way, it can now
comprehend what previously meant nothing to it?

And again, the concept of metadata (at least in the discussion at hand)
only has meaning in the context of datatypes. You seem to be saying that
because lists are Very Important Things, that the DBMS must "understand"
them as metadata, in much the same way as it understands files and
fields. I'm saying that's not needed, because you can define a List type
 which the RDBMS can manipulate like any other type you want to define,
though if you want the benefit of relational manipulation (a good thing
which would eliminate, for example, many many lines of code), you must
express the data relationally.

> How does an RDBMS optimise access to a list, if it doesn't have any
> understanding of what a list is?

So it's an optimization question? In short, it wouldn't - no more than
it would optimize access to an Order type I've defined (including line
items). Then again, if it were a relation-valued attribute, it could
optimize that with the same machinery with which it optimizes the rest
of the relations.

But again, the main point here is what's important and what's not. Lists
and their status as first-class DBMS citizens seems to be the point in
question.

> That's the point of storing metadata *as* *metadata*. Because the
> database understands it.

It can only understand what it understands. What other types besides
Lists need to "be" metadata?

- erk
Eric Kaun - 09 Jul 2004 01:53 GMT
> Yes, but relational formalises metadata INTO data.

No formalization is needed; metadata is data. It's just data with a
different domain, but there's no reason to think it obeys different laws
or requires different structure.

> Once it's in an RDBMS
> it's no longer metadata, because the rdbms doesn't understand any
> meaning in it and can't take advantage of that meaning so it's just data.

I'm confused. How does placing it in an RDBMS make it no longer
metadata? The system catalog (metadata - data about your data) can be
represented relationally (or as XML if you're feeling masochistic).

How does the RDBMS "understand" no meaning in it? And how do other DBMSs
"understand" meaning? The constraints and relation definitions of the
metadata are as much meaning as the RDBMS can have.

> The ordering in a list is metadata. Convert that into a set to put into
> an rdbms and ORDER is now just a meaningless (as far as the db engine is
> concerned) bit of data.

No, in that case order is gone, vanished. If you don't state it, the
RDBMS doesn't know about it. On the other hand, it doesn't assume
anything either. Order is easily represented, and again if you're
masochistic, you can store a list-typed attribute.

> That's where MV and OO fundamentally differ. They try to *avoid*
> converting metadata to data, so that the db engine can be intelligent
> and take advantage of it to optimise things.

So by treating metadata as something other than data (what would that
be?), they can be intelligent and optimize? Intelligent how? Optimize what?

- erk
Bill H - 14 Jun 2004 20:12 GMT
> "Bill H" <wphaskett@THISISMUNGEDatt.net> wrote in message...
> >
[quoted text clipped - 18 lines]
> > as being the controlling field with a relationship to field# 10 while
> > field# 10 is dependent on field# 9.

[snipped]

> While I see many examples like the above, can you give us an example of how
> the dictionary defines those?

Here's the field definition for the Accounts and Amounts:

   accounts
001 A
002 9
003 ACCT
004 C;10
005
006
007
008
009 L
010 5
011
012
013
014
015
016
017 The G/L acct#s associated with this invoice (controls field# 10)

   amounts
001 A
002 10
003 ACCT/AMTS
004 D;9
005
006
007 MR2,M
008
009 RN
010 13
011
012
013
014
015
016
017 The amounts associated with each G/L acct# in field# 9.

Field# 004 in the above definitions defines the controlling and dependent
fields.  The above structure may be different in different mvDbms products.
Anyway, these definitions are data just like other data and reside in the
database.

> What language do you use to define the dictionary? Is it user-accessible?
>
> - erk

As you can see, the definitions are just data.  They describe the data the
definitions have a pre-defined structure (the dbms defines this structure).
One builds a dictionary through various tools (line editor, screen editor,
GUI editor, GUI dictionary editor, etc).  The query language uses the field
definitions so I could:

LIST APINVOICES ACCOUNTS AMOUNTS

and get the following output:

apopen.... ACCT. ACCT/AMTS....
                *
555*1011   5070          6.73
340*VR3-2  5170      1,012.61-
          3370      1,963.84-
          5170          0.00
          3370          0.00
9999*3907  5000        300.00
555*1018   5070         29.53
340*VR11-1 5170        999.22-
          3370      1,977.23-
          5170          0.00
          3370          0.00

So the data is accessible by users or developers and the field definitions
can be accessed in the same way (since they're just data too) using the
query language:

LIST DICT APOPEN 'ACCOUNTS' 'AMOUNTS' D/CODE A/AMC S/NAME
V/STRUC V/TYP V/MAX

DICT APOPEN code A/AMC S/NAME.............. c/d struc. TP MAX

ACCTS       A        9 ACCT                 C;10       L  5
AMTS        A       10 ACCT/AMTS            D;9        RN 13

[405] 2 items listed out of 2 items.

Hope this helps.

Bill
Anthony W. Youngman - 18 Jun 2004 18:16 GMT
>This is, more than anything, the philosophical divide between relational and
>Pick folks. The more rules, the more they should be kept OUT of the
>application code. "Application" means just that: a judicious application. Of
>what? Rules. Application != definition, just as implementation !=
>specification.

Actually, as a Pickie, I'm very much inclined to agree. Rules should sit
BETWEEN the application and the data store. So no, I don't quite agree
with the relational approach, but I think the Pick approach is lacking
here.

>> This is because the business person is much closer to the application
>> and database, and its tools.
[quoted text clipped - 6 lines]
>Granted that some rules should be configurable; that doesn't imply that all
>should be. The business, after all, has (or needs!) some structure.

Beyond relational's strict data typing, aren't all relational rules
configurable? Okay, it's done using constraints, triggers, etc etc but
it's still configured by the programmer or dba, as far as I can see.

I want that power in Pick :-)
(Okay we've got it with triggers, but I don't necessarily think that's
the best way to do it)

>> As you can tell, a well defined mvDbms application uses the field
>> definitions to describe the data (as it should be) and relationships with
[quoted text clipped - 5 lines]
>the dictionary defines those? What language do you use to define the
>dictionary? Is it user-accessible?

I said earlier "the generic trumps the specific". Relational has a rule
that says "the data definition must be accessible using the same tools
as are used to access the data" (C&D rule 4, my paraphrase).

My multi-value rule 5 - Database self description :-)
The database management system shall describe itself using FILEs. FILEs
are described by other FILEs. The database itself is described by a
FILE. The database shall have no fundamental mechanism to differentiate
between FILEs containing data and FILEs containing metadata about that
data.

So yes, Pick uses the same language to access both data and definition.
And yes, it's user accessible. Because the db engine is forbidden to
know that there is a difference between data and metadata.

Which is why I think of a database as layered. And why I find the
relational emphasis on "push it into the database" difficult. I see it
as a four-layer thing.

Layer 1: the data store.
Layer 2: the integrity layer. Where relational has triggers,
constraints, and all that guff. Pick doesn't have anything native here
(although a lot of what relational has, Pick doesn't need because the
model is different - constraints for example). But Pick really should
have a native mechanism here. It hasn't had it in the past because we
manage fine without it (really we do - it's just that, in a FEW cases,
we can see that relational is worth copying here).
Layer 3: the presentation layer. What the apps see - relational tables
and views, Pick FILEs. Really, this is where views are defined, so Pick
really neither has it nor needs it.
Layer 4: applications - split into 4a database tools and 4b user apps.

I get the impression relational is trying to have a monolithic database
layer which is trying to be all things to all men. And if that's the
case, it's bound to fail. Break things up into tasks and layers, and
don't just have "the database".

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Laconic2 - 18 Jun 2004 20:14 GMT
> Actually, as a Pickie, I'm very much inclined to agree. Rules should sit
> BETWEEN the application and the data store. So no, I don't quite agree
> with the relational approach, but I think the Pick approach is lacking
> here.

As a relational  (as in SQL), I'm glad to see some agreement.  But I'd offer
yet another opinion.

Rules should  sit both ABOVE and BEYOND the application and the data store.
Both the CREATE script for the data store, and the code generation phase of
the application should be able to include rules,  when necessary,  from some
common rule repository.  This rule repository would do for rules what a data
dictionary does for data definitions.

Or not?
Dawn M. Wolthuis - 18 Jun 2004 21:43 GMT
> > Actually, as a Pickie, I'm very much inclined to agree. Rules should sit
> > BETWEEN the application and the data store. So no, I don't quite agree
[quoted text clipped - 11 lines]
>
> Or not?

Yes-ish.  It is hard to split out metadata, including rules, from data.  If
a type or a maximum length are designated as binding information regarding
an attribute, then those are rules, right?  And they are constraints, right?
And they are metadata, right?  And if we want to store all of our rules for
use by a rules engine, these these rules should be there, right?  So, what
should be in a system catalog or as DBMS constraints specifications outside
of a rules respository?  Nothing.  So, the rules repository should include
whatever aspects of the data dictionary are in need of enforcement.  The
data dictionary is then descriptive for use in queries, not another rules
repository.  I think of a data dictionary as like a Land's End catalog --
something from which to shop for the information I want.

--dawn
Anthony W. Youngman - 19 Jun 2004 23:42 GMT
>> Actually, as a Pickie, I'm very much inclined to agree. Rules should sit
>> BETWEEN the application and the data store. So no, I don't quite agree
[quoted text clipped - 11 lines]
>
>Or not?

Except I don't understand what you mean by "above" and "beyond". The app
sits above the rules, the datastore sits below. In order for the app to
write to the datastore it has to go through the rule layer. That's the
way I see it. You seem to be saying the rules are somewhere else. I
think I see what you mean, but it doesn't make sense to me.

The way I would implement it (in Pick) would be to attach an "integrity
check routine" to the FILE. Think of it as a SQL "whole-table trigger" -
you can't write to file without setting off this thing (if it exists)
and it can reject the write with a "this data is invalid" error.

And as a first thing, I would add the OPTION to make dictionary
descriptions prescriptive, so you could enforce eg "this column/FIELD is
a number" :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony - 19 Jun 2004 12:40 GMT
> >This is, more than anything, the philosophical divide between relational and
> >Pick folks. The more rules, the more they should be kept OUT of the
[quoted text clipped - 6 lines]
> with the relational approach, but I think the Pick approach is lacking
> here.

Perhaps you agree more than you realise, since a DBMS is a database
MANAGEMENT system, not just a data STORE.  The DBMS sits between the
application and the dumb data store, which is the file system.  That's
why the rules belong in the DBMS.
Laconic2 - 19 Jun 2004 13:59 GMT
> Perhaps you agree more than you realise, since a DBMS is a database
> MANAGEMENT system, not just a data STORE.  The DBMS sits between the
> application and the dumb data store, which is the file system.  That's
> why the rules belong in the DBMS.

Excellent point!

This gets to be even more true when the database integrates data from more
than one application.
mAsterdam - 20 Jun 2004 09:53 GMT
> ... Rules should sit
> BETWEEN the application and the data store.

In the relational approach, by separating the rules and the data store,
that is exactly where they are: BETWEEN the application and the data store.

> I get the impression relational is trying to have a monolithic database
> layer which is trying to be all things to all men. And if that's the
> case, it's bound to fail. Break things up into tasks and layers, and
> don't just have "the database".

Why did you do it: because I can. A lot of presentational
application code has tabular structure. While there may be
no need to share that (it is not user data, it is code)
it is convenient to put it into something which has a
track record of storing tables.
The content of the tables resulting from this practise (the use of the
tables managed by the DBMS to contain application code)
should be treated as what it is, part of the code: apply change
management discipline, include the tables in packaged releases etc.
Dawn M. Wolthuis - 13 Jun 2004 03:30 GMT
> > > Months ago,  I asked whether a pizza with pepperoni and onion was the
> same
[quoted text clipped - 31 lines]
> Are you "just expected to know"  the logical structure of invoices and
> pizzas enough to draw this inference?

I think the way this is handled is one of the (rather few) areas that is not
the same with each MV database on the market.  In the UniData environment,
with which I am most familiar, if there are "associated multivalues" then
they are identified as such and this "association" is named in the
dictionary -- the vocabulary of the view of the data through a particular
portal.  So, I can talk about each multivalued field individually, or the
association (nested table-ish) by its name.

Keep in mind that unlike an RDBMS schema, the vocabulary for MV/PICK systems
is descriptive of the data and not constraining.  The same data can be
described in many different ways.  The association would really be a type of
derived data.

> Not that there aren't things you "just have to know"  in a schema of tables,
> but the Pick people treat it as though it's "intuitively obvious".  Maybe to
> an SME,  but maybe not to everybody else.

No we don't -- oddly enough, it is though.  smiles.  --dawn
Eric Kaun - 14 Jun 2004 17:21 GMT
> Keep in mind that unlike an RDBMS schema, the vocabulary for MV/PICK systems
> is descriptive of the data and not constraining.  The same data can be
> described in many different ways.  The association would really be a type of
> derived data.

How is this useful? I've seen this in COBOL layouts, and was underwhelmed;
it always seemed to cause more problems (and invite even others) than it
appeared to solve. How is this more effective than a view, for example?

- erk
Dawn M. Wolthuis - 14 Jun 2004 21:43 GMT
> > Keep in mind that unlike an RDBMS schema, the vocabulary for MV/PICK
> systems
[quoted text clipped - 6 lines]
> it always seemed to cause more problems (and invite even others) than it
> appeared to solve. How is this more effective than a view, for example?

Logically that is what it is, I guess, but it can be nested.

Take all of the nouns you want to consider and look at their relationships.
Month, Day, and Year are three such nouns and you might want another that is
made up of exactly these three -- so you can derive Date as Month | Day |
Year or derive month, for example, using a function as Month(Date).  Now, if
you are looking at a list of dates, you can do the same thing, performing
functions to group or separate various data.

I'm not sure that answered your concern.  I think being underwhelmed
regarding derived data is appropriate in 2004.  smiles.  --dawn
Eric Kaun - 16 Jun 2004 15:56 GMT
> > How is this useful? I've seen this in COBOL layouts, and was underwhelmed;
> > it always seemed to cause more problems (and invite even others) than it
[quoted text clipped - 11 lines]
> I'm not sure that answered your concern.  I think being underwhelmed
> regarding derived data is appropriate in 2004.  smiles.  --dawn

I don't think any of the above represents derivation; it looks more to me
like operations over types. I think of Date as a type, as well as Day,
Month, and Year.

So I'd set up equivalences like these:

Month(Date(Y, M, D)) = M
Day(Date(Y, M, D)) = D
Year(Date(Y, M, D)) = Y

which assumes only that you have a selector (constructor) Date(Y,M,D). You
could set up others, of course, and you'd need domain specifiers over M and
Y, and then a constructor for Day that took Month into account.

And then the individual types would have other semantics. In particular,
you'd have to introduce the notion of calendars (the above is
GregorianDate), and the base type all of them rely on (not "derived from")
is something like Timestamp, an instant in time.

But I think I've gone far afield of your original points...

- erk
Dawn M. Wolthuis - 16 Jun 2004 16:35 GMT
> > > How is this useful? I've seen this in COBOL layouts, and was
> underwhelmed;
[quoted text clipped - 20 lines]
> like operations over types. I think of Date as a type, as well as Day,
> Month, and Year.

Well, see now, I knew that about you (and your type ;-) and so I poked.
Derived data is just applying operations to stored data, whether a "type
operation" or any other function one wants to write or that comes with the
licenesed toolset.  There remain these two things: data and functions and
the coolest of these is derived data (applying functions, by whatever name,
to data).

> So I'd set up equivalences like these:
>
[quoted text clipped - 5 lines]
> could set up others, of course, and you'd need domain specifiers over M and
> Y, and then a constructor for Day that took Month into account.

yup, or use something like number-of-days-since-D-day and then functions on
that to view it in whatever way is desired

> And then the individual types would have other semantics. In particular,
> you'd have to introduce the notion of calendars (the above is
> GregorianDate), and the base type all of them rely on (not "derived from")
> is something like Timestamp, an instant in time.
>
> But I think I've gone far afield of your original points...

No problem - we've wandered a ways from Wol's original point, so I'll
summarize with it.  Now that we've determined that relational
modeling/theory is NOT mathematics, we can get back to the science of it,
which is where experience comes in.  Since we all have different experiences
and there are not enough good studies (if any) to validate industry best
practices, I think we do well to learn from the experiences of others.

cheers!  --dawn
Anthony W. Youngman - 18 Jun 2004 18:27 GMT
>> Not that there aren't things you "just have to know"  in a schema of
>tables,
[quoted text clipped - 3 lines]
>
>No we don't -- oddly enough, it is though.  smiles.  --dawn

Oddly enough, I've just been trying to get to grips with our new SQL
database. And I asked "how do I know which tables belong together?" I
was told that, given an individual table, I couldn't find out which
other tables "join"ed to it. I "just had to know".

With Pick, you just have to LIST DICT FILENAME, and chances are it'll be
there in front of you. Certainly you get all the files that FILENAME
links to, if not the files that link to FILENAME (probably the same
set).

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Laconic2 - 18 Jun 2004 20:21 GMT
> Oddly enough, I've just been trying to get to grips with our new SQL
> database. And I asked "how do I know which tables belong together?" I
> was told that, given an individual table, I couldn't find out which
> other tables "join"ed to it. I "just had to know".

Bad database, don't you think?  If the appropriate REFERENCES clauses had
been included,  you would be able to figure out the join conditions,
wouldn't you?

Of course, there are times when the ability to join data in an unanticipated
way is actually useful.

As I wrote in "Stupid database tricks",  it's possible to create a totally
inscrutable database, comprehensible only to programmers,  in SQL.  It's
also possible to write spaghetti code in C.

Nothing is foolproof, because fools are so ingenious.  Or so it says on
somebody's tag line.
Anthony W. Youngman - 19 Jun 2004 23:49 GMT
>> Oddly enough, I've just been trying to get to grips with our new SQL
>> database. And I asked "how do I know which tables belong together?" I
[quoted text clipped - 4 lines]
>been included,  you would be able to figure out the join conditions,
>wouldn't you?

Maybe they have been. How would I find out? Or maybe the RDBMS doesn't
support REFERENCES. It's MS Squirrel Server :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony - 19 Jun 2004 12:47 GMT
> >> Not that there aren't things you "just have to know"  in a schema of
>  tables,
[quoted text clipped - 8 lines]
> was told that, given an individual table, I couldn't find out which
> other tables "join"ed to it. I "just had to know".

Either the person you asked was an idiot, or you have a crap SQL DBMS,
or both.  What DBMS is it?  Every SQL DBMS I know has a data
dictionary that shows the RI constraints between the tables, which
gives you what you need.  Of course, some application-centric idiot
"designers" don't bother to define these, because the application
"knows".
Laconic2 - 19 Jun 2004 14:04 GMT
> "Anthony W. Youngman" <wol@thewolery.demon.co.uk> wrote in message > >
Oddly enough, I've just been trying to get to grips with our new SQL
> > database. And I asked "how do I know which tables belong together?" I
> > was told that, given an individual table, I couldn't find out which
[quoted text clipped - 6 lines]
> "designers" don't bother to define these, because the application
> "knows".

I ran into one of those, a few years back.  It was the "Great Plains" order
processing system for dotcoms.  No constraints in the DB, although the DBMS
supports them.  Rows in the same table that represented different "record
types", depending on a value in one of the columns.  Sets formed by doubly
linked lists of foreign keys.  No documentation.  The whole nine yards.

The programmers told me it was "very advanced".  Yeah, right.
Eric Kaun - 14 Jun 2004 17:14 GMT
> [...]
> In the recent Pick example,  showing an invoice,  there's a list of account
[quoted text clipped - 10 lines]
> but the Pick people treat it as though it's "intuitively obvious".  Maybe to
> an SME,  but maybe not to everybody else.

I think the choice of data structure is important, both in terms of
correctness and in communication. I see this a lot on Java - people using
ArrayList everywhere just because they can, and then doing nausea-inducing
searches through the lists, as opposed to using a Set or Bag or some other
structure. And besides the simple bad choice, I keep thinking "O how I wish
I could do an in-memory SELECT here..."
Anthony W. Youngman - 18 Jun 2004 17:56 GMT
>> It's a slippery handle, but maybe - but be careful asking about "the same
>> as" in an OO context - that subject gets very confusing to OOers. :-)
[quoted text clipped - 24 lines]
>Are you "just expected to know"  the logical structure of invoices and
>pizzas enough to draw this inference?

No. Pick stores metadata in its dictionaries, and has a concept called
ASSOCiation.

With the pizza, there is no ASSOCiation defined between CHEESE and
TOPPING, but with the invoice there is an ASSOCiation between ACCOUNT.NO
and AMOUNT. That is, if the programmer has remembered to define it ...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Laconic2 - 18 Jun 2004 20:09 GMT
> No. Pick stores metadata in its dictionaries, and has a concept called
> ASSOCiation.
>
> With the pizza, there is no ASSOCiation defined between CHEESE and
> TOPPING, but with the invoice there is an ASSOCiation between ACCOUNT.NO
> and AMOUNT. That is, if the programmer has remembered to define it ...

That's a fair enough answer to the question as stated.  Data is only self
describing if somebody made it that way.

As far as "if the programmer has remembered to define it"  goes,  I find
that no more, and no less, of a pitfall than the REFERENCES constraint in
SQL.
Dawn M. Wolthuis - 18 Jun 2004 21:48 GMT
> > No. Pick stores metadata in its dictionaries, and has a concept called
> > ASSOCiation.
[quoted text clipped - 9 lines]
> that no more, and no less, of a pitfall than the REFERENCES constraint in
> SQL.

Only slightly different, for better or worse, in that if the REFERENCES
constraint is not there, you can still create a query with the appropriate
join, so there is nothing to force you to put in the REFERENCES constraint,
where in PICK, the user will complain that they can't get the data out the
way they want if the ASSOC is not there, so the dictionary, which is just
descriptive, does get that type of accuracy quite soon after deployment if
it is not built in from the start.

--dawn
Anthony W. Youngman - 18 Jun 2004 17:51 GMT
>> He did say that, and I've been thinking about it, and am not sure it's
>> accurate. The order of values in a list attribute in a Pick file seems
[quoted text clipped - 12 lines]
>I got several cute responses, but nobody really addressed the underlying
>issue.  Sounds like you've got a handle on it.

In other words, is it a set, a bag, or a list?

Note that it's easy to go from a list to either of the other two. But in
order to go back, the set or bag needs to contain extra data (ie the
order) over the list.

Because Pick stores attributes as lists (if relevant) the order is
available to the db engine as metadata if required. And it can't be
accidentally lost by an analyst :-) So I would argue that storing things
as lists is better, because you can always get the other two if you
want.

After all, your question could be taken to mean "Is it a pizza with both
pepperoni and onion" or "is it a pizza with pepperoni on it then onion
on top of that".

If the analyst hasn't taken it into account, then relational is sunk
without a redesign. With Pick you just tell your till-operatives that
it's to be entered as an ordered list, not a set :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Marshall Spight - 27 Jun 2004 02:42 GMT
> Note that it's easy to go from a list to either of the other two. But in
> order to go back, the set or bag needs to contain extra data (ie the
> order) over the list.

I don't see how you could consider that data "extra" if it was
there originally.

Anyway, the list [A, B, C] expressed as a set is { (1, A), (2, B), (3, C) }
where [] denotes an ordered collection and {} denotes an unordered
collection.

Going back to the list from the information-preserving set is not that hard.

> Because Pick stores attributes as lists (if relevant) the order is
> available to the db engine as metadata if required. And it can't be
> accidentally lost by an analyst :-) So I would argue that storing things
> as lists is better, because you can always get the other two if you
> want.

Although I don't think having lists as the only collection primitive is
a good idea, there is one key point that I will gladly grant you:

Lists are very common, and SQL doesn't handle them well at all.
RM doesn't handle them well either.

(I can also believe that MV handles them quite well, although I
have no direct experience to that end.)

Does anyone know of a "list calculus" or "list algebra" with a
formal definition? It is just too simple for anyone to have
cared about?

Marshall
mAsterdam - 27 Jun 2004 10:51 GMT
>>Note that it's easy to go from a list to either of the other two. But in
>>order to go back, the set or bag needs to contain extra data (ie the
[quoted text clipped - 8 lines]
>
> Going back to the list from the information-preserving set is not that hard.

There is more to it.

[forced meaning/overspecification]
The above is one way of expressing the list as a set.
There are other ways, and that is where the complication comes in.
One has to decide which.

First, the way you chose is not without problems:

Let's put a new element 'N' into
the list, right after the 'B':

The new list is [A, B, N, C].
Now what is the set?

is it UC1: { (1, A), (2, B), (3, C), (2.5, N) } or
is it UC2: { (1, A), (2, B), (4, C), (3, N) } ?

In the case of UC1 we will have to make sure that there is allways
room for inserts, In the case of UC2 every insert causes updates to
all elements that should be after the new element.

Now there are other ways of representing lists as unordered collection,
better or worse at some or other aspects - but that is not the problem.

The problem is we have to make a choice with consequences to get from
the list to a unordered collection carrying the same meaning, so we are
capable of reconstructing the list. It is a strange thing: we require an
'unordered collection' to preserve order - now if that isn't asking for
trouble ... :-) but I am digressing.

Thing is we *have* to make a complex choice,
because (and only because) we decided that the order was
meaningful. But the choice we have to make has *more*
consequences than we bargained for. I could imagine
people calling that 'extra data' - but what does the
extra data mean? In the above solutions we get an extra
column (or attribute) stating rank. In another solution
we might get an extra colmunn designating the  'next in order'
element - or would 'prior' be better?

I don't care, but I must decide.
It is somewhat like having to choose which side on the road
to drive on without traffic signs and without
knowing what country you are in.

There are even more worms in this can, I think, but
I would appreciate your comments on this one.

<snip>
> Lists are very common, and SQL doesn't handle them well at all.
> RM doesn't handle them well either.
[quoted text clipped - 5 lines]
> formal definition? It is just too simple for anyone to have
> cared about?

Lisp and prolog come to mind - but that is not what you are
looking for - at least it is not what I am looking for.
Marshall Spight - 27 Jun 2004 16:55 GMT
> >>Note that it's easy to go from a list to either of the other two. But in
> >>order to go back, the set or bag needs to contain extra data (ie the
[quoted text clipped - 9 lines]
> [...]
> First, the way you chose is not without problems:

For sure!

> Let's put a new element 'N' into
> the list, right after the 'B':
[quoted text clipped - 4 lines]
> is it UC1: { (1, A), (2, B), (3, C), (2.5, N) } or
> is it UC2: { (1, A), (2, B), (4, C), (3, N) } ?

It's UC2.

> In the case of UC2 every insert causes updates to
> all elements that should be after the new element.

Let us also consider the two other common implementations
of lists: linked lists and arrays.

In UC2, an insert requires O(N) time: every index above
the one inserted requires updating. In a linked list, an insert
requires O(1) time, but locating an item in order to insert in
front of it is O(N). In an array, an insert requires O(N) time,
because you have to move all the higher elements up.

The issue here is not performance, it's interface. Using SQL
to manipulate ordered data *where the ordering is positional*
and *not by any logical component of the data* is just way
too hard.

"Lists are very common, and SQL doesn't handle them well at all.
RM doesn't handle them well either."

Let me expand on what I mean by "handle," and to do so,
I'll use the "structure, integrity, manipulation" definition of
a DBMS.

If you have an unchanging list, using (1,A), (2,B), (3,C) etc as a list
*structure* works just fine. All the query operators you'd expect
to have, you have: get me item 2, how many items are there, etc.
You'd also get some not-so-common queries: at what indicies
does the letter "Q" occur?

But let's consider manipulation. Simple insert, using your (3,N) tuple.

Java:
list.insert(3, N);

SQL:
begin
update List set position = position + 1 where position >= 3
insert into List (position, value) values (3, N)
commit
[note if you do them in the wrong order you're screwed.]

Wait, I just remembered: Date says integrity enforcement
should be at the statement level, not the transaction
level. So the first update will fail, because it leaves a
hole in the sequence. So I've gotta figure out how to
update all those pos values at once, which I think I
can do with mod and some offsets. I expect it would
take me a few hours to figure out.

Now: integrity. I have to specify some checks. Uh:
unique(pos)
check( pos >= 0 )
check( select max(pos) from List = select count(pos) from List )

Did that get it? I think it did, but I'm not sure. Also, if I
saw that in a table declaration, would I say "list" or would
I say "bunch of integrity constraints."

> The problem is we have to make a choice with consequences to get from
> the list to a unordered collection carrying the same meaning, so we are
> capable of reconstructing the list.

I don't disagree, but I might say it differently: what matters is what we
options we have to enforce integrity, and what operators we have
to perform manipulation. I think RM is a step backwards in ease-of-use
for each of these where lists are concerned. (Which is not to say
that I think we should make our decisions on that basis alone, but I
do think it's significant.)

> Thing is we *have* to make a complex choice,
> because (and only because) we decided that the order was
> meaningful.

I don't think the choice is intrinsically complex. I think
the issue is just that SQL (and TTM for that matter) don't
give you even the most basic list manipulation or integrity
checks.

> There are even more worms in this can, I think, but
> I would appreciate your comments on this one.
[quoted text clipped - 12 lines]
> Lisp and prolog come to mind - but that is not what you are
> looking for - at least it is not what I am looking for.

No, if I want a PL with list operators, I can find them anywhere.
I was more thinking of something like the relational algebra,
with its minimal set of operators, additional operators defined
in terms of the minimal ones, and a definition of what it means
for a set of operators to be complete. Only for lists.

Frighteningly, if I can't find something like that, I may have
to do it myself.

Marshall
mAsterdam - 27 Jun 2004 21:38 GMT
>>>>Note that it's easy to go from a list to either of the other two. But in
>>>>order to go back, the set or bag needs to contain extra data (ie the
[quoted text clipped - 21 lines]
>
> It's UC2.

Ok. No argument, just wondering: any particular reason to
discard UC1?

>>In the case of UC2 every insert causes updates to
>>all elements that should be after the new element.
>
> Let us also consider the two other common implementations
> of lists: linked lists and arrays.

Yes, there are more alternatives, all with pros and cons.

> In UC2, an insert requires O(N) time: every index above
> the one inserted requires updating. In a linked list, an insert
[quoted text clipped - 3 lines]
>
> The issue here is not performance, ...

Indeed.

> it's interface.

Is it really? Just interface? Maybe I just don't
understand what you mean by that. I suspect it goes
beyond interface. To be more specific: I think it is related
to the information principle:

>          The entire information content of a relational database
>          is represented in one and only one way: namely, as
>          attribute values within tuples within relations.

As soon as some order is said to have information content,
this principle requires that to have it in a relational database,
this content must be represented as attribute values.

So we have to answer the question: which attribute(s')
values? The answer seems obvious: list typed attributes.
But that's not the way it is done - so what stops that?
I know: the lack of the "Spight list algebra"!

BTW: Attribute is still on the glossary's todo list (hint :-)

> Using SQL
> to manipulate ordered data *where the ordering is positional*
[quoted text clipped - 25 lines]
> commit
> [note if you do them in the wrong order you're screwed.]

No problem. Assume a preprocessor and make a
macro^H^H^H^H^H shorthand.

> Wait, I just remembered: Date says integrity enforcement
> should be at the statement level, not the transaction
[quoted text clipped - 12 lines]
> saw that in a table declaration, would I say "list" or would
> I say "bunch of integrity constraints."

:-)

>>The problem is we have to make a choice with consequences to get from
>>the list to a unordered collection carrying the same meaning, so we are
[quoted text clipped - 6 lines]
> that I think we should make our decisions on that basis alone, but I
> do think it's significant.)

>>Thing is we *have* to make a complex choice,
>>because (and only because) we decided that the order was
[quoted text clipped - 4 lines]
> give you even the most basic list manipulation or integrity
> checks.

The more options we have, the more serious the problem is.
Is list [A, B, N, C] ambiguous? Is 'insert Y into L1 after B'
amibiguous?

Say we have a pizza-attribute 'topping' of type 'list-of-toppings'
Is 'constraint FK topping refers to toppings' ambiguous?

If it is intrinsically simple as you say - what is your
explanation why SQL (and TTM) do simply not address meaningful
order?

Maybe we are just overlooking something obvious.

> I was more thinking of something like the relational algebra,
> with its minimal set of operators, additional operators defined
[quoted text clipped - 3 lines]
> Frighteningly, if I can't find something like that, I may have
> to do it myself.

"Spight list algebra". Sounds good. :-)
Marshall Spight - 27 Jun 2004 23:27 GMT
> >>Let's put a new element 'N' into
> >>the list, right after the 'B':
[quoted text clipped - 9 lines]
> Ok. No argument, just wondering: any particular reason to
> discard UC1?

If you deviate from the integer domain for your index, you
no longer have a list; you now have just another set with
just another regular attribute. A list or sequence is a mapping
from the natural numbers to another set. For example, a string
is a mapping from nat -> char.

UC1 is a perfectly valid approach; it's just not a list.

> > The issue here is not performance, ...
>
[quoted text clipped - 14 lines]
> this principle requires that to have it in a relational database,
> this content must be represented as attribute values.

Something the information principal *doesn't* say is that
those relations cannot be a subtype of relation.
Ordered relation is a subtype of unordered relation, and
list is a subtype of ordered relation. And if we allow relation-
valued attributes (RVAs) then that means we also allow
lists as attributes.

> So we have to answer the question: which attribute(s')
> values? The answer seems obvious: list typed attributes.
> But that's not the way it is done - so what stops that?
> I know: the lack of the "Spight list algebra"!

Actually, I think it's exactly that. (Not that it needs to
have my name on it, of course.) I think we need a *theoretical*
understanding of the subtyping relationship between
sets, totally ordered sets, partially ordered sets, and lists.
(Mathematicians, of course, have had this understanding
for years, but not too many people in data management
are there yet. Still, I wish I knew as much math as, say
Mikito Harakiri.)

We also need the relational language to have a type system
that isn't antediluvian. SQL's type system is right in line
with other languages of the day, such as FORTRAN, Cobol,
and Pascal (or C for that matter.) It hasn't advanced much,
at least not in the type system, and it's extraordinary market
success has defeated all newcomers.

> > [note if you do them in the wrong order you're screwed.]
>
> No problem. Assume a preprocessor and make a
> macro^H^H^H^H^H shorthand.

Agreed, but I think that for practical purposes you *have* to
have this shorthand form.

> > I don't think the choice is intrinsically complex. I think
> > the issue is just that SQL (and TTM for that matter) don't
[quoted text clipped - 4 lines]
> Is list [A, B, N, C] ambiguous? Is 'insert Y into L1 after B'
> amibiguous?

I'd say "no" and "no." I'm not certain I see your point, though.
Maybe it's that the current situation is quite complicated if
you want to handle lists? I'd agree with that. That's why I
think we need those list primitives (derived from that
list algebra) to make it simple.

> Say we have a pizza-attribute 'topping' of type 'list-of-toppings'

Why is that a list? When I go to pizza hut and ask for ham and
pineapple, I get the same thing as if I asked for pineapple and
ham. Pizza toppings aren't a list; they're a set.

I think part of the reason people see lists everywhere is because
they're used to supplying information in the form of a list,
even when the order info isn't relevant. This makes for
potential program bugs, because

List{ham, pineapple} != List {pineapple, ham}
but
Set {ham, pineapple} == List {pineapple, ham}

> Is 'constraint FK topping refers to toppings' ambiguous?

I don't get the question.

> If it is intrinsically simple as you say - what is your
> explanation why SQL (and TTM) do simply not address meaningful
> order?
>
> Maybe we are just overlooking something obvious.

I think Date et. al.'s usual culprit suffices here: lack of education.
I would note, ironically, that that group has never managed
to clarify the (IHMO) essential distinction between partial
order and total order when discussing order.

> > I was more thinking of something like the relational algebra,
> > with its minimal set of operators, additional operators defined
[quoted text clipped - 5 lines]
>
> "Spight list algebra". Sounds good. :-)

I'll get right on it.  :-)

Marshall
mAsterdam - 28 Jun 2004 23:36 GMT
> mAsterdam wrote:
>>
[quoted text clipped - 19 lines]
>
> UC1 is a perfectly valid approach; it's just not a list.

Now I am puzzled. What makes the integer
*not* a regular attribute? -

I'll try myself: it is easier to hide.
And that is exactly what is
necessary. We can pretend it's not there, and
we should, because in the unambigous list
presentation it is not there. Any more data
than exactly necessary for presenting the
order should not be visible.

(Yep: interface :-)

<snip>

> Something the information principal *doesn't* say is that
> those relations cannot be a subtype of relation.
> Ordered relation is a subtype of unordered relation, and
> list is a subtype of ordered relation. And if we allow relation-
> valued attributes (RVAs) then that means we also allow
> lists as attributes.

How does this eliminate the extra choice to make?
(- I think you agree it should).

>>So we have to answer the question: which attribute(s')
>>values? The answer seems obvious: list typed attributes.
[quoted text clipped - 16 lines]
> at least not in the type system, and it's extraordinary market
> success has defeated all newcomers.

Yep. Types. Sigh. Gimme gimme gimme :-)

But later. One thing at a time. For now: lists!

>>>[note if you do them in the wrong order you're screwed.]
>>
[quoted text clipped - 3 lines]
> Agreed, but I think that for practical purposes you *have* to
> have this shorthand form.

/me nods.

>>>I don't think the choice is intrinsically complex. I think
>>>the issue is just that SQL (and TTM for that matter) don't
[quoted text clipped - 16 lines]
> pineapple, I get the same thing as if I asked for pineapple and
> ham. Pizza toppings aren't a list; they're a set.

To connaisseurs it's a list. The order of the toppings
matters. I assumed that this foundation of the
pizza-model was common knowledge ;-)

> I think part of the reason people see lists everywhere is because
> they're used to supplying information in the form of a list,
[quoted text clipped - 8 lines]
>
> I don't get the question.

After the above pizza-model 101 you probably do.

It's just to state that the list-member should be first
class^H^H^H^H^H type citizens.

>>If it is intrinsically simple as you say - what is your
>>explanation why SQL (and TTM) do simply not address meaningful
[quoted text clipped - 6 lines]
> to clarify the (IHMO) essential distinction between partial
> order and total order when discussing order.

I have to admit that I suppressed bringing it (partial order) in
earlier. I think complete sequence (non-partial order) is to
be viewed as a special case of partial ordering (1 part).
So maybe the simplest strategy is: get the simple, special
case first in thorough detail,
than tackle the more complicated (partial order), than show
the simple case is a specialization of the complex.
Not unlike the approach in 'Temporal data'.

>>>I was more thinking of something like the relational algebra,
>>>with its minimal set of operators, additional operators defined
[quoted text clipped - 7 lines]
>
> I'll get right on it.  :-)

Just lists for now, ok?
Marshall Spight - 29 Jun 2004 02:53 GMT
> >>>It's UC2.
> >>
[quoted text clipped - 21 lines]
>
> (Yep: interface :-)

Mostly I agree. But I don't think the "we should"
part always applies. Sometimes we should, and
sometimes we shouldn't. It depends on what
particular operations we want to do. The most
flexible way is where we get to choose.

> <snip>
> >
[quoted text clipped - 7 lines]
> How does this eliminate the extra choice to make?
> (- I think you agree it should).

No, I don't think I do. What you have to do is decide
how you are going to model your data. *That* choice
is fundamental; you can't get rid of it. Do your pizza
toppings come in order or not? The data model you're
working with won't help you make that decision; it
only comes into play when you have to figure out how
to *express* that decision.

> But later. One thing at a time. For now: lists!

Deal.

> >>Say we have a pizza-attribute 'topping' of type 'list-of-toppings'
> >
[quoted text clipped - 5 lines]
> matters. I assumed that this foundation of the
> pizza-model was common knowledge ;-)

Well, okay; whether a particular aspect of your data model
is ordered or not is part of the conceptual modelling; that's
the part that's all art, no science.

Pizza Hut's pizza-ordering web app treats pizza toppings
as unordered; I postulate a chain called "Bob's Totally
Delicious Pizza withTotally Ordered Pizza Toppings"
that keeps them in order.

> > I think part of the reason people see lists everywhere is because
> > they're used to supplying information in the form of a list,
[quoted text clipped - 13 lines]
> It's just to state that the list-member should be first
> class^H^H^H^H^H type citizens.

Ah; I see. I agree the list members should be first-class
citizens. Thus, you should be able to say, I have a list
of items which are each foreign keys to <whatever>.

> I have to admit that I suppressed bringing it (partial order) in
> earlier. I think complete sequence (non-partial order) is to
[quoted text clipped - 4 lines]
> the simple case is a specialization of the complex.
> Not unlike the approach in 'Temporal data'.

Yes, but you wanted to talk about lists, and a totally
ordered set is not the same thing as a list!

Consider the string "abca". This is a list of characters,
but it certainly isn't a totally ordered set over {abc}.

I wish it was simpler, but it's not.

Marshall
mAsterdam - 01 Jul 2004 22:33 GMT
>>>>>It's UC2.
>>>>
[quoted text clipped - 27 lines]
> particular operations we want to do. The most
> flexible way is where we get to choose.

The flexible way would unfortunately also
be the way that would force us to make
decicions where there are no
a priori criteria.

If *all* we want is to assiocate some meaning with
the order of some values, the *only* meaningful thing
besides those values should be the order of them.
Nothing more - for if we would have more, we'd have
to decide between them, and thus be forced to assign
meaning where we have none.

The fact that there are so many ways to represent lists
(just read a few of Jan Hidders' links), all with
different consequences, suggests that there should
be a visible, simple part (only the values in order
+ operators) and an invisible, complex part
(the values, and all that is necessary to keep
them in order under the visible operators).

The complex, invisible part could use the flexiblity
to make efficiency decisions based on the (a posteriori)
actual content and usage of the values, but these
choices should in no way affect the content of the
visible, meaningful part.

>><snip>
>>
[quoted text clipped - 15 lines]
> only comes into play when you have to figure out how
> to *express* that decision.

We decided that, beside the values, the order
is relevant.
For now, it would be a good thing (TM) to
also decide/assume that *only* the order
and the values are relevant.

>>But later. One thing at a time. For now: lists!
>
[quoted text clipped - 13 lines]
> is ordered or not is part of the conceptual modelling; that's
> the part that's all art, no science.

Yep, hence the assumptions.

> Pizza Hut's pizza-ordering web app treats pizza toppings
> as unordered; I postulate a chain called "Bob's Totally
> Delicious Pizza withTotally Ordered Pizza Toppings"
> that keeps them in order.

:-)

>>>I think part of the reason people see lists everywhere is because
>>>they're used to supplying information in the form of a list,
[quoted text clipped - 17 lines]
> citizens. Thus, you should be able to say, I have a list
> of items which are each foreign keys to <whatever>.

Exactly.

>>I have to admit that I suppressed bringing it (partial order) in
>>earlier. I think complete sequence (non-partial order) is to
[quoted text clipped - 10 lines]
> Consider the string "abca". This is a list of characters,
> but it certainly isn't a totally ordered set over {abc}.

Yep, that is clear. I was also thinking about precedence
ordering (CPM/PERT type planning graphs) to *exclude*
them from the problem space (at least for now).

> I wish it was simpler, but it's not.

You wouldn't have volonteered if it were :-)
Anthony W. Youngman - 10 Jul 2004 01:02 GMT
>>>Say we have a pizza-attribute 'topping' of type 'list-of-toppings'
>>  Why is that a list? When I go to pizza hut and ask for ham and
[quoted text clipped - 4 lines]
>matters. I assumed that this foundation of the
>pizza-model was common knowledge ;-)

Tomato always goes at the bottom, cheese on the top :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Marshall Spight - 10 Jul 2004 02:42 GMT
> >>>Say we have a pizza-attribute 'topping' of type 'list-of-toppings'
> >>  Why is that a list? When I go to pizza hut and ask for ham and
[quoted text clipped - 6 lines]
>
> Tomato always goes at the bottom, cheese on the top :-)

Good point.

Is crust a topping?

Marshall
Gene Wirchenko - 10 Jul 2004 05:58 GMT
>>>>Say we have a pizza-attribute 'topping' of type 'list-of-toppings'
>>>  Why is that a list? When I go to pizza hut and ask for ham and
[quoted text clipped - 6 lines]
>
>Tomato always goes at the bottom, cheese on the top :-)

    What?  I always have tomato on top.  Maybe, you are confusing
tomato and tomato sauce?

    Just to be safe: I live in Kamloops, British Columbia, Canada.
Please do not open a pizzeria in Kamloops.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Dawn M. Wolthuis - 30 Jun 2004 01:07 GMT
> > >>Note that it's easy to go from a list to either of the other two. But in
> > >>order to go back, the set or bag needs to contain extra data (ie the
[quoted text clipped - 22 lines]
>
> It's UC2.

Yes, there is some such list algebra in the DataBASIC language associate
with PICK and this is how such an insert would be handled logically.

> > In the case of UC2 every insert causes updates to
> > all elements that should be after the new element.

Only if the number is stored rather than implied by the ordering of the data
in linked list, for example

> Let us also consider the two other common implementations
> of lists: linked lists and arrays.
[quoted text clipped - 9 lines]
> and *not by any logical component of the data* is just way
> too hard.

yes, exactly!

> "Lists are very common, and SQL doesn't handle them well at all.
> RM doesn't handle them well either."
[quoted text clipped - 37 lines]
> saw that in a table declaration, would I say "list" or would
> I say "bunch of integrity constraints."

very good point

> > The problem is we have to make a choice with consequences to get from
> > the list to a unordered collection carrying the same meaning, so we are
[quoted text clipped - 15 lines]
> give you even the most basic list manipulation or integrity
> checks.

RM, or at least SQL, focuses on a) scalar values and b) relations.  Adding
c) lists as a type with its own functions (operators) native to the database
and SQL language is a start.  Viewing relations and lists both as functions
allows us to think about what functions can be applied to which others, in
particular, which functions can be applied to functions where the, uh,
domain of one of the parameters is the natural numbers (to mix vocabulary).
These would be the ordered lists.

> > There are even more worms in this can, I think, but
> > I would appreciate your comments on this one.
[quoted text clipped - 5 lines]
> > > (I can also believe that MV handles them quite well, although I
> > > have no direct experience to that end.)

sometimes yes, sometimes no.

> > > Does anyone know of a "list calculus" or "list algebra" with a
> > > formal definition? It is just too simple for anyone to have
[quoted text clipped - 11 lines]
> Frighteningly, if I can't find something like that, I may have
> to do it myself.

I look forward to it!  --dawn
Anthony W. Youngman - 10 Jul 2004 00:35 GMT
>> Let's put a new element 'N' into
>> the list, right after the 'B':
[quoted text clipped - 6 lines]
>
>It's UC2.

Why? (Okay, I do agree with you. But why?) I think it's because your
ordering column is actually "list position" which by definition must be
a sequential integer. But that column could equally be defined as
"relative position" in which case UC1 would be fine.

>> In the case of UC2 every insert causes updates to
>> all elements that should be after the new element.
[quoted text clipped - 12 lines]
>and *not by any logical component of the data* is just way
>too hard.

Is it "way too hard" or is it simply because the underlying relational
model has no concept of order? You cannot manipulate something that has
no conceptual existence.

>"Lists are very common, and SQL doesn't handle them well at all.
>RM doesn't handle them well either."

:-)
>
>Let me expand on what I mean by "handle," and to do so,
>I'll use the "structure, integrity, manipulation" definition of
>a DBMS.

Of a DBMS, or the relational version of that definition?

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Marshall Spight - 10 Jul 2004 02:41 GMT
> >> Let's put a new element 'N' into
> >> the list, right after the 'B':
[quoted text clipped - 10 lines]
> ordering column is actually "list position" which by definition must be
> a sequential integer.

Exactly.

> >> In the case of UC2 every insert causes updates to
> >> all elements that should be after the new element.
[quoted text clipped - 15 lines]
> Is it "way too hard" or is it simply because the underlying relational
> model has no concept of order?

It's too hard because SQL doesn't have primitives that make it easy.
If it did, it would be easy.

> >"Lists are very common, and SQL doesn't handle them well at all.
> >RM doesn't handle them well either."
>
> :-)

I figured we'd agree on that point.

> >Let me expand on what I mean by "handle," and to do so,
> >I'll use the "structure, integrity, manipulation" definition of
> >a DBMS.
>
> Of a DBMS, or the relational version of that definition?

It's a definition of DBMS; it's independent of whether the
DBMS is relational or not.

Marshall
Jan Hidders - 01 Jul 2004 01:59 GMT
> Does anyone know of a "list calculus" or "list algebra" with a
> formal definition? It is just too simple for anyone to have
> cared about?

Yes, it exists, and, no, it is not too simple for anyone to have cared
about. There are in fact lots of them. The most interesting ones are
based on the the comprehension syntax.

  Comprehension syntax.
  Peter Buneman, Leonid Libkin, Dan Suciu, Val Tannen and Limsoon Wong.
  SIGMOD Record, 23 (1994), 87-96.

Related to those are the ones based on monads.

  Comprehending monads
  Philip Wadler. Mathematical Structures in Computer Science, Special
  issue of selected papers from 6'th Conference on Lisp and Functional
  Programming, 2:461-493, 1992.

The core of XQuery is to some extent based on that. With Google and
citeseer you should be able to find on-line versions. There are many
many more papers on this, but these should set you in the right direction.

If you are into that sort of thing you might want to read about calculi
for even more general data structures such as pomsets (partially ordered
mutisets, a generalization of sets, bags and lists):

 An Algebra for Pomsets
 Stéphane Grumbach and Tova Milo
 Proceedings of the 5th International Conference on Database Theory
 191-207, 1995

Happy reading,

-- Jan Hidders
Anthony W. Youngman - 10 Jul 2004 00:26 GMT
>> Note that it's easy to go from a list to either of the other two. But in
>> order to go back, the set or bag needs to contain extra data (ie the
[quoted text clipped - 6 lines]
>where [] denotes an ordered collection and {} denotes an unordered
>collection.

Well, you have just created a field called ORDER, and created the values
1, 2 and 3.

The point is that the data was NOT there initially - it was implicit. If
something is implicit, then it quite clearly does not have existence in
its own right.

>Going back to the list from the information-preserving set is not that hard.

Correct. But surely it's better to throw away the implicit ordering if
it's unnecessary at the point of use, than to suddenly discover that it
was necessary but that it's been thrown away by accident ... :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Marshall Spight - 10 Jul 2004 02:37 GMT
> >> Note that it's easy to go from a list to either of the other two. But in
> >> order to go back, the set or bag needs to contain extra data (ie the
[quoted text clipped - 8 lines]
>
> Well, you have just created a field called ORDER,

Yes.

> and created the values 1, 2 and 3.

No, they were there already.

> The point is that the data was NOT there initially - it was implicit.

It sounds like what you're saying is that information encoded as
order is "not there." This is incorrect; order encodes information.

> If something is implicit, then it quite clearly does not have existence
> in its own right.

Strongly disagree. If it didn't exist, then it couldn't carry any information.

> >Going back to the list from the information-preserving set is not that hard.
>
> Correct. But surely it's better to throw away the implicit ordering if
> it's unnecessary at the point of use, than to suddenly discover that it
> was necessary but that it's been thrown away by accident ... :-)

If you can't tell what information is necessary and what isn't, you're
not going to be able to manage that information anyway.

Also, there's no reason why the list [A, B, C] and the set
{ (1, A), (2, B), (3, C) } can't have the exact same in-memory
representation.

Marshall
mAsterdam - 10 Jun 2004 21:16 GMT
>>Roman numerals still exist. They work quite well in some contexts.
>>Besides, there is tradition.
>>Do you know what the QWERTY keyboard was designed for?

> I was told once it was to keep the mechanical hammers attached to the keys
> from hitting each other, so they needed to put keys you would likely hit one
> after the other so they were not close together.

IOW: It was designed to *slow* the typing *down*.
Alternative keyboards have been designed, built, and
tested to significantly ( > 2x) increase the speed of
typing. E.g. velotype (http://www.velotype.com/index.html)
is used in specialised applications, but it never
really caught on. People tend to like and kling on to
what is there and works.

>>>... an invoice is just one of these things, but
>>> the data from the invoice is also available
[quoted text clipped - 9 lines]
>>>you can "get to" from there (via
>>>declared links as one might have in a join statement).

Phew ;-) So a portal is similar to a view.
What is the difference?

[snip]

> I'm still not saying this both accurately and clearly.  I'll think about it
> some more.  There is no problem paying one line item from an invoice and I'm
[quoted text clipped - 4 lines]
> to come on multiple sheets of paper so you could retrieve the one piece of
> paper related to this line item and check it off that way?

No, I would prefer to have one listing - wether on paper or on my
handheld pen-computer. So - I need to be able to treat the whole invoice
as one thing, and I need to be able to treat the invoice as being
composed of items. So the model of the invoice should be designed
to cater for both needs. Though earlier posts suggested that the
chopping up is somehow a typical relational, bad 1NF thing to do
I suspect that is rather easy to to, both in MV or in a RDBMS.

Now, I also want one listing for the measurements of stock turnaround,
in order to aim for just-in-time logistics and optimally sized orders.
In an RDBMS I would create another view on the same schema. Would, in
MV, another portal be the way to approach this?

>>The first department to get a database wins.
>>The rest has to jiggle their stuff into the imposed hierarchy.
[quoted text clipped - 5 lines]
> adding files, fields, functions, but it works just fine and again I'll have
> to think of how to make that perfectly clear.

Another poster already asked an ownership question, so I won't go into
that here. "see information that Dept #1 maintains." gives me an uneasy
feeling, though.

> As Wol has said, you can take any PICK database and view it as relational,
> but you can't go the other way around.  

And you will have seen that I asked him a question about that.

> If you could, then this discussion
> would be moot -- we could just toggle between different perspectives on the
[quoted text clipped - 12 lines]
> to put two classification codes on this entity, you do so.  Overly
> simplified example, but ...

In two ways. The straight-jacket feeling this gives comes from
oversimplification in the relational design.  :-)
But it is a realistic example, I think.
Eric Kaun - 10 Jun 2004 20:48 GMT
> > I am all for lowering this cost - decreasing the "impedance mismatch", so
> to
[quoted text clipped - 8 lines]
> buck by using relational theory, that would be a different story.  I'd
> strongly suggest we nudge relational databases toward pragmatism ;-)

Several things here:
1. I doubt "overwhelmingly good evidence" motivated people to pick up Pick
(or any other technology)
2. People tend to use that with which they're familiar
3. The market doesn't necessarily guarantee anything (and no, I'm not
anti-capitalism) about what it produces
4. The gadgeteering which makes software development fun also produces a
tendency to wallow in what you know, and in what seems "cool", orthogonal to
any actual value
5. Finally, the criteria businesses have for their solutions also factors in
learning ability, prevalence of those taught in technology X, etc.

> > Agreed - however, while my experience comes from a large company, it's
work
> > done for a relatively small business unit. I was the only developer on
> > several of the projects, and my user base was fairly small. I was DBA,
[quoted text clipped - 6 lines]
> way it was sent?  OR was it the loosey-gooseyness of it where there are not
> as many texts with rules for "how to"?

Good questions.

1. The XML was mapped into Java objects (actually object graphs), which
turns out to be trivial.
2. However: that requires lots and lots of redundant and overlapping methods
to query that object graph (e.g. I want to find a certification with a
destination URL of "http:blah", so I write a method, then later I need to
find one by ID, etc. etc.). I can get around it (now) using
expression-parsing libraries (JXPath, Jelly, others). Still, those aren't
type-checked, which gives them some agility but gives me less comfort (I've
seen how easy it is for them to go wrong type-wise)
3. Doing "agile XML", at least here, resulted in multiple files with
overlapping attributes. True, some of that was just bad design, but from
what I've seen here, those apps using Oracle are more disciplined. Maybe
because they have to be? Are just encouraged to be? Not sure...
4. XML's type system is impoverished - some validation is easy, but no
constraints
5. We use libraries for XML, including Castor to generate Java objects, so
the how-to isn't lacking in this case.

> > I've never used Pick -
> > sounds like their environment gives them a lot of power, and while that's
> > nice, I'd still never think of thinking of an invoice as a single
> > proposition or "object". It's not.
>
> Perhaps you've never seen one?  ;-)

My wife handles the finances, since she's damn good at it. And all of the
invoices I "see" are so... flat. Two-dimensional. Surely ripe for
relational? :-)

> > It's a fairly complex series of them.
>
> That too, but through how many portals would you want to have to go to
> collect all such?  This has to do with how the "user" (application developer
> or dba, for example) should view the data.

Oh, I agree the user should have few portals. But application developers
want to see the messy back-room (to extend the department-store metaphor).
Or more accurately, developers are like the store managers who map many
different suppliers' products into departments, clusters, and shelves.

> > Just like an "order", an invoice is a fairly complex confluence of
> > phenomena, and not even a static one (modifications / confirmations to
[quoted text clipped - 11 lines]
> dependent on any other entities in the system.  What is that top level of
> nodes after ENTITY in a system, such as PEOPLE PLACES THINGS.

Ah, I see. Yes, I agree that those drive UIs, reports, etc. - at least for a
while. I focus on those technologies that will make that part easy, AND give
me some assurance in their consistency and that I can drive more complex
requirements easily. And those complex ones always arise quickly, I've
found... if I've oversimplified early (and I've done the entity/object style
of design before), I usually regret it. Sometimes that's warranted, if
time-to-market is the critical success factor.

> > And I disagree. An invoice is many somethings. If your questions deal only
> > with the set (e.g. presenting an invoice on a screen), then great - treat
[quoted text clipped - 11 lines]
> accessed.  Each portal can see everything you can "get to" from there (via
> declared links as one might have in a join statement).

I think we're on the same page - I just think (based on comparison with
other things) that relational makes the best logical support for a
multi-portal system. And if you think about it, those portals can be small
and nested... UIs inside other UIs, etc. - whatever the user needs to get
the job done. Since those portals start to look like mini-apps, that makes
their common logical foundation all the more important.

> > So it depends on your needs, but I'd far
> > rather place my bet on something that allows me to scale my queries and
[quoted text clipped - 6 lines]
> fields (derived data or data found elsewhere), the INVOICE vocabulary for
> everyone has what it needs to show an invoice.

And I think I'm seeing more and more value to a path-like / hierarchical
expression as a user tool. I see it as best layered atop relational, since I
anticipate more views (if my data is useful, and I'm trying to help the
business's departments interoperate) but I think we agree philosophically
with the notion of packaging for the user.

> > > It is an object in and of itself that
> > > needs no "chopping up", so to speak.
[quoted text clipped - 11 lines]
> it's base relation.  Remove that obstacle -- free yourself.  Yes, we still
> divide it all up, but into wholes, not pieces.

I agree, and didn't mean to give the impression that data should only be
accessed through base relations. Far from it. Relations are a necessary (to
me) but not sufficient condition for good application design.

> > Domains are intellectually tractable when
> > they're separated. Holism may be fine in medicine (???) where human
[quoted text clipped - 5 lines]
> quite far for that, even if you allow for both scalar values and compound
> ones (such as lists).

For users, yes, lists are useful (I'd argue that sets are more often, and
that relations are even better, but I'll lighten up on that). The other
linchpin of relational, of course, is types. I distrust technologies with
weak typing, but that's a different discussion; suffice it to say that
having a LINE_ITEMS attribute in a file would make me far less queasy if the
elements of that list were real objects, with real operations defined over
them.

> > > This is where simpler means don't destroy the properties of the invoice
> in
[quoted text clipped - 10 lines]
> No, the data needs to be available to other entities as well, as you pointed
> out.

Sure, I was being facetious - so there are 2 questions:
1. What is the nature of the "other entities" that will need to use the
data?
2. In what form does the data need to be to provide those entities with easy
access; and even to make those entities easy to develop?

I see those entities as applications (including GUIs and reports and batch
processes), and contend that relational is the best answer for #2. But
hierarchies are useful for #1. The impedance mismatch, though much more
tractable at this level than object-relational mappings.

> > "Making the data fit" is also nonsense; whatever physical and logical
> model
[quoted text clipped - 13 lines]
> "this" context.  Define it based on its use and if a new use comes up,
> redefine it if necessary, otherwise add qualifiers to it.

Hmmm. Okay, I'm all for agility where it makes sense - still, I think a
little extra work up front goes a long way. But if you're got your
DB-upgrade and redeployment processes automated, and unit tests and all,
this can work...

> > That can do what - model arbitrary data in its "natural form", whatever
> that
> > means? I agree. If you show that to me, I'll use it.
>
> as entities.  Still working on how to show it.

I'm getting the idea.

> > I hope so - that would be nice. I think XPath and XQuery, while
> convoluted,
[quoted text clipped - 5 lines]
> add anything to the information in your source, but when you go the other
> direction, you need to add data (such as ordering)?

Most of the XML I deal with requires no ordering, so that's a wash. I think
XML is a relatively poor notation for anything requiring explicit ordering,
but that's just my gut feel. Usually I find hints that the XML designer
really wanted relational; they've got IDs and IDREFs, and then in the code
they're manually coding searches through the hierarchy - which is where an
in-memory RDBMS would be nice. It's not so horrid now, but in this industry
(print industry), there's a standard called JDF that is currently
manifesting itself as a 1.36MB set of XML Schema specs. Needless to say,
there are LOTS of cross-links, and regardless of the storage technology,
relations would have helped break this down considerably... even with the
ordering requirements (which are there, but much less than the cross-linked
references to node IDs).

- erk
Laconic2 - 11 Jun 2004 11:59 GMT
> 1. I doubt "overwhelmingly good evidence" motivated people to pick up Pick
> (or any other technology)

I'll bet it was the path of least resistance.  My guess, from Dawn's
description, is that it's real easy to learn, and it puts together, in one
package,  the tools needed to store and retrieve data,  and the tools needed
to capture and present data.  Add in some fairly trivial computing
capability, and you've got a pretty powerful system...  regardless of the
data model.

> 2. People tend to use that with which they're familiar

And they become familiar with that which they use.  It's feedback.

> 3. The market doesn't necessarily guarantee anything (and no, I'm not
> anti-capitalism) about what it produces

If nobody buys, then it doesn't sell.  No invoices, no cash.  That's more a
guarantee of apparent value than real value.  But measuring real value is
extraordinarily difficult.

> 4. The gadgeteering which makes software development fun also produces a
> tendency to wallow in what you know, and in what seems "cool", orthogonal to
> any actual value

To a kid  with a hammer, "normalization"  means "flattening".

> 5. Finally, the criteria businesses have for their solutions also factors in
> learning ability, prevalence of those taught in technology X, etc.

Good point.
Dawn M. Wolthuis - 13 Jun 2004 04:30 GMT
<snip>
> My wife handles the finances, since she's damn good at it. And all of the
> invoices I "see" are so... flat. Two-dimensional. Surely ripe for
> relational? :-)

doubt it (not doubting the wife's skills) -- I'm guessing at least some have
header info and multiple line items ... ?

> > > It's a fairly complex series of them.
> >
[quoted text clipped - 7 lines]
> Or more accurately, developers are like the store managers who map many
> different suppliers' products into departments, clusters, and shelves.

Sure, the developers need to know the information required to set up the
store into nice tidy departments.

> > > Just like an "order", an invoice is a fairly complex confluence of
> > > phenomena, and not even a static one (modifications / confirmations to
[quoted text clipped - 19 lines]
> of design before), I usually regret it. Sometimes that's warranted, if
> time-to-market is the critical success factor.

I very much agree, but seem to arrive at a different conclusion on how best
to set up for handling these sudden changes to requirements.

> > > And I disagree. An invoice is many somethings. If your questions deal
> only
[quoted text clipped - 21 lines]
> the job done. Since those portals start to look like mini-apps, that makes
> their common logical foundation all the more important.

More mind meld here -- similar thought processes drawing different
conclusions.

> > > So it depends on your needs, but I'd far
> > > rather place my bet on something that allows me to scale my queries and
[quoted text clipped - 15 lines]
> business's departments interoperate) but I think we agree philosophically
> with the notion of packaging for the user.

OK, now read what the purpose of the relational model is (somewhere towards
the front of Date's latest edition of the textbook).  If "the user" (whether
a s/w developer or an end-user) can work with data thinking entirely in this
walk-our-way-through-the-vocabulary fashion for queries of any sort, then
what, again was the need for the relational model in this?  You are correct,
however when you asked somewhere whether one can update through these
portals -- not really, but it works for managers & high level designers,
making anything more an implementation detail ;-)

> > > > It is an object in and of itself that
> > > > needs no "chopping up", so to speak.
[quoted text clipped - 39 lines]
> elements of that list were real objects, with real operations defined over
> them.

How and where what rules/constraints are applied to the data is one of those
topics where I'm not yet where I want to be in understanding various options
and how they influence agility/maintainability.  So, I can sympathize but I
can't get too upset about descriptions of the data that go further than the
constraints that are applied to it (that might not have made sense to anyone
but me, so ignore if it didn't).

> > > > This is where simpler means don't destroy the properties of the
> invoice
[quoted text clipped - 18 lines]
> 1. What is the nature of the "other entities" that will need to use the
> data?

We will know in time.

> 2. In what form does the data need to be to provide those entities with easy
> access; and even to make those entities easy to develop?

We will know in time.

But we definitely should think about what are the most likely changes on the
horizon and what our strategy would be for each of those.  I'm not
completely in the XP camp where we only think about the requirements
(stories) for this iteration of development and worry about tomorrow,
tomorrow.

> I see those entities as applications (including GUIs and reports and batch
> processes), and contend that relational is the best answer for #2. But
> hierarchies are useful for #1. The impedance mismatch, though much more
> tractable at this level than object-relational mappings.

I hope to eventually agree with you on the best approach to #2.  That is not
the same statement as saying that I hope to eventually agree with your
current opinion on the matter.

> > > "Making the data fit" is also nonsense; whatever physical and logical
> > model
[quoted text clipped - 21 lines]
> DB-upgrade and redeployment processes automated, and unit tests and all,
> this can work...

Yes, I agree that identifying such potential risks is a good idea, but not
in terms of the semantics of the data required for this round, but rather
the likelihood of various possible new requirements.
<snip>

Cheers!  --dawn
Eric Kaun - 15 Jun 2004 16:40 GMT
> > And I think I'm seeing more and more value to a path-like / hierarchical
> > expression as a user tool. I see it as best layered atop relational,
since I
> > anticipate more views (if my data is useful, and I'm trying to help the
> > business's departments interoperate) but I think we agree philosophically
> > with the notion of packaging for the user.
>
> OK, now read what the purpose of the relational model is (somewhere towards
> the front of Date's latest edition of the textbook).

Don't have it here, but if it suggests making data easy for end-users,
that's a myth. Its creators said the same thing about COBOL. Yes there are
users who can understand it, but most users have other considerations and
want to focus on their tasks. There are always power-users who write reports
and queries; they can grok relational, in many cases, and can compose views.

> If "the user" (whether
> a s/w developer or an end-user) can work with data thinking entirely in this
> walk-our-way-through-the-vocabulary fashion for queries of any sort, then
> what, again was the need for the relational model in this?

In the parlance of Pick, to establish the various vocabularies for various
users. To act as its foundation. The minute I have more than one view of the
data, I need the model to be neutral (or else you get something like a
pipeline of XSLT transforms, which is declarative but easily made
intractable). In the case of Pick, though, there's still an implicitly
"right" view of the FILE - all of its attributes in gory detail, I assume.
That still assumes a hierarchy, meaning other views are relative to that.
Basing all the views on a predicate-based structure makes those easier;
granted that it gives none of them special status, but I view that as a good
thing.

> > Sure, I was being facetious - so there are 2 questions:
> > 1. What is the nature of the "other entities" that will need to use the
[quoted text clipped - 13 lines]
> (stories) for this iteration of development and worry about tomorrow,
> tomorrow.

I agree. I think agile arose because big up-front documents suck, and that's
still true. What's never been really pushed outside academia are real
models, checkable ones, and theorem-proving. Alloy is a nice tool in the
model-checking camp; and check out how it subsumes OO-like structures inside
relations! In any event, if our designs were concise and checkable, and even
used as the basis of code generation, we get correctness and agility in one.
But given languages like Java and platforms like the J2EE, while code
generation helps, agile is still important. I'm in the Martin Fowler camp,
not quite trusting the complete absence of up-front analysis. Like Reagan, I
think the agilists talk much tougher than they walk; I think you'd find far
more up-front analysis and design than they admit in rhetoric.

> > I see those entities as applications (including GUIs and reports and batch
> > processes), and contend that relational is the best answer for #2. But
[quoted text clipped - 4 lines]
> the same statement as saying that I hope to eventually agree with your
> current opinion on the matter.

Hey, that's fine... we're just discussing, not brainwashing...

- erk
Anthony W. Youngman - 07 Jun 2004 22:48 GMT
>> When it comes to modeling
>> information, I suspect there will always be a gap. Relational advocates
>> favor being able to derive truths from other truths, acknowledging of
>course
>> that the internal predicates must be defined relative to an external one,
>> and that that's a human effort which can always go awry.

Yep. I have no problem with that. It's just that the whole point of this
thread was me asking "what is that external one", and I still haven't
got an answer. What I *have* got, though, is loads of people having a go
at me for having the temerity to ask the question ...

>> You and Dawn, as
>> best I can understand, place more value on reproduction of the original
[quoted text clipped - 4 lines]
>usually has multiple items ordered.  It is an object in and of itself that
>needs no "chopping up", so to speak.

Yep. I think you're certainly speaking for me here.

>This is where simpler means don't destroy the properties of the invoice in
>order to make the data fit into an arbitrary data model with tautological
[quoted text clipped - 5 lines]
>at
>> (e.g. repetitive symbolic manipulation).

Except that relational theory DOES stretch humans in ways they're not
good at. Didn't Tony have a go at me for "can't you handle the
abstraction?". Why should I, when the MV model tells me I don't have to?

>I think you right here.  I've been in business for many years.  I would like
>development to be easy for me.  We can watch the pendulum swinging towards
>making software development easier for those of us using the software.
>.NET, for better or worse, is attempting to make development easier (if it
>wasn't for the bizarre data typing and variable scoping it would be a lot
>easier).  Hopefully dbms theory will contribute to this too.

I want development to be easy, too :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
The society which scorns excellence in plumbing because plumbing is a humble
activity, and tolerates shoddiness in philosophy because it is an exalted
activity, will have neither good plumbing nor good philosophy. Neither its
pipes nor its theories will hold water. John W Gardner

mAsterdam - 08 Jun 2004 00:49 GMT
> ... What I *have* got, though, is loads of people having a go
> at me for having the temerity to ask the question ...

You asked a good question, phrased in a relational-bashing way.
I rephrased it. You went on.

> I want development to be easy, too :-)

Moi aussi.
Tony - 02 Jun 2004 10:43 GMT
> And no, Tony, Einstein did NOT "build a better model" using the same
> algebra. What he DID do was realise that Newton's fundamental axioms
> were wrong.

Amounts to the same thing: Einstein developed equations that took into
account additional information not known to Newton.  These equations
gave correct results for more cases than Newton's.  This is much like
a database designer realising that an earlier designer's database had
some rules missing, and adding them.

> He redefined the metaphysical interface between reality and the model.

Baloney.  His model was different from Newton's.  It wasn't the same
model with a different "metaphysical interface".  What the hell is one
of those anyway?

> And the problem I have is that I cannot see any metaphysical interface
> between reality and relational theory. This is basically Dawn's point
> about "is relational theory even the right theory to use?".

You are looking for a nonsense.  No wonder you can't find it.
Eric Kaun - 02 Jun 2004 16:30 GMT
> And the problem I have is that I cannot see any metaphysical interface
> between reality and relational theory. This is basically Dawn's point
[quoted text clipped - 8 lines]
> improve it. BUT IN DOING SO, IT WILL BE TRANSFORMED BEYOND RECOGNITION
> :-)

So what improvements would you make? From what I've heard suggested
elsewhere, it's not a transformation beyond recognition.

> What we NEED is a "theory of business analysis" - a formal theory that
> tells analysts how to analyse the real world.

hahahahahahahaha

Oh... you're serious?

> And I'm pretty damn
> confident that you can NOT create a theory that will do a reversible
> mapping between the real world and relational data.

So what precisely is different about other theories of data that do allow a
reversible mapping? And are there properties other then reversibility that
are desirable in such a model?

> This theory will then be the equivalent of Kepler and Newton discovering
> ellipses and calculus, or of Einstein realising that mass and energy
> were interchangeable. Basically, pretty much ALL of relational theory's
> axioms are taken as given by the mathematicians, and no thought is given
> as to whether they actually match the real world.

Which axioms don't match? I wasn't really aware there were axioms per se.

> To give you a simple example, the business analyst analyses an invoice,
> and you design the database to store the data. Can you then ask the
> DATABASE to give you the invoice data back?

Sure.

> Certainly with current
> relational databases accessed with SQL, you're relying on either an
> application programmed OVER the database, or a view which gives you
> multiple copies of data of which the original only had one.

Huh?

> Yes I know people are likely to say that "SQL is not genuine
> relational", but you're still relying on a view - even a valid
> relational one - or an application.

So what do you want - the invoice paper? Maybe we should just rely on
scanners producing JPGs - non-lossy, of course.

> If we can't go - using formal theory - from the database back through
> the analysis to get back to the real world we started from, then we have
> no idea if our axioms are correct, and as Dawn says, we have no idea if
> relational theory is the correct theory to solve real world problems.

Most real-world problems are more than just round-trip regurgitation. Surely
any trivial serialization scheme fits that bill?

> And as I said before, it we have no idea if it's the correct theory, why
> are we using it?

So what do we have that's correct? You mean the round-trip is your litmus
test?

> Dawn was going on about faith. Do you have faith in
> business analysts to get the analysis correct, or would you rather have
> a formal, REVERSIBLE and PROVABLE (or testable, falsifiable, scientific,
> whatever term you want to use) logical theory to do it for you?

Sure. I also want to fly, eat infinite amounts of ice cream without gaining
weight, and drive at very fast speeds with no possibility of injury.

- erk
Dawn M. Wolthuis - 02 Jun 2004 18:12 GMT
> > And the problem I have is that I cannot see any metaphysical interface
> > between reality and relational theory. This is basically Dawn's point
[quoted text clipped - 22 lines]
> > confident that you can NOT create a theory that will do a reversible
> > mapping between the real world and relational data.

I agree and figure that we will have a useful theory of this sort when we
have the same on "how to parent a teenager".  However, in some ways, Wol is
making a similar argument to mountain man (even if they might not agree with
that) in identifying that even if relational theory were good to apply, it
is useful in a rather small portion of what we do in addressing the "data
processing" needs of a business.

> So what precisely is different about other theories of data that do allow a
> reversible mapping? And are there properties other then reversibility that
[quoted text clipped - 7 lines]
>
> Which axioms don't match? I wasn't really aware there were axioms per se.

There are at least the axioms of set theory and then some things were tossed
in the mixed without any proof from these axioms, such as restricting the
sets from which elements can come to sets of scalar values (which has been
changed now, but 1NF, however defined, would have to be considered an axiom
since it does not arise from any other mathematics)

> > To give you a simple example, the business analyst analyses an invoice,
> > and you design the database to store the data. Can you then ask the
[quoted text clipped - 8 lines]
>
> Huh?

It think it is worth noting that is far more difficult to retrieve an
invoice the way it looked originally after chopping it up (that 1NF thing
again) and then using SQL to show the invoice again.  It is possible,
however, so perhaps Wol has looked at some more difficult specimens.
Loosely stated - SQL can only place on a single line entities that are
related to each other on that one line.  Stick with me here, I know I said
that poorly.

Example:

Qty....Item..........................Catalogs.............................Co
lor.............Price
1        Beautiful Skirt            Summer Collection.            White
$120.00
                                          2004 Wardrobe Catalog.     Blue

Without arguing the semantics (and mapping of the data to reality) of this
particular example, if your invoice looked like this when selling a
beautiful skirt in white and blue that comes from two of your catalogs, it
is definitely HARDER than a non-1NF environment, though not impossible, to
get a SQL statement to show your invoice properly.

> > Yes I know people are likely to say that "SQL is not genuine
> > relational", but you're still relying on a view - even a valid
> > relational one - or an application.
>
> So what do you want - the invoice paper? Maybe we should just rely on
> scanners producing JPGs - non-lossy, of course.

No need -- including lists in your data (at least your virtual data!) gets
you far enough that you don't notice any more big disconnects.  SQL Server
permits lists in their UDFs, while Oracle (to my knowledge) does not allow
lists returned from their functions (stored procedures)

> > If we can't go - using formal theory - from the database back through
> > the analysis to get back to the real world we started from, then we have
[quoted text clipped - 17 lines]
> Sure. I also want to fly, eat infinite amounts of ice cream without gaining
> weight, and drive at very fast speeds with no possibility of injury.

As long as we are all aiming for the same things ...  smiles.  --dawn
mAsterdam - 02 Jun 2004 19:44 GMT
> It think it is worth noting that is far more difficult to retrieve an
> invoice the way it looked originally after chopping it up

You chopped it up. Why?

While chopping it up, you got rid of the layout.
What you will retrieve is the data, not the layout.
Now if you also have some markup for the abstract invoice,
you can just fit the invoice-data you retrieved into the
invoice-markup.

I would think you would know all this, if
it was not so that over and over you blame

> (that 1NF thing again)

for these non-problems.

> and then using SQL to show the invoice again.

SQL reports are ugly - I'ld would not want to
show one to a customer.
Use a tool that was designed to present data.

> It is possible,
> however, so perhaps Wol has looked at some more difficult specimens.
[quoted text clipped - 15 lines]
> is definitely HARDER than a non-1NF environment, though not impossible, to
> get a SQL statement to show your invoice properly.

Some products have presentation and query integrated.
Some of those use (generated, hidden) SQL for the
query part. Don't use just SQL and expect anything
that looks like a proper invoice.

It is like you expect to be able to prepare a meal by
just unpacking the ingredients - you are going to need
some kitchen tools.

>>Sure. I also want to fly, eat infinite amounts of ice cream without
>>gaining weight, and drive at very fast
>> speeds with no possibility of injury.
>
> As long as we are all aiming for the same things ...  smiles.  --dawn

Sorry if I did not address your problem, but please
try distinguishing the retrieval and presentation
part if you restate it because I did get it all wrong.

Bon apetit!
Dawn M. Wolthuis - 03 Jun 2004 01:56 GMT
> > It think it is worth noting that is far more difficult to retrieve an
> > invoice the way it looked originally after chopping it up
>
> You chopped it up. Why?

You know the answer, so I'll move on ...

> While chopping it up, you got rid of the layout.

Not JUST the layout, but the ease in retrieving the data required for the
layout.  In 1NF'ing we can make a nightmare for the retrieval process.  And
ease of data retrieval seems to me to be one of the most important
requirements for any DBMS, right (she says, baiting him)?

> What you will retrieve is the data, not the layout.
> Now if you also have some markup for the abstract invoice,
[quoted text clipped - 13 lines]
> show one to a customer.
> Use a tool that was designed to present data.

And what will that tool use?  And so developers should not build reports
directly because they are so ugly?  Why not give them a better language?

> > It is possible,
> > however, so perhaps Wol has looked at some more difficult specimens.
[quoted text clipped - 3 lines]
> >
> > Example:

Qty....Item..........................Catalogs.............................Co
> > lor.............Price
> > 1        Beautiful Skirt            Summer Collection.            White
[quoted text clipped - 11 lines]
> query part. Don't use just SQL and expect anything
> that looks like a proper invoice.

As a must-have requirement, I wouldn't want to invest in any DBMS that
didn't provide a query tool that could be used happily by developers.  Sure
we need security, reliability, and such, but then it is just plain
imperative that the data can be handily retrieved!!

> It is like you expect to be able to prepare a meal by
> just unpacking the ingredients - you are going to need
> some kitchen tools.

Best not to discuss preparing meals with me ;-)  I require tools such as
phone and car.

> >>Sure. I also want to fly, eat infinite amounts of ice cream without
> >>gaining weight, and drive at very fast
[quoted text clipped - 5 lines]
> try distinguishing the retrieval and presentation
> part if you restate it because I did get it all wrong.

Data retrieval -- not presentation, but retrieval -- is an extremely
important feature of a database and pretty close to the whole point of why
you would choose to model the data in one way or another -- so that data
retrieval is easy over time (the "over time" bringing up other failings of
SQL-DBMS's, but I'll skip that for now).

Cheers!  --dawn

> Bon apetit!
mAsterdam - 03 Jun 2004 10:02 GMT
>>>It think it is worth noting that is far more difficult to retrieve an
>>>invoice the way it looked originally after chopping it up
>>You chopped it up. Why?
> You know the answer, so I'll move on ...

Do I? Now I have to guess.
You could store an image of the original
invoice if you need the original look.

My guess would be: You chopped it up because you
want to do something with the pieces other
than making invoices.
You can store the invoice image anyway.

>>While chopping it up, you got rid of the layout.
>
> Not JUST the layout, but the ease in retrieving
> the data required for the layout.
> In 1NF'ing we can make a nightmare for the retrieval process.

Your query will give you all data your
invoice needs for reconstruction, except the layout.
They'll be - agreed - in a clumsy fashion for presentation.
That is the loss of ease. (Or am I missing something?)

> And ease of data retrieval seems to me to be one of
> the most important requirements for any DBMS,
> right (she says, baiting him)?

/me nods

[snip]

>>Use a tool that was designed to present data.
>
> And what will that tool use?
> And so developers should not build reports
> directly because they are so ugly?
> Why not give them a better language?

Yes! A markup language - or a tool that generates the markup
and the query (for whatever querylanguage/database/normal form).
I haven't seen exploitation of databases without them.

[snip]
>>Some products have presentation and query integrated.
>>Some of those use (generated, hidden) SQL for the
[quoted text clipped - 7 lines]
> but then it is just plain imperative that
> the data can be handily retrieved!!

/me nods again.

>>It is like you expect to be able to prepare a meal by
>>just unpacking the ingredients - you are going to need
>>some kitchen tools.
>
> Best not to discuss preparing meals with me ;-)  
> I require tools such as phone and car.

When using the phone you still need a table, plates,
knives, glasses - or do you accept the layout as it comes :-)

[snip]

> Data retrieval -- not presentation, but retrieval -- is an extremely
> important feature of a database and pretty close to the whole point of why
> you would choose to model the data in one way or another -- so that data
> retrieval is easy over time (the "over time" bringing up other failings of
> SQL-DBMS's, but I'll skip that for now).

Ah! Here is the nugget. I knew there was more to it.
Thank you for restating.

Why is a MPEG not simply a table of pictures? Because we mostly
only need the complete movie. We do not need to share the
parts of the movie. The benefits of generality do not outweigh the cost.
Or even one picture, JPEG: If we *do* need to get into the picture
(I mean we need to retrieve parts of it - think automated fingerprint,
face or signature recognition) we need to model content of the picture
differently, and while a table of pixels may or may not be the basis of
that, it won't get us very far.

>>Bon apetit!
Bill H - 03 Jun 2004 23:03 GMT
> > It think it is worth noting that is far more difficult to retrieve an
> > invoice the way it looked originally after chopping it up
[quoted text clipped - 6 lines]
> you can just fit the invoice-data you retrieved into the
> invoice-markup.

I find it interesting you should say this.  All RDBMS products I've seen
show data in columns and rows.  In fact, that is the language of RDBMS: rows
and columns.

It is not unusual, therefore, to define and describe data in a preferred
layout?

Bill
mAsterdam - 03 Jun 2004 23:54 GMT
>>>It think it is worth noting that is far more difficult to retrieve an
>>>invoice the way it looked originally after chopping it up
[quoted text clipped - 13 lines]
> It is not unusual, therefore, to define and describe data in a preferred
> layout?

I don't know about the 'therefore', but in
my experience their preferred layout is something
which domain experts are most comfortable with.

The most important question here, though (the one
Dawn refused to answer) is why do want to chop it up?
What exactly are you trying to achieve by doing so?
Dawn M. Wolthuis - 04 Jun 2004 00:06 GMT
> >>>It think it is worth noting that is far more difficult to retrieve an
> >>>invoice the way it looked originally after chopping it up
[quoted text clipped - 21 lines]
> Dawn refused to answer) is why do want to chop it up?
> What exactly are you trying to achieve by doing so?

Sorry, not  refusal, but even I get sick of my broken record on 1NF --
that's why things are chopped up unnecessarily, in order to put them into
1NF.  So, in the example I gave, there is no reason, in my opinion, not to
have a single line of the invoice be stored in a tuple, allowing the lists
to be elements of the tuple, just as the single-valued attributes are.

--dawn
mAsterdam - 04 Jun 2004 00:27 GMT
>>>>>It think it is worth noting that is far more difficult to retrieve an
>>>>>invoice the way it looked originally after chopping it up
>>>>
>>>>You chopped it up. Why?
[chop]
> Sorry, not  refusal, but even I get sick of my broken record on 1NF --
> that's why things are chopped up unnecessarily, in order to put them into
> 1NF.  So, in the example I gave, there is no reason, in my opinion, not to
> have a single line of the invoice be stored in a tuple, allowing the lists
> to be elements of the tuple, just as the single-valued attributes are.

So you don't need the to share the internal structure.
Don't do that, then.
Dawn M. Wolthuis - 04 Jun 2004 01:05 GMT
> >>>>>It think it is worth noting that is far more difficult to retrieve an
> >>>>>invoice the way it looked originally after chopping it up
[quoted text clipped - 9 lines]
> So you don't need the to share the internal structure.
> Don't do that, then.

My understanding of relational structure is that it is for the logical view
of the database, not the internal structure.  If we opt for something else
as the logical level, then we are not doing relational theory, we are doing
something else (such as Nelson-Pick [un]theory).  There are folks,
particular those working with XML who have worked on non-relational theories
of databases and I'm reading what I can of what Jan Hidders suggested
earlier.  But, again, if your data model (logical level) is not relational,
then what's the purpose of relational theory? --dawn
mAsterdam - 04 Jun 2004 16:12 GMT
>>>>>>>It think it is worth noting that is far more difficult to retrieve an
>>>>>>>invoice the way it looked originally after chopping it up
>>>>>>
>>>>>>You chopped it up. Why?
[chop]
>>So you don't need the to share the internal structure.
>>Don't do that, then.
[quoted text clipped - 7 lines]
> earlier.  But, again, if your data model (logical level) is not relational,
> then what's the purpose of relational theory? --dawn

Somehow I get the impression that you put all
blame for the chopping (and the need to re-assemble)
on relational theory, in particular on 1NF.
That is too much blame, I think.

Say we have a date. It has structure, no doubt.
Actually it has a different structure for different purposes.
It has a different structure in different countries. We may only
be interested in one or some parts/aspects of it: day-of-the week, century.
Now suppose we did not have a system defined type 'date'.
What to do? Are the problems really that different
in the context of relational theory?

Well, a little. COBOL has a powerful way of defining types: easy
grouping, redefinitions, symbolic values (not without complications,
http://home.swbell.net/mck9/cobol/style/88.html)
for: the picture clause. Unfortunately it also defines
the storage structure - this makes it fragile.
Eric Kaun - 04 Jun 2004 15:11 GMT
> > >>>It think it is worth noting that is far more difficult to retrieve an
> > >>>invoice the way it looked originally after chopping it up
[quoted text clipped - 28 lines]
> have a single line of the invoice be stored in a tuple, allowing the lists
> to be elements of the tuple, just as the single-valued attributes are.

What are your criteria for chopping into the following:
1. files
2. attributes
3. sub-attributes
4. sub-sub-attributes

?
Anthony W. Youngman - 07 Jun 2004 23:10 GMT
>> Sorry, not  refusal, but even I get sick of my broken record on 1NF --
>> that's why things are chopped up unnecessarily, in order to put them into
[quoted text clipped - 3 lines]
>
>What are your criteria for chopping into the following:

For me, it's simple.

>1. files

This represents a "physical" object. A house. A car. A company. A
building. An invoice.

>2. attributes

This describes the object. A house has an address. A car has a colour,
and an owner. A company may have several buildings (so here we have a
"foreign key"). A building has an address. An invoice may have several
addresses, and several lines.

>3. sub-attributes

A building has an address - which may have multiple lines (actually,
this is a bad example, but it's a common mistake). An invoice has
multiple lines, each of which contains several different types of data.

>4. sub-sub-attributes

Simply nest sub-attributes one level deeper. :-)

Basically, to describe it in relational terms, if you link table A to
table B, such that deleting a record in A causes a cascading delete of
one or more records in B, then I'd make each column of B a column of A,
and each row of B into a sub-attribute row of A.

And then you use common sense to say "I use these fields all the time,
and these fields only rarely" so you split A into two physical FILEs,
and make all the colums of A-rarely into virtual columns of
A-all-the-time, and vice versa. So for retrievals the user notices
nothing (apart from the speed-up), although it does cost a bit extra
logic when updating.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 09 Jun 2004 20:41 GMT
> >1. files
>
> This represents a "physical" object. A house. A car. A company. A
> building. An invoice.

What about relationships between any of those things - e.g. cars and houses
owned by companies, and the invoices for their purchases? At what point does
something move from being a file to an attribute or vice versa?

> >2. attributes
>
> This describes the object. A house has an address. A car has a colour,
> and an owner. A company may have several buildings (so here we have a
> "foreign key"). A building has an address. An invoice may have several
> addresses, and several lines.

So if an invoice has a Parts attribute, it then needs additional attributes
corresponding to each "attribute" of its use of those parts? At what point
does the relationship between the two acquire enough "meaning" or enough
attributes of its own to warrant being in its own file? Imagine line items
on an invoice became very complex, with shipment information and payment
information... is there a point at which you say "Enough!" and stop adding
those things as sub-attributes to the Parts attribute of the Invoice?

> >3. sub-attributes
>
> A building has an address - which may have multiple lines (actually,
> this is a bad example, but it's a common mistake). An invoice has
> multiple lines, each of which contains several different types of data.

Yes, but how far do you go? Certainly at some point some of those attributes
refer to other things properly categorized as Files. Do you find yourself
yanking out attributes or sub-attributes, and moving the lot to Files? At
least to me, normalization offers a much clearer view on how do make those
data design decisions. Maybe I'm being alarmist, but when I've had to make
changes in a SQL data model it's been due to actual changes in the
requirements (external predicates), not just acquiring one attribute too
many. Granted that I don't know Pick, so haven't walked a mile in your
shoes... these criteria just seem very dicey.

> >4. sub-sub-attributes
>
> Simply nest sub-attributes one level deeper. :-)

Ah, hierarchical induction. I'll just have one File in my app. :-)

> Basically, to describe it in relational terms, if you link table A to
> table B, such that deleting a record in A causes a cascading delete of
> one or more records in B, then I'd make each column of B a column of A,
> and each row of B into a sub-attribute row of A.

You're describing parent-child relationships. Surely you run into
multi-parent and many-to-many scenarios?

> And then you use common sense to say "I use these fields all the time,
> and these fields only rarely" so you split A into two physical FILEs,
> and make all the colums of A-rarely into virtual columns of
> A-all-the-time, and vice versa. So for retrievals the user notices
> nothing (apart from the speed-up), although it does cost a bit extra
> logic when updating.

A logical cost is a big cost. Every updating app needs to know that, right?

So you're definitely describing some physical redesign which sits below the
logical view available to users. I think in relational terms, that's what
the DBMS vendors should offer, since they can more accurately (and easily)
split relations based on usage, and have that happen dynamically. But you do
seem to be describing a user- or application-level view of the data, which
is layered atop something that is, or leans toward, or could be "more
relational." At least that's the way it seems...

- erk
Anthony W. Youngman - 10 Jun 2004 01:14 GMT
>> >1. files
>>
[quoted text clipped - 4 lines]
>owned by companies, and the invoices for their purchases? At what point does
>something move from being a file to an attribute or vice versa?

Just because I own a car, doesn't make the car part of me ... think
language, and think nouns and adjectives (and gerunds).

In Britain, a car's registration plate is assigned on first sale, and
"deleted" when the car is crushed. Actually, that's not completely true,
but near enough.

So my car's registration plate is an attribute of me, and of my car. So
in the "car" FILE it would be the primary key, and in the "person" FILE
it would be a foreign key (to use relational terminology). You do not
put two different "nouns" in the same FILE - you use a foreign key.

>> >2. attributes
>>
[quoted text clipped - 10 lines]
>information... is there a point at which you say "Enough!" and stop adding
>those things as sub-attributes to the Parts attribute of the Invoice?

In theory, no. In practice, you might choose to split the invoice data
across two FILES, where you've promoted sub-attributes of INVOICE to be
primary attributes of the secondary file.

>> >3. sub-attributes
>>
[quoted text clipped - 11 lines]
>many. Granted that I don't know Pick, so haven't walked a mile in your
>shoes... these criteria just seem very dicey.

Changes in Pick are almost invariably due to changes in requirements,
too :-)

>> >4. sub-sub-attributes
>>
>> Simply nest sub-attributes one level deeper. :-)
>
>Ah, hierarchical induction. I'll just have one File in my app. :-)

Nah! FILE = noun :-)

Now if your app consists solely of invoices, then your approach might
work :-)

>> Basically, to describe it in relational terms, if you link table A to
>> table B, such that deleting a record in A causes a cascading delete of
[quoted text clipped - 3 lines]
>You're describing parent-child relationships. Surely you run into
>multi-parent and many-to-many scenarios?

Think about what you've just said. I said "deleting a record in A
triggers a cascading delete into B (and C (and D (...)))". Do you want
to try that in relational? Deleting one record in relational will
cascade and delete your entire database ... ?

You completely missed the point here. Where and why would you use a
cascading delete? THINK! Be *practical*. What *works* in *reality*
(rather than theory, which can think up a thousand impossible scenarios
before breakfast (with apologies to "Alice in Wonderland")).

>> And then you use common sense to say "I use these fields all the time,
>> and these fields only rarely" so you split A into two physical FILEs,
[quoted text clipped - 4 lines]
>
>A logical cost is a big cost. Every updating app needs to know that, right?

Yep ... but relational theory, which imposes mandatory separation of the
logical from the physical, imposes that cost on EVERY app, not just
those that update the data.

Furthermore, by actively hindering the programmer from providing hints
to the database, relational forces the programmer to rely on the
database's artificial intelligence, which is quite likely to guess wrong
...

>So you're definitely describing some physical redesign which sits below the
>logical view available to users. I think in relational terms, that's what
[quoted text clipped - 3 lines]
>is layered atop something that is, or leans toward, or could be "more
>relational." At least that's the way it seems...

Yup. Let's assume that the Pick database has been designed properly, and
that within the FILEs the data has been normalised. I can now present my
apps with a *closed* relational view!

My Pick application has also FORCED, by DEFAULT, my database to store
related data close to itself (what relational calls clustering, I
believe). It's fairly easy to prove, statistically, that this will
optimise data retrieval from disk. Sod AI optimisation, Pick doesn't
have a choice and it works, which is why in any system lacking
sufficient ram a Pick app will kick the equivalent relational app's
butt!

Basically, by not hiding the physical implementation from the user, Pick
makes it easy to prove there just IS NO room for improvement. By hiding
the physical from the user, relational forces you to rely on the AI and
you have no way of knowing whether it is efficient or not.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 11 Jun 2004 22:54 GMT
> >> >1. files
> >>
[quoted text clipped - 7 lines]
> Just because I own a car, doesn't make the car part of me ... think
> language, and think nouns and adjectives (and gerunds).

What about sentences?

> >So if an invoice has a Parts attribute, it then needs additional attributes
> >corresponding to each "attribute" of its use of those parts? At what point
[quoted text clipped - 7 lines]
> across two FILES, where you've promoted sub-attributes of INVOICE to be
> primary attributes of the secondary file.

Why would you do the split, when the intent seems to be to keep things
whole? Is this purely for performance optimization? I have a hard time
keeping up with shifts between logical and physical, and the reasons for the
splits. I understand you CAN do these things, but why and when? What are
your heuristics?

> >> >4. sub-sub-attributes
> >>
[quoted text clipped - 3 lines]
> >
> Nah! FILE = noun :-)

Okay; I name my file "MyApplication." :-)

> You completely missed the point here. Where and why would you use a
> cascading delete? THINK! Be *practical*. What *works* in *reality*
> (rather than theory, which can think up a thousand impossible scenarios
> before breakfast (with apologies to "Alice in Wonderland")).

Seldom, due to business desires, but to answer the question you're getting
at: when there's a foreign-key dependency, and there's one relation that is
deemed "important" enough to trigger the cascade. There could be multiple,
though that's rare...

> >> And then you use common sense to say "I use these fields all the time,
> >> and these fields only rarely" so you split A into two physical FILEs,
[quoted text clipped - 8 lines]
> logical from the physical, imposes that cost on EVERY app, not just
> those that update the data.

And it enables EVERY app with EVERY optimization. (not close, but you get my
drift)

You've just successfully argued against code sharing, by the way, since if
something is coded badly (either slowly, or laden with defects), then every
app has to suffer, so you're better off recoding it in each app, right?

> Furthermore, by actively hindering the programmer from providing hints
> to the database, relational forces the programmer to rely on the
> database's artificial intelligence, which is quite likely to guess wrong

And you've also just argued against compilers, since they're so likely to
guess wrong about the intention of your code, and therefore will produce
badly-optimized machine code.

> Yup. Let's assume that the Pick database has been designed properly, and
> that within the FILEs the data has been normalised. I can now present my
> apps with a *closed* relational view!

What do you mean by that?

> My Pick application has also FORCED, by DEFAULT, my database to store
> related data close to itself (what relational calls clustering, I
> believe). It's fairly easy to prove, statistically, that this will
> optimise data retrieval from disk.

For that one access path.

> Sod AI optimisation, Pick doesn't
> have a choice and it works, which is why in any system lacking
> sufficient ram a Pick app will kick the equivalent relational app's
> butt!

I can think of several faster alternatives. Using ROWID and stashing
hierarchies in Oracle tables would at least close some of the gap.
Performance isn't the only point, but oh well...

> Basically, by not hiding the physical implementation from the user, Pick
> makes it easy to prove there just IS NO room for improvement.

hahahahaha

Oh - you were serious. My bad.

> By hiding
> the physical from the user, relational forces you to rely on the AI and
> you have no way of knowing whether it is efficient or not.

AI?  Yes, I'd hate to rely on something like a "computer" or some other
fancy "automaton" that does "logic" or some such liberal nonsense... :-)

- erk
Anthony W. Youngman - 18 Jun 2004 19:05 GMT
>> In theory, no. In practice, you might choose to split the invoice data
>> across two FILES, where you've promoted sub-attributes of INVOICE to be
[quoted text clipped - 5 lines]
>splits. I understand you CAN do these things, but why and when? What are
>your heuristics?

The heuristics are probably when it gets too complicated for the brain
to comprehend easily.

Okay, I'm getting physical, and messy, but that's the real world. Where
do you draw the line between biology and organic chemistry? Between
organic and inorganic chemistry? Between chemistry and the physics of
atoms?

Okay, there is a pretty clear line between the physics of atoms and
atomic physics, but that's an anomaly!

I can understand you want things nice and clear cut, but the real world
isn't like that. I use relational theory to help me understand the data
down to one or two levels deeper than I need, then I draw the line at
whatever level seems appropriate.

Don't forget, I'm a chemist by training. If I'm doing bio-chemistry it's
incredibly useful to understand electron orbital theory but I can't WORK
at that level. It's just *too* abstract to be meaningful.

By abstracting data down to (and focussing on) the tuple, relational
theory has just gone into TOO MUCH detail and lost sight of (indeed, to
some extent DESTROYED) any view of the big picture...

What you want to do is present the user with a view of the data at their
level, and then analyse it deeper.

As a chemist, I think in molecules. As a businessman, I think people
tend to think in terms of customers, invoices, things like that. THAT is
the level at which the database should interface with users.

Relational interfaces at the chemical equivalent of atoms - with the
tuple. The poor programmer has to think UP to the "business object"
level, and then UP AGAIN to the reality equivalent.

With Pick, I can stand at the "business object" interface, and reach
DOWN into the data, and UP into reality. It's far easier to stand on the
interface reaching in both directions, than to be mired down in the
detail, struggling to get out.

I am sorry I can't give you a better answer than that. But the real
world is messy. Deal with it!

>> >> >4. sub-sub-attributes
>> >>
[quoted text clipped - 15 lines]
>deemed "important" enough to trigger the cascade. There could be multiple,
>though that's rare...

So I would I would seriously consider pulling all the tables into which
the cascade went into a single FILE. "Rules are for the guidance of wise
men, and obedience of fools" - I would use my intelligence as to whether
this made sense.

>> >> And then you use common sense to say "I use these fields all the time,
>> >> and these fields only rarely" so you split A into two physical FILEs,
[quoted text clipped - 12 lines]
>And it enables EVERY app with EVERY optimization. (not close, but you get my
>drift)

But if the small cost of the "bad" code (which by definition is rarely
used) makes a big difference to the cost of the "good" code, then I'm
laughing all the way to the bank. Who cares if I add an hour to a job
that runs once a month, if by doing so I can shave a second off a job
that 50 users use several hundred times a day?

>You've just successfully argued against code sharing, by the way, since if
>something is coded badly (either slowly, or laden with defects), then every
[quoted text clipped - 13 lines]
>
>What do you mean by that?

I mean it's like a relational view, but if I've got a "one to many"
relationship, the "one" data only appears once, not replicated for every
instance of the "many".

>> My Pick application has also FORCED, by DEFAULT, my database to store
>> related data close to itself (what relational calls clustering, I
>> believe). It's fairly easy to prove, statistically, that this will
>> optimise data retrieval from disk.
>
>For that one access path.

Here you go again - crippling the race horse so we can have a "fair"
race against the crippled old nag ...

>> Sod AI optimisation, Pick doesn't
>> have a choice and it works, which is why in any system lacking
[quoted text clipped - 11 lines]
>
>Oh - you were serious. My bad.

Yes I was :-)

>> By hiding
>> the physical from the user, relational forces you to rely on the AI and
>> you have no way of knowing whether it is efficient or not.
>
>AI?  Yes, I'd hate to rely on something like a "computer" or some other
>fancy "automaton" that does "logic" or some such liberal nonsense... :-)

I don't. I rely on statistics to tell me the Pick model does a better
job than AI.

That crack about the race horse was deliberate. Relational seeks to make
all access paths equal. Fair enough. Rather like the UK educational
system that sees competition as "unfair" and wants all schoolkids to
leave Uni with a first class degree, not caring whether they are a dunce
or a genius (sadly, I'm serious about our education :-(

You said "for that one access path". But that access path IS THE MOST
COMMON PATH! So. I can prove that it's the most common path. I can prove
it's the most efficient path.

Can you prove, that by crippling the most common path, you can improve
the "worst path" cases enough to make it worth-while? Was it Knuth that
said "premature optimisation is the worst evil"? I couldn't give a damn
if the nag trails in last by a racecourse. I want the thoroughbred to
win.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Dawn M. Wolthuis - 18 Jun 2004 21:32 GMT
> >> In theory, no. In practice, you might choose to split the invoice data
> >> across two FILES, where you've promoted sub-attributes of INVOICE to be
[quoted text clipped - 8 lines]
> The heuristics are probably when it gets too complicated for the brain
> to comprehend easily.

I suspect there are some papers about this as it relates to XML documents
and Jan Hidders did point to some papers re normalization for XML at one
point IIRC.  The rule of thumb is to try to match up to the way people think
when designing the logical structure of the data in a "nested relational"
structure (I don't like that description of XML, but it gives some hint that
we are still working with relations)

> Okay, I'm getting physical, and messy, but that's the real world. Where
> do you draw the line between biology and organic chemistry? Between
[quoted text clipped - 6 lines]
> I can understand you want things nice and clear cut, but the real world
> isn't like that.

so, then, you are NOT a Calvinist? ;-)

>  I use relational theory to help me understand the data
> down to one or two levels deeper than I need, then I draw the line at
[quoted text clipped - 14 lines]
> tend to think in terms of customers, invoices, things like that. THAT is
> the level at which the database should interface with users.

The analogy to molecules is an excellent one.  Looking for, and defining,
the molecules among the data in our problem domain is really what we do with
our logical data models.

> Relational interfaces at the chemical equivalent of atoms - with the
> tuple. The poor programmer has to think UP to the "business object"
[quoted text clipped - 17 lines]
> >
> >Okay; I name my file "MyApplication." :-)

That's where experience and best practices come into play, where you would
play to the strengths of whatever tools you were using and would, likely,
not name your file that.

> >> You completely missed the point here. Where and why would you use a
> >> cascading delete? THINK! Be *practical*. What *works* in *reality*
[quoted text clipped - 10 lines]
> men, and obedience of fools" - I would use my intelligence as to whether
> this made sense.

and it is definitely the case that for many of the non-DBMS's that are
really enhanced file systems (which is where PICK really falls) that the
developer is given enough rope to hang themselves.

> >> >> And then you use common sense to say "I use these fields all the time,
> >> >> and these fields only rarely" so you split A into two physical FILEs,
[quoted text clipped - 18 lines]
> that runs once a month, if by doing so I can shave a second off a job
> that 50 users use several hundred times a day?

And you do have the option of having "services" that test business rules as
well as others that perform updates to ensure the same degree of
consistencey and decoupling of app and database as an RDBMS.  These same
services can be used to determine the appropriate GUI components.  Excellent
software can be written, but the database does not require anything of the
developer -- there is considerable freedom (to shoot yourself in the foot)
and this is what also provides the ease of maintenance.

> >You've just successfully argued against code sharing, by the way, since if
> >something is coded badly (either slowly, or laden with defects), then every
> >app has to suffer, so you're better off recoding it in each app, right?

No -- then fix it.

> >> Furthermore, by actively hindering the programmer from providing hints
> >> to the database, relational forces the programmer to rely on the
[quoted text clipped - 13 lines]
> relationship, the "one" data only appears once, not replicated for every
> instance of the "many".

The replication in a SQL-DBMS is only upon viewing granular data.  I think I
know what you mean by this, but perhaps it makes more sense to state it
differently.  A FILE would include the "one" and the many manys.  We would
not haves separate relations defined for each many, with multiple rows for
each of the "one" in order to link such relations back to the "one" (master)
relation.  One FILE of PEOPLE in a non-1NF structure (such as XML docs or
PICK) could easily turn into 20 relations in a SQL-DBMS.  The 1 million
records in that one PICK file could turn into those 1 million rows plus
multiple rows for each of these records in each of the other 19 relations
that were split out when putting into 1NF.  Since each of those 19 files
needs to have a candidate key, a lot of generated keys get built to
accomplish this 1NF. So, there is a lot of extra data stored in the
relational structure (in the form of lots and lots of keys).

I have the feeling I didn't actually CLARIFY your statement, Wol, sorry, but
the way it is stated, I would object when my relational hat is on.

> >> My Pick application has also FORCED, by DEFAULT, my database to store
> >> related data close to itself (what relational calls clustering, I
[quoted text clipped - 5 lines]
> Here you go again - crippling the race horse so we can have a "fair"
> race against the crippled old nag ...

Now, there's no reason to call me names ;-)

> >> Sod AI optimisation, Pick doesn't
> >> have a choice and it works, which is why in any system lacking
[quoted text clipped - 13 lines]
>
> Yes I was :-)

Yes, he was.  I don't know about the PROVE part, but I do know that there is
a HUGE difference between letting queries fly on the MV side of the house
compared to tuning SQL statements ad infinitum on the SQL side.  After
trying to migrate people from the old (PICK) to the  newer (SQL), I have
become completely convinced that is a significant step backwards.  If there
were something comparable to ODBC for non-SQL structures, there would be no
reason at all to consider SQL in those environments.

> >> By hiding
> >> the physical from the user, relational forces you to rely on the AI and
[quoted text clipped - 11 lines]
> leave Uni with a first class degree, not caring whether they are a dunce
> or a genius (sadly, I'm serious about our education :-(

A "no child left behind" jab on state of US education would be in order, but
it is so upsetting that I can't muster one right now.

> You said "for that one access path". But that access path IS THE MOST
> COMMON PATH! So. I can prove that it's the most common path. I can prove
[quoted text clipped - 5 lines]
> if the nag trails in last by a racecourse. I want the thoroughbred to
> win.

Another good analogy.  By trying to treat all relations as if they are of
the same weight -- as if none is a more important entry-point into the data,
we are breaking down the molecules and focussing only on the atoms, without
giving any hints as to where the molecules are to be found.  Those "one"
relations in the one to many you mentioned above are often like named
molecules, but invisible to the database user -- a case of missing the
forest for the trees.  Views can be built to add these molecules back in,
but there is nothing it relational modeling that makes it clear, or even
suggests, how to handle this.

Cheers!  --dawn
Tony - 19 Jun 2004 12:51 GMT
> Was it Knuth that said "premature optimisation is the worst evil"?

I don't know, but YOU are the one who wants to optimise prematurely,
i.e. while designing the database.  The relationalists prefer to
optimise at the last possible moment, i.e. when we know what the query
is.
Laconic2 - 19 Jun 2004 14:06 GMT
> > Was it Knuth that said "premature optimisation is the worst evil"?
>
> I don't know, but YOU are the one who wants to optimise prematurely,
> i.e. while designing the database.  The relationalists prefer to
> optimise at the last possible moment, i.e. when we know what the query
> is.

Not only when we know the query, but also when we know, approximately, the
data volumes.

Given that access strategy costs are non linear,  this is an important input
to optimization.
Anthony W. Youngman - 20 Jun 2004 00:27 GMT
>> "Anthony W. Youngman" <wol@thewolery.demon.co.uk> wrote in message
>news:<L4l1l0HM7y0AFwZd@thewolery.demon.co.uk>...
[quoted text clipped - 10 lines]
>Given that access strategy costs are non linear,  this is an important input
>to optimization.

Okay. Let's try to explain. It's statistics, so it's totally outside
relational theory (statistics is fuzzy logic, after all :-)

Let's assume we've got an invoice file, with a thousand invoices. Each
invoice has ten detail lines.

Your app gets one detail line from the table. The database grabs another
line as a pre-fetch ...

What's the chance that the next thing your app asks for is either
another line, or the invoice header? Quite high. If it's another line,
what's the chance that the one it asks for is the one the db has
prefetched. Easy - it's one in ten thousand. But if, by chance or
design, it happens to belong to the same invoice as the first line, the
chances have improved to one in ten or twenty - a massive improvement.

Look at your apps. How many data fetches are "random" (ie depend on
external input), and how many are of "related data", ie given that you
have one record already, you're retrieving data that shares a "foreign
key" relationship with the data you already have. I'd guess that,
typically, for every one of the first type you have a hundred, maybe
more, of the second.

So, by actively clustering related data together, you could massively
improve database performance. In other words, you'd be copying what Pick
achieves by default.

Let's phrase it in a different way. Those invoice detail lines can be
considered as a bag of lists. By default, relational will treat them as
a random set. It has a one-in-ten-thousand chance of guessing the next
one correctly, unless it has a clever optimisation engine or the DBA
gives it a hint.

Pick by default knows it's a bag of lists. If the app really does ask
for a random "next line", Pick's worst case is the same as relational.
But if, as is normally the case, the next line comes from the same
invoice, Pick's DEFAULT chance of getting it right is the same as the
chance of two consecutive lines coming from the same invoice - ie pretty
high.

So Pick's worst case is equal to relational's worst case - in the highly
UNlikely scenario of random data access. But in the normal case, of
accessing rows that are somehow linked, relational relies on AI or the
DBA "hinting" to the database. Pick just does it that way "by default".

And that's why ALL the anecdotes I've ever seen say that Pick outperfoms
relational by a huge margin.
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 19 Jul 2004 18:48 GMT
> I can understand you want things nice and clear cut, but the real world
> isn't like that.

I do understand that, but machines are remarkably literal; they sort of
need thing spelled out to them, so at some point we need to decide
what's important and then how to represent it. Software is description.
[Michael Jackson's quote, not mine, though I agree with it.]

> I use relational theory to help me understand the data
> down to one or two levels deeper than I need, then I draw the line at
> whatever level seems appropriate.

This is interesting:
1. If you're just understanding the data, how do you know what you need,
much less what "one or two levels" beyond that is?
2. What's a level?
3. Appropriate with respect to... what? Some function of requirements, I
imagine, but it would be nice to formalize the rules-of-thumb.

> Don't forget, I'm a chemist by training. If I'm doing bio-chemistry it's
> incredibly useful to understand electron orbital theory but I can't WORK
> at that level. It's just *too* abstract to be meaningful.

Couldn't agree more.

> By abstracting data down to (and focussing on) the tuple, relational
> theory has just gone into TOO MUCH detail and lost sight of (indeed, to
> some extent DESTROYED) any view of the big picture...

The big picture as presenting to users is a different thing, outside the
domain of relational. How relational losing sight of the big picture
destroys is it amusing, but

And the tuple isn't the focus - the relation is, including constraints
on relations. As logical predicates, they bear an uncanny resemblance to
many "classes" of rules and requirements.

Of course, you can look at just the "raw data" - typed attributes,
values in tuples. Or you can look at individual predicates. Or you can
look at database constraints (constraints over the set of relations).
Wow! That's three... Three... THREE levels of abstraction in one!

Is that the final word? No, of course not. Date draws a distinction
between physical (DBMS storage), logical (predicates), and conceptual
(mapping to the user). Views, screens, reports, etc. all help paint the
"big picture" to the user. But the big picture won't fully develop if
the components aren't there, or the logical underpinnings are suspect.

> What you want to do is present the user with a view of the data at their
> level, and then analyse it deeper.

So you're talking about a RAD / prototyping / extreme programming
approach to data design? This seems more like process than logical
definition of the data.

> As a chemist, I think in molecules. As a businessman, I think people
> tend to think in terms of customers, invoices, things like that. THAT is
> the level at which the database should interface with users.

Not a bad idea, though the users of a database tend to be developers.
Some users ("power users") can handle SQL and reporting and such, but
not that many. Still, given that relational's domain is "shared data
banks," a logical representation which supports multiple users and
multiple applications (and user views) is likely to be lower level - you
need to be concerned with those issues to make decisions that help the
big picture look real purty.

> Relational interfaces at the chemical equivalent of atoms - with the
> tuple.

There are no tuple-level operators in relational; although each tuple is
a fact, operations are defined over relations (predicates), not
individual tuples (which are not to be singled out).

> The poor programmer has to think UP to the "business object"
> level, and then UP AGAIN to the reality equivalent.

Again, sometimes the poor programming is thinking the other way - not of
the order, but of which customers in certain states have placed more
than 3 orders which contain both condoms and ice cream cones (or some
such combination). From that standpoint, the "business object" is...
what? In a hierarchical data definition, the most common operations to
data-entry clerks are obvious; everything else becomes convoluted
procedural logic. (generalization noted)

> With Pick, I can stand at the "business object" interface, and reach
> DOWN into the data, and UP into reality. It's far easier to stand on the
> interface reaching in both directions, than to be mired down in the
> detail, struggling to get out.

Hey, the interface to the real world is messy - deal with it. That
interface is an ever-shifting beast in any "real business" I've ever
dealt with. With a shifting interface (one which changes radically as
you follow data from department to department), a lower-level (sic)
definition of data is a better support system than a "big picture."

By "standing on" the interface, you depend on its stability. I believe
it to be far, far less stable than the logical definition of the data,
which perhaps might be "small picture." But I'll deal with it there,
where I have some power.

> I am sorry I can't give you a better answer than that. But the real
> world is messy. Deal with it!

That's exactly what we're all trying to do; fuzzy definitions like
"messy", "real world", and "big picture" aren't going to give us much
purchase in the attempt.

>> Seldom, due to business desires, but to answer the question you're
>> getting
[quoted text clipped - 8 lines]
> men, and obedience of fools" - I would use my intelligence as to whether
> this made sense.

I agree - if it makes sense, it's not a bad idea. I have just found very
few cases where it has made sense, but not every database I've designed
as been fully normalized (and that was by design).

One case in point: an issue-reporting database that I did during a
meeting as a prototype which quickly went production. Before it did go
production, I normalized what had been denormalized structures; and was
glad I did, because subsequent report, query, data export, and even
screen view requirements would have been tricky without it. Not
impossible by any means, but far less intuitive.

>>> Yep ... but relational theory, which imposes mandatory separation of the
>>> logical from the physical, imposes that cost on EVERY app, not just
[quoted text clipped - 9 lines]
> that runs once a month, if by doing so I can shave a second off a job
> that 50 users use several hundred times a day?

True (with reservations based on the context of those processes) - but
I've been talking more about new requirements for reports, views, data
imports/exports, etc. A normalized relational structure supports new
requirements better; a denormalized one adds some initial overhead. As
far as performance, I have no doubt that Pick performs well, but haven't
been so constrained in terms of hardware and design that I would
denormalize to save... something.

>>> Yup. Let's assume that the Pick database has been designed properly, and
>>> that within the FILEs the data has been normalised. I can now present my
[quoted text clipped - 5 lines]
> relationship, the "one" data only appears once, not replicated for every
> instance of the "many".

Well, a hierachical database is likely to deal natively with hierarchies
(which is what you're talking about). From a relational viewpoint, you
don't have "one thing" - you're talking about two predicates (the parent
and the child).

>>> My Pick application has also FORCED, by DEFAULT, my database to store
>>> related data close to itself (what relational calls clustering, I
[quoted text clipped - 4 lines]
>
> Here you go again -

Uh oh... flashbacks of the Jimmy Carter - Ronald Reagan debate... (in
which Reagan repeatedly used the phrase "there you go again" to avoid
actually having rebut a point).

> crippling the race horse so we can have a "fair"
> race against the crippled old nag ...

A poor analogy, though I can't think of a better one. I'm talking about
being able to support new and changed requirements with a minimum of
change, as well as being able to firewall the integrity of my data from
programmer error (including my own!), and declaring the meaning of my
data (for humans as well as for enforcement).

For the record, yes, I'm sure Pick can load that record real darn fast.
But I have yet to see an application that, based on a primary key,
couldn't load the vast majority of its associated hierarchy in an
unnoticeable amount of time. It just isn't that hard. And in terms of
code burden, mapping tools make it a no-brainer.

So I'll form my own analogy: you're talking about having that racehorse
cross the finish line 1 second faster (when it already was beating the
other horses anyway), albeit dropping the jockey on his arse en route. :-)

> That crack about the race horse was deliberate. Relational seeks to make
> all access paths equal. Fair enough. Rather like the UK educational
[quoted text clipped - 5 lines]
> COMMON PATH! So. I can prove that it's the most common path. I can prove
> it's the most efficient path.

Sure - but taking alternative paths to that same endpoint aren't much
slower. Yeah they're slower... but so what? It doesn't always make a
different, and in my experience, it's the reports and ad hoc queries and
dataloads and such that demand performance optimizations. I don't give a
damn whether loading my Order is done in one read in 0.09 seconds, or in
several reads in 1 second, since my UI is probably going to take its
sweet time painting anyway...

Obviously I'm not that naive, but I think your optimizations to the
most-common path, while certainly an improvement, may have a
less-than-noticeable impact on most users. But I could certainly be wrong.

> Can you prove, that by crippling the most common path, you can improve
> the "worst path" cases enough to make it worth-while?

> Was it Knuth that
> said "premature optimisation is the worst evil"?

"Premature optimization is the root of all evil", and while Knuth gets
the credit, he says Tony (C.A.R.) Hoare said it first.

> I couldn't give a damn
> if the nag trails in last by a racecourse. I want the thoroughbred to win.

Well, we differ. If the horses are functions / applications, I want as
many horses as possible to finish before the jockeys drop dead of old
age. :-)

Thanks, Anthony, for the lively exchange(s).

- erk
Marshall Spight - 19 Jul 2004 19:38 GMT
> Again, sometimes the poor programming is thinking the other way - not of
> the order, but of which customers in certain states have placed more
> than 3 orders which contain both condoms and ice cream cones (or some
> such combination).

I don't know what you've got planned
for tonight, Homer, but count me out.

Marge
Eric Kaun - 25 Jul 2004 05:32 GMT
>>Again, sometimes the poor programming is thinking the other way - not of
>>the order, but of which customers in certain states have placed more
[quoted text clipped - 5 lines]
>
> Marge

Heh.

Oh - and duckbilled platypi (sp?). Need those too.
Leandro Guimaraens Faria Corsetti Dutra - 04 Jun 2004 00:13 GMT
> that is the language of RDBMS: rows and columns.

    Rather tuples and attributes.

Signature

Leandro Guimarães Faria Corsetti Dutra           +55 (11) 5685 2219
Av Sgto Geraldo Santana, 1100 6/71        leandro@dutra.fastmail.fm
04.674-000  São Paulo, SP                                    BRASIL
http://br.geocities.com./lgcdutra/

Anthony W. Youngman - 04 Jun 2004 00:35 GMT
>> It think it is worth noting that is far more difficult to retrieve an
>> invoice the way it looked originally after chopping it up
[quoted text clipped - 19 lines]
>show one to a customer.
>Use a tool that was designed to present data.

THAT WAS MY POINT!

The tool is external to the database ...

Thanks for proving it :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

mAsterdam - 04 Jun 2004 01:18 GMT
>>> It think it is worth noting that is far more difficult to retrieve an
>>> invoice the way it looked originally after chopping it up
[quoted text clipped - 24 lines]
>
> Thanks for proving it :-)

Thank you for your trust
but modesty dictates me to
say I did not prove anything.
I was just giving pragmatic
guidance based on opinion.
Dawn M. Wolthuis - 04 Jun 2004 01:50 GMT
> >>> It think it is worth noting that is far more difficult to retrieve an
> >>> invoice the way it looked originally after chopping it up
[quoted text clipped - 30 lines]
> I was just giving pragmatic
> guidance based on opinion.

LOL  --dawn
Anthony W. Youngman - 07 Jun 2004 23:12 GMT
>>> SQL reports are ugly - I'ld would not want to
>>> show one to a customer.
[quoted text clipped - 8 lines]
>I was just giving pragmatic
>guidance based on opinion.

But it doesn't stop your modest opinion being a perfect example of what
I was trying to say :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

mAsterdam - 08 Jun 2004 00:52 GMT
>>>> SQL reports are ugly - I'ld would not want to
>>>> show one to a customer.
[quoted text clipped - 12 lines]
> But it doesn't stop your modest opinion being a perfect example of what
> I was trying to say :-)

:-)
Eric Kaun - 02 Jun 2004 22:00 GMT
> > > And I'm pretty damn
> > > confident that you can NOT create a theory that will do a reversible
[quoted text clipped - 6 lines]
> is useful in a rather small portion of what we do in addressing the "data
> processing" needs of a business.

Possibly, though that depends on how you define data processing, of course.
Certainly software concepts like concurrency and distributed computing
aren't addressed by relational. When it comes to data, I would say that a
presentation language would be a nice relational add-on, as well as the
definition of a system catalog. The two of those would make a nice
combination.

In any event, Dataphor (and reporting products mAsterdam refers to) infer a
great deal already, and to present both UIs and other user-facing artifacts
derived from... relations! Why you'd structure your data based on its
eventual output is beyond me, since that output is neither unique nor
static.

> There are at least the axioms of set theory and then some things were tossed
> in the mixed without any proof from these axioms, such as restricting the
> sets from which elements can come to sets of scalar values (which has been
> changed now, but 1NF, however defined, would have to be considered an axiom
> since it does not arise from any other mathematics)

1NF says only that the relational model doesn't treat types specially; it
defines the "domain" of the relational model as distinct from the
user-definable (therefore extensible) portions. Not sure why that's a
problem - in any event, lists can be seen as scalars, as long as you're not
requiring the base algebra to in some way acknowledge them. Not sure why
you'd want to, since you do have user-definable types and operations.

And as far as data types go, lists are sometimes nice but there are far
better ones for most purposes - e.g. relation-valued attributes. For
example, let's say that you chose to have a "LineItems" attribute. If you
had only lists, you'd have to have several attributes - one for part
numbers, one for quantities, etc. And you'd have to adopt the convention
that PartNumber[n] corresponds to Quantity[n], with no guarantees that the
lists couldn't end up with different sizes, if your code is bad. With a
relation LineItems = {Part#, Qty}, not only would you have a guarantee, you
could even query that attribute, rather than writing a stupid loop!

So relations give you rich attributes in a far more consistent and powerful
way than lists, which are impoverished little suckers. I rarely use them in
Java - the other types are far more useful and powerful. They're the
type-generators of last resort.

But if you do want to see lists done right, check out any Lisp dialect.
Easier processing, nicer syntax, and better reuse than Java.

(and yes, I understand we're mixing code and data again...)

> It think it is worth noting that is far more difficult to retrieve an
> invoice the way it looked originally after chopping it up (that 1NF thing
> again) and then using SQL to show the invoice again.

True, but it's far easier to derive other facts about that invoice, and its
relationship to the customer and other invoices and parts and shipments,
when it's not all one big blob. Again, if your main task is showing people
the invoice, why transform it at all? How do you know when you've gone too
far? In other words, what are your normalization rules? Is it just unhooking
2NF from 1NF?

> Without arguing the semantics (and mapping of the data to reality) of this
> particular example, if your invoice looked like this when selling a
> beautiful skirt in white and blue that comes from two of your catalogs, it
> is definitely HARDER than a non-1NF environment, though not impossible, to
> get a SQL statement to show your invoice properly.

That's true, primarily because SQL is a bad reporting tool (in addition to
being a bad relational derivative). There are better ones.

> > So what do you want - the invoice paper? Maybe we should just rely on
> > scanners producing JPGs - non-lossy, of course.
>
> No need -- including lists in your data (at least your virtual data!) gets
> you far enough that you don't notice any more big disconnects.

It gets you somewhere, but as I said above, relation-valued attributes get
you much further. Why not them?

> SQL Server
> permits lists in their UDFs, while Oracle (to my knowledge) does not allow
> lists returned from their functions (stored procedures)

Does SQL Server's SQL dialect address lists directly?

> > > Dawn was going on about faith. Do you have faith in
> > > business analysts to get the analysis correct, or would you rather have
[quoted text clipped - 6 lines]
>
> As long as we are all aiming for the same things ...  smiles.  --dawn

True - ease of development and assurance of data integrity are mine. Both
benefit the user...

- erk
Anthony W. Youngman - 04 Jun 2004 00:34 GMT
>> > Certainly with current
>> > relational databases accessed with SQL, you're relying on either an
[quoted text clipped - 10 lines]
>related to each other on that one line.  Stick with me here, I know I said
>that poorly.

What I'm trying to say is that if we use SQL, we get a corrupted version
of the data back (ie data that went in ONCE comes back MULTIPLY
DUPLICATED), *or* we use an application (such as CrystalReports) which
is not part of the database to retrieve the relevant bits from the
relevant table.

The database itself doesn't know which tables represent "the set of
invoices" and doesn't know how to retrieve a single instance of
"invoice" from it - it needs to be told by an external influence, namely
a query.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Anthony W. Youngman - 04 Jun 2004 00:29 GMT
>> This theory will then be the equivalent of Kepler and Newton discovering
>> ellipses and calculus, or of Einstein realising that mass and energy
[quoted text clipped - 3 lines]
>
>Which axioms don't match? I wasn't really aware there were axioms per se.

BLOODY HELL ...

I don't mean to sound stunned, but this takes the biscuit ...

ALL mathematical theories are based on axioms.

Science is basically the search for experimental proof that the axioms
correctly describe the real world.

If you can't describe relational theory in terms of axioms and logical
deductions, then it isn't maths and can't be science!

An axiom is basically "any statement which the model ASSUMES to be
true". In relational theory, I would guess that at least one axiom could
be phrased as "data comes in tuples".

So, if you don't have experiments to show that real-world data ALSO
comes in tuples (or a close approximation thereof), then you can't
conclude that a relational database is a good place to store real-world
data. (Oh - and if you conclude that real-world data DOES come in
tuples, but in several different types of tuple, then your theory needs
to take that into account!)

Sorry for ignoring the rest of your post, but this is ABSOLUTELY
FUNDAMENTAL!!!

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Dawn M. Wolthuis - 04 Jun 2004 01:16 GMT
> >> This theory will then be the equivalent of Kepler and Newton discovering
> >> ellipses and calculus, or of Einstein realising that mass and energy
[quoted text clipped - 15 lines]
> If you can't describe relational theory in terms of axioms and logical
> deductions, then it isn't maths and can't be science!

By George, you've got it., Wol!!!  Perfect!

Relational theory, once some choice axioms are added in (without being
stated as axioms and without being obvious that they out to be axiomatic
when measured by any map to reality) does then proceed with mathematics, but
there is a lot of "tossing stuff in and out" going on because there is not
that match with reality at each point.

Cheers!  --dawn
Eric Kaun - 04 Jun 2004 15:51 GMT
> [SNIP]
> > If you can't describe relational theory in terms of axioms and logical
[quoted text clipped - 7 lines]
> there is a lot of "tossing stuff in and out" going on because there is not
> that match with reality at each point.

So what mathematical axioms do you know of that "map to reality"? I didn't
realize that was the fundamental aspect of an axiom's value. And if it is,
then again, what data axioms do you propose as a good start? They needn't be
formal, but have to have more meaning than "data comes in tuples".

- erk
Dawn M. Wolthuis - 04 Jun 2004 16:53 GMT
> > [SNIP]
> > > If you can't describe relational theory in terms of axioms and logical
[quoted text clipped - 10 lines]
>
> So what mathematical axioms do you know of that "map to reality"?

Those arithmetic ones have worked OK for me.

> I didn't
> realize that was the fundamental aspect of an axiom's value.

It is only of worth if you want to apply the mathematics to something, such
as databases.

> And if it is,
> then again, what data axioms do you propose as a good start? They needn't be
> formal, but have to have more meaning than "data comes in tuples".

I'm not in that spot yet and I do want them to be formal.  smiles.  --dawn
Eric Kaun - 04 Jun 2004 21:05 GMT
> > So what mathematical axioms do you know of that "map to reality"?
>
> Those arithmetic ones have worked OK for me.

I agree, they work well. But what "reality" do they map to? They're
synthetic, albeit extremely useful. How would you correlate them with
reality?

> > I didn't realize that was the fundamental aspect of an axiom's value.
> It is only of worth if you want to apply the mathematics to something, such
> as databases.

Right, but how exactly does one determine the applicability of mathematics
to, say, physics? In other words, what axioms does any branch of mathematics
have that correlate to something in the real world?

- erk
Dawn M. Wolthuis - 05 Jun 2004 06:53 GMT
> > > So what mathematical axioms do you know of that "map to reality"?
> >
[quoted text clipped - 3 lines]
> synthetic, albeit extremely useful. How would you correlate them with
> reality?

Without looiking up the axioms themselves, I map the number 1 to a single
sheep and then with addition, I add in sheep. It's all about sheep.

> > > I didn't realize that was the fundamental aspect of an axiom's value.
> > It is only of worth if you want to apply the mathematics to something,
[quoted text clipped - 4 lines]
> to, say, physics? In other words, what axioms does any branch of mathematics
> have that correlate to something in the real world?

I think that is where Wol's line of discussion was.  As far as I'm
concerned, they correlate as metaphors when they are used and then they are
are used for that which they work for.  So, the correlation is very
pragmatic.  There is no proof of such a correlation, but you can disprove an
exact correlation just as you can come up with a fault in a metaphor.

Somehow I don't think I'm tapping into your questions right 'cause I think
you and I agree on these points and are arguing anyway -- otherwise, without
asking a qusetion for the answer, where do you think we have a disagreement
in this area?  --dawn
Eric Kaun - 07 Jun 2004 19:22 GMT
> Without looiking up the axioms themselves, I map the number 1 to a single
> sheep and then with addition, I add in sheep. It's all about sheep.

I disagree - it's all about turtles, stacked up on an elephant. Or maybe
vice versa. But in either case, there are no sheep. Unless it's the ones the
turtles are dreaming about.

> > Right, but how exactly does one determine the applicability of mathematics
> > to, say, physics? In other words, what axioms does any branch of
[quoted text clipped - 11 lines]
> asking a qusetion for the answer, where do you think we have a disagreement
> in this area?  --dawn

Uh... I object. By golly, I object. (flashback to an old Bloom County
cartoon with Opus in court, pounding on the desk with a gavel saying "By
golly I object" repeatedly...)

Anyway, you're right - I don't think we actually disagree with that. Systems
development is unnatural, and discards detail because the real world is
unautomatable. We can only model and simulate tiny segments of it, and my
assertion is that those models gain far more power from the nature of the
model than from correlation with the real world. That correlation is nice,
and certainly there must be a mapping... but that balance point is what
we're arguing about, not the points above.

Sorry...

- erk
Anthony W. Youngman - 07 Jun 2004 23:39 GMT
>> By George, you've got it., Wol!!!  Perfect!
>>
[quoted text clipped - 9 lines]
>then again, what data axioms do you propose as a good start? They needn't be
>formal, but have to have more meaning than "data comes in tuples".

e=mc^2 ?

Yep. I know it's bl**dy difficult. But if you're not prepared to attempt
it, then you're admitting your theory is irrelevant to the real world
(and cannot be used to solve real-world problems).

Let's take the evolution of that theory I keep on throwing out as an
example.

Copernicus : orbit == circle
Kepler : obit == ellipse
Newton : F=ma; E=1/2mv^2 where m is constant
Einstein : e=mc^2

Each change may only subtly modify the previous axioms, but the result
is theory/model that is a closer fit to reality.

Going back to relational theory. Does the THEORY distinguish between a
"join" and a "join with a cascading delete"? Or a "join" and a "join
with a foreign key that must exist (cannot be null)".

Because if relational theory cannot cope with that, then the Pick model
can. And surely, a relational table who's rows are meaningful in their
own right MUST be different from a table who's rows are meaningless
without another table to relate to?

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 09 Jun 2004 20:58 GMT
> >So what mathematical axioms do you know of that "map to reality"? I didn't
> >realize that was the fundamental aspect of an axiom's value. And if it is,
[quoted text clipped - 6 lines]
> it, then you're admitting your theory is irrelevant to the real world
> (and cannot be used to solve real-world problems).

That's silly. The following have all been used to solve real-world problems:
- Pick
- SQL
- Relational (assuming Dataphor has at least one real-world solution in
place somewhere)
- Flat files
- XML

I don't know the "axioms" of the flat-file solution, and don't think that
Unix's "everything is a file" is really an axiom. What they're saying is
that we have a useful model that treats all data as files.

Besides, who is attempting it? What attempt has MV made? I don't understand
what you're looking for here - do you want science, math, or neither? In
whatever category you want, what does Pick/MV offer? You seem unwilling to
pick up your own gauntlet.

> Let's take the evolution of that theory I keep on throwing out as an
> example.
[quoted text clipped - 6 lines]
> Each change may only subtly modify the previous axioms, but the result
> is theory/model that is a closer fit to reality.

I don't think the above are axioms in the mathematical sense, though I could
be wrong.

> Going back to relational theory. Does the THEORY distinguish between a
> "join" and a "join with a cascading delete"?

Cascading deletes are useful for implementations, not part of the theory - a
cascading delete is simply nice shorthand for an implicit multi-update (as
advocated by Date in recent writings), and roughly corresponds to the
usefulness of the "foreign key" concept in place of a longer-winded
constraint definition.

> Or a "join" and a "join with a foreign key that must exist (cannot be
null)".

In relational, all foreign keys must exist, and no attribute value can be
null.

> Because if relational theory cannot cope with that, then the Pick model
can.

"Cannot cope with that" implies that there is some objective reality that's
presenting X, and that a model that doesn't "cope with" X is a poor match to
reality. While I agree with the implication overall, the premise is false -
there's no objective reality "presenting" cascading deletes or nulls. Those
are both aspects of modeling data. There's no objective reality with which
those correspond. At best, you're pitting Data Model A against Data Model B,
and claiming B is lacking in attribute C, when C doesn't even enter into
Data Model A.

> And surely, a relational table who's rows are meaningful in their
> own right MUST be different from a table who's rows are meaningless
> without another table to relate to?

"Meaningful in their own right" is rhetorical - every relation has a meaning
(the external predicate). To turn the question on its ear, surely a Pick
file which requires applications to enforce the correspondence between
values in several distinct attributes MUST be different from a file whose
attributes refer to the IDs of other files?

- erk
Anthony W. Youngman - 10 Jun 2004 01:34 GMT
>> >So what mathematical axioms do you know of that "map to reality"? I
>didn't
[quoted text clipped - 26 lines]
>whatever category you want, what does Pick/MV offer? You seem unwilling to
>pick up your own gauntlet.

What I'm saying is that maths is great at building a model. But without
science you can't say that any model is useful. Without science, a model
is just an intellectual exercise of no value to the real world.

>> Let's take the evolution of that theory I keep on throwing out as an
>> example.
[quoted text clipped - 9 lines]
>I don't think the above are axioms in the mathematical sense, though I could
>be wrong.

Yes they are. Copernicus ASSUMED that the planets went in circles, and
then he used logic on top or that. Therefore, "orbit == circle" is an
axiom.

Kepler realised that "orbit == ellipse", and that explained why
Copernicus' logic was so screwy.

Newton ASSUMED that m and E could neither be created nor destroyed,
therefore they are axioms.

Einstein realised that m and E were interchangeable, and that explained
why Newton couldn't predict the orbit of Mercury.

Basically, any assumption that underlies mathematical logic is an axiom.
Copernican orbital theory is a mathematical model. Newtonian Mechanics
is a mathematical model. Therefore the assumptions that underlie them
must be axioms.

>> Going back to relational theory. Does the THEORY distinguish between a
>> "join" and a "join with a cascading delete"?
[quoted text clipped - 4 lines]
>usefulness of the "foreign key" concept in place of a longer-winded
>constraint definition.

In other words, the theory has no way of coping with what I call "the
adjectival clause" - a table whose contents are meaningless without the
existence of another table to point to. An invoice line item cannot
exist without an invoice for it to belong to!

Or, in other words again, relational theory is deficient because it has
no way of coping with real-world constructs that "obviously" exist.

>> Or a "join" and a "join with a foreign key that must exist (cannot be
>null)".
[quoted text clipped - 13 lines]
>and claiming B is lacking in attribute C, when C doesn't even enter into
>Data Model A.

But there IS objective reality. A line-item on an invoice, for example.
The former has no existence outwith the latter.

>> And surely, a relational table who's rows are meaningful in their
>> own right MUST be different from a table who's rows are meaningless
[quoted text clipped - 5 lines]
>values in several distinct attributes MUST be different from a file whose
>attributes refer to the IDs of other files?

I don't get that. But I think you're making the logical blunder of
expecting your logic to PREscribe the world's behaviour, rather than
DEscribe it.

Please explain to me how, in the real world, an invoice line item can
have an existence in the absence of the invoice to which it belongs ...
because as I read you you are saying that relational theory says it can
...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 10 Jun 2004 21:15 GMT
> What I'm saying is that maths is great at building a model. But without
> science you can't say that any model is useful. Without science, a model
> is just an intellectual exercise of no value to the real world.

You've probably said it before, and it makes sense. But most sciences verify
their theorems using prediction and comparisons with real-world phenonema.
We don't have that luxury, since we can build working systems with various
data models and languages and frameworks, and any experiments designed to
measure overall productivity in system creation and maintenance have to take
colossal human factors into account... is there some way of testing the
models - or actually the meta-models?

> >> Let's take the evolution of that theory I keep on throwing out as an
> >> example.
[quoted text clipped - 27 lines]
> is a mathematical model. Therefore the assumptions that underlie them
> must be axioms.

OK, my mistake - I thought you meant the formulae themselves were axioms.
You're referring to underlying assumptions, which of course are.

> >Cascading deletes are useful for implementations, not part of the theory - a
> >cascading delete is simply nice shorthand for an implicit multi-update (as
[quoted text clipped - 6 lines]
> existence of another table to point to. An invoice line item cannot
> exist without an invoice for it to belong to!

Right, but cascading delete is different than that. Constraints encode what
you just said ("A cannot exist without B"); cascades are useful shorthands
for updates, designed to make sets of operations easier while obeying the
constraints.

> >"Meaningful in their own right" is rhetorical - every relation has a meaning
> >(the external predicate). To turn the question on its ear, surely a Pick
[quoted text clipped - 5 lines]
> expecting your logic to PREscribe the world's behaviour, rather than
> DEscribe it.

That's not what I'm trying to do, but describing the world's behavior isn't
all software development is. My requirements have always included demands
for information which isn't directly manifest in the real world - in other
words, beyond just being able to reproduce an invoice (thus showing that my
database describes invoices), my data is also the basis for

> Please explain to me how, in the real world, an invoice line item can
> have an existence in the absence of the invoice to which it belongs ...
> because as I read you you are saying that relational theory says it can

I may have expressed myself badly. What relational theory says is that
statements about line items on an invoice state truths about values which
are unrelated to the invoice as a whole, though each line item of course
depends on the invoice (header). That statement, while "related to" the
invoice header (in that it can't exist without it), has logical meaning on
its own - I can formulate useful queries over line items which don't involve
the header.

If cascading delete is your guide, then would deleting a customer cause all
of that customer's invoices (and their line items) to be deleted as well?
Danger aside, does that imply that invoices are an attribute of customers? I
understand there's a difference between customer:invoice and invoice:line
item relationships, but I'm trying to boil it down to something more than
"they're part of the same thing". When it comes to general ledger entries,
invoices, payments, shipments, contacts, etc., the line between what is and
isn't part of a customers gets a little murkier. Or does it?

- Eric
Anthony W. Youngman - 14 Jun 2004 23:12 GMT
>> Basically, any assumption that underlies mathematical logic is an axiom.
>> Copernican orbital theory is a mathematical model. Newtonian Mechanics
[quoted text clipped - 3 lines]
>OK, my mistake - I thought you meant the formulae themselves were axioms.
>You're referring to underlying assumptions, which of course are.

Some of the formulae may have to be axioms too. If you need to assume,
then it's an axiom, if you can derive from your previous assumptions
then it's a theorem.

>> >Cascading deletes are useful for implementations, not part of the
>theory - a
[quoted text clipped - 13 lines]
>for updates, designed to make sets of operations easier while obeying the
>constraints.

But the need for a cascading delete is metadata - information that
should be *implicit* within the database. You're turning it into an
*external* constraint - putting it where it does NOT belong!

>> Please explain to me how, in the real world, an invoice line item can
>> have an existence in the absence of the invoice to which it belongs ...
[quoted text clipped - 7 lines]
>its own - I can formulate useful queries over line items which don't involve
>the header.

I think we're having a bit of fun here :-) You're saying you want to
extract data from certain "columns" without caring what the primary key
is. Fine - no problem there. Ignore the columns you're not interested
in.

I'm saying that deleting the primary key should delete all related rows
- even those in other tables! If your analyst forgot to specify a
cascading delete (and you say that they're external to the theory,
anyway), what you're saying is that the theory FAILS to enforce data
integrity in that you're using something external to theory to keep the
tables in sync.

Pick just stores it all together so that taking out the primary key
takes out everything else.

>If cascading delete is your guide, then would deleting a customer cause all
>of that customer's invoices (and their line items) to be deleted as well?
[quoted text clipped - 4 lines]
>invoices, payments, shipments, contacts, etc., the line between what is and
>isn't part of a customers gets a little murkier. Or does it?

Well, if you follow accounting rules, yes it should :-) Although I think
that really goes the other way - you can't create an invoice for a
non-existent customer :-)

I think I know what you mean though, when you say "gets a little
murkier". Except, in practice, it doesn't. "customer" is a noun - it
gets its own FILE. "invoice" likewise. "line item" - is it a noun or
adjectival clause? Pick Business Analysis would unhesitatingly place it
in the category of adjectival clause. But I know why you would want to
treat it as a noun.

Probably because it makes the General Ledger so much easier :-) you want
to analyse by line-item, and not by invoice. Actually, that's not
difficult at all - you just add ledger code as an attribute of invoice,
grouped as part of line-item :-) But yep. I can see why you wouldn't
think it as clean - I'm inclined to agree with you. If I was programming
this, I'd probably say that "line-item" in the general ledger wasn't the
same as "line-item" in the invoice and that would make my life nice and
simple :-) but it would have the relational people throwing their hands
up in horror. Or just make the entries in the GENERAL-LEDGER FILE a list
of foreign keys pointing at the line item in the invoice file - not hard
at all. Just a smidgeon more work for the database (but rather more
mental contortion for the programmer).

But I've been thinking about a few other things while this reply has
been sitting half-composed on my computer ... Relational Theory is all
about capturing *data*. BUT - a lot of information is *metadata* which
an RDBMS is incapable of storing as such. We were discussing ordering -
an RDBMS only captures this - as data - if the analyst thinks it
important. A Pick database captures it as a matter of course.

And constraints - I categorise them as "natural constraints" and
"business constraints". You can't have an invoice line item without an
invoice - that's a "natural constraint". But you *can* have an invoice
without a valid company. It might be an error, or it might be called a
receipt. But there's nothing to stop the accounts dept screwing up and
issuing an invoice to a non-existent company :-) That's what I call a
"business constraint". You seem to think that should be captured as
*data*. Pick captures it as *metadata*.

Now compare the amount of *metadata* available to Pick and/or
relational. It doesn't matter what your database is, the data in it is,
as far as the dbms is concerned, a meaningless "blob". To optimise
performance, storage, whatever, the only thing available of any use to
the dbms is *metadata*. Which Pick has in abundance.

That's why I describe Pick as a superset of relational - it can convert
metadata into data and present it to the app. It can also USE the
metadata to optimise itself. Relational can only store this sort of
information as *data*, and as such the information is not available to
the dbms for its internal use.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony - 15 Jun 2004 10:38 GMT
> Now compare the amount of *metadata* available to Pick and/or
> relational. It doesn't matter what your database is, the data in it is,
[quoted text clipped - 7 lines]
> information as *data*, and as such the information is not available to
> the dbms for its internal use.

False.  A relational database contains a lot of metatdata:
primary/unique keys, foreign keys, other constraints.  All of these
are available to the RDBMS for optimisation purposes.  To take your
invoice example, the RDBMS "knows" that a given invoice has 14 invoice
lines just as surely as Pick does.
Laconic2 - 15 Jun 2004 13:39 GMT
> False.  A relational database contains a lot of metatdata:
> primary/unique keys, foreign keys, other constraints.  All of these
> are available to the RDBMS for optimisation purposes.  To take your
> invoice example, the RDBMS "knows" that a given invoice has 14 invoice
> lines just as surely as Pick does.

Excellent, excellent point.  So many people fail to recognize this feature
of an RDBMS.

When I built a "data mart" in Oracle as a star schema,  I included all the
primary and foreign key constraints, even though it slowed down loading.
The advantage came when I went to copy the star into Cognos (Impromptu or
Power Play,  I forget)
Both Cognos and the Oracle optimizer recognized my star schema for what it
was,  and made appropriate use of that fact.
x - 15 Jun 2004 14:28 GMT
> > "Anthony W. Youngman" <wol@thewolery.demon.co.uk> wrote in message
> news:<qZztNxCLLizAFwnY@thewolery.demon.co.uk>...
[quoted text clipped - 7 lines]
> Excellent, excellent point.  So many people fail to recognize this feature
> of an RDBMS.

And all this "knowledge" about data won't spare you from writing code for
assembling several SQL statements.

What Anthony said is this:
1) we entered all data in an invoice *at once* in the database
2) we should be able to work on the *whole* as well as on a part of this
data by means of the DBMS
3) we should be able to ask for the *same* data we entered in the database
as a whole by means of the DBMS.
If we are not able to do this with a RDBMS, then something is missing.
He call this metadata.

> When I built a "data mart" in Oracle as a star schema,  I included all the
> primary and foreign key constraints, even though it slowed down loading.
> The advantage came when I went to copy the star into Cognos (Impromptu or
> Power Play,  I forget)
> Both Cognos and the Oracle optimizer recognized my star schema for what it
> was,  and made appropriate use of that fact.

You were lucky.
Laconic2 - 15 Jun 2004 16:23 GMT
> > When I built a "data mart" in Oracle as a star schema,  I included all the
> > primary and foreign key constraints, even though it slowed down loading.
[quoted text clipped - 4 lines]
>
> You were lucky.

I don't think so.  The engineers who built the CBO for Oracle were real
smart.  And they had the example of the DEC Rdb optimizer to guide them.
And there was a note somewhere in the release notes saying they had
implemented a thing they called a "star join".  That was  enough for me.

And the Oracle DBA, who had plenty of experience with databases that ran
like molasses,  was amazed at the performance I got out of this beast.
Especially when she looked at my code,  and didn't find any "hints"  and
already knew that my tablespaces had nothing but default parameter settings.

It's amazing how many times you "get lucky" by just following simple, sound
design, and by keeping things "as simple as possible, but not simpler than
that."  In the few places where you end up with a performance problem, you
can typically tune locally, without ripples spreading all over the system.

The engineers who built the Cognos data extraction tool were real smart.
And if they knew MDDB down cold (which they must have),  then they almost
certainly knew how to recognize a star schema when they saw one.   They way
I knew that they knew was by looking at the SQL the Cognos tool used to
extract the data from my star schema.  Sure enough,  they "got it".

A lot of people in this business get a lot of bang for the buck by assuming
that "everybody but me is an idiot".  I've gotten a lot of bang for the buck
by assuming just the opposite:  "nobody in this business is an idiot.  But
everybody makes mistakes,  and some of them are idiotic."

(I occasionally call people "idiots".  But that's just venting.)
Dawn M. Wolthuis - 15 Jun 2004 18:21 GMT
> > > When I built a "data mart" in Oracle as a star schema,  I included all
> the
[quoted text clipped - 12 lines]
> And there was a note somewhere in the release notes saying they had
> implemented a thing they called a "star join".  That was  enough for me.

Is the star join a relational concept?  I heard someone suggest that
fact-dimension tables with star schema is bad design, but I forget the
rationale for that and they seem to be very effective.

> And the Oracle DBA, who had plenty of experience with databases that ran
> like molasses,  was amazed at the performance I got out of this beast.
[quoted text clipped - 16 lines]
> by assuming just the opposite:  "nobody in this business is an idiot.  But
> everybody makes mistakes,  and some of them are idiotic."

Good rule of thumb.

> (I occasionally call people "idiots".  But that's just venting.)

and that's fine behind closed doors.  I'm thankful that there is less of
that in public on this list than there was when I started (I'm not sure now
what to do with the balls I had to grow at that time, but pleased that I no
longer need them to chat here  ;-)

Cheers!  --dawn
Laconic2 - 15 Jun 2004 20:49 GMT
> Is the star join a relational concept?  I heard someone suggest that
> fact-dimension tables with star schema is bad design, but I forget the
> rationale for that and they seem to be very effective.

As near as I can make out,  a "star join" is yet another join algorithm,
that is added to the ones previously implemented.

Earlier join algorithms include the "loop join" and the "merge join".  I
could describe these in more detail, but you may already know them.  They
all acheive the same result:  a join.  They differ in performance,  and
different ones are better in different cases. A smart optimizer picks the
best algorithm given the available information.

A star schema is not a relational concept as such.  A star schema is a
projection of the multidimensional model onto databases like Oracle, DB2,
etc.  that I still refer to as "relational DBMSes",  except in this forum,
where I will be scolded by the keepers of the faith if I do.

In order to implement a successful star schema, you have to unlearn most of
what you learned in normalization catechism.
I would have said that would be fun for you,  except that you don't unlearn
1NF.

Is it bad design?  It depends.  For certain types of uses,  it is far more
useful than a fully normalized relational design.
Especially reporting, warehousing, and OLAP.   Like almost everything in
life,  sometimes it's a good idea, sometimes it's a bad idea.

But I wouldn't recommend that you run off and learn star schema immediately,
although it might be useful if you could incorporate that into some of the
SQL you teach in college.  What I would recommend, for what it's worth,  is
that you learn a little MDDB and OLAP, if you haven't already. Then,  I
think you would find it quite easy to back your way into star schema.

It's just  a blend of MDDB concepts with the relational and SQL concepts you
already know.
Dawn M. Wolthuis - 15 Jun 2004 21:09 GMT
> > Is the star join a relational concept?  I heard someone suggest that
> > fact-dimension tables with star schema is bad design, but I forget the
[quoted text clipped - 8 lines]
> different ones are better in different cases. A smart optimizer picks the
> best algorithm given the available information.

Yes, I do have plenty of star join experience (with SQL-DBMS's and with OLAP
"cubes" in various tools)

> A star schema is not a relational concept as such.  A star schema is a
> projection of the multidimensional model onto databases like Oracle, DB2,
> etc.  that I still refer to as "relational DBMSes",  except in this forum,
> where I will be scolded by the keepers of the faith if I do.

I've switched from RDBMS to SQL-DBMS for that reason. I think TRDBMS is the
same as RDBMS.

> In order to implement a successful star schema, you have to unlearn most of
> what you learned in normalization catechism.
> I would have said that would be fun for you,  except that you don't unlearn
> 1NF.

Yes, interesting, eh?  It makes you think that 1NF is decidedly a different
animal.  But since I've done stars(-ish) in Pick as well, I can say with
certainty that 1NF is not required.

> Is it bad design?  It depends.  For certain types of uses,  it is far more
> useful than a fully normalized relational design.
> Especially reporting, warehousing, and OLAP.   Like almost everything in
> life,  sometimes it's a good idea, sometimes it's a bad idea.

There are OLTP (online transaction processing) designs that can double
handily for OLAP (online analytical processing).  I'd tell you what they
are, but I'm trying to trim back my use of the P word.  There are good
reasons to rehost data into some other format, but if all of the data you
need are in a single OLTP system and you don't need a frozen point in time,
then it is such a shame that so many people feel a need to pull their data
out of their SQL-DBMS's and/or reshape it just so they can get information
back out (reporting), don't you think?

> But I wouldn't recommend that you run off and learn star schema immediately,
> although it might be useful if you could incorporate that into some of the
> SQL you teach in college.  What I would recommend, for what it's worth,  is
> that you learn a little MDDB and OLAP, if you haven't already. Then,  I
> think you would find it quite easy to back your way into star schema.

Sorry to mislead you, I'm well-versed in the ways of the stars -- more so
than I am with other relational joins in SQL-DBMS's.

> It's just  a blend of MDDB concepts with the relational and SQL concepts you
> already know.

Maybe the relational complaint is about implementing fact-dimension
strategies in MOLAP or other non-RDBMS products.  I thought I had heard
someone state that designing star schemas was both unnecessary and outside
of relational modeling.  I'll check Date's book later to see what he says.

cheers!  --dawn
Eric Kaun - 16 Jun 2004 20:16 GMT
> Is the star join a relational concept?  I heard someone suggest that
> fact-dimension tables with star schema is bad design, but I forget the
> rationale for that and they seem to be very effective.

Star schemas are created primarily for performance reasons, because SQL
DBMSs are so bad. They're typically denormalized extracts / transformations
of normalized schemas, and thus can be regarded as large views. I think
they're 2NF but not 3+NF. In any event, you wouldn't want to update one of
them, because not only are they denormalized enough that you'd need to
update N other rows, but expressing the constraints as triggers in a SQL
database, as a derivation of the real integrity rules in the source
normalized schema, would be ugly (to say the least). Thus the ETL
(extract-transform-load) as basically a big function over the original
database.

> and that's fine behind closed doors.  I'm thankful that there is less of
> that in public on this list than there was when I started (I'm not sure now
> what to do with the balls I had to grow at that time, but pleased that I no
> longer need them to chat here  ;-)

I won't ask where they go when you're not using them, or whether you're
still able to regrow them at will like the gender-changing frogs whose DNA
provided the catalyst for the dino-crisis in Jurassic Park... oh wait, I
guess I just begged the question I was too demure to ask directly. :-\

I do wonder whatever happened to Bob... maybe he's reading, maybe not. I
can't say I miss the abuse, but do think he knew his stuff relationally.

- Eric
Dawn M. Wolthuis - 17 Jun 2004 00:43 GMT
> > Is the star join a relational concept?  I heard someone suggest that
> > fact-dimension tables with star schema is bad design, but I forget the
[quoted text clipped - 27 lines]
>
> - Eric

Yes, and I feel bad about him leaving.  I didn't let him bully me off the
list and I think that contributed to him either leaving or going into a
silent state.  I figured the list was big enough for both of us and I also
think he knew a ton about relational theory and would prefer to learn from
him than have him gone.  I don't miss being called names constantly, but I
hope he is doing well.  So, Bob B, we miss you (even if not your abuse).  If
you need me to leave before you return, let me know and I will bow out.

--dawn
Tony - 15 Jun 2004 21:48 GMT
> > When I built a "data mart" in Oracle as a star schema,  I included all the
> > primary and foreign key constraints, even though it slowed down loading.
[quoted text clipped - 4 lines]
>
> You were lucky.

Yes, and the harder he works on database design, the luckier he gets ;-)
Eric Kaun - 16 Jun 2004 19:49 GMT
> >OK, my mistake - I thought you meant the formulae themselves were axioms.
> >You're referring to underlying assumptions, which of course are.
>
> Some of the formulae may have to be axioms too. If you need to assume,
> then it's an axiom, if you can derive from your previous assumptions
> then it's a theorem.

Yes, agreed...

> >Right, but cascading delete is different than that. Constraints encode what
> >you just said ("A cannot exist without B"); cascades are useful shorthands
[quoted text clipped - 4 lines]
> should be *implicit* within the database. You're turning it into an
> *external* constraint - putting it where it does NOT belong!

I don't understand your distinction between "implicit" and "external" here.
External to what? In relational, both are part of the database, which
includes both relations (actually relvars, relation-typed variables which
are updated with new relation values) and constraints. In most businesses
there are rules which bind multiple relations; I'm sure you have something
similar in Pick, though it may be enforced by the application. Using
first-order logic over relvars, you can specify most of these (if not all).

Haven't you ever seen a Pick app where deleting from (or updating) a record
in FILE A requires a corresponding delete/update in FILE B, yet you don't
have the ability to encode that in the file or dictionary?

I think you're suggesting that the data structures themselves should encode
the constraints, which gets you into dangerous territory, leading to novel
data structures for each individual enterprise. That's fine as long as the
query and update operators stay consistent, but you'd quickly find
constraints undoing that, leaving you with custom persistence and no
standard at all. Remember that even foreign and primary key constraints are
just that; SQL and even D give shortcuts, but it's just a 1-1 mapping to a
constraint declaration.

> >I may have expressed myself badly. What relational theory says is that
> >statements about line items on an invoice state truths about values which
[quoted text clipped - 5 lines]
>
> I think we're having a bit of fun here :-)

Yes, apparently we both have a sick notion of fun. :-)

> You're saying you want to
> extract data from certain "columns" without caring what the primary key
> is. Fine - no problem there. Ignore the columns you're not interested
> in.

But then why include them at all? Certainly I can ignore attributes, for
example in updating attribute A I ignore attribute B. Consistently treating
a set of Pick attributes as a group (e.g. the line item attributes), while
they're part of INVOICE, seems logically wrong; those attributes are
different than, for example, the INVOICE_DATE.

> I'm saying that deleting the primary key should delete all related rows
> - even those in other tables! If your analyst forgot to specify a
> cascading delete (and you say that they're external to the theory,
> anyway), what you're saying is that the theory FAILS to enforce data
> integrity in that you're using something external to theory to keep the
> tables in sync.

Nothing external about it. In Pick the integrity of the files is enforced by
the application, yet you don't regard the app as external (at least I've
seen arguments to the contrary). Constraints are different from relations
because they make statements about those relations. Both are integral to
relational.

> Pick just stores it all together so that taking out the primary key
> takes out everything else.

A fine shorthand, and again the CASCADE DELETE (not always what you want, by
the way) is simple enough to do, and even to add in later (unlike in Pick,
where you have to make that decision up front).

I object less to this than you'd expect; I can see some cases where this
buys you a short-term gain. I just see little long-term gain, and expect
long-term cost. I've been trying to think of past databases I've worked on,
and whether MVs would have bought me anything. Haven't found anything yet...
in the few cases where multi-descriptions or multi-coding would have helped,
I had cross-business unit and internationalization issues that would have
prevented leveraging them anyway. And I can remember a few cases where
properly treating a simple code as its own "noun", rather than an adjective,
saved me much work later.

Short version: I see adjectives "becoming" relations fairly frequently. I
see relations which remain "unused" as such infrequently. Of course, I'm
aware that our perceptions are less than objective, and that the lexicons in
our head guide our observations more than they should.

> I think I know what you mean though, when you say "gets a little
> murkier". Except, in practice, it doesn't. "customer" is a noun - it
> gets its own FILE. "invoice" likewise. "line item" - is it a noun or
> adjectival clause? Pick Business Analysis would unhesitatingly place it
> in the category of adjectival clause. But I know why you would want to
> treat it as a noun.

And I can see the desire to make it an adjective - believe me, I understand
the object-oriented view, the desire to treat the entire business notion as
a single object. But I've been bitten too much by doing so, and rarely by
"overnormalizing" - and I can usually see an impending need to "add more
intelligence" to that "attribute."

> Probably because it makes the General Ledger so much easier :-) you want
> to analyse by line-item, and not by invoice. Actually, that's not
[quoted text clipped - 8 lines]
> at all. Just a smidgeon more work for the database (but rather more
> mental contortion for the programmer).

And I certainly understand the development-time advantage in reports and GUI
screens that a list attribute gives you. I have no doubt that Pick leverages
the MV paradigm far, far more than either SQL or SQL libraries leverage
relational (or even SQL, for that matter). There are better environments and
libraries, but they're far from good.

If you read Michael Jackson (not the king of pop, not the beer expert, but
the English software engineer), he has an interesting approach to business
(domain) analysis - and it advocates predicates prior to (or instead of)
object analysis. It's interesting not just because it accords more with
relational (a side benefit), but because it treats "phenomena", modeled by
predicates and "owned" by different domains, the basis for design.

> But I've been thinking about a few other things while this reply has
> been sitting half-composed on my computer ... Relational Theory is all
> about capturing *data*. BUT - a lot of information is *metadata* which
> an RDBMS is incapable of storing as such. We were discussing ordering -
> an RDBMS only captures this - as data - if the analyst thinks it
> important. A Pick database captures it as a matter of course.

True. An avenue for discussion might also be what other metadata is useful,
other than order. I think that more general question would cut more to the
heart of why different data models appeal in different ways.

> And constraints - I categorise them as "natural constraints" and
> "business constraints". You can't have an invoice line item without an
[quoted text clipped - 4 lines]
> "business constraint". You seem to think that should be captured as
> *data*. Pick captures it as *metadata*.

Given that relational advocates (at least in recent writings by Date) a
system catalog that is also composed of relational (and which effectively
represents a partial second-order relational algebra/calculus), I'd say that
relational definitely wants all data, even metadata, as relations and
constraints. I have no particular reason to think that that's not desirable,
but it also begs the question: what metadata is there, what's useful, and
how does the importance of a given "type" of metadata influence the utility
of a given data model?

There may be research on such... just haven't stumbled across it.

> Now compare the amount of *metadata* available to Pick and/or
> relational. It doesn't matter what your database is, the data in it is,
> as far as the dbms is concerned, a meaningless "blob". To optimise
> performance, storage, whatever, the only thing available of any use to
> the dbms is *metadata*. Which Pick has in abundance.

What, other than ordering?

> That's why I describe Pick as a superset of relational - it can convert
> metadata into data and present it to the app.

True enough about the ordering, but I'd argue that without any constraints
(ordering is implicit), Pick doesn't offer much else. I have to admit not
knowing enough about the dictionary, but that seems to be functional
transformation, not actual constraints on what's placed into a file... and
in particular no constraints that cross multiple files. I think relational
constraints can be much, much more descriptive (as well as being
proscriptive) - far better than SQL would let on.

> It can also USE the metadata to optimise itself.

How so? I thought the programmer had to make the opimization, by choosing
what data is retrieved at one time by virtue of being in the same file? I
may be missing something.

> Relational can only store this sort of
> information as *data*, and as such the information is not available to
> the dbms for its internal use.

Well, if the catalog is relational (as it should be), then I'd say this
isn't quite correct. One could even enforce database design / naming
standards using constraints over system catalog relations!

In any event, it would seem useful even in Pick if there were certain
"implicit" files that represented the files in the system - for example, a
file called FILE with one record per file, and perhaps an attribute called
ATTRIBUTES containing a list of attributes... anyway, you can probably see
the utility of that for app generation, enforcing standards, and even
implementing the Pick engine (and extensions/plugins). Date advocates that,
and I believe that Dataphor uses that heavily.

I do wish the Dataphor folks would chime in... it would be nice to hear
something from a real relational engine.

- erk
Dawn M. Wolthuis - 17 Jun 2004 00:34 GMT
> > >OK, my mistake - I thought you meant the formulae themselves were axioms.
> > >You're referring to underlying assumptions, which of course are.
[quoted text clipped - 36 lines]
> just that; SQL and even D give shortcuts, but it's just a 1-1 mapping to a
> constraint declaration.

I suspect that Wol was talking about one of the more common relationships
between relations -- that of parent and child.  A parent-child relationship
is designed and then specified and no additional constraints or logic of any
sort is required to ENSURE there are no children without a parent and if the
parent goes, the children are gone too. Of course this can be accomplished
handily in a SQL-based solution, but it isn't quite as intuitive.

> > >I may have expressed myself badly. What relational theory says is that
> > >statements about line items on an invoice state truths about values which
[quoted text clipped - 9 lines]
>
> Yes, apparently we both have a sick notion of fun. :-)

Count me in on the sick fun, but, nevermind -- I deleted my first response,
which is just as well.

> > You're saying you want to
> > extract data from certain "columns" without caring what the primary key
[quoted text clipped - 6 lines]
> they're part of INVOICE, seems logically wrong; those attributes are
> different than, for example, the INVOICE_DATE.

I'm missing your point.  INVOICE_DATE is an attribute of an INVOICE and
INVOICE_LINE_ITEM is an attribute of an INVOICE, even if it has both
cardinality and degree greater than 1.

> > I'm saying that deleting the primary key should delete all related rows
> > - even those in other tables! If your analyst forgot to specify a
[quoted text clipped - 5 lines]
> Nothing external about it. In Pick the integrity of the files is enforced by
> the application,

In the case that Wol is talking about -- the parent-child relationship, it
is the database that enforces integrity, not the application.

> yet you don't regard the app as external (at least I've
> seen arguments to the contrary).

all a matter of definition

> Constraints are different from relations
> because they make statements about those relations. Both are integral to
> relational.

There are constraints that are part of the relation -- an attribute being
part of a relation is a constraint of sorts, for example.

> > Pick just stores it all together so that taking out the primary key
> > takes out everything else.
>
> A fine shorthand, and again the CASCADE DELETE (not always what you want, by
> the way) is simple enough to do, and even to add in later (unlike in Pick,
> where you have to make that decision up front).

Yes, but it does get missed often and application developers have to know
whether such logic is left to the app or is encoded in the database.

> I object less to this than you'd expect; I can see some cases where this
> buys you a short-term gain. I just see little long-term gain, and expect
> long-term cost.

It might not be this particular feature, but I suspect it is a part of what
makes for agile software development -- it is easy to make a mess of Pick
design over time, but there are an amazing number of twenty-year-old systems
out there (in need of database refactoring, no doubt).

> I've been trying to think of past databases I've worked on,
> and whether MVs would have bought me anything. Haven't found anything yet...
[quoted text clipped - 3 lines]
> properly treating a simple code as its own "noun", rather than an adjective,
> saved me much work later.

I think some code-offs in the future might be in order.

> Short version: I see adjectives "becoming" relations fairly frequently. I
> see relations which remain "unused" as such infrequently. Of course, I'm
> aware that our perceptions are less than objective, and that the lexicons in
> our head guide our observations more than they should.

Language does A LOT to guide our perceptions.  I was just in a meeting with
a project manager who is implementing a PICK application (although he
doesn't know that) when the last project he managed was SAP on Oracle.  He
said that comparatively this was a piece of cake except that it is so
different that he doesn't know if he is asking all of the right questions.
I wanted to tell him to ask the questions that he would have if he had never
been in an SAP or Oracle shop, but opted not to say that.  It seems to me
that relational thinking trains something out of us rather than training
something into us.  Just thinking outloud.

> > I think I know what you mean though, when you say "gets a little
> > murkier". Except, in practice, it doesn't. "customer" is a noun - it
[quoted text clipped - 8 lines]
> "overnormalizing" - and I can usually see an impending need to "add more
> intelligence" to that "attribute."

It is the combination of the initial structure plus the ability to make
changes over time that helps to handle these impending changes.  I agree
with your statement as it relates to 2nd & 3rd normal forms (functional
dependency issues).

> > Probably because it makes the General Ledger so much easier :-) you want
> > to analyse by line-item, and not by invoice. Actually, that's not
[quoted text clipped - 13 lines]
> the MV paradigm far, far more than either SQL or SQL libraries leverage
> relational (or even SQL, for that matter).

One person mentioned that Pick is archaic, old-fashioned, or whatever.  That
is true and you should not give it too much credit, especially on the GUI
side (given that didn't exist in the 70's and there has been little
enhancement to Pick in the past few decades -- some will disagree with me).
My interest in it for the future is as a better starting point for the
industry than the SQL-DBMS's are.  I can see that it has provided its users
with better agility than the SQL-DBMS and that it "thinks like people think"
about data (way too vague, I realize).

> There are better environments and
> libraries, but they're far from good.
[quoted text clipped - 31 lines]
> relational definitely wants all data, even metadata, as relations and
> constraints.

Yes, that is my understanding of the theory (not the practice)

> I have no particular reason to think that that's not desirable,
> but it also begs the question: what metadata is there, what's useful, and
[quoted text clipped - 10 lines]
>
> What, other than ordering?

The logical structures are built on lots of derived data (sort-of analogous
to stored procedures).

<snip> I'll stop there - I can't get through the entire thing -- you guys
can last a long time!
--dawn
Anthony W. Youngman - 19 Jun 2004 01:31 GMT
>> But the need for a cascading delete is metadata - information that
>> should be *implicit* within the database. You're turning it into an
[quoted text clipped - 7 lines]
>similar in Pick, though it may be enforced by the application. Using
>first-order logic over relvars, you can specify most of these (if not all).

See lower in my original post - an invoice line can't exist without an
invoice, whereas a car can exist without an owner ...

>Haven't you ever seen a Pick app where deleting from (or updating) a record
>in FILE A requires a corresponding delete/update in FILE B, yet you don't
>have the ability to encode that in the file or dictionary?

It probably happens, but MUCH less than relational. So much so, that the
need has never been worth doing anything about :-)

>I think you're suggesting that the data structures themselves should encode
>the constraints, which gets you into dangerous territory, leading to novel
[quoted text clipped - 15 lines]
>they're part of INVOICE, seems logically wrong; those attributes are
>different than, for example, the INVOICE_DATE.

Because statistics tell me that if I access a line item, then I am
highly likely to want to access the invoice date at the same time. The
cost of retrieving it unnecessarily turns out to be worth it in making
it available in case I want it.

But that's a physical thing that relational theory refuses to address...

>> I'm saying that deleting the primary key should delete all related rows
>> - even those in other tables! If your analyst forgot to specify a
[quoted text clipped - 8 lines]
>because they make statements about those relations. Both are integral to
>relational.

No they are NOT enforced by the application. They are enforced by the
DESIGN. An invoice line has as its primary key the invoice number (plus
an *implicit* sequence number). Delete that primary key and all
associated data disappears including all the invoice lines.

>> Pick just stores it all together so that taking out the primary key
>> takes out everything else.
>
>A fine shorthand, and again the CASCADE DELETE (not always what you want, by
>the way) is simple enough to do, and even to add in later (unlike in Pick,
>where you have to make that decision up front).

Yup, you do "make that decision up front", but it's an obvious decision.
I know you can't design something to be fool-proof, but you really do
have to be an idiot to make a design blunder of this magnitude...

>I object less to this than you'd expect; I can see some cases where this
>buys you a short-term gain. I just see little long-term gain, and expect
[quoted text clipped - 10 lines]
>aware that our perceptions are less than objective, and that the lexicons in
>our head guide our observations more than they should.

And Pick treats foreign keys as "just another attribute", so I'm sorry
but I'd just dismiss your "adjectives become relations" with "so what!".
We don't see it as a problem.

>> But I've been thinking about a few other things while this reply has
>> been sitting half-composed on my computer ... Relational Theory is all
[quoted text clipped - 6 lines]
>other than order. I think that more general question would cut more to the
>heart of why different data models appeal in different ways.

Part of the problem is that relational theory explicitly ignores
implementation. The main reason for keeping metadata as metadata not
data, is that it assists greatly in optimisation - an implementation
issue.

Keeping metadata is worthless in the relational paradigm, which I
suspect is why so many people have trouble with me repeatedly talking
about statistics :-)

>> And constraints - I categorise them as "natural constraints" and
>> "business constraints". You can't have an invoice line item without an
[quoted text clipped - 23 lines]
>
>What, other than ordering?

Which data is *tightly* linked to other data, and which data is
*loosely* linked. Information which helps it guarantee that after one
disk access, the next few requests can be met from cache not disk ...

>> That's why I describe Pick as a superset of relational - it can convert
>> metadata into data and present it to the app.
[quoted text clipped - 6 lines]
>constraints can be much, much more descriptive (as well as being
>proscriptive) - far better than SQL would let on.

The dictionary doesn't declare constraints between FILEs, but seeing as
a FILE usually contains contents equivalent to several relational
tables, it has no need to ... that's why Pick doesn't have constraints
like relational does - even those of us who understand relational
constraints just can't see the point of implementing them in Pick :-)

>> It can also USE the metadata to optimise itself.
>
[quoted text clipped - 17 lines]
>implementing the Pick engine (and extensions/plugins). Date advocates that,
>and I believe that Dataphor uses that heavily.

Sounds interesting ... and from another post of mine, you'll see that if
I've understood you correctly, Pick does indeed have something like that
... or if it doesn't then it would be easily implemented if it made any
sense within the model.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Alfredo Novoa - 10 Jun 2004 14:43 GMT
>> Copernicus : orbit == circle
>> Kepler : obit == ellipse
[quoted text clipped - 6 lines]
>I don't think the above are axioms in the mathematical sense, though I could
>be wrong.

Of course they are not!

An axiom is a proposition regarded as self-evidently true without
proof.

http://mathworld.wolfram.com/Axiom.html

Regards
 Alfredo
Bill H - 10 Jun 2004 23:32 GMT
Alfredo:

[snipped]
>> Let's take the evolution of that theory I keep on throwing out as an
>> example.
[quoted text clipped - 9 lines]
>I don't think the above are axioms in the mathematical sense, though I could
>be wrong.

> An axiom is a proposition regarded as self-evidently true without
> proof.
>
> http://mathworld.wolfram.com/Axiom.html

I think this definition is too rigid.  Thinking of an axiom this rigidly
often produces a rigidly, narrow analysis.  :-)

An axiom can easily be thought of as both a self-evident truth (so what's
self-evident?) or an assumption to use to base a further analysis.  Newton's
3 laws of motion are generally referred to as axioms that are used as
assumptions (or postulates) for further theoretical analysis.

Since databases are natural companions to multiple environments (business,
gov't, etc) we shouldn't be limiting our inquiry with such rigid definitions
of useful words.

Bill
Laconic2 - 11 Jun 2004 12:20 GMT
> An axiom can easily be thought of as both a self-evident truth (so what's
> self-evident?) or an assumption to use to base a further analysis.  Newton's
> 3 laws of motion are generally referred to as axioms that are used as
> assumptions (or postulates) for further theoretical analysis.

People in this forum tend to confuse "axiom" and "hypothesis".

But then, they also tend to confuse "math" and "science".
Alfredo Novoa - 14 Jun 2004 15:46 GMT
>> An axiom is a proposition regarded as self-evidently true without
>> proof.
>>
>> http://mathworld.wolfram.com/Axiom.html
>
>I think this definition is too rigid.

No, it is correct.

>An axiom can easily be thought of as both a self-evident truth (so what's
>self-evident?)

Absolutely trivial and self contained. You don't need to operate with
the statement to see that it is true.

For instance here is the fitst of Euclid's postulates:

"A straight line segment can be drawn joining any two points."

This is contained in the line definition. Nothing new.

>or an assumption to use to base a further analysis.  Newton's
>3 laws of motion are generally referred to as axioms that are used as
>assumptions (or postulates) for further theoretical analysis.

It is a very bad use of the terms. Postulates are not assumptions,
postulates are axioms: truths.

Newtos's 3 laws of motion are not evident, self consistent nor true.

>Since databases are natural companions to multiple environments (business,
>gov't, etc) we shouldn't be limiting our inquiry with such rigid definitions
>of useful words.

Rigid and correct are different things.

Regards
 Alfredo
Todd B - 14 Jun 2004 22:05 GMT
> >> An axiom is a proposition regarded as self-evidently true without
> >> proof.
[quoted text clipped - 4 lines]
>
> No, it is correct.

Yep, it sure is a good statement of the concept.

> >An axiom can easily be thought of as both a self-evident truth (so what's
> >self-evident?)
[quoted text clipped - 7 lines]
>
> This is contained in the line definition. Nothing new.

Axioms are based within a system of thought.  For example, Euclid was
thinking about planar geometry.  Is it possible that if your straight
line bent by space could not connect two points in that space?  Ah,
then you might think, "Well then, it's not a straight line anymore."
But from who's perspective?  I'm thinking about Einsteinian physics,
or even touching on n-dimensional concepts.  Axioms just set down
rules (and the rules don't have to make 'sense' in the real world) for
a logical system.  They are not 'true' inherently to the real world.
They simply are a base for logical deduction.  Although looking back
at your post, we may be thinking the same thing.

> >or an assumption to use to base a further analysis.  Newton's
> >3 laws of motion are generally referred to as axioms that are used as
> >assumptions (or postulates) for further theoretical analysis.

Referring to this earlier post, I'd say: Newton's laws are not
postulates (axioms).  They are theorems in physics based upon his
original hypotheses.  These physical theorems, as far as I know, are
different than mathematical theorems, where the former are
elucidations about the physical world we perceive, the latter are
conclusions derived from the original axioms with certain rules
applied to those axioms.  Newton's laws, in other words, make bad
examples in this discussion about axioms.

> It is a very bad use of the terms. Postulates are not assumptions,
> postulates are axioms: truths.

Well said, but, truths in the real world, or within the system?
Because I can build any logical system with a set of axioms.  They
will always be true (if they don't contradict each other) because
that's where I started.  I made them true, like an act of God.  I
said, "This is how it is; where do we go from here."  IMO, I think
that is what the Wolfram definition is stating rather clearly.

> Newtos's 3 laws of motion are not evident, self consistent nor true.

Correct.

Todd
"Nothing is True"  -- not a Zen koan, but very paradoxically
self-referential
Alfredo Novoa - 15 Jun 2004 17:27 GMT
>Axioms are based within a system of thought.  For example, Euclid was
>thinking about planar geometry.  Is it possible that if your straight
>line bent by space could not connect two points in that space?  Ah,
>then you might think, "Well then, it's not a straight line anymore."
>But from who's perspective?

From the planar geometry perspective.

>  I'm thinking about Einsteinian physics,
>or even touching on n-dimensional concepts.

But this is not the case. Euclids postulates are about planar geometry
and only about that.

>  Axioms just set down
>rules (and the rules don't have to make 'sense' in the real world) for
>a logical system.  They are not 'true' inherently to the real world.
>They simply are a base for logical deduction.  Although looking back
>at your post, we may be thinking the same thing.

Yes, I completely agree with you. Axioms are independent to the
physical world.

>Referring to this earlier post, I'd say: Newton's laws are not
>postulates (axioms).  They are theorems in physics based upon his
>original hypotheses.

And in observations of the physical world.

>> It is a very bad use of the terms. Postulates are not assumptions,
>> postulates are axioms: truths.
>
>Well said, but, truths in the real world, or within the system?

Within the system. We can not know if something is true in the
physical world.

>Because I can build any logical system with a set of axioms.  They
>will always be true (if they don't contradict each other) because
>that's where I started.

If they contradict then they are not axioms.

>  I made them true, like an act of God.

They were always true because you are saying the same in two ways.

When you say line you are saying the shortest join of two points.
Axioms are redundant.

>Todd
>"Nothing is True"  -- not a Zen koan, but very paradoxically
>self-referential

Like: all generalizations are bad :-)

or

There are two groups of people in the world; those who believe that
the world can be divided into two groups of people, and those who
don't. :-)

Regards
 Alfredo
Paul - 18 Jun 2004 22:42 GMT
>>>or an assumption to use to base a further analysis.  Newton's
>>>3 laws of motion are generally referred to as axioms that are used as
[quoted text clipped - 8 lines]
> applied to those axioms.  Newton's laws, in other words, make bad
> examples in this discussion about axioms.

I think we have to distinguish between Newton's laws as a practical way
to discuss reality, and Newton's theory as a mathematical model.

The mathematical model may be inspired by reality but it exists on its
own as well. In this sense the postulates are axioms.

Reality is the semantics, mathematical models are the syntax.
Mathematical models always need a human to map them to reality.

Also I think there is a difference between theorems and theories:
Theorems are purely mathematical, they can be proved. If they haven't
yet been proved thay are just a conjecture.
Theories are the maps between models and reality, they can only be
disproved (not proved).

Paul.
Alfredo Novoa - 20 Jun 2004 02:37 GMT
>I think we have to distinguish between Newton's laws as a practical way
>to discuss reality, and Newton's theory as a mathematical model.
>
>The mathematical model may be inspired by reality but it exists on its
>own as well. In this sense the postulates are axioms.

They are not postulates nor axioms, they are assumptions.

Regards
 Alfredo
Anthony W. Youngman - 07 Jun 2004 23:29 GMT
>> >> This theory will then be the equivalent of Kepler and Newton
>discovering
[quoted text clipped - 25 lines]
>there is a lot of "tossing stuff in and out" going on because there is not
>that match with reality at each point.

Fine. This seems as good a place as any to say what I thought of after
that previous post.

This is for all those people who think "if I don't understand it, then
it must be wrong" (is Tony listening :-)

Now. It's not words of one syllable, I'm afraid, but I'm trying to
explain something very heavy as simply as I can.

Let's start by defining what the words mean.

A "theory", a "model" and an "axiom" are ALL things that have not been
proven correct. BUT - and here we hit our first point of confusion -
with the exception of a "mathematical theory", they are all things that
CANNOT be proven correct. Once proven, a mathematical theory become a
"theorem", but a mathematical axiom by definition cannot be proven true,
scientific theories and models can only be shown to be false, and a
mathematical model cannot be proven to be true because it relies on
axioms which cannot be proven true.

Okay. Now ALL models (scientific or mathematical) belong to the set "IF
{axioms} THEN {theorems}". Read C&D's twelve rules. Ask yourself which
rules are axioms, which rules are convenient constraints, and what else?
Basically, what fundamental mathematical category does each rule fall
into?

I think Codd (maybe Date) is even on record as saying that various rules
were "convenient constraints". In other words, they are axioms with as
much validity as Euclid's "parallel lines never meet" - they make the
maths easy with no real grounding in reality.

Once you've identified those axioms, ask yourself "what proof do we have
in favour of them?" and DON'T FORGET that you CANNOT use logic!
"if/then" is NOT TRANSITIVE"! Just because the theorems are true doesn't
mean you can conclude the axioms are true - indeed - it's the exact
opposite - you can only prove the theorems are true BECAUSE you have
ASSUMED the axioms are true.

LOOK at the subject of this thread again. It is an AXIOM of relational
theory that data comes in tuples. Show me that that's true! And because
it's an axiom, mathematics itself tells you that logic CAN not give you
an answer!

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony - 08 Jun 2004 10:51 GMT
> This is for all those people who think "if I don't understand it, then
> it must be wrong" (is Tony listening :-)

Do you mean me?  If so, I'm not sure what you are referring to.  Do
you mean that I think "If I (Tony) don't understand it, then it must
be wrong"?  If so, where did you get that idea?

> Now. It's not words of one syllable, I'm afraid, but I'm trying to
> explain something very heavy as simply as I can.

Thanks, Professor ;-)

> LOOK at the subject of this thread again. It is an AXIOM of relational
> theory that data comes in tuples. Show me that that's true! And because
> it's an axiom, mathematics itself tells you that logic CAN not give you
> an answer!

Where did you get that axiom from that "data comes in tuples"?  Codd's
rule #1 says that all data in the database is to be REPRESENTED in
only one way: as values in attributes of tuples.  It is a prescribed
RULE for building relational databases, it is not a claim that
anything in the real world "comes in tuples".  We have a similar rule
in English that all objects are represented by words made up from the
26 letters of the alphabet; it is not an "axiom" that says that
objects "come in" combinations of the letters A-Z.

Your problem is that you consistently confuse data and reality.

Of course, this all doesn't mean that tuples are the BEST way to
represent data, or even that ALL data can be represented by tuples.
But you could easily disprove a theorem that said that "all data can
be represented by tuples" by finding a counter-example.  Bet you can't
though!
Paul - 08 Jun 2004 12:44 GMT
> Where did you get that axiom from that "data comes in tuples"?  Codd's
> rule #1 says that all data in the database is to be REPRESENTED in
> only one way: as values in attributes of tuples.

Here's a thought: where do constraints fit in? They are kind of like
data, since they give you some information about the real-world system
you are modelling.

For a fixed snapshot of a database I guess they don't add anything
extra, since the tuples satisfy the constraints already. But if you
think about a database evolving over time, they do add information. For
example suppose you had a constraint "Age < 60" on some relation/column.
Then you could ask the question: "Can I add a person aged 65 to my
database?" Now in current DBMSs I think you'd do that by trying it and
seeing if you get an error. (or maybe by querying the system tables).

In databases we assume anything that isn't true is false (closed world
assumption). So maybe constraints give a stronger form of truth that
tuples in this sense: If I have no-one aged 65 in my tuples I could say:
"the real-world system I'm (partially) modelling may have people aged
65, but my database doesn't". But If I have a constraint "age < 60" it's
like I'm making a stronger claim: that not only does my database have
no-one over 60, but also the real-world situation I'm modelling has
no-one over 60.

Another question: do current systems use the constraints when optimising
queries? Would it be feasible for them to do so? For example suppose I
have a billion people in my table, with the constraint "Age < 60". If I
do "SELECT * FROM people WHERE age = 65" the optimizer could in theory
use the constraint to quickly return an answer.

You could also think of examples where an index wouldn't be feasible:
constraint: "name NOT LIKE '%x%'"
query: "SELECT * FROM people WHERE name LIKE '%axw%'

Paul.
Laconic2 - 08 Jun 2004 13:08 GMT
> For a fixed snapshot of a database I guess they don't add anything
> extra, since the tuples satisfy the constraints already. But if you
[quoted text clipped - 3 lines]
> database?" Now in current DBMSs I think you'd do that by trying it and
> seeing if you get an error. (or maybe by querying the system tables).

Yes.

In particular, the optimizer can use information made available by the
constraints in order to generate
additional correct strategies, ones that could not be guaranteed to be
correct in the absence of such information.

In particular,  entity integrity and referential integrity constraints can
be used to "prove" that, in certain cases,
"SELECT ALL" and "SELECT DISTINCT" will yield identical results.  This can
result in generating a faster strategy.

The information that a given snapshot happens to conform to a constraint
could be made available by examining the snapshot, rather than examining the
constraint,  but the cost of obtaining that knowledge would be prohibitive.

So a constraint that is known to be valid can be used to advantage, even in
the context of a snapshot.
Paul - 08 Jun 2004 13:38 GMT
> In particular,  entity integrity and referential integrity constraints can
> be used to "prove" that, in certain cases,
> "SELECT ALL" and "SELECT DISTINCT" will yield identical results.  This can
> result in generating a faster strategy.

OK, but uniqueness constraints and referential integrity constraints are
a very small subset of all possible constraints. They're quite simple
for a DBMS to understand and use. What about ones that are even a little
bit more complicated? I guess the constraints mentioned above don't
require knowledge of particular types or operators (other than
equality), but ones like "Age < 60" do.

In general a constraint could be any expresson in first order logic. And
then to complicate matters further you've got non-relational operators
(like "<") added in.

> The information that a given snapshot happens to conform to a constraint
> could be made available by examining the snapshot, rather than examining the
> constraint,  but the cost of obtaining that knowledge would be prohibitive.

Here's a thought: consider a database with the constraints "Age < 65"
and "Age < 60". Should there be something to say this isn't normalised
in some sense? I know that normalization and eliminating redundancy are
different things but maybe there should be some kind of "constraint
normalization"?

Paul.
Laconic2 - 08 Jun 2004 16:08 GMT
> OK, but uniqueness constraints and referential integrity constraints are
> a very small subset of all possible constraints. They're quite simple
> for a DBMS to understand and use. What about ones that are even a little
> bit more complicated? I guess the constraints mentioned above don't
> require knowledge of particular types or operators (other than
> equality), but ones like "Age < 60" do.

I didn't mean to imply that all constraints were useful in the way I set
forth.  Just that some were.

The DBMS can make use of value limiting constraints to compress data better.
For instance,  if there is a column called

, ZIP_CODE  CHAR(10)

(the tenth character is for the hyphen),  and a value is to be stored that
is CHAR(15), but the last five characters are blanks,
a suitable DBMS could go ahead and store the value anyway,  knowing that it
can reconstruct the CHAR(15) value later, if necessary.

The same comment goes for a "field" defined as CHAR(10)  by the way.

> In general a constraint could be any expresson in first order logic. And
> then to complicate matters further you've got non-relational operators
> (like "<") added in.

I don't understand.  What makes "<" a non  relational operator?  I had been
taught that
"x < y"  is a relation on x and y.  This was in math, not comp. sci.
Eric Kaun - 09 Jun 2004 16:43 GMT
> Here's a thought: consider a database with the constraints "Age < 65"
> and "Age < 60". Should there be something to say this isn't normalised
> in some sense? I know that normalization and eliminating redundancy are
> different things but maybe there should be some kind of "constraint
> normalization"?

Implication is one part of it; since [Age<60] implies [Age<65], the latter
is unnecessary. The relational model, by relying on such, make more
optimizations possible than some of the ad hocisms of SQL and the like - and
they certainly get much more complex than this example...

An interesting related point is the overlap between types and constraints.
In the above, isn't it really an example of an Age type, with values [0, 1,
..., 60]? (assuming integers here)

- erk
Eric Kaun - 09 Jun 2004 16:39 GMT
> Where did you get that axiom from that "data comes in tuples"?  Codd's
> rule #1 says that all data in the database is to be REPRESENTED in
[quoted text clipped - 4 lines]
> 26 letters of the alphabet; it is not an "axiom" that says that
> objects "come in" combinations of the letters A-Z.

Ah, an excellent analogy. I'm sure it's flawed, but it gets the point across
in a new way... thanks.

Another, often-cited, is the difference between "flat" tables and relations,
and the way people assume relations are 2-dimensional. Consider the
following:
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1

Wow! A flat cube! Nifty! I was just too lazy to type out a tessaract
(hypercube)...

> Your problem is that you consistently confuse data and reality.
>
> Of course, this all doesn't mean that tuples are the BEST way to
> represent data, or even that ALL data can be represented by tuples.

Keep in mind that even if you say this, the orthogonal dimension is Type
(Domain), which introduces wrinkles of its own.

> But you could easily disprove a theorem that said that "all data can
> be represented by tuples" by finding a counter-example.  Bet you can't
> though!

Yes, good point - find us that black swan.

- erk
Anthony W. Youngman - 10 Jun 2004 01:44 GMT
>> Where did you get that axiom from that "data comes in tuples"?  Codd's
>> rule #1 says that all data in the database is to be REPRESENTED in
[quoted text clipped - 7 lines]
>Ah, an excellent analogy. I'm sure it's flawed, but it gets the point across
>in a new way... thanks.

But in the reality we live in, all objects DO come in combinations of
A-Z. So it has to be a theorem or an axiom. And if it's a theorem, from
what axioms is it derived?

>> Your problem is that you consistently confuse data and reality.

I may well be confused. But that's because I'm actually trying to
understand the link between the two. After all, isn't that the subject
of this thread? And if there IS no link, what the hell's the point of
studying data, since it is no use to us here in reality, anyway :-)

>> Of course, this all doesn't mean that tuples are the BEST way to
>> represent data, or even that ALL data can be represented by tuples.
[quoted text clipped - 7 lines]
>
>Yes, good point - find us that black swan.

I'd suggest going to visit the Serpentine in Hyde Park :-) You'll find
plenty there.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony - 10 Jun 2004 10:25 GMT
> >> Where did you get that axiom from that "data comes in tuples"?  Codd's
> >> rule #1 says that all data in the database is to be REPRESENTED in
[quoted text clipped - 7 lines]
> >Ah, an excellent analogy. I'm sure it's flawed, but it gets the point across
> >in a new way... thanks.

(Yes, I fear it is flawed too!)

> But in the reality we live in, all objects DO come in combinations of
> A-Z. So it has to be a theorem or an axiom. And if it's a theorem, from
> what axioms is it derived?

IT ISN'T AN AXIOM OR A THEOREM!!!!!!!!!!!  That's my point!  I could
today to invent a new way of representing objects, using only the 12
letters A to L, or using shapes and colours, whatever.  It would
surely work, but it would be neither an axiom ("it is self-evidently
true that all real-world objects come in combinations of the letters A
to L") nor a theorem.  It would just be a method for representing
real-world objects.
Dawn M. Wolthuis - 10 Jun 2004 01:17 GMT
> >> >> This theory will then be the equivalent of Kepler and Newton
> >discovering
[quoted text clipped - 63 lines]
> opposite - you can only prove the theorems are true BECAUSE you have
> ASSUMED the axioms are true.

Minor point, but another way to say it is that theorms are true with respect
to the axioms.

> LOOK at the subject of this thread again. It is an AXIOM of relational
> theory that data comes in tuples. Show me that that's true! And because
> it's an axiom, mathematics itself tells you that logic CAN not give you
> an answer!

Excellent, excellent, point.  I would love to hear if there is any
disagreement on this point.  If not, then perhaps we can work this into the
glossary somehow related to "relational theory" or "axioms".
Cheers!  --dawn

> Cheers,
> Wol
mAsterdam - 10 Jun 2004 01:43 GMT
>>LOOK at the subject of this thread again. It is an AXIOM of relational
>>theory that data comes in tuples. Show me that that's true! And because
[quoted text clipped - 4 lines]
> disagreement on this point.  If not, then perhaps we can work this into the
> glossary somehow related to "relational theory" or "axioms".

The "information principle" would qualify as an axiom, I suspect - but
I am not well-versed in this math/logic area (I did read some -
just never discussed it) - so somebody else will have to make a clean,
copy & pastable piece of proze for inclusion.

It surely fits the 'lengthy misunderstandings' criterion :-)
Eric Kaun - 04 Jun 2004 15:46 GMT
> >Which axioms don't match? I wasn't really aware there were axioms per se.
>
[quoted text clipped - 13 lines]
> true". In relational theory, I would guess that at least one axiom could
> be phrased as "data comes in tuples".

That's hardly an axiom that I would recognize, since while "tuple" is
defined in terms of other more basic terms (axioms?), "data" is hardly
well-defined. And what does "comes in" mean?

I believe the axioms of set theory and predicate calculus apply (those in
set theory limited somewhat, to sets of tuples perhaps), but don't claim to
know formally what those are.

> So, if you don't have experiments to show that real-world data ALSO
> comes in tuples (or a close approximation thereof), then you can't
> conclude that a relational database is a good place to store real-world
> data.

Sure you can; evidence <> proof. The nice work logicians and mathematicians
have done with predicate calculus over the years, while perhaps not
corresponding to "the real world" (tm, MTV Networks), gives us nice
machinery with which to manipulate... well, data. What, precisely, would
allow you to conclude that a <datamodel> database is a "good place" to store
real-world data?

> Sorry for ignoring the rest of your post, but this is ABSOLUTELY
> FUNDAMENTAL!!!

Perhaps, but I still don't think "data comes in tuples" is anything like an
axiom. I could certainly be wrong.

- erk
Anthony W. Youngman - 07 Jun 2004 23:47 GMT
>> So, if you don't have experiments to show that real-world data ALSO
>> comes in tuples (or a close approximation thereof), then you can't
[quoted text clipped - 7 lines]
>allow you to conclude that a <datamodel> database is a "good place" to store
>real-world data?

Yup. Evidence does not equal proof. But that was not what I was getting
at. Note my careful use of the phrase "or a close approximation thereof"
:-)

If "real world" data is not a close approximation of "relational data",
then it is reasonable to conclude that a relational database is not a
good place to put it ... :-) And if the two are a close approximation,
then a relational database may not be the *best* place, but it has to be
a *good* place.

Don't forget - I'm a scientist :-) If the stats are 95% confident,
that's not "proof", but it's "good enough".

>> Sorry for ignoring the rest of your post, but this is ABSOLUTELY
>> FUNDAMENTAL!!!
>
>Perhaps, but I still don't think "data comes in tuples" is anything like an
>axiom. I could certainly be wrong.

Read C&D's first rule! "Data comes in rows" - which is as far as I can
make out, a synonym for "data comes in tuples". I'm sure a relational
guru will disagree, but I can't see the difference ...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 09 Jun 2004 16:45 GMT
> >Perhaps, but I still don't think "data comes in tuples" is anything like an
> >axiom. I could certainly be wrong.
>
> Read C&D's first rule! "Data comes in rows" - which is as far as I can
> make out, a synonym for "data comes in tuples". I'm sure a relational
> guru will disagree, but I can't see the difference ...

And as stated elsewhere, those aren't axioms anyway... he used the word
"representation", and the context fully suggests that he's not correlating
it with the real world.
Anthony W. Youngman - 10 Jun 2004 01:51 GMT
>> >Perhaps, but I still don't think "data comes in tuples" is anything like
>an
[quoted text clipped - 7 lines]
>"representation", and the context fully suggests that he's not correlating
>it with the real world.

I know. After writing that I thought rather more about what C&D's twelve
rules actually are. And that they don't seem to contain any axioms at
all.

Which leads to the conclusion that relational theory is axiom-free.
Which means that it cannot be a valid model. Which means that its
application to the real world has no basis in anything whatsoever.

Okay, I'm sure that the mathematicians who've built on it have fleshed
out the fundamentals somewhat, but it certainly means that if your sole
criteria for defining a "relational database" is that "it complies with
C&D's 12 rules", then such a database has no grounding in formal logic
whatsoever.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 10 Jun 2004 21:19 GMT
> >And as stated elsewhere, those aren't axioms anyway... he used the word
> >"representation", and the context fully suggests that he's not correlating
[quoted text clipped - 6 lines]
> Which leads to the conclusion that relational theory is axiom-free.
> Which means that it cannot be a valid model.

Some possibilities (I'm running too short on time to explore them):
- The axioms may be simply implicit in his rules
- What are MV's axioms? If it has them, then we could map at least some of
them to relational (since there are commonalities)

> Which means that its
> application to the real world has no basis in anything whatsoever.

Maybe, but again, what sort of data model would have axioms? I'm not sure
this is possible... and if it is, again, relational would somewhat-similar
ones. Surely there's at least a partial homomorphism between data models?

> Okay, I'm sure that the mathematicians who've built on it have fleshed
> out the fundamentals somewhat, but it certainly means that if your sole
> criteria for defining a "relational database" is that "it complies with
> C&D's 12 rules", then such a database has no grounding in formal logic
> whatsoever.

Mathematics requires axioms - does logic? I thought it was purely symbolic
manipulation, which is defined for relational.

- erk
Bill H - 11 Jun 2004 00:09 GMT
...Crossposted from comp.databases.theory...

An interesting question.  Can someone proffer some suggestions?

>"Eric Kaun" <ekaun@yahoo.com> wrote in message news:An3yc.2600
>
[quoted text clipped - 14 lines]
>
> erk
Kevin Powick - 11 Jun 2004 03:23 GMT
> ...Crossposted from comp.databases.theory...
>
> An interesting question.  

And the answer will change my life.. how?

I want move to theory...  Everything works there.

<flame shields up>
<set thread to ignore>

Signature

Kevin Powick

Laconic2 - 11 Jun 2004 12:15 GMT
> And the answer will change my life.. how?

"I come that you may know the truth.  And the truth will make you free."
Kevin Powick - 11 Jun 2004 15:48 GMT
> > And the answer will change my life.. how?
>
> "I come that you may know the truth.  And the truth will make you free."

Thanks Neo ;-)

Signature

Kevin Powick

Anthony W. Youngman - 14 Jun 2004 23:24 GMT
>> >And as stated elsewhere, those aren't axioms anyway... he used the word
>> >"representation", and the context fully suggests that he's not
[quoted text clipped - 12 lines]
>- What are MV's axioms? If it has them, then we could map at least some of
>them to relational (since there are commonalities)

If they're implicit, then they need to be made explicit (hence my
comment about mathematicians "fleshing out" the theory).

>> Which means that its
>> application to the real world has no basis in anything whatsoever.
>
>Maybe, but again, what sort of data model would have axioms? I'm not sure
>this is possible... and if it is, again, relational would somewhat-similar
>ones. Surely there's at least a partial homomorphism between data models?

The generic always trumps the specific. I suspect Pick axioms are very
similar to relational. But just as C&D's first rule says that data comes
in 2-dimensional tables (or arrays), I've defined "Pick's first rule"
that says data comes in n-dimensional arrays. So relational is the
specific subset of Pick where n=2.  :-)

>> Okay, I'm sure that the mathematicians who've built on it have fleshed
>> out the fundamentals somewhat, but it certainly means that if your sole
[quoted text clipped - 4 lines]
>Mathematics requires axioms - does logic? I thought it was purely symbolic
>manipulation, which is defined for relational.

Logic is used to manipulate axioms to give theorems. The result is a
model.

So no, if you're being pedantic, maybe logic doesn't require axioms. But
in the same way as an axe doesn't *require* wood. Just as an axe with
nothing to chop is useless, so is logic without axioms to manipulate.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Dawn M. Wolthuis - 15 Jun 2004 02:29 GMT
<snip>
> >Mathematics requires axioms - does logic? I thought it was purely symbolic
> >manipulation, which is defined for relational.
[quoted text clipped - 5 lines]
> in the same way as an axe doesn't *require* wood. Just as an axe with
> nothing to chop is useless, so is logic without axioms to manipulate.

Logic as a branch of mathematics most definitely requires axioms.  --dawn
Eric Kaun - 16 Jun 2004 19:57 GMT
> Logic as a branch of mathematics most definitely requires axioms.  --dawn

I'll posit that it doesn't; I would regard mathematics as a derivation of
(wrong term, I know) logic. Logic is a system or metasystem of symbolic
manipulation, and can be applied to many different "maths".

As I think of it, I'd say math has axioms and logic doesn't, and derives its
power for precisely that reason; it can be applied to different sets of
axioms. At least that's how I've always thought of it... I could be dead
wrong, and would expect this to be contested.

- erk
Paul - 18 Jun 2004 22:55 GMT
>>Logic as a branch of mathematics most definitely requires axioms.  --dawn
>
[quoted text clipped - 6 lines]
> axioms. At least that's how I've always thought of it... I could be dead
> wrong, and would expect this to be contested.

This is from http://en.wikipedia.org/wiki/First-order_predicate_calculus
---
Like any logical theory, first-order calculus consists of
* a specification of how to construct syntactically correct statements
  (the well-formed formulas)
* a set of axioms, each axiom being a well-formed formula itself
* a set of inference rules which allow one to prove theorems from axioms
  or earlier proven theorems.

There are two types of axioms: the logical axioms which embody the
general truths about proper reasoning involving quantified statements,
and the axioms describing the subject matter at hand, for instance
axioms describing sets in set theory or axioms describing numbers in
arithmetic.
---

Now ultimately set theory is used for the foundation of everything in
mathematics. So you use set theory to build your logical theory.

You might ask how do you specify your set theory without using logic,
otherwise you've got a chicken and egg situation. I'm not too sure what
the answer is there, I think there is some kind of hand-waving appeal to
"naive logic". Or maybe it is more rigorous, I don't know.

Very philosophically interesting these questions of foundation though.

Paul.
Alfredo Novoa - 30 May 2004 12:32 GMT
>> You just don't get it, do you Wol?  No matter how many times people
>> try to explain it to you it just doesn't sink in.  The relational
[quoted text clipped - 4 lines]
>and I just responded to Alfredo who said that data were facts and I thought
>for sure the idea was that these facts corresponded to reality.

Not necessarily. They can be false or about a fantasy world.

>If there is a tight mathematical definition of "data" within relational
>theory, then that's great, but it is not the commonly used definition, I
>suspect.

The common use of the term is sloppy like with many other terms.

> It is in the leap from doing relational theory to thinking that
>the application of such theory is the best approach to storing/retrieving
>propositions using computers by a business -- that is where there is a
>rather significant leap of faith.

You are wrong. It was mathematically proven that it is better than the
graph based approaches.

>That connection is NOT science

It is maths, but the word science has many meanings.

>, although
>we could conceivably set up some experiments to collect a bit more
>information about whether it is better than some other approach.

We don't need the experiments and it was proved in the 70's that The
Relational Model is better than the other approaches.

It was explained zillions of times in this group.

> I'm not
>opposed to faith

I am completely opposed to faith and other forms of irrationalism. The
Relational Model is maths not irrational faith.

Regards
 Alfredo
mAsterdam - 30 May 2004 16:06 GMT
[snip]
>>It is in the leap from doing relational theory to thinking that
>>the application of such theory is the best approach to storing/retrieving
[quoted text clipped - 3 lines]
> You are wrong. It was mathematically proven
> that it is better than the graph based approaches.

This is a very strange statement.
It gets stated over and over again,
not only in this newsgroup. Outside this
newsgroup I am supposed to take it for granted and
not take time to think about it.

But here I can ask the people in support of this statement:
 - Better at what?
 - What exacltly was proven?
 - Could you please give a reference?

I happen to like the relational model for thinking about data
in a detailed fashion, checking and double checking
the database and the support it gives to the whole
of the system it is part of.

I happen to like graph based approaches
for the overall picture and to elicit design
ideas from non-IT professionals.

But that is both just preference and
personal experience, not proof.

[snip]
>>I'm not opposed to faith
>
> I am completely opposed to faith and other forms of irrationalism. The
> Relational Model is maths not irrational faith.

Rationalism is as irrational(/rational) as any oher faith.
I see reason(ratio) as a tool, even more so than language
(some posts ago somebody claimed languages are tools).
Alfredo Novoa - 31 May 2004 00:06 GMT
>> You are wrong. It was mathematically proven
>> that it is better than the graph based approaches.
>
>This is a very strange statement.
>It gets stated over and over again,
>not only in this newsgroup.

This is a very basic knowledge taught in every serious database
introductory course.

>But here I can ask the people in support of this statement:
>  - Better at what?

Simplicity

>  - What exacltly was proven?

That the Relational Model is superior.

>  - Could you please give a reference?

Codd, E. F. and C. J. Date. "Interactive Support for Nonprogrammers:
The Relational and Network Approaches." IBM Research Report RJ1400
(June 6th, 1974). Republished in Randall J. Rustin (ed.), Proc. ACM
SIGMOD Workshop on Data Description, Access, and Control, Vol. II, Ann
Arbor, Michigan, May 1974. Also in C. J. Date, Relational Database:
Selected Writings. Reading, Mass.: Addison-Wesley, 1986.

http://www.intelligententerprise.com/db_area/archives/1999/992206/online1.jhtml

http://www.intelligententerprise.com/db_area/archives/1999/991105/online2.jhtml

>I happen to like graph based approaches
>for the overall picture and to elicit design
>ideas from non-IT professionals.

We are talking about very different things. I am not talking about
drawings, I am talking about the network and hierarchical approaches.

>> I am completely opposed to faith and other forms of irrationalism. The
>> Relational Model is maths not irrational faith.
>
>Rationalism is as irrational(/rational) as any oher faith.

What a nonsense!

Regards
 Alfredo
Bill H - 31 May 2004 10:46 GMT
Ahhh.  Descartes Corollary:  I think therefore I'm right.  :-)

Bill

"Alfredo Novoa" <alfredo@ncs.es> wrote in message

[snipped]

> >  - What exacltly was proven?
>
> That the Relational Model is superior.
Alfredo Novoa - 31 May 2004 11:44 GMT
>Ahhh.  Descartes Corollary:  I think therefore I'm right.  :-)

This is the "logic" of the pickies and the likes.

Regards
 Alfredo
Laconic2 - 31 May 2004 14:09 GMT
> Ahhh.  Descartes Corollary:  I think therefore I'm right.  :-)

It's all nonsense.  "better" and "superior"  are value judgements.  They are
not the subject matter of mathematics.

This is cant.
Alfredo Novoa - 31 May 2004 15:55 GMT
>It's all nonsense.  "better" and "superior"  are value judgements.

Wrong. The quality can be comparable.

> They are
>not the subject matter of mathematics.

It is the subject matter of Computing Science.
mAsterdam - 31 May 2004 11:47 GMT
>>>... It was mathematically proven
>>>that it is better than the graph
[quoted text clipped - 4 lines]
> This is a very basic knowledge taught in every
> serious database introductory course.

The statement is made in just about every database
course, without demonstrating it - that's exactly
what I think is strange about it. If it's proven
why not give or at least reference the proof?
The way it is put, it is propaganda, not basic
knowledge.

>> ... - Better at what?
> Simplicity

This reduces the statement to
"It was mathematically proven that it is simpler
than the graph based approaches." and leaves the
judgement to the reader/student. An improvement,
but it still leaves the questions unanswered:
simpler at what? etc.

>> - What exacltly was proven?
> That the Relational Model is superior.
[quoted text clipped - 3 lines]
> Codd, E. F. and C. J. Date. "Interactive Support for Nonprogrammers:
> The Relational and Network Approaches." ... 1974 ...

Part of it was quoted at your second url, see below.
I could only find the abstract on-line.

abstract (from
http://portal.acm.org/citation.cfm?id=811529&dl=ACM&coll=portal):
<quote>

   The objectives and strategies of the relational and network approaches
   are compared. The status of support for non-programming users is
   examined. General purpose support for such users entails provision of an
   augmented relationally complete retrieval capability without branching,
   explicit iteration, or cursors. It is clear how this capability can be
   realized with the relational approach—whether with a formal or informal
   language interface. It is not at all clear how the network approach can
   reach this goal, so long as the principal schema includes owner-coupled
   sets “bearing information essentially”. A relational discipline is
   suggested as a way out for DBTG users.

</quote>

Appearantly the information principle is dicussed avant la lettre there.
For the people who do not know that cornerstone:

Chris Date in "EDGAR F. CODD 08/23/1923 – 04/18/2003 A TRIBUTE" at
http://www.dbdebunk.com/page/page/621965.htm :
<quote>

    The concept of essentiality, introduced by Ted in this debate, is
    a great aid to clear thinking in discussions regarding the nature
    of data and DBMSs.  In particular, The Information Principle (which
    I heard Ted refer to on occasion as the fundamental principle
    underlying the relational model) relies on it, albeit not very
    explicitly:

        The entire information content of a relational database
        is represented in one and only one way: namely, as
        attribute values within tuples within relations.

</quote>

> http://www.intelligententerprise.com/db_area/archives/1999/992206/online1.jhtml

While this does give some insights in the
history of the use of 'data model' and
related terms (for the people here who
showed interest in that topic), it doesn't
at all claim to mathematically prove anything.

> http://www.intelligententerprise.com/db_area/archives/1999/991105/online2.jhtml

Here it gets very interesting. From the overview:
<quote>
    Of course, the battle between relations and networks is
    ancient history now. (The good guys won.) This fact notwithstanding,
    Codd's paper --  even though it was written over 25 years ago -- is
    still worth reading today as a beautiful example of clear thinking.
    Indeed, it's quite remarkable to see how, on a topic where muddled
    thinking was the norm at the time, Codd was able to do such a good
    job of cutting to the chase and focusing on the real underlying
    issues. Let me elaborate:

        * First of all, Codd realized that to compare the very concrete
          CODASYL specifications and the much more abstract relational
          model would be an apples-and-oranges comparison and would
          involve numerous distracting irrelevancies.

        * Hence, it would be necessary first to define an abstract
          "network model." The comparison could then be done on a
          level playing field, as it were, in a fair and sensible
          manner.

        * Codd therefore proceeded to define an abstraction of
          the CODASYL specifications that might reasonably be
          regarded as such a model. (And then, of course, he went
          on to compare that abstraction with the relational model.)

</quote>

Relevancy to the 'mathematical proof' statement under discussion:
a fair comparison (a precondition for the claimed mathematical proof)
would require specification on the same (or at least similar) levels of
abstraction.

I don't know if anybody after this has provided another
formalization of the network model, so AFAIK this comparison
stands.

But what exactly is compared? Relational model versus network model for
interactive support(1) for nonprogrammers. To dissmiss all graph based
approaches for all purposes based on it is overstretching it, IMO,
jumping to conclusions.

>>I happen to like graph based approaches
>>for the overall picture and to elicit design
>>ideas from non-IT professionals.
>
> We are talking about very different things. I am not talking about
> drawings, I am talking about the network and hierarchical approaches.

Equating network approaches to graph based approaches, for all
purposes? The network approach is Codd's formalization of the CODASYL
specification for the purpose of interactive support(1) for
nonprogrammers, in the documents you referenced.
(Or should I say pointed me to :-)

To determine wether it possible generalise Codd's comparisons
to relational approach vs. graph based approach, some
more levelling is needed. Generalising the stated purpose
is not trivial, either.

>>>I am completely opposed to faith and other forms of irrationalism. The
>>>Relational Model is maths not irrational faith.
>>
>>Rationalism is as irrational(/rational) as any oher faith.
>
> What a nonsense!

Very faithful ;-)

I suspect we will not be able to agree on this one.

However, maybe we can try to agree on the
'mathematical proof' issue, by clearly
stating what exactly was proven.

Anyway, thank you for the nice read.

====
Footnote:
(1) : interactive in textmode is implicitly meant
- but that's another can of worms.
Dawn M. Wolthuis - 31 May 2004 15:02 GMT
> >>>... It was mathematically proven
> >>>that it is better than the graph
[quoted text clipped - 11 lines]
> The way it is put, it is propaganda, not basic
> knowledge.

Exactly.  I've read a lot of what people have suggested is a mathematical
proof that relational database theory is good for business.  While the
mathematical theory itself is fine, the application of it to databases can
have no mathematical proof of its usefulness (math does not prove its
usefulness!) and seems to also have no scientific proof of its usefulness
either.  There are exceptions to this, such as logically proving/showing
that if you handle functional dependencies one way or another, it affects
what changes need to be made when requirements change.  So, I use those
techniques.  There are tradeoffs.  You design one way with agility in mind
and mitigate the risks.

> >> ... - Better at what?
> > Simplicity
[quoted text clipped - 5 lines]
> but it still leaves the questions unanswered:
> simpler at what? etc.

There is surely some mathematics that is simpler when putting data into
what-once-was-the-def-of-1NF (no repeating groups).  But it is also simpler
for the logic in retrieving data to have no relation-valued-attributes and
yet they have now been tossed into the mix.  So, what's simpler?  The old
version of 1NF or the new version?  Is simpler always better?  Applying the
simplest mathematics to complex problems isn't our goal here.

> >> - What exacltly was proven?
> > That the Relational Model is superior.
[quoted text clipped - 17 lines]
>     explicit iteration, or cursors. It is clear how this capability can be
>     realized with the relational approach?whether with a formal or
informal
>     language interface. It is not at all clear how the network approach can
>     reach this goal, so long as the principal schema includes owner-coupled
[quoted text clipped - 22 lines]
>
> </quote>

http://www.intelligententerprise.com/db_area/archives/1999/992206/online1.jhtml

> While this does give some insights in the
> history of the use of 'data model' and
> related terms (for the people here who
> showed interest in that topic), it doesn't
> at all claim to mathematically prove anything.

http://www.intelligententerprise.com/db_area/archives/1999/991105/online2.jhtml

> Here it gets very interesting. From the overview:
> <quote>
[quoted text clipped - 11 lines]
>            model would be an apples-and-oranges comparison and would
>            involve numerous distracting irrelevancies.

Let me guess -- so instead of taking the relational model to an
implementation and playing on the IDMS playing field (which would only
provide data on once instance of each), he brought CODASYL onto his ball
field and then beat it, right?  Sorry, I'm getting ahead of you, excited to
hear the story unfold.

>          * Hence, it would be necessary first to define an abstract
>            "network model." The comparison could then be done on a
>            level playing field, as it were, in a fair and sensible
>            manner.

laughing

>          * Codd therefore proceeded to define an abstraction of
>            the CODASYL specifications that might reasonably be
[quoted text clipped - 16 lines]
> approaches for all purposes based on it is overstretching it, IMO,
> jumping to conclusions.

Absolutely!

> >>I happen to like graph based approaches
> >>for the overall picture and to elicit design
> >>ideas from non-IT professionals.
> >
> > We are talking about very different things. I am not talking about
> > drawings, I am talking about the network and hierarchical approaches.

As I understand it, the purpose of the relational model is to have a way to
"view" the structure of the data.  It isn't intended to be the way that it
is implemented.  So, if users (e.g. me) want to view the data in a graph,
then that's seems like a good model to use, right?

> Equating network approaches to graph based approaches, for all
> purposes? The network approach is Codd's formalization of the CODASYL
[quoted text clipped - 15 lines]
>
> Very faithful ;-)

more laughter from the heretic in this corner

> I suspect we will not be able to agree on this one.
>
> However, maybe we can try to agree on the
> 'mathematical proof' issue, by clearly
> stating what exactly was proven.

Yes, I think you started to get at it.  It sounds like it has been proven
that a mathematical relational model is simpler than a corresponding network
model so it would be good to get this nailed down in precise terms (and I
haven't read all suggested readings, but will look at them soon).  Although
I do believe this has been proven, I would still like a clear, crisp
theorm/proof of "the proof" for relational theory.

Has there been any proof, ever, of the use of the relational model providing
for a better realized solution for anything than any other model?   It is in
the application of the model that I think we lack evidence.

> Anyway, thank you for the nice read.

quite entertaining!  --dawn
Alfredo Novoa - 31 May 2004 17:14 GMT
>Exactly.  I've read a lot of what people have suggested is a mathematical
>proof that relational database theory is good for business.

It is good for data management.

>  While the
>mathematical theory itself is fine, the application of it to databases can
>have no mathematical proof of its usefulness (math does not prove its
>usefulness!)

Of course it can and it did. You can do the same as before with a
fraction of the instructions and optimization can be done by the
machine.

>Let me guess -- so instead of taking the relational model to an
>implementation and playing on the IDMS playing field (which would only
>provide data on once instance of each), he brought CODASYL onto his ball
>field and then beat it, right?

To the field of formalism.

You try to do the same but your field is irrationalism and rough
sophistry.

>Yes, I think you started to get at it.  It sounds like it has been proven
>that a mathematical relational model is simpler than a corresponding network
>model

It has been proved that it is simpler to manage data with the
relational approach.

You are always trying to confuse playing sloppily with words and
distorting things.

>Has there been any proof, ever, of the use of the relational model providing
>for a better realized solution for anything than any other model?

It is not a proof but there are plenty of systems rewritten using a
pseudorelational approach which saved a lot of code.

But it would be a waste of time to show them to you.

Regards
 Alfredo
Eric Kaun - 01 Jun 2004 15:12 GMT
> > >>>... It was mathematically proven
> > >>>that it is better than the graph
[quoted text clipped - 14 lines]
> Exactly.  I've read a lot of what people have suggested is a mathematical
> proof that relational database theory is good for business.

I don't think such a thing is even possible. However, it seems obvious to me
that in this industry we have the capacity to stay very close to theory,
given that computers are very unforgiving, and therefore our programs have
to achieve some degree of rigor with respect both to the language at hand
and to the business conditions we're trying to automate. So I certainly
think that theory is worth a good first look, given that it impacts us in a
much more direct way than many other disciplines.

Now look at the relational theory vs. some of the emerging semantics of
XPath and XQuery (from Philip Wadler, Don Chamberlin, and many others). It's
extraordinarily complex in comparison, and furthermore offers nothing like
normalization rules (at least that I've seen - Jan Hidders and others may
have seen something like that). Therefore good design criteria are less
formal. A more complex theory, with weaker criteria for good design, is (all
other things being equal, which they never are) to be preferred over
something with a simpler theory and stronger criteria for good design. At
least that's my theory. ;-)

> While the
> mathematical theory itself is fine, the application of it to databases can
> have no mathematical proof of its usefulness (math does not prove its
> usefulness!) and seems to also have no scientific proof of its usefulness
> either.

One could argue the same for XML, Pick, etc. (for which it would still be
useful to see a theory), and we're back to the "...in my experience... bang
for the buck..." argument. I'm not criticizing that argument, but "in my
experience" I have seen many more problems with Pick-like designs, problems
that have a direct impact on agility - on my ability to evolve a system
toward the expanding and changing needs of a business.

> There are exceptions to this, such as logically proving/showing
> that if you handle functional dependencies one way or another, it affects
> what changes need to be made when requirements change.  So, I use those
> techniques.  There are tradeoffs.  You design one way with agility in mind
> and mitigate the risks.

Agreed - in the final analysis it seems somewhat like a typing exercise to
me. Whereof one cannot speak, thereof one must be silent... if the business
doesn't know of any rules or structure surrounding Field X, but know it's
always been captured, then perhaps something that's just an untyped list is
best. But you have to ask the question, and have it answered, even if the
answer is "I dunno."

> > >> ... - Better at what?
> > > Simplicity
[quoted text clipped - 10 lines]
> for the logic in retrieving data to have no relation-valued-attributes and
> yet they have now been tossed into the mix.  So, what's simpler?

Even those who discuss relation-valued attributes (RVAs) (Codd, Date,
Pascal, etc.) acknowledge that they're more complex, and argue against using
them. However, in some cases they're simply the best model. Take example: a
system catalog, family relationships, even prime factors (which introduces
to me the interesting notion of "relation-valued functions" especially
infinite ones). In all these cases, eliminating the RVA introduces keys with
no real meaning, and adds some complexity. RVAs introduce a different type
of complexity, perhaps - I'm not going to attempt to characterize the two
here.

> The old
> version of 1NF or the new version?  Is simpler always better?  Applying the
> simplest mathematics to complex problems isn't our goal here.

I disagree completely. If you can successfully apply simple math to complex
problems, you're way ahead of the game. Part of the problem these days is
jumping on one or more complex technologies because in some vague,
unanalyzed way they "seem like" the structure of the problem. Mastery of the
basic tools is a useful prerequisite to understanding when and how to apply
the complex ones.

> >          * First of all, Codd realized that to compare the very concrete
> >            CODASYL specifications and the much more abstract relational
[quoted text clipped - 6 lines]
> field and then beat it, right?  Sorry, I'm getting ahead of you, excited to
> hear the story unfold.

There isn't a home-court advantage here. The court had yet to be built.
Screaming "that's not fair" at an attempt to compare several models, without
implementations of both at hand, is disingenuous. The point of a model,
which we should all understand, is to characterize X before going to all the
work of implementing X (which requires many other concerns and
distractions). Even businesses understand that.

> >          * Hence, it would be necessary first to define an abstract
> >            "network model." The comparison could then be done on a
> >            level playing field, as it were, in a fair and sensible
> >            manner.
>
> laughing

In what way is the "playing field" unfair?

> > >>I happen to like graph based approaches
> > >>for the overall picture and to elicit design
> > >>ideas from non-IT professionals.

It's fine as a starting point, but the approach quickly breaks down as you
get into details. I have no proof, simply my experience doing JAD sessions
and user requirements...

> As I understand it, the purpose of the relational model is to have a way to
> "view" the structure of the data.

It's a predicate-based model of data - not sure what you mean by "view".
It's to define the structure of data, to retrieve relation values based on
other values, and to update relation variables with new values.

> It isn't intended to be the way that it
> is implemented.  So, if users (e.g. me) want to view the data in a graph,
> then that's seems like a good model to use, right?

No. There is a logical and practical difference between what we do and what
users see. The above is like saying that since the users see a graphic, that
Adobe Photoshop is the proper GUI modeling tool.

> Has there been any proof, ever, of the use of the relational model providing
> for a better realized solution for anything than any other model?   It is in
> the application of the model that I think we lack evidence.

That's a smooth bit of useless rhetoric - the use of the word "ever" in
there is especially galling. I'd like to see what "proof" there is of such a
thing for any technology, language, model, etc. You're not asking for
evidence about the application of the model - you're looking for something
like evidence on the statistically-support success of solutions using X
compared with solutions using Y, where both X and Y aren't implementations,
or even designs, but models on which those designs and implementations can
be based. It isn't there.

Let me ask you this: Has there been any proof, ever, of the use of
object-orientation providing for a better realized solution for anything
than any other model? Has there been any proof, ever, of the use of
three-valued logic providing for a better realized solution for anything
than any other model? etc. etc... I was going to babble on with more
examples but my fingers are tired.

No, there's no proof. Let's move on to a useful discussion...

- Eric
Alfredo Novoa - 31 May 2004 15:55 GMT
>> Simplicity
>
[quoted text clipped - 4 lines]
>but it still leaves the questions unanswered:
>simpler at what? etc.

At number of instructions. Simpler in orders of magnitude, and
susceptible of many automatic optimizations.

The superiority is very striking and overwhelming. That's why teachers
don't spend a lot of time with this topic.

>While this does give some insights in the
>history of the use of 'data model' and
>related terms (for the people here who
>showed interest in that topic), it doesn't
>at all claim to mathematically prove anything.

Because you are quoting the wrong parts.

What about this?:

<quote>

CODASYL relational
GO TO 15 0
PERFORM UNTIL 1 0
currency indicators 10 0
IF 12 0
FIND 9 0
GET 4 1
STORE / PUT 2 1
MODIFY 1 0
MOVE CURRENCY 4 0
other MOVEs 9 1
SUPPRESS CURRENCY 4 0
total statements > 60   3

The relative simplicity of the relational solution is very striking.
Note: In fact, the relational solution could have been reduced to just
a single statement, a PUT; the GET and MOVE aren't strictly necessary.
What's more (although Codd doesn't mention the fact), the CODASYL
"solution" -- which was taken from another source, by the way, not
created by Codd himself -- included at least two bugs!

</quote>

>But what exactly is compared? Relational model versus network model for
>interactive support(1) for nonprogrammers. To dissmiss all graph based
>approaches for all purposes based on it is overstretching it, IMO,
>jumping to conclusions.

There is only one graph based approach. The hierarchical approach is
only a specialization of the network approach.

>Equating network approaches to graph based approaches, for all
>purposes?

If you want to formalize the network and hierarchical approaches the
only way is graph theory.

> The network approach is Codd's formalization of the CODASYL
>specification for the purpose of interactive support(1) for
>nonprogrammers, in the documents you referenced.
>(Or should I say pointed me to :-)

Yes.

>To determine wether it possible generalise Codd's comparisons
>to relational approach vs. graph based approach, some
>more levelling is needed.

No, CODASYL has all the essential features of the network approach.

>I suspect we will not be able to agree on this one.

Surely. Rationalism is the contrary of faith. Faith means to believe
without reason.

>However, maybe we can try to agree on the
>'mathematical proof' issue, by clearly
>stating what exactly was proven.

I recoment you to read Relational Database: Selected Writings.

Regards
 Alfredo
Anthony W. Youngman - 01 Jun 2004 23:43 GMT
>>>>... It was mathematically proven that it is better than the graph
>>>>based approaches.
[quoted text clipped - 19 lines]
>but it still leaves the questions unanswered:
>simpler at what? etc.

And Occam's Razor (the Einstein version iirc) says "make things as
simple as possible, BUT NO SIMPLER".

For example, Newtonian Mechanics is a damn sight simpler than General
Relativity. But precisely *because* it is simpler, it is also a hell of
a lot more dangerous, because it is *too* simple, and more prone to
screw-ups.

Simplicity - if carried too far - is lethal.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

mAsterdam - 02 Jun 2004 00:05 GMT
mAsterdam writes
>> This reduces the statement to
>> "It was mathematically proven that it is simpler
[quoted text clipped - 5 lines]
> And Occam's Razor (the Einstein version iirc) says "make things as
> simple as possible, BUT NO SIMPLER".

The examples given in Alfredo's links did a good job at shaving
CODASYL's beard by providing the same and better
results (for the "no simpler" part) from a much
simpler construct. Did you read them?
Anthony W. Youngman - 04 Jun 2004 00:50 GMT
>mAsterdam writes
>>> This reduces the statement to
[quoted text clipped - 10 lines]
>results (for the "no simpler" part) from a much
>simpler construct. Did you read them?

Probably. And I probably didn't understand them.

All I'm trying to say is that simplicity as a goal in itself is a
delusion.

And just because relational may be simpler than codasyl doesn't mean
that it's a good thing. We have a real-world problem here ... look at
the following mapping ...

real world <=> business analysis <=> database

What matters is the complexity (or simplicity) of the WHOLE SYSTEM.
There's no point in simplifying the database, if the necessary increase
in complexity of the business analysis totally negates it. By focussing
on minimising the complexity of one part of the system, we make the
system as a whole more complex. That will explain why Dawn's experience
is that MV is more productive than relational - the simplicity of the
relational database over MV simply pushes all the complexity into the
business analysis side, turning that into a total nightmare.

Which is simpler - to model a single real world entity as a single
database "table" as MV does (we can model an invoice in a single FILE),
or as five or six relational tables? And don't forget - our FILE (should
be) normalised, so we can access it just as if it were five or six
relational tables ...

Yep. The database itself is more complex. But the business analysis is
MUCH simpler, such that the total system complexity is a lot less.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

mAsterdam - 04 Jun 2004 02:20 GMT
> mAsterdam writes
>> mAsterdam writes
[quoted text clipped - 13 lines]
>
> Probably. And I probably didn't understand them.

You PROBABLY read them? Schroedingers cat would
go from right to left. I'm positively sure about that,
I think. Maybe.

> All I'm trying to say is that simplicity as a goal in itself is a delusion.

As is clarity. As is having a goal for that matter.

> And just because relational may be simpler than codasyl doesn't mean
> that it's a good thing. We have a real-world problem here ... look at
> the following mapping ...
>
> real world <=> business analysis <=> database

<=> defined as 'having some mutual metaphorical resemblances to'?

> What matters is the complexity (or simplicity) of the WHOLE SYSTEM.
> There's no point in simplifying the database, if the necessary increase
> in complexity of the business analysis totally negates it.  

Very true. Often made mistaek.

> By focussing
> on minimising the complexity of one part of the system, we make the
> system as a whole more complex. That will explain why Dawn's experience
> is that MV is more productive than relational - the simplicity of the
> relational database over MV simply pushes all the complexity into the
> business analysis side, turning that into a total nightmare.

I'll state my intuition (not backed up by experience)
about not taking the time to analyse data:
postponing the basic issues will bring volatile
quick wins, pushing depth investment (cost) of
reflection and the real benefits of data assests
into the future. So, if and only if your survival
depends on quick wins, go for it.

> Which is simpler - to model a single real world entity as a single
> database "table" as MV does (we can model an invoice in a single FILE),
[quoted text clipped - 4 lines]
> Yep. The database itself is more complex. But the business analysis is
> MUCH simpler, such that the total system complexity is a lot less.

Did you *read* what I replied to your post about mapping concepts
from different contexts a while ago? Probably. Maybe. Later.
Anthony W. Youngman - 07 Jun 2004 23:50 GMT
>> By focussing  on minimising the complexity of one part of the system,
>>we make the  system as a whole more complex. That will explain why
[quoted text clipped - 10 lines]
>into the future. So, if and only if your survival
>depends on quick wins, go for it.

Except that Dawn's experience (and most MV consultants, too) is that the
cost of maintaining old MV databases is lower than that of maintaining
relational ...

They're cheaper to write, they're cheaper to maintain, and they take a
LOT longer to get decrepit ...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

mAsterdam - 08 Jun 2004 01:05 GMT
> mAsterdam writes
>
[quoted text clipped - 19 lines]
> They're cheaper to write, they're cheaper to maintain, and they take a
> LOT longer to get decrepit ...

As I understood your writings you claim to analyse
your data before taking a well-informed decision to
prefer MV implementation above a RDBMS implementation.
How can my statement about quick wins trigger this response?
Anthony W. Youngman - 10 Jun 2004 02:01 GMT
>>> I'll state my intuition (not backed up by experience)
>>> about not taking the time to analyse data:
[quoted text clipped - 13 lines]
>prefer MV implementation above a RDBMS implementation.
>How can my statement about quick wins trigger this response?

Except you don't understand. I prefer to put normalised data into a MV
database, on the basis of past experience that it is ALWAYS easier to
understand the result.

Don't forget. If you've analysed your data properly, then the conversion
of data from MV-form to RDBMS-form is trivial and easily done "on the
fly" by any modern MV database.

So if I put my data into an MV database I can access it as if it were in
an RDBMS. However, the converse is not true.

AND it's a hell of a lot simpler to understand the "real world <=>
logical data" mapping in MV as opposed to relational - in MV it is
almost invariably one real world object "instance of class noun" maps
directly to one "RECORD in a FILE". In relational, typically one
"instance of class noun" will map to many rows spread across multiple
tables.

Experience says MV is simpler to understand. Maths says MV gives me the
best of both worlds.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

mAsterdam - 10 Jun 2004 19:30 GMT
> So if I put my data into an MV database I can access it as if it were in
> an RDBMS. However, the converse is not true.

It would be very interesting to know - in some detail -
what kind of data gives difficulties in putting stuff
from a RDBMS into a MV database.
This maybe somewhat awkward in this newsgroup, because some
will be just waiting to say: See? You *can* express proposition_set(x)
in a RDBMS, and you *can't* in MV, therefore MV is better.
More is not a priori better.

But I trust you can stand that reaction. Could you give some examples?
Bill H - 11 Jun 2004 00:40 GMT
Notes embedded.

> "mAsterdam" <mAsterdam@vrijdag.org> wrote...
>
[quoted text clipped - 12 lines]
>
> But I trust you can stand that reaction. Could you give some examples?

Here's an A/P invoice record.  The view is vertical, not horizontal.  The
numbers on the left indicate the field location# in the physical string on
disk.  The unnumbered value is the record key.

   340*VR3-2
001 480
002 13279
003 13210
004 74*578
005 LOAN PAYMENT, 3/2004
006 -297645
007 0
008 -297645
009 5170]3370]5170]3370
010 -101261]-196384]0]0
011 200404
012 200404
013
014
015 AUTO
016
017 74

The disk storage would look like:

^^340*VR3-2^480^13279^12114^74*578^LOAN PAYMENT,
3/2004^-297645^0^-297645^5170]3370]5170]3370^-101261]-196384]0]0^200404^2004
04^^^AUTO^^74

The contents of field#s 009 & 010 are the G/L acct#s assigned this invoice
and the "associated" G/L amounts allocated to each G/L acct#.  The last two
"values" of field# 10 are zero (0), but could just as easily be nothing so
the field could look like:

010 -101261]-196384]]

There are no data types in this record.  Field#s 002 & 003 are dates where
each date is the number of days past 31 Dec 1967 (kind of like unix's # of
seconds past midnight on 01 January 1970.  The values of field#s 006, 008,
and 010 are monetary values with the decimal stripped, so the value 25 would
indicate $.25 (or .25 of whatever denomination used).

To load a valid date into a Pick-like dbms (mvDbms) a conversion needs to be
done similar to what is done for Unix.  The same is true for money amounts.
However, extraction is very easy so that:

::LIST APOPEN '340*VR3-2' INVDATE DUEDATE ACCTS AMTS
Page   1     APOPEN

APOPEN.... INV-DATE DUE-DATE ACCT. ACCT/AMTS....

340*VR3-2  03-01-01 05-09-04 5170      1,012.61-
                            3370      1,963.84-
                            5170          0.00
                            3370          0.00

[405] 1 items listed out of 1 items.

or the relational way:

::LIST APOPEN '340*VR3-2' BY-EXP ACCTS INVDATE DUEDATE ACCTS AMTS
Page   1     APOPEN

APOPEN.... INV-DATE DUE-DATE ACCT. ACCT/AMTS....

340*VR3-2  03-01-01 05-09-04 3370      1,963.84-
340*VR3-2  03-01-01 05-09-04 3370          0.00
340*VR3-2  03-01-01 05-09-04 5170      1,012.61-
340*VR3-2  03-01-01 05-09-04 5170          0.00

[405] 4 items listed out of 1 items.

(if this appears in a proportional font simply cut & paste to notepad)

I hope this answered your question.  :-)

Bill
mAsterdam - 11 Jun 2004 07:14 GMT
> Notes embedded.
>>"mAsterdam" <mAsterdam@vrijdag.org> wrote...
[quoted text clipped - 9 lines]
>>will be just waiting to say: See? You *can* express proposition_set(x)
>>in a RDBMS, and you *can't* in MV, therefore MV is better.

Heh. Strange typo - sorry.

>>More is not a priori better.
>>
[quoted text clipped - 28 lines]
> 3/2004^-297645^0^-297645^5170]3370]5170]3370^-101261]-196384]0]0^200404^2004
> 04^^^AUTO^^74

I can see the mapping between the 'storage' and the 'vertical view'
representation. The FILE definition (if this is the correct term) would
help to get the example clear, no?

> The contents of field#s 009 & 010 are the G/L acct#s assigned this invoice
> and the "associated" G/L amounts allocated to each G/L acct#.  The last two
[quoted text clipped - 6 lines]
> each date is the number of days past 31 Dec 1967 (kind of like unix's # of
> seconds past midnight on 01 January 1970.  

So, a different time-offset than unix.
Different internal date representation.
No big deal, IMO.

> The values of field#s 006, 008,
> and 010 are monetary values with the decimal stripped, so the value 25 would
[quoted text clipped - 29 lines]
>
> [405] 4 items listed out of 1 items.

Hm... no big deal either, or is it? Grouping / levelled duplicate
suppression and it's the same, no?

> (if this appears in a proportional font simply cut & paste to notepad)
>
> I hope this answered your question.  :-)

Certainly on the detail level. But what is the problem?
I don't see the RDBMS ==> MV problem - but I  may
be overlooking something.
Anyway, thank you for your effort.
Anthony W. Youngman - 15 Jun 2004 00:29 GMT
>> So if I put my data into an MV database I can access it as if it were
>>in an RDBMS. However, the converse is not true.
[quoted text clipped - 8 lines]
>
>But I trust you can stand that reaction. Could you give some examples?

Actually, all you have to do to make RDBMS appear (superficially) to
look like MV is to declare the appropriate views. This does, however,
have the unfortunate side-effect of presenting your application with
apparently redundant data. The app is also unaware of "if I change this,
then that will change too" responses. Or "if I delete that, then the
other will go with it".

However, what you can not do with RDBMS is predict system response :-)
With MV, you can *prove* that it's damn near impossible to improve on
it...

You also have difficulty guessing which tables represent which
real-world object - while MV has no guarantees either, your chances of
being correct "by accident" are much, much higher. Does an address table
represent a company address, a billing address, a shipping address or
what? While a relational table may make it clear in the name, MV makes
it clear because it would be part of the company file, or the invoice
file, or whatever.

I gather it is possible to hide the underlying tables such that an app
can access them, but only through views that the dba wishes to permit.
With MV, that extra "clutter" isn't there...

Basically, MV is so much simpler :-) the database organisation maps
roughly one-to-one to real-world reality :-)

Actually, it's quite difficult to answer your question - we've taken so
much from relational :-) But we've taken mostly in "the theory of good
design", and not made any changes to the fundamental design of MV, just
in how we use it. I think the most important difference is that thing
about being able to predict real-world system response. Something
relational theory actively avoids... and as I comment elsewhere, the
fact that MV stores so much more information as metadata, not data, so
it's actually available to the dbms to help it optimise.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

mAsterdam - 15 Jun 2004 02:20 GMT
> mAsterdam writes
>>
[quoted text clipped - 15 lines]
> have the unfortunate side-effect of presenting your application with
> apparently redundant data.

'superficially' only? explain.
'unfortunate side-effect of presenting your application with
apparently redundant data' what's so unfortunate in having
redundancy in presenting stuff? This is getting to look more
and more like sales-crap. I don't want to offend you, but
please try to understand what I am asking instead of giving
a rebuttal to a non-existent attack.
Anthony W. Youngman - 19 Jun 2004 01:41 GMT
>> mAsterdam writes
>>>
[quoted text clipped - 23 lines]
>please try to understand what I am asking instead of giving
>a rebuttal to a non-existent attack.

Sorry. I don't want to offend you either. But MV will present you with a
normalised view of the data. If an RDBMS presents you with a view of the
same data and it contains a "many" join, then the app will get multiple
copies of certain bits of data. In relational, this could lead to an
attempt to update one copy without realising that there IS only one
copy, so all the others change as well. Yes that would be stupid
programming, but an MV app would know that there was only one copy.

That's why I said "superficial". The MV app will know more about the
data, because of the way the data is presented by the database.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 04 Jun 2004 15:58 GMT
> >mAsterdam writes
> >>> This reduces the statement to
[quoted text clipped - 15 lines]
> All I'm trying to say is that simplicity as a goal in itself is a
> delusion.

Sorry, but it's not. There are other goals, and sometimes the goals
short-circuit one another, but simplicity is always a worthy goal, in every
aspect of every design of... well, anything. It's a necessary but not
sufficient component of design quality.

> And just because relational may be simpler than codasyl doesn't mean
> that it's a good thing. We have a real-world problem here ... look at
[quoted text clipped - 3 lines]
>
> What matters is the complexity (or simplicity) of the WHOLE SYSTEM.

And the complexity of the individual components at least form an input to
the "complexity measure function" of the whole system.

> There's no point in simplifying the database, if the necessary increase
> in complexity of the business analysis totally negates it.

Agreed.

> By focussing on minimising the complexity of one part of the system, we
make the
> system as a whole more complex.

Not true, at least not necessarily.

> That will explain why Dawn's experience
> is that MV is more productive than relational - the simplicity of the
> relational database over MV simply pushes all the complexity into the
> business analysis side, turning that into a total nightmare.

Oh please. You have to be kidding. Moving from a 3NF-but-not-1NF MV
structure to a fully 3NF structure produces, from bliss, a "total
nightmare"? That's delusion.

> Which is simpler - to model a single real world entity as a single
> database "table" as MV does (we can model an invoice in a single FILE),
> or as five or six relational tables?

Why do you have attributes? You're, like, totally dissecting the holistic
nature of the invoice, dude. C'mon, let's harmonize with the cosmic gestalt
and just have one big file with attributes INVOICE1, INVOICE2, INVOICE3,
etc...

> And don't forget - our FILE (should
> be) normalised, so we can access it just as if it were five or six
> relational tables ...
>
> Yep. The database itself is more complex. But the business analysis is
> MUCH simpler, such that the total system complexity is a lot less.

In what way does storing the invoice in a single file mean the business
analysis is simpler? You still had to identify attributes, and then of
course make sure the ordering of elements in the MV attributes is the same,
and possibly dissect values into sub-values...

- erk
Dawn M. Wolthuis - 04 Jun 2004 16:49 GMT
> > >mAsterdam writes
<snip>
>> Which is simpler - to model a single real world entity as a single
> > database "table" as MV does (we can model an invoice in a single FILE),
[quoted text clipped - 4 lines]
> and just have one big file with attributes INVOICE1, INVOICE2, INVOICE3,
> etc...

hey man, now you're talkin' but now we want to ask questions of the data, so
we need to tag some parts, without harming any animals, and there you have
it ;-)

> > And don't forget - our FILE (should
> > be) normalised, so we can access it just as if it were five or six
[quoted text clipped - 7 lines]
> course make sure the ordering of elements in the MV attributes is the same,
> and possibly dissect values into sub-values...

One way it makes it easier is that it takes us down to a smaller number of
portals, namespaces, vocabularies, means of making our way into the data.
If you don't think in terms of equal relations, but of some being important
entry points into the data, you simply things greatly for the user.  I don't
know about thinking about ordering of the elements -- we don't give it a
second thought -- you start tossing those puppies in there and add to the
end or leave a few open spaces if you like it that way.  No one spends any
brain cells considering the ordering of attributes in PICK/MV.  --dawn
Eric Kaun - 04 Jun 2004 21:10 GMT
> > > >mAsterdam writes
> <snip>
[quoted text clipped - 11 lines]
> we need to tag some parts, without harming any animals, and there you have
> it ;-)

Heh... oh, so you don't give a darn about plants? Speciesist.

Asking questions is precisely the point of relational, and has been pointed
out, relational is more egalitarian with respect to what you can ask, in
that it requires that only relations (and tuples, which are directly
implied) "tag" the data. Tagging, though, implies a high ratio of text to
markup - else you end up with the mess that is many XML docs, with a 5:1
ratio of tags to data. And XML tagging has a limited type system, and
nothing about constraints.

So when you say "tag parts", you've already decided on the format - an
implication that the data comes "naturally" in some form. In my experience,
that "natural" form is as useful as natural language is for programming,
even with the tags.

Maybe I'd be swayed by a more complete tagging system, but I think it's the
cart pulling the horse...

> > In what way does storing the invoice in a single file mean the business
> > analysis is simpler? You still had to identify attributes, and then of
[quoted text clipped - 6 lines]
> If you don't think in terms of equal relations, but of some being important
> entry points into the data, you simply things greatly for the user.

Some users. What about the users with other entry points?

> I don't
> know about thinking about ordering of the elements -- we don't give it a
> second thought -- you start tossing those puppies in there and add to the
> end or leave a few open spaces if you like it that way.  No one spends any
> brain cells considering the ordering of attributes in PICK/MV.  --dawn

I don't mean the ordering of attributes - I mean the ordering of elements
within an attribute. For example, you have an invoice, which probably has a
LineItem attribute, and a Part# attribute, and a Quantity attribute. As
separate attributes, those aren't connected by anything but convention; and
it's those sort of assumptions that normalization is meant to codify.

- erk
Anthony W. Youngman - 08 Jun 2004 00:10 GMT
>> hey man, now you're talkin' but now we want to ask questions of the data,
>so
[quoted text clipped - 10 lines]
>ratio of tags to data. And XML tagging has a limited type system, and
>nothing about constraints.

Relational is more egalitarian about what you can ask (actually, I'm not
sure about that, but never mind ...)

But it's like being "politically correct". Sure, you may want to know
"how many blokes have a car the same colour as their wife's hair?".

BUT! Do you want to make "all questions equally easy to answer" or do
you want to make "common questions easier to answer than unusual ones".
If by levelling off the ease of asking questions, you simply make the
easier questions harder in order to level the playing field, you're
doing your users a very big disservice. Is that what you're trying to
achieve?

And with a little bit of thought, you can make nearly ANY question in
Pick easy to answer. More to the point, you can PROVE that the system
can answer the question easily. Given that relational goes to extreme
lengths to separate the logical from the physical, relational actually
prevents you from even trying to prove the question is easy, merely
saying "you have no choice but to trust the optimiser" :-(

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 09 Jun 2004 20:27 GMT
> Relational is more egalitarian about what you can ask (actually, I'm not
> sure about that, but never mind ...)
[quoted text clipped - 8 lines]
> doing your users a very big disservice. Is that what you're trying to
> achieve?

Not at all. I want a firm logical foundation for any questions, and to have
that foundation serve as the basis for making questions easier to answer -
in other words, to define useful (though necessarily restricted) views in
terms of an egalitatian model. Otherwise my initial choice is simply too
risky, at least in complex domains.

> And with a little bit of thought, you can make nearly ANY question in
> Pick easy to answer. More to the point, you can PROVE that the system
> can answer the question easily.

Please elaborate on both of these. I have no idea what proving something
easy to answer entails...

> Given that relational goes to extreme
> lengths to separate the logical from the physical,

? Extreme lengths to separate? I'd say is simply tries to keep them
"naturally" separate, though of course I have no definition for "natural".
:-)

> relational actually
> prevents you from even trying to prove the question is easy, merely
> saying "you have no choice but to trust the optimiser" :-(

Huh? Prove the question is easy? What does that mean?

At any level above hardware, we have no choice - it depends on the
processor, and hard disk, and memory speed, and... so of course there's a
level of trust involved. Do you trust the Pick compiler / interpreter? I
certainly want to delegate the nasty business of optimization (which we've
demonstrated is useful in the considerably more-difficult area of compilers)
to a machine which can do the job better and faster than I.

- erk
Anthony W. Youngman - 10 Jun 2004 02:06 GMT
>> relational actually
>> prevents you from even trying to prove the question is easy, merely
[quoted text clipped - 8 lines]
>demonstrated is useful in the considerably more-difficult area of compilers)
>to a machine which can do the job better and faster than I.

Basically, if we assume (reasonable assumption) that everything else is
irrelevant when compared to disk access, I can prove that (almost) every
attempted disk access actually retrieves data that is relevant to the
question.

I can also show statistically that the chances of retrieving multiple
items of interest with a single access are also high.

Of course, that argument is less relevant now we have huge amounts of
ram...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 10 Jun 2004 21:23 GMT
> >> relational actually
> >> prevents you from even trying to prove the question is easy, merely
[quoted text clipped - 13 lines]
> attempted disk access actually retrieves data that is relevant to the
> question.

Oh... easy == fast.

Perhaps the above is true, but that requires your data be structured in the
same file, which is both boon and bane. In a TRDBMS (remember, this is
c.d.t), the DBMS would reorganize base relations's storage based on access
patterns, whereas in Pick you have to decide that in advance, and do a lot
of work later if it changes. Unless I'm misinterpreting... in any event,
access optimization and clustering based on common usage (which can change,
especially as reports and ad hoc queries enter the fray) should be dynamic,
and analyzed by a computer.

- erk
Anthony W. Youngman - 15 Jun 2004 01:02 GMT
>> Basically, if we assume (reasonable assumption) that everything else is
>> irrelevant when compared to disk access, I can prove that (almost) every
[quoted text clipped - 11 lines]
>especially as reports and ad hoc queries enter the fray) should be dynamic,
>and analyzed by a computer.

And dynamic re-organisation can be prohibitively expensive as it
reorganises the data to optimise your year-end reports, only for you to
have run your final report five minutes ago. This is the fallacy of
making all reports equally "easy" by imposing unnecessary overhead on
the common ones!

If you want to know one thing about an invoice, chances are you want to
know several. MV will (if properly designed) return EVERYTHING in a
single disk hit. If you then want to know about the company it was sent
to, a further SINGLE disk hit will return EVERYTHING you want there.

You're trying to optimise everything. If you access one bit of an
invoice, the chances of you accessing another bit of the same invoice
are HIGH. The chances of the computer guessing correctly whether you
want another invoice, or the company, or any other bit of information
unrelated to that invoice, are piss-poor. So why try?

MV optimises retrieval of information about any single real-world
object. It will step, with blinding speed, down a list of keys. It
doesn't even try to second-guess the user's next random data access -
what's the point? Was it Knuth said "premature optimisation is the root
of all evil"? The design of MV naturally clusters related data. And
ignores unrelated data. And it shows! As I've said, again and again, why
does experience say that MV beats relational for speed hands down every
time, especially for "large" databases?

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Anthony W. Youngman - 08 Jun 2004 00:02 GMT
>> And don't forget - our FILE (should
>> be) normalised, so we can access it just as if it were five or six
[quoted text clipped - 7 lines]
>course make sure the ordering of elements in the MV attributes is the same,
>and possibly dissect values into sub-values...

When I analyse our MV system at work, I think in terms of physical
objects. I then decompose each physical object into normal form. I DON'T
NEED to think about other objects while I'm decomposing the one in front
of me.

We're in the progress of porting to MS SQL-Server. The data diagram is
an ABSOLUTE NIGHTMARE! When I look at the table diagram it's an absolute
spaghetti of links EVERYWHERE! I don't have a clue which tables model
which physical object, the meaning of links isn't intuitive.

Trying to juggle hundreds of tables is far harder than keeping track of
several tens of physical objects to which I can relate, even if each of
those objects is then broken down logically into normal form.

The MV database layout imposes a grouping which helps me grasp the
system complexity. While I can easily view the MV structure as equal to
the relational structure, a true relational database does not give me
the MV structure which appears much less complicated by virtue of
appealing to the way I naturally view the world.

Take the invoice. From the MV point of view, I see it as a SINGLE
object. It is *TRIVIAL* for the database itself to decompose that and
present it, via ODBC, to a relational programmer who wouldn't even
realise that the MV back-end viewed it as a single object.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 09 Jun 2004 21:13 GMT
> When I analyse our MV system at work, I think in terms of physical
> objects. I then decompose each physical object into normal form. I DON'T
> NEED to think about other objects while I'm decomposing the one in front
> of me.

Physical objects - how quaint. :-)

In my experience, even entities like invoices and orders become unmanageable
as "physical objects" - specifically when someone placing an order needs to
know some fairly complex interrelationships between the parts in that order,
the warehouses and their parts, and parts previously ordered by the same
customer. And in a paint formula database, the "physical objects" are both
far from clear, and far different than even the users suppose they are.

> We're in the progress of porting to MS SQL-Server. The data diagram is
> an ABSOLUTE NIGHTMARE! When I look at the table diagram it's an absolute
> spaghetti of links EVERYWHERE! I don't have a clue which tables model
> which physical object, the meaning of links isn't intuitive.

Several points:
1. Diagrams can be messy, and they can be nicely-organized
2. If you were to draw a diagram that captures the functions you present to
users in the dictionary, the files you use, and the files split for
efficiency, you might see something similarly ugly
3. The meaning of links should be fairly straightforward if you're thinking
in terms of predicates. Then again, if the database designer didn't think
that way, you could be in trouble. Surely you've seen bad Pick data models?

> Trying to juggle hundreds of tables is far harder than keeping track of
> several tens of physical objects to which I can relate, even if each of
> those objects is then broken down logically into normal form.

Agreed - good modeling is difficult, and it's certainly very useful to think
of things in groups. ER/win always did a good job for me - I could keep a
100-table logical data model segregated into domains [sic]. I never looked
at the entire thing all at once.

> The MV database layout imposes a grouping which helps me grasp the
> system complexity.

That's a very good thing. What's not a good thing is selecting a grouping
that makes sense to Function X to push its way into the definition of the
data, since Function Y might have a very different idea what groupings
should be imposed.

> While I can easily view the MV structure as equal to
> the relational structure, a true relational database does not give me
> the MV structure which appears much less complicated by virtue of
> appealing to the way I naturally view the world.

Again, if you only ever need to "view the objects" in one way, I envy you.
Experience hasn't been that kind to me.

> Take the invoice. From the MV point of view, I see it as a SINGLE
> object. It is *TRIVIAL* for the database itself to decompose that and
> present it, via ODBC, to a relational programmer who wouldn't even
> realise that the MV back-end viewed it as a single object.

I think you're backwards on this:

1. Seeing it as a single objects implies some degree of encapsulation

2. Many queries need to cross encapsulation boundaries - hence the need for
the Pick/MV query mechanism to directly support list attributes and
sub-attributes and sub-sub-attributes... if it were really one object, you
could only "get at" those pieces via the Invoice's defined operations.
Encapsulation is a red herring (direct quote from Date) - every "persistence
mapping" tool violates it.

3. It's not at all trivial. Say your Invoice has 4 attributes: customer,
parts, ship dates, and payments. Part[N] must have a corresponding Date[N]
which is the date on which that part was received; Payments is completely
separate. How would the mapping "know" that that correspondence exists?
Certainly there is "meaning" there? The aggregation has traded one general
form of meaning for a very app-specific form; in examples other than the
somewhat-hierarchical Invoice/Order, the value of the trade diminishes even
further.

4. Given that you do various queries, and that the database answers lots of
questions, what value is it for the database to "know" it's a single
objects? You don't see the "lowest level" of a data model being something
"egalitatian", which can support several views? It seems to me much more
powerful and general to supply predicates as the foundation, and layer a
"path expression" on top which can present any arbitrary view of the data.
Example: atop a relational model, I can present the Invoice, as well as a
Warehouse which "contains" the parts it shipped and which customers bought
them? Reporting and GUI generation, here I come...

- erk
Anthony W. Youngman - 10 Jun 2004 02:20 GMT
>> When I analyse our MV system at work, I think in terms of physical
>> objects. I then decompose each physical object into normal form. I DON'T
>> NEED to think about other objects while I'm decomposing the one in front
>> of me.
>
>Physical objects - how quaint. :-)

No comment :-)

>In my experience, even entities like invoices and orders become unmanageable
>as "physical objects" - specifically when someone placing an order needs to
>know some fairly complex interrelationships between the parts in that order,
>the warehouses and their parts, and parts previously ordered by the same
>customer. And in a paint formula database, the "physical objects" are both
>far from clear, and far different than even the users suppose they are.

Except that if you think of it as "noun or adjective", it does actually
become a lot clearer ... if the same adjective describes several nouns
then you need multiple instances of it :-)

>> We're in the progress of porting to MS SQL-Server. The data diagram is
>> an ABSOLUTE NIGHTMARE! When I look at the table diagram it's an absolute
[quoted text clipped - 9 lines]
>in terms of predicates. Then again, if the database designer didn't think
>that way, you could be in trouble. Surely you've seen bad Pick data models?

Have I seen bad models? I work with them :-(

>> Trying to juggle hundreds of tables is far harder than keeping track of
>> several tens of physical objects to which I can relate, even if each of
[quoted text clipped - 4 lines]
>100-table logical data model segregated into domains [sic]. I never looked
>at the entire thing all at once.

Pick never expects me to :-)

>> The MV database layout imposes a grouping which helps me grasp the
>> system complexity.
[quoted text clipped - 3 lines]
>data, since Function Y might have a very different idea what groupings
>should be imposed.

Here again, the noun/adjective paradigm just seems to work a treat ...

>> While I can easily view the MV structure as equal to
>> the relational structure, a true relational database does not give me
[quoted text clipped - 28 lines]
>somewhat-hierarchical Invoice/Order, the value of the trade diminishes even
>further.

Very easily. Pick metadata actually provides a very simple mechanism for
saying which fields are linked, and which are not.

For example, some of our Pick FILES are broken up into three or four
tables for export to ODBC. It's not a problem at all.

>4. Given that you do various queries, and that the database answers lots of
>questions, what value is it for the database to "know" it's a single
[quoted text clipped - 5 lines]
>Warehouse which "contains" the parts it shipped and which customers bought
>them? Reporting and GUI generation, here I come...

Statistics says that if you want to know one thing about an object, then
the chances are high that you want to know several things about that
object.

Okay, if the query is "please list all customers who bought part X",
then Pick gains nothing over relational. But if the query is "please
list all parts invoiced on date Y" then Pick gains big time - merely by
asking "what invoices are dated Y" I get all the data I want for free as
a side effect. If there are ten invoices, I need ten data accesses to
get all the invoice details. Relational needs ten data accesses to get
the invoice numbers from the date, then loads more accesses to get the
actual items from the invoice numbers.

Relational needs an optimiser - Pick gets it for free ... and the stats
say that on average it pays off handsomely :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Tony - 10 Jun 2004 10:32 GMT
> Relational needs an optimiser - Pick gets it for free ... and the stats
> say that on average it pays off handsomely :-)

Pick doesn't have a "free" optimizer: it just doesn't have one.  So
either there is only one access path, and too bad if it isn't optimal,
or you the designer/programmer are playing the part of the optimizer
yourselves (and I don't suppose your time comes free).
Bill H - 10 Jun 2004 23:42 GMT
Tony:

> Pick doesn't have a "free" optimizer: it just doesn't have one.  So
> either there is only one access path, and too bad if it isn't optimal,
> or you the designer/programmer are playing the part of the optimizer
> yourselves (and I don't suppose your time comes free).

I would ask: is the developer more efficient optimizing application data or
is the DBA?  A Pick-like dbms asks the developer to optimize while RDBMS
products ask the DBA (or it's self optimizing based on queries) to optimize.

Bill
Anthony W. Youngman - 15 Jun 2004 01:04 GMT
>> Relational needs an optimiser - Pick gets it for free ... and the stats
>> say that on average it pays off handsomely :-)
[quoted text clipped - 3 lines]
>or you the designer/programmer are playing the part of the optimizer
>yourselves (and I don't suppose your time comes free).

So our time isn't free ... but optimisation is inherent in the design.
We don't even think about it - it just happens ...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Dawn M. Wolthuis - 15 Jun 2004 02:25 GMT
> >> Relational needs an optimiser - Pick gets it for free ... and the stats
> >> say that on average it pays off handsomely :-)
[quoted text clipped - 6 lines]
> So our time isn't free ... but optimisation is inherent in the design.
> We don't even think about it - it just happens ...

Having spent some years trying to teach PICK folks to use SQL as a query
language (don't worry, I saw the light) I can definitely vouch for ths as a
MAJOR difference between the MV query language and SQL.  Users who had been
accustomed to MV Query langauges couldn't believe they had to work so hard
to get a good SQL query from the same data.  Of course, there are better
optimizers than what they had in the product at that time, but nothing holds
a candle to the query language they were used to.

I'm hopeful that someone will write an implementation of GIRLS (mv query)
that goes against XML data so others can benefit from the language.  It
needs some updating since it hasn't changed much in 40 years, but it sure
beats XQuery for ease of use by a hardly-trained human.

Cheers!  --dawn
mAsterdam - 15 Jun 2004 02:26 GMT
>>> Relational needs an optimiser - Pick gets it for free ... and the stats
>>> say that on average it pays off handsomely :-)
[quoted text clipped - 6 lines]
> So our time isn't free ... but optimisation is inherent in the design.
> We don't even think about it - it just happens ...

Magic! Yey!
Bill H - 05 Jun 2004 17:49 GMT
Lets look at

[snipped]

> And just because relational may be simpler than codasyl doesn't mean
> that it's a good thing. We have a real-world problem here ... look at
> the following mapping ...
>
> real world <=> business analysis <=> database

Here's an visual example:

:LIST APHIST '340*VR11-1' INVDATE DESC ACCTS AMTS

APHIST.... INV-DATE Description......................... Acct..
Acct/Amts....
340*VR11-1 11-01-00 LOAN PAYMENT, NOV        5170        999.22-

3370      1,977.23-

5170          0.00

3370          0.00

This is a single A/P invoice with multiple G/L account#s defined (and
amounts).  This invoice will update four G/L accounts in the general ledger
and financial statements by the amount "associated" with each account#.
These are the properties (some anyway) of this invoice.  The invoice is
_not_ like:

:LIST APHIST '340*VR11-1' BY-EXP ACCTS INVDATE DESC ACCTS AMTS

APHIST.... INV-DATE Description................................ Acct...
Acct/Amts....
340*VR11-1 11-01-00 LOAN PAYMENT, NOV               3370      1,977.23-
340*VR11-1 11-01-00 LOAN PAYMENT, NOV               3370          0.00
340*VR11-1 11-01-00 LOAN PAYMENT, NOV               5170        999.22-
340*VR11-1 11-01-00 LOAN PAYMENT, NOV               5170          0.00

To alter the invoice, when preparing it for data storage, is to alter its
fundamental structure.  Although, in many instances it can be put back
together again, sometimes it cannot.  In addition, there's a lot of work to
go through in order to decompose then recompose this invoice.  This
additional complexity, I think, is unnecessary and costly.

Then again, cost and complexity may not be an issue for some.

Bill
Alfredo Novoa - 02 Jun 2004 15:08 GMT
>And Occam's Razor (the Einstein version iirc) says "make things as
>simple as possible, BUT NO SIMPLER".

It is simpler to build information systems with the relational
approach than with any other approach.

Your quote is out of context because it is related to physical
theories and not about engineering approaches. You always make the
same mistakes.

Regards
 Alfredo
Dawn M. Wolthuis - 02 Jun 2004 16:01 GMT
> >And Occam's Razor (the Einstein version iirc) says "make things as
> >simple as possible, BUT NO SIMPLER".
>
> It is simpler to build information systems with the relational
> approach than with any other approach.

What are the other approaches you have tried?  --dawn
<snip>
Alfredo Novoa - 02 Jun 2004 17:27 GMT
>> It is simpler to build information systems with the relational
>> approach than with any other approach.
>
>What are the other approaches you have tried?  --dawn
><snip>

To process files in applications, business objects, xBase, ADO.NET,
SQL, etc.

Regards
 Alfredo
Leandro Guimaraens Faria Corsetti Dutra - 31 May 2004 13:26 GMT
>>Rationalism is as irrational(/rational) as any oher faith.
>
> What a nonsense!

    No, it is not.  Rationalism, usually defined as the belief
that human reasoning is the sole or sufficient test of truth, is
circular reasoning.

    Now, this is Philosophy in general.  In Natural Philosophy,
also known as Science, reason is indeed our only resource.  It just
doesn't hold water when erected in the sole basis of Epistemology,
Metaphisics and Ethics.

Signature

Leandro Guimarães Faria Corsetti Dutra           +55 (11) 5685 2219
Av Sgto Geraldo Santana, 1100 6/71        leandro@dutra.fastmail.fm
04.674-000  São Paulo, SP                                    BRASIL
http://br.geocities.com./lgcdutra/

Alfredo Novoa - 31 May 2004 15:55 GMT
>>>Rationalism is as irrational(/rational) as any oher faith.
>>
>> What a nonsense!
>
>    No, it is not.

It is incredible nonsense. One of the biggest nonsense written here.

>  Rationalism, usually defined as the belief
>that human reasoning is the sole or sufficient test of truth, is
>circular reasoning.

Wrong.

<quote>

Rationalism, also known as the rationalist movement, is a
philosophical doctrine that asserts that the truth should be
determined by reason and factual analysis, rather than faith, dogma or
religious teaching.

</quote>

http://en.wikipedia.org/wiki/Rationalist

It is only common sense, the less common of the senses.

Faith is a form of superstition.

<quote>

Superstition is a term used to refer to a set of behaviors that may be
faith based, or related to magical thinking, whereby the practitioner
believes that the future, or the outcome of certain events, can be
influenced by certain of his or her behaviors.

Critics argue that superstition is not based on reason, but instead
springs from religious feelings that are misdirected or unenlightened,
which leads in some cases to rigor in religious opinions or practice,
and in other cases to belief in extraordinary events or in charms,
omens, and prognostics. Many superstitions can be prompted by
misunderstandings of causality or statistics.

</quote>

http://en.wikipedia.org/wiki/Superstition

>    Now, this is Philosophy in general.  In Natural Philosophy,
>also known as Science, reason is indeed our only resource.

Of course not!

In science the only resource is observation. Reason is used to
understand the observations and to make predictions.

Regards
 Alfredo
Leandro Guimaraens Faria Corsetti Dutra - 31 May 2004 16:53 GMT
>>    Now, this is Philosophy in general.  In Natural Philosophy,
>>also known as Science, reason is indeed our only resource.
[quoted text clipped - 3 lines]
> In science the only resource is observation. Reason is used to
> understand the observations and to make predictions.

    Ergo, observations are only useful in the presence of reason.
This makes reason, as sustained by reasonable faith, our most
fundamental resource in Natural Philosophy, as we wished to
demonstrate.

Signature

Leandro Guimarães Faria Corsetti Dutra           +55 (11) 5685 2219
Av Sgto Geraldo Santana, 1100 6/71        leandro@dutra.fastmail.fm
04.674-000  São Paulo, SP                                    BRASIL
http://br.geocities.com./lgcdutra/

Anthony W. Youngman - 01 Jun 2004 23:37 GMT
>[snip]
>>>It is in the leap from doing relational theory to thinking that
[quoted text clipped - 14 lines]
> - What exacltly was proven?
> - Could you please give a reference?

And by its very definition, maths proves nothing about the real world.
Just because relational databases are perfect (as indeed they are) at
modelling "data as defined by the relational model", that says nothing
about whether the real world can be described by "data" that fits the
mathematical definition.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Leandro Guimaraens Faria Corsetti Dutra - 31 May 2004 13:17 GMT
> I am completely opposed to faith and other forms of irrationalism. The
> Relational Model is maths not irrational faith.

    This is OT, but there is such a thing as rational faith.  You
use it everyday to function in society.

Signature

Leandro Guimarães Faria Corsetti Dutra           +55 (11) 5685 2219
Av Sgto Geraldo Santana, 1100 6/71        leandro@dutra.fastmail.fm
04.674-000  São Paulo, SP                                    BRASIL
http://br.geocities.com./lgcdutra/

Alfredo Novoa - 31 May 2004 15:55 GMT
>> I am completely opposed to faith and other forms of irrationalism. The
>> Relational Model is maths not irrational faith.
>
>    This is OT

I disagree.

>, but there is such a thing as rational faith.  You
>use it everyday to function in society.

No, faith is irrational by definition. If it is rational then it is
not faith. It is deduction based on incomplete information.

Regards
 Alfredo
Leandro Guimaraens Faria Corsetti Dutra - 31 May 2004 16:51 GMT
>>    This is OT
>
> I disagree.

    Too bad, as one of the luminaries of this newsgroup you've
just given permission for me to argue.

>> but there is such a thing as rational faith.  You
>>use it everyday to function in society.
>
> No, faith is irrational by definition. If it is rational then it is
> not faith. It is deduction based on incomplete information.

    You are messing the concepts of religion and faith.

    Religion is, quite by definition, based on faith.  But there
are quite a lot of other instances of faith besides religious ones,
even if you don't differentiate institutional from spiritual religion.

    The reason for this is the long association of faith and
religion, or more generally spirituality and authority, so as to
change even the dictionaries carry only the contaminated definition.

    Faith originally was not opposed to reason, but to vision.
One has faith not necessarily because one accepts another's authority,
but because one continues to believe something even without presently
seeing proof of it.

    For example, I have no real assurance the building I am
working in won't crumble while I write these lines.  But based on its
apparently solid form, the fact that buildings don't usually crumble
without first showing some signs as cracks and shifts, and that the
municipality has taken some reasonable steps to check the engineers'
work, I have a reasonable faith in being able to leave for home
unharmed from falling bricks.

    In another example, I have no way to reasonably prove the meal
my wife serves me isn't poisoned.  But I have faith in her, and good
indications in her past behaviour and my knowledge of her psiche, that
it isn't.

    To carry this into the Informatics realm, I can't really
remember all the time all the proofs of why we need RDBMSs.  But I've
seen the proof, and based on my admittedly partial remembrance of them
I have faith this is the way to go.

    Without faith, even Science wouldn't be possible.  A scientist
has faith his reason is reasonably functioning, and so is his
colleagues'.  He can't keep all the proofs of this all the time at his
conscience.

    The erection of Science as independent of the very basis of
human reasoning is sub-scientific, and ultimately dangerous to Science
itself.

    Well, I still maintain all this is OT.  But sure Philosophy
101 isn't lost time.

Signature

Leandro Guimarães Faria Corsetti Dutra           +55 (11) 5685 2219
Av Sgto Geraldo Santana, 1100 6/71        leandro@dutra.fastmail.fm
04.674-000  São Paulo, SP                                    BRASIL
http://br.geocities.com./lgcdutra/

Alfredo Novoa - 31 May 2004 19:48 GMT
>    Faith originally was not opposed to reason

Indeed, it was a "solution" used when the reason had no answers. Faith
is related to magic and authority.

Fortunately the reason has a lot more answers that it had in the
prehistory.

>One has faith not necessarily because one accepts another's authority,
>but because one continues to believe something even without presently
>seeing proof of it.

No, proofs are not the only reasons. You are confusing faith with
deduction based on incomplete information.

Although like many other words, faith has many meanings like fidelity
(faithful implementation) and it is even used for reason based
confidence. But I don't think we are talking about these lose uses of
the term.

The pickies use the term in order to equate The Relational Model and
the primitie approaches. Here is their fallacy:

Irrationalism is faith, rationalism is faith so irrationalism is as
good as rationalism therefore The Relational Model is not better than
any primitive ad hoc approach.

The flaws of the argument are rather obvious.

>    For example, I have no real assurance the building I am
>working in won't crumble while I write these lines.  But based on its
[quoted text clipped - 3 lines]
>work, I have a reasonable faith in being able to leave for home
>unharmed from falling bricks.

And it has nothing to do with the strict sense of faith. You have a
reasonable knowledge with a very high degree of certainty. You are
making a good use of the avaiable information and applying maths.

When you say that you have no real assurance about the building, you
are saying that you have not faith on it. Faith is all or nothing.

Faith would be to have the absolute certainty about that the building
will collapse without any previous sign or logical reason, because it
was revealed.

>    In another example, I have no way to reasonably prove the meal
>my wife serves me isn't poisoned.  But I have faith in her

You have confidence in her among other things because you know that
she has no reasons to kill you and good reasons to not do it.

>, and good
>indications in her past behaviour and my knowledge of her psiche, that
>it isn't.

Correct. That's why you have reason and experience based knowledge and
not faith.

>    To carry this into the Informatics realm, I can't really
>remember all the time all the proofs of why we need RDBMSs.  But I've
>seen the proof, and based on my admittedly partial remembrance of them
>I have faith this is the way to go.

You are using the term very losely. IMO it is not a good use of the
term and you are playing the troll's game.

This is faith according to the strict meaning of the word:

<quote>
"We believe", says the Vatican Council (III, iii), "that revelation is
true, not indeed because the intrinsic truth of the mysteries is
clearly seen by the natural light of reason, but because of the
authority of God Who reveals them, for He can neither deceive nor be
deceived."
</quote>

http://en.wikipedia.org/wiki/Faith

So what the pickies want to say is that we don't promote The
Relational Model because the measurable advantages it has, but because
the authority of Codd (that rhymes with God) and his apostle Date Who
reveals them, for They can neither deveive nor be deceived. :-)

Regards
 Alfredo
Leandro Guimaraens Faria Corsetti Dutra - 31 May 2004 20:13 GMT
> You are using the term very losely

    No I am not.  I am giving it its original meaning.

    You are dealing with current reinterpretations focusing on
spirituality and authority.

Signature

Leandro Guimarães Faria Corsetti Dutra           +55 (11) 5685 2219
Av Sgto Geraldo Santana, 1100 6/71        leandro@dutra.fastmail.fm
04.674-000  São Paulo, SP                                    BRASIL
http://br.geocities.com./lgcdutra/

Dawn M. Wolthuis - 31 May 2004 20:35 GMT
> > Faith originally was not opposed to reason
>
[quoted text clipped - 22 lines]
> good as rationalism therefore The Relational Model is not better than
> any primitive ad hoc approach.

I was with you until you seem to have jumped on some irrationalism bandwagon
with that last paragraph.  Given that my heavy programming years were not
spent with PICK, I'm not a real pickie (and I also don't know if they call
themselves that -- I have used that term as short-hand, but I'm not sure who
else might).  But I'll classify myself a pickie, if I may, because I see
that the implementations based on the Nelson-PICK efforts (which appear to
be based on the Postley-Buettell work) provide very cost-effective solutions
for business, especially when compared to those claiming to be based on the
relational model.

I have worked with a number of different environments, and my opinion from
experience is that PICK provides a big bang for the buck and runs "lean and
mean" compared to other solutions.  I do not claim that I can prove it is
better in any way -- it seems better.  Intuition is not irrational -- it is
based on our brains working in such a way as to arrive at a hypothesis (not
proof!) that we might be incapable of defending through any logic, at least
at some point in time.

This is NOT irrational thinking.  Agreed?

--dawn

<snip>
Dawn M. Wolthuis - 01 Jun 2004 04:04 GMT
> >> I am completely opposed to faith and other forms of irrationalism. The
> >> Relational Model is maths not irrational faith.
[quoted text clipped - 8 lines]
> No, faith is irrational by definition. If it is rational then it is
> not faith. It is deduction based on incomplete information.

Faith does not base decisions of truth or falsity on rationality, but it
need not be irrational -- perhaps a-rational would be more accurate?  For
example if one were to believe that rationality were the only means to
determining truth, that might be wrong or it might be outside of rationality
for this believer to arrive at that conclusion, but I don't think it is
irrational to hold to such a belief.  Similarly for other religions.

--dawn
Paul - 02 Jun 2004 00:16 GMT
>> To go back to your favourite analogy (apologies everyone), it is
>> like saying that algebra was responsible for the shortcomings of
[quoted text clipped - 17 lines]
> except where "the proof is in the pudding" -- scientific observation,
> for example).

No, I think in this analogy Newton's model does correspond to a specific
database design. The possibility that the relational theory itself is
wrong corresponds to the possibility that algebra is wrong.

>> Einstein didn't invent a better algebra, he designed a better model
>> using the SAME algebra - like a later designer designing a better
[quoted text clipped - 3 lines]
> functional theory that is better than the relational theory before
> it.

I think that would correspond to Einstein inventing a better algebra for
his model. I belive tensor algebra was actually used for relativistic
mechanics, but it's not really an improvement to standard algebra, just
a different example of it.

I think you're mistaking the theory of (algebra, relational databases,
logic) itself with theories that can be developed in them, for example
(Newtonian mechanics, payroll database, relativistic mechanics). We've
got two levels of theories here.

Paul.
mountain man - 02 Jun 2004 02:12 GMT
> No, I think in this analogy Newton's model does correspond to a specific
> database design. The possibility that the relational theory itself is
> wrong corresponds to the possibility that algebra is wrong.

Or incomplete, as has been formally demonstrated
at least 30 years prior to the emergence of the RM.

Pete Brown
Falls Creek
Oz
mAsterdam - 02 Jun 2004 09:24 GMT
>>No, I think in this analogy Newton's model does correspond to a specific
>>database design. The possibility that the relational theory itself is
>>wrong corresponds to the possibility that algebra is wrong.
>
> Or incomplete, as has been formally demonstrated
> at least 30 years prior to the emergence of the RM.

What do you mean? Do you mean that incomplete is wrong?
Or that the Goedel-incompleteness somehow implies that
any specific database design/relational theory must be
incomplete/wrong? What are you saying?
mountain man - 02 Jun 2004 13:23 GMT
> >>No, I think in this analogy Newton's model does correspond to a specific
> >>database design. The possibility that the relational theory itself is
[quoted text clipped - 7 lines]
> any specific database design/relational theory must be
> incomplete/wrong? What are you saying?

http://www.mountainman.com.au/GIF/logic_space_1.jpg
In  reference to the above diagram:

Starting with a given set of axioms (yellow), one can derive
a specific set of formalised "provable truths" (green).  This
logic space of provable truth however is not all there is to
the notion of truth.

Godel showed that there exists "unprovable truths" in all
mathematical systems, which are valid and true, but which
are not capable of being referenced by the foundational
axioms.  More recently Chaitin showed that there exists
"random truths", which are valid and tue, but which require
no reference to any axioms. (purple)

Here is a recent (2000) transcript of a talk given by Chaitin
on the relevant details of the history of these developments:
http://www.cs.auckland.ac.nz/CDMTCS/chaitin/cmu.html
entitled Historical Introduction --- A Century of Controversy
Over the Foundations of Mathematics

IMO it implies that the complete notion of whatever-it-is-that
-is-truth cannot be encapsulated in any traditional mathematical
language using the traditional axiomatic methodology everyone
has been spoon fed the last few hundred years, and that another
approach is required, in the long run.  This includes algebra.

How does this apply to relational database theory and the
Relational Model, and tables and row values?  There will
necessarily exist example truths such as those defined above
that exist independent of the relational model, and which are
not addressable by the model.

I believe that an example of this is:

The intelligence (ie: data) that is encoded in (application level)
SQL code captured in RDBMS stored procedures exists right
alongside the data, and the constraints, etc.  While the RM and
theory address the data and constraints, etc, the intelligence
(which is data) of the application level processes cannot be
formally addressed by it, even though it consists of valid SQL
statements expressing manipulations of perfectly valid data
objects known to the model and theory.

Pete Brown
Falls Creek
Oz
mAsterdam - 02 Jun 2004 14:50 GMT
>>>>No, I think in this analogy Newton's model does correspond to a specific
>>>>database design. The possibility that the relational theory itself is
>>>>wrong corresponds to the possibility that algebra is wrong.
>>>
>>>Or incomplete, as has been formally demonstrated
>>>at least 30 years prior to the emergence of the RM.

[snip]

> ...  There will
> necessarily exist example truths such as those defined above
> that exist independent of the relational model, and which are
> not addressable by the model.

Indeed. And you even go on looking for such truths. Chapeau.

> I believe that an example of this is:
>
[quoted text clipped - 6 lines]
> statements expressing manipulations of perfectly valid data
> objects known to the model and theory.

Some of it may be capturable in the model by redefining the
model - but this does not invalidate your statement.
Here is another example:
http://www.essentialstrategies.com/documents/brules.pdf
mountain man - 03 Jun 2004 04:20 GMT
> >>>>No, I think in this analogy Newton's model does correspond to a specific
> >>>>database design. The possibility that the relational theory itself is
[quoted text clipped - 27 lines]
> Here is another example:
> http://www.essentialstrategies.com/documents/brules.pdf

Looks like an interesting article.  Many thanks
for the reference.

Pete Brown
Falls Creek
Oz
Alfredo Novoa - 02 Jun 2004 15:08 GMT
>Godel showed that there exists "unprovable truths" in all
>mathematical systems, which are valid and true, but which
>are not capable of being referenced by the foundational
>axioms.  More recently Chaitin showed that there exists
>"random truths", which are valid and tue, but which require
>no reference to any axioms. (purple)

This means that there are theorems you can not prove deriving from the
axioms.

>How does this apply to relational database theory and the
>Relational Model, and tables and row values?

It means that perhaps you can find unprovable relational theorems.
Nothing less and nothing more.

>I believe that an example of this is:
>
[quoted text clipped - 6 lines]
>statements expressing manipulations of perfectly valid data
>objects known to the model and theory.

No you don't understand anything. It does not have any relationship
with what Godel said.

BTW have you readen The Third Manifesto?

It has many pages devoted to the integration of The Relational Model
with procedural programming (stored procedures). Just what you want to
address.

Regards
 Alfredo
Paul - 03 Jun 2004 00:21 GMT
> Godel showed that there exists "unprovable truths" in all
> mathematical systems, which are valid and true, but which
> are not capable of being referenced by the foundational
> axioms.

Not exactly, Godel's Incompleteness Theorem only applies to theories or
systems that are above a certain complexity. See here for example:
http://www.sm.luth.se/~torkel/eget/godel/complete.html

There are certainly complete theories, for example the theories of real
numbers, of complex numbers, and of Euclidean geometry. In these
theories there are no truths that cannot be proved within the system.

> IMO it implies that the complete notion of whatever-it-is-that
> -is-truth cannot be encapsulated in any traditional mathematical
> language using the traditional axiomatic methodology everyone
> has been spoon fed the last few hundred years, and that another
> approach is required, in the long run.  This includes algebra.

Are you sure? I can't find any definite links to a completeness result
for algebra, but it's quite a simple system compared to the one for the
whole of arithmetic, so I'd be surprised if it wasn't complete.

> How does this apply to relational database theory and the
> Relational Model, and tables and row values?  There will
> necessarily exist example truths such as those defined above
> that exist independent of the relational model, and which are
> not addressable by the model.

Check out this article on Codd's 1972 paper "Relational Completeness of
Data Base Sublanguages":
http://www.intelligententerprise.com/db_area/archives/1999/990501/online.jhtml

Unfortunately I can't find an online version of Codd's original paper,
but he appears to prove that relational algebra is complete. Whether
this is "completeness" used in exactly the same sense as Godel's
Incompleteness Theorem I'm not quite sure though.

> I believe that an example of this is:
>
[quoted text clipped - 6 lines]
> statements expressing manipulations of perfectly valid data
> objects known to the model and theory.

Do you have a simple concrete example of what you mean by this?
What kind of stored procedures are you thinking of?
Plain single SELECT statements?
Or a series of INSERTs, UPDATEs and DELETEs that do some business
process? In this latter case you've got procedural code and I think it
should be possible to replace it with declarative code. It's difficult
to talk about without an example though.

Paul.
mountain man - 03 Jun 2004 04:20 GMT
> > Godel showed that there exists "unprovable truths" in all
> > mathematical systems, which are valid and true, but which
[quoted text clipped - 8 lines]
> numbers, of complex numbers, and of Euclidean geometry. In these
> theories there are no truths that cannot be proved within the system.

I think you should double-check the above.  Godel's incompleteness
theorem was a statement in elementary number theory (arithmetic).
Here is a reference:
http://www.cs.auckland.ac.nz/CDMTCS/chaitin/cmu.html

> > IMO it implies that the complete notion of whatever-it-is-that
> > -is-truth cannot be encapsulated in any traditional mathematical
[quoted text clipped - 5 lines]
> for algebra, but it's quite a simple system compared to the one for the
> whole of arithmetic, so I'd be surprised if it wasn't complete.

See above.

> > How does this apply to relational database theory and the
> > Relational Model, and tables and row values?  There will
[quoted text clipped - 4 lines]
> Check out this article on Codd's 1972 paper "Relational Completeness of
> Data Base Sublanguages":

http://www.intelligententerprise.com/db_area/archives/1999/990501/online.jhtml

> Unfortunately I can't find an online version of Codd's original paper,
> but he appears to prove that relational algebra is complete. Whether
[quoted text clipped - 19 lines]
> should be possible to replace it with declarative code. It's difficult
> to talk about without an example though.

Well here is an example that goes to the extreme:
http://www.mountainman.com.au/software/southwind/

The entire (100% of) application software suite is in the
form of stored procedures.  No intelligence specific to the
organization is stored external to the RDBMS software
layer.

The Relational Model of the data needs expansion to
be able to address this configuration of organizational
intelligence.  It remains mute to this intelligence.

Pete Brown
Falls Creek
Oz
Paul - 03 Jun 2004 16:41 GMT
>> Not exactly, Godel's Incompleteness Theorem only applies to
>> theories or systems that are above a certain complexity. See here
[quoted text clipped - 9 lines]
> Here is a reference:
> http://www.cs.auckland.ac.nz/CDMTCS/chaitin/cmu.html

I'm fairly sure I'm right. The axioms of number theory or Peano
arithmetic are more complicated than those for the theories of real
numbers or Euclidean geometry. I think the important ones are probably
the "inductive" ones (every number has a successor, and different
numbers have different successors). Although it seems intuitively that
the theory of real numbers must somehow be a superset of the Peano
arithmetic, that's not the case. In the theory of real numbers, you
don't have the concept of a successor function, they're all on a
continuum and (maybe paradoxically) this makes it simpler.

Here is something about Tarski's completeness proof for Euclidean
geometry: http://www.math.psu.edu/simpson/papers/philmath/node15.html

>> Do you have a simple concrete example of what you mean by this?
>> What kind of stored procedures are you thinking of? Plain single
[quoted text clipped - 6 lines]
> Well here is an example that goes to the extreme:
> http://www.mountainman.com.au/software/southwind/

I still don't quite see what the killer point is here. Is it that your
application just has a single form to display all data? And the form
knows what views to use by consulting a table? How is this different to
the "Switchboard Manager" functionality that comes with Access for example?

> The entire (100% of) application software suite is in the form of
> stored procedures.  No intelligence specific to the organization is
> stored external to the RDBMS software layer.

Isn't this the same as what is done by any of the front-end GUIs you get
with most DBMSs? For example SQL Server's Enterprise Manager? It's
totally general, all the information it needs is in relations in the
database. In theory you could just give every database user a copy of
Enterprise Manager and it would suffice for any database, for any
purpose. It just wouldn't be very user-friendly.

I don't think that applications store any essential business knowledge,
they are just there to make things easier for the user. If you like they
are non-essential business knowledge. For example "Users of the
accounting form 53(a) don't need to see the 'foobar' column". I'm
assuming here that the foobar column isn't actually restricted for
security purposes; it's just not useful for some particular task.

Paul.
Torkel Franzen - 04 Jun 2004 09:38 GMT
> I think you should double-check the above.  Godel's incompleteness
> theorem was a statement in elementary number theory (arithmetic).

 Statements of arithmetic cannot be expressed in the language of the
theory of the real field. This is because although the natural numbers
are a subset of the real numbers, you cannot define "(the real number)
x is a natural number" using only +,*,0,1,= and quantification over
the real numbers.
Paul - 02 Jun 2004 09:37 GMT
>>No, I think in this analogy Newton's model does correspond to a specific
>>database design. The possibility that the relational theory itself is
>>wrong corresponds to the possibility that algebra is wrong.
>
> Or incomplete, as has been formally demonstrated
> at least 30 years prior to the emergence of the RM.

Do you mean that algebra as in:
http://en.wikipedia.org/wiki/Universal_algebra
is incomplete?

I've never come across this, what does it mean, do you have a link?

A confusion here is that our analogy is comparing things from different
levels. Algebra is usually thought of as a model, so it comes above
logic in "reality". In our analogy, we are comparing it to logic, which
is fine as it's only an analogy. But I don't think the analogy extends
to having a completeness theorem for algebra.

Paul.
Tony - 02 Jun 2004 10:23 GMT
> > No, I think in this analogy Newton's model does correspond to a specific
> > database design. The possibility that the relational theory itself is
> > wrong corresponds to the possibility that algebra is wrong.
>
> Or incomplete, as has been formally demonstrated
> at least 30 years prior to the emergence of the RM.

"Wrong" and "incomplete" are not synonyms.  If algebra is correct but
incomplete, then it is safe to use it right?  There may be a question
it can't answer, but there are no questions for which it can give the
wrong answer.
Anthony W. Youngman - 04 Jun 2004 00:56 GMT
>>> To go back to your favourite analogy (apologies everyone), it is
>>> like saying that algebra was responsible for the shortcomings of
[quoted text clipped - 18 lines]
>database design. The possibility that the relational theory itself is
>wrong corresponds to the possibility that algebra is wrong.

Exactly.

And Newton's algebra is NOT wrong. It's just that the axioms (on which
he based his algebra) don't match reality. And that cannot be proved
from WITHIN the algebra.

So you cannot prove that relational theory is right or wrong from WITHIN
the theory.

>>> Einstein didn't invent a better algebra, he designed a better model
>>> using the SAME algebra - like a later designer designing a better
[quoted text clipped - 12 lines]
>(Newtonian mechanics, payroll database, relativistic mechanics). We've
>got two levels of theories here.

And we also have experiments to show that the axioms do (or don't)
accurately describe reality. Einstein showed that Newton's axioms didn't
describe reality, and replaced them by new axioms that did a better job.
He didn't alter Newton's algebra at all - indeed, he used exactly the
same algebra ...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Paul - 04 Jun 2004 18:27 GMT
>> No, I think in this analogy Newton's model does correspond to a
>> specific database design. The possibility that the relational
[quoted text clipped - 6 lines]
> which he based his algebra) don't match reality. And that cannot be
> proved from WITHIN the algebra.

We have to be careful here with ambiguities of language. When I said the
"possibility that algebra is wrong" I meant the theory of algebra
itself, not some model that could be set up using the language of algebra.

Newton's algebra was not wrong, but what we mean here is the model that
Newton made using algebra as the meta-language wasn't wrong. Newton
didn't invent *an* algebra, he invented a model using algebra.

It's very confusing when you've got algebra as a kind of meta-language
for Newton's theory. But then also you have logic as a kind of
meta-language for algebra itself. In common speech we use the word
"algebra" to mean both the theory of algebra itself, and pieces of code
written in algebra.

> So you cannot prove that relational theory is right or wrong from
> WITHIN the theory.

That depends what you mean by "right"(!). First-order logic is your
meta-language for talking about your database. So you can show that it
will be "complete" in the senses mentioned before, because you are kind
of jumping "outside" the theory. What you can't do is show that's it's
the best methodology for managing data.

> And we also have experiments to show that the axioms do (or don't)
> accurately describe reality. Einstein showed that Newton's axioms
> didn't describe reality, and replaced them by new axioms that did a
> better job. He didn't alter Newton's algebra at all - indeed, he used
> exactly the same algebra ...

OK, agreed.

Hmm, this stuff has the tendency to seem crystal-clear when you're
writing it, but then turns to gobbledygook when you re-read it...
:)

Paul.
Anthony W. Youngman - 28 May 2004 19:12 GMT
>> > So if you use Newtonian Mechanics to prove where Mercury was 400 years
>> > ago, your proof is more accurate than Tycho Brahe's observations - which
[quoted text clipped - 38 lines]
>to do that.  So, the proof that various aspects of relational theory have
>been good for use with DBMS's is not within mathematics.

Thanks, Dawn. It was Laconic, I think, who gave that wonderful quote
about "the axioms are self-evident, the logic is flawless, therefore the
experiments must be wrong" :-)

That's why you and I tear our hair out when people say relational is
best because it's based on mathematics! They're the people who assume
the axioms are self evident ...

And I think that's a major failing in current RDBMS thinking - nobody is
questioning the axioms. Unfortunately, to me, they are self-evidently
wrong...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Anthony W. Youngman - 28 May 2004 19:07 GMT
>> So by definition the theory is unscientific because you cannot show
>>that  the dbms proof is true (or false) in real life.
>
>Given that your axioms and your interpretation are correct, then I
>think you can show the DBMS proof is true in real life (for the reasons
>given above and in previous posts).

What do you mean by interpretation? Do you mean the philosophy of data
by which you convert your mathematical description to a real-world
description?

>I know that the language used by logicians can seem very inpenetrable
>but I think it does actually make sense; it's not just a conspiracy of
>people talking gibberish and pretending to understand each other.

But logic is a branch of mathematics. As such, it has nothing to do with
philosophy and the matching up of theory with reality. This is a matter
of science and experiment, not logic.

>I don't know how much you've read about logic but it is very
>mathematical and well worth the steep learning curve. Wikipedia is a
>good place to start. Be warned though: logicians to have a tendency to
>go insane in later life; it is a serious brainfuck if you think about
>it too much!

I can imagine :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Paul - 30 May 2004 18:35 GMT
>> Given that your axioms and your interpretation are correct, then I
>> think you can show the DBMS proof is true in real life (for the
[quoted text clipped - 3 lines]
> by which you convert your mathematical description to a real-world
> description?

Yes.

>> I know that the language used by logicians can seem very inpenetrable
>> but I think it does actually make sense; it's not just a conspiracy of
[quoted text clipped - 3 lines]
> philosophy and the matching up of theory with reality. This is a matter
> of science and experiment, not logic.

I agree.

I don't think what I'm saying is that controversial really - it's really
just what you would intuitively expect, expressed more rigorously in
mathematical jargon. Tony explains it very well in another branch of
this thread.

I'm not Pick-bashing either, what I'm saying is separate to relational
theory. It's just that because the relational model is based so closely
on first-order predicate logic, it's easier to see the connections.

I think what might be confusing is just terminology: the words "model"
and "theory" can mean slightly different things in different contexts.
From one point of view something could be regarded as a model, from
another it could be regarded as a theory. Neither is wrong, it's just
the context.

Paul.
Todd B - 25 May 2004 01:59 GMT
> Todd:
>
> Does this pass the "reasonableness" test?  The thought that: ...there are
> questions that can't be answered so they're meaningless and, thus, ignored
> (so the system is still complete) doesn't say much for consistency (i.e.
> anything that shows inconsistency is ignored so we still have consistency).

Yes, in a formal system, everything that fails to show consistency
becomes invalid (and is ultimately ignored or denied due to the fact
that it lacks basic logic - like: A proves B, and B proves C, but C
contradicts A - that would be a very simple example of an informal
system).  At least that's the way I understand it.

> With postulates like these, I'm depressed about getting A's in college logic
> and statistics classes, as they were obviously worthless.  :-)
>
> Bill

I wouldn't suggest that logic is 'worthless', or comment that any
questions that cannot be answered are 'meaningless'; just that logic
is not as complete as most logicians or mathematicians would like it
to be.

Now Paul has a point about first order logic - and its completeness -
that I would like to look into.  I'm still betting that my
interpretation of Godel's theorem is correct in the sense that 'Any
formal system that is consistent is definitely _not_ complete'.

Is any formal system useful?  That's a whole different argument to me,
because any formal or informal system can be put to use.  So, in a
way, there is still hope for us logical (and illogical) people :)

I guess I'm ranting about this topic and I'm sure everyone in this
group is hoping I'll shut up.  So, before they tell me that, and
before I step out of line (too late), since I don't have the
impressive logic and math background that some of you have, I sum up
Godel's Incompleteness theorem like so: "Within a formal system, there
are things that are true within that system that you cannot
prove/derive within that system".  Whew.

Todd
mountain man - 25 May 2004 11:42 GMT
Modern scholars of logic should be aware that whereas
the ancient western philosophers and logicians were not
aware of the limitations of the system of logic, their ancient
eastern counterparts (cf: Indian logic) in fact were aware
of its limitations.

And secondly, the following article might be of interest
to those who have been thinking about the implications
of the work of Godel, Turing and Chaitin in logic:

The article is a transcript of an address given by Chaitin:
http://www.cs.auckland.ac.nz/CDMTCS/chaitin/cmu.html

Best wishes,

Pete Brown
Falls Creek
Oz

> Todd:
>
[quoted text clipped - 18 lines]
> > > they are meaningful in a "real-world" sense, because we are thinking in
> > > a larger system which includes second-order logic.
Anthony W. Youngman - 25 May 2004 18:04 GMT
>> I suppose at least we would know that in theory, every query that it is
>> possible to formulate in some given relational query language can be
[quoted text clipped - 7 lines]
>requirement.  Is it 'complete', though?  I don't think so, but please
>prove me wrong or point me to some articles that do.

But that definition of "complete" only works if we can prove,
scientifically, that the system accurately describes the reality.

If we cannot show that "the system" and "the reality" match up with each
other (which we can't if we don't have a philosophical definition of
"data" in "the reality") then it's impossible for "the system" to be
complete ...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

mountain man - 22 May 2004 00:06 GMT
...[trim]...

> > As I have
> > outlined, I have constructed an arrangment whereby all of E3
[quoted text clipped - 6 lines]
> arbitrary languages, and executed anywhere. It seems you see their value in
> their genericity, rather than in where they happen to execute.

I see value in avoiding redundancies.  All application code that relates
to database I/O that is external to the RDBMS environment requires
redundancy of definition of the database schema.  This is so because
the entire system spans two software environments (E2 and E3).

Current technology looks at this as the status quo.  The world
is used to defining things in a union and conjunction of two
separate software systems.  However it is very inefficient.

When things can be defined using one software layer alone,
the redefinitions referred to above no longer exist.

...[trim]...

Pete Brown
Falls Creek
Oz
Eric Kaun - 24 May 2004 14:22 GMT
> I see value in avoiding redundancies.  All application code that relates
> to database I/O that is external to the RDBMS environment requires
[quoted text clipped - 7 lines]
> When things can be defined using one software layer alone,
> the redefinitions referred to above no longer exist.

I agree, though using some fairly simple techniques definitions can exist in
one place, and be propagated to others. If an "object" could be defined in
E2, for example, and automatically generate the appropriate changes on E3,
how would that fit?

I'm also curious why you suggest moving objects from E3 to E2 for reasons of
efficiency and duplicate elimination, but don't include E1 in the mix.
Certainly services in E1 are relevant to applications?

- Eric
mountain man - 25 May 2004 11:42 GMT
> > I see value in avoiding redundancies.  All application code that relates
> > to database I/O that is external to the RDBMS environment requires
[quoted text clipped - 12 lines]
> E2, for example, and automatically generate the appropriate changes on E3,
> how would that fit?

It would still imply replication of existent data, or if you prefer,
redundant I/O in that updates would need to be pushed up from
the database (E2) into E3.

However I reckon this would be a better "practice" to engineer
as its operation relies on maintainance of the database, and
avoids duplicate maintainance at the code level.  It is a step
closer to optimum running, for sure.

> I'm also curious why you suggest moving objects from E3 to E2 for reasons of
> efficiency and duplicate elimination, but don't include E1 in the mix.
> Certainly services in E1 are relevant to applications?

First steps first.   ;-)

Services in E1 and wheel-in wheel-out redundant services.
An organisation uses these much the same way as the next.
Their physical network may be different and their user base
however, IMO, the elements of code resident at E2 and E3
are the uniquely specifying elements for that "organization's
intelligence".

The first step in moving objects from E3 to E2 will obviate
the (database application's) client environment of code.

This is the big mixmaster in complexity with the reality of
the management of RDBMS production sites, the most
expensive, the least responsive, the change-management
headache, the cause of the bulk of many problems.

Binding the results into E1 should fall out of the consolidation
effort (E3 to E2) and represents the logical next step in the
theory of database systems, imo.

Pete Brown
Falls Creek
Oz
Bill H - 15 May 2004 18:23 GMT
Wol:

"Anthony W. Youngman" <wol@thewolery.demon.co.uk> wrote in message

> Okay. So what is "data". Because if we can't anchor that in the real
> world, we have no way of knowing if, or how strongly, relational theory
> is relevant (and usable) in the real world.

Anything that can be reduced to an electrical impulse?    :-)

Bill
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2010 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.