Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / November 2008

Tip: Looking for answers? Try searching our database.

Modeling question...

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Volker Hetzer - 08 Jul 2008 17:25 GMT
Hi!
Not sure if this is the right group but I've come across a problem I'm at a
loss to model properly. Here's the setup:
A model contains three entities ("Level"s) describing projects:
- Project Family records, each referencing several
- Project records (with different attributes), in turn referencing
- Sub projects, with different attributes again.
Those three Project levels are connected by straight forward 1 to n relationships.
But the problem is that they all have a bunch of key/value pairs.
So, a project family can have a key/value-pair StartDate=20080615.
But each project and subproject can have a different StartDate.
On the other hand, sub projects, projects and family don't need to have
the same key/value pairs.

Now, the simple solution is to have three key_value tables and be done with
it. However what I'd very much prefer is just one key_value table with some
kind of "level" attribute, 0 being family, 1 being project and 2 being sub
project. The primary keys on the level entities are all numeric so
this would perhaps work but I have no idea how to get the foreign key
constraints done because the key_value table would have three parents.

How does one model relationships in which
- one child table has several parent tables
- each parent record can refer to several child records
- each child record belongs to exactly one parent record in exactly one of
  the several parent tables?
Is there a declarative way to enforce consistency?

Lots of Greetings!
Volker
Signature

For email replies, please substitute the obvious.

Bob Badour - 08 Jul 2008 19:07 GMT
> Hi!
> Not sure if this is the right group but I've come across a problem I'm
[quoted text clipped - 27 lines]
> Lots of Greetings!
> Volker

Ooooh! Reinventing EAV with levels...
Volker Hetzer - 08 Jul 2008 20:36 GMT
Bob Badour schrieb:

>> Hi!
>> Not sure if this is the right group but I've come across a problem I'm
[quoted text clipped - 32 lines]
>
> Ooooh! Reinventing EAV with levels...
Possibly. I had a look at
http://ycmi.med.yale.edu/nadkarni/eav_CR_contents.htm and didn't find anything
exciting.
All my attributes (key value pairs) are (for the purpose of this discussion)
strings, so the Data tables hierarchy ends with EAV_Objects in the first image
of that link.
My problem is that, that I haveTest  three different "Objects_1" tables and
I'd like to avoid having to replicate the EAV_Objects-Table for each
"Objects_1"-Table.
OTOH, I could have the "level" entities all be children of an id table and
put the key value pairs into a child of that table. I need to try this out.

Thanks for providing the pointer!
Volker
Signature

For email replies, please substitute the obvious.

Bob Badour - 09 Jul 2008 13:14 GMT
> Bob Badour schrieb:
>
[quoted text clipped - 50 lines]
> Thanks for providing the pointer!
> Volker

Just to be clear, I was more than offering a pointer. I was also
ridiculing the idea of EAV.
Volker Hetzer - 25 Jul 2008 15:05 GMT
Bob Badour schrieb:
>>> Ooooh! Reinventing EAV with levels...
>>
[quoted text clipped - 17 lines]
> Just to be clear, I was more than offering a pointer. I was also
> ridiculing the idea of EAV.
I got that. :-)
But "we want to be able to create and delete attributes" is a customer
requirement. I think it's different from "I am too lazy to do a proper data
model". There are plenty of "normal" attributes left to model ERD like.

Lots of Greetings!
Volker
Signature

For email replies, please substitute the obvious.

JOG - 25 Jul 2008 15:19 GMT
> Bob Badour schrieb:
>
[quoted text clipped - 29 lines]
> --
> For email replies, please substitute the obvious.

What's wrong with drop/add column?
Volker Hetzer - 25 Jul 2008 15:33 GMT
JOG schrieb:
>> Bob Badour schrieb:
>>
[quoted text clipped - 27 lines]
>
> What's wrong with drop/add column?
All the things that are wrong if an application requires DDL during its normal
state. No undo, no scalability, limits on the number of attributes, limits on
the structure of the attribute names, the same attributes in each
project/pcb/etc. and so on.
Sorry, but in my opinion DDL is for installation and maintenance. End users
shouldn't trigger DDL neither directly nor indirectly.

Lots of Greetings!
Volker
Signature

For email replies, please substitute the obvious.

JOG - 25 Jul 2008 16:45 GMT
> JOG schrieb:
>
[quoted text clipped - 34 lines]
> the structure of the attribute names, the same attributes in each
> project/pcb/etc. and so on.

If one is changing trying to change the attributes that entities
possess, than one is necessarily altering the propositions that can be
stated about them. This necessitates a change in relation predicates,
which means it is absolutely a DDL issue. To think otherwise seems to
somewhat miss the point of the relational model.

> Sorry, but in my opinion DDL is for installation and maintenance. End users
> shouldn't trigger DDL neither directly nor indirectly.

No need to be sorry. If you want to make the same EAV mistakes that
countless have before you then that's up to you. All best, J.

> Lots of Greetings!
> Volker
> --
> For email replies, please substitute the obvious.
Volker Hetzer - 11 Sep 2008 18:29 GMT
JOG schrieb:
>> JOG schrieb:
>>
[quoted text clipped - 36 lines]
> which means it is absolutely a DDL issue. To think otherwise seems to
> somewhat miss the point of the relational model.
Bit late but I was on holiday...
I am not trying to change the attributes that an entity possesses but
I am allowing each business object or user object or whatever you may
call it to contain an attribute collection. There are no changes in
relation predicates since any attribute name is just contents, like
its value.
I think we are talking at cross purposes.

I'm curious, what do you people do when a customer comes and says,
"I want to store <thing> and I want to add, change and remove key
value pairs and I want to name them freely.".
Are you telling them that a database can't do it? That whatever the
reason, it's stupid? That no more than 255-minus-housekeeping
attributes are allowed because, say, oracle can't do more columns?
That no attributes can contain more than 22 or whatever characters?
What do you say when they ask the some intern and he comes up
with a <thing_id>,<attribute_name>,<attribute_value> table attached
to the <thing> table by a one-to-many relation and tell you this
is what they want?

Lots of Greetings!
Volker

Signature

For email replies, please substitute the obvious.

Bob Badour - 11 Sep 2008 21:23 GMT
> JOG schrieb:
>
[quoted text clipped - 61 lines]
> "I want to store <thing> and I want to add, change and remove key
> value pairs and I want to name them freely.".

In... um... 27 years of developing software, not once has a customer
ever come to me and said anything even remotely similar.
Alvin Ryder - 12 Sep 2008 03:11 GMT
> JOG schrieb:
>
[quoted text clipped - 66 lines]
>
> - Show quoted text -

Volker,

If you just want something like forgotton password question/answer
data:-
User_Name='JOE', Question = 'Fav color', Answer = 'pink',
OtherStuff='blah'...
User_Name='JIM', Question = 'Fav color', Answer = 'purple',
OtherStuff='blah'...
User_Name='BILL', Question = 'How much money do I have', Answer = 'way
too much'

then that's OK, the entire table is not structured around name/value
pairs. Users are allowed to have different question/answer pairs.

BUT if you want name/value pairs for everything then you're asking for
something very different, you're asking for trouble. It doesn't only
go against fundamental principles of the relational model, you might
say "so what that's so airy fairy" but many database vendors have gone
out of their way to implement according to that theory so the
consequences will become very practical very quickly:-
-You can forget about decent performance for one.
-You can forget about maintainable queries as well.
-The application code will also suffer.
It's all a lot more work.

You should try writing some queries using proper relations and then
try the same with name/value pairs. Especially try joins involving
name/value pairs. Yuck. I don't know this guy but at a glance those
queries are started to look part, but I've seen much worse (http://
tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-
mistakes.html)

Now the multi-level part. Normally self-joins can be used to achieve a
tree like structure but you need to be careful because many databases
can choke if you don't implement them the way they like. Now you
wanted name/value pairs with levels - ouch!

You should serious follow the advice already given to you by the
others, devise a proper relational model and avoid EAVs
Volker Hetzer - 20 Oct 2008 18:05 GMT
Alvin Ryder schrieb:
> Volker,
>
[quoted text clipped - 9 lines]
> then that's OK, the entire table is not structured around name/value
> pairs. Users are allowed to have different question/answer pairs.
Ok, I see the misunderstanding here.
Yes, you are right, ours aren't attributes that lead to
foreign keys or even constraints, it really is arbitrary data that only
gets displayed. I like your example.
Here's ours:
Team one thinks it wants to remember some implementation detail of
the firewire port of a board. So they decide they want an attribute
FIREWIRE_TYPE and they want this to be a number. That does not
mean I (the database) am to do calculations with this number, just that
they want some simple type checking in the input mask. This also does not
imply that other pieces of data depend on that attribute being there
or having a certain value.
Team two doesn't do pc boards today but some industrial stuff so
they decide they want an attribute CANBUS_AVAILABLE for their project
and want this to be boolean.
What none of the guys want is to sit on a huge heap of predefined
attributes, consisting of each and every attribute ever entered
or required by them.
So, each board project starts out afresh, with attributes added as the
team members see fit. Each new project has some new attributes because
technology advances or people think that different bits are important.
(It's a bit less arbitrary because for each board family
the teams meet up and decide upon the attributes for that family.
This happens several times a year and of course, one attribute is always
overlooked in the first spec.)

OTOH: Of course, each project has a name, belongs to a family, has
assembly variants and stuff and all /these/ things are modeled properly
(that is, not as EAVs) because they form relations and are the skeleton
of the whole application.
The forgotten-password-type-attributes are just things dangling off
some of the real entities.

> BUT if you want name/value pairs for everything then you're asking for
> something very different, you're asking for trouble.
I fully agree. This is also the deal I've insisted on with the customers.
As soon as I have to do something with the data (apart from storing and
retrieving) it's an entirely different quality and needs time, causes
implementation costs, beta tests and all the things associated with an
application change.
They have assured me that this definitely will not happen, so I think it
will happen less frequently than once or twice per year, with decreasing
frequency as the application matures. That means we can incorporate this
into a normal software development cycle.

> You should try writing some queries using proper relations and then
> try the same with name/value pairs.
No, thanks. I know what you're driving at and I see the same problem.
If customers need to store a bunch of named sticky notes, all right,
but that means named sticky notes is all they get.

> Now the multi-level part. Normally self-joins can be used to achieve a
> tree like structure but you need to be careful because many databases
> can choke if you don't implement them the way they like. Now you
> wanted name/value pairs with levels - ouch!
No, the self joins work with "real" data, that is, properly modeled
stuff. :-)

> You should serious follow the advice already given to you by the
> others, devise a proper relational model and avoid EAVs
Believe me, I do. It's just that sometimes it's kind of hard to explain
the problem to people from a very different background.
Thanks again for the lost password metaphor. It really hit the nail
on the head.

Lots of Greetings!
Volker
Signature

For email replies, please substitute the obvious.

JOG - 21 Oct 2008 22:31 GMT
> JOG schrieb:
> [JOG hat gesnipped]
[quoted text clipped - 16 lines]
> value pairs and I want to name them freely.".
> Are you telling them that a database can't do it?

I'm telling them, and you, that the relational model can't do it
because it was designed to handle "formatted" propositions (sets of
data with a high level of common predication). It is important to
recognize that the EAV approach you are looking at just happens to use
the RM as its physical layer, and that's it. It does not use the RM as
a logical model, and you therefore lose all of its algebraic power.
(Sure you keep the management system's transactional capabilities, but
thats nothing to do with the RM).

In fact, having abandonded it, you might as well use XML, OO or RDF
databases and cut out the middle man. However, better imo to convince
the client that designing a robust a priori conceptual model is worth
doing, and that you can come and update it at appropriate intervals (I
say this because currently the RM is the most solid framework we
have).

I do have sympathy, because the issue of handling semistructured and
dynamic schema is simply an unsolved problem (as is how to handle
missing data). Proposed "solutions" are all woeful (in fact completely
retrograde, whisking us back to 1960's tech). As such, anything you
try and implement for your client will inevitably be an ad-hoc hack /
in some way or other/. We're still in the stone age of informatics i'm
afraid. Regards, Jim.

> That whatever the
> reason, it's stupid? That no more than 255-minus-housekeeping
[quoted text clipped - 10 lines]
> --
> For email replies, please substitute the obvious.
paul c - 21 Oct 2008 23:06 GMT
>> JOG schrieb:
>> [JOG hat gesnipped]
[quoted text clipped - 40 lines]
> afraid. Regards, Jim.
> ...

I remember more than thirty years ago when companies sent some of their
mid-level non-DP types to report writer school, sponsored by various
outfits who're gone now or absorbed by bigger ones.  A few of those
people got good enough to submit their own queries in the batch mode of
 the day and only resorted to the DP department (wasn't called IT then)
when it was over their head.  They were hip enough to know when they
were over their heads and didn't waste time like some of us (including
me).  Of course the majority were useless at this, as I think they were
in their official function (this being typical of most commercial
pursuits, if you ask me).

Makes me wonder why commercial enterprises that are laying off  IT
people don't try some minimal training for their middle muddler people
in relational algebra.  If they did, I guess they'd have to put in some
fairly strict rules for their dba's.

Is it naive to think that some end-users could add attributes (ignoring
the problem that this would make them non-end-users)?
Walter Mitty - 22 Oct 2008 15:22 GMT
> I'm telling them, and you, that the relational model can't do it
> because it was designed to handle "formatted" propositions (sets of
[quoted text clipped - 19 lines]
> in some way or other/. We're still in the stone age of informatics i'm
> afraid. Regards, Jim.

The above states the case better than I can.  I'm going to throw in my two
cents, in addition to agreeing with JOG's comment.

The big problem with using a semistructured approach to data, such as the
EAV, is setting user expectations.  If users were able to understand and
appreciate that there is not necessarily any way to integrate the data
between users after users have each used EAV to, in effect, design their own
idiosyncratic database,  that would be one thing.  But my experience is that
either users, or at least upper management,  always fall back on the notion
that databases are for sharing data, and therefore when they ask for outputs
that require integration,  the hard work has already been done when the
database was built.

In one sense, it's hard to argue with management.  Databases are for sharing
data.  That's what they were invented for, and that's what they are good at.
So the expectation lives on that getting an output from a database that
requires integrated massaging of the data is a simple request.  Just take
the required information,  map it to the way the database represents the
data,  crank up a report writer,  and presto!

The problem here is in the phrase "the way the database represents the
data."  With approaches like EAV,  there is no ONE way the database
represents data.  Each user's data is represented the way that seems good to
that user.  Getting the users and management to understand that fact and set
their expectations accordingly, is very very difficult.  The easiest way to
do this is to bypass a DBMS completely,  and just store the semistructured
and unintegrated data in a text file.  Then at least you don't get the
illusion that,  because we manage the database with a DBMS, we must
therefore have stored the data according to some coherent general plan.
Roy Hann - 22 Oct 2008 17:33 GMT
>> I'm telling them, and you, that the relational model can't do it
>> because it was designed to handle "formatted" propositions (sets of
[quoted text clipped - 49 lines]
> illusion that,  because we manage the database with a DBMS, we must
> therefore have stored the data according to some coherent general plan.

I don't want to disagree too violently with Walter's re-telling of JOG's
very sound position, but I get the sense that even Walter doesn't fully
appreciate the idiocy of EAV.

The point to get across to management is that they don't need EAV
because even an SQL DBMS already does everything EAV does, and more.  If
management want users to be able to dream up and implement their own
fact types, each user can just go ahead and create suitable tables in
the usual way.

Now, how do you get all the users to share an understanding of all these
tables so they can usefully collaborate  (share data)?  Good question.
The same way they imagine EAV would do it, I guess.  (Only it will be
easier to implement because all you need is dynamic SQL.)

And before I leave this alone, there is no such thing a
"semi-structured" data.  That term makes as much sense as
semi-understood knowledge.  The concept that people using that term
might be struggling to convey is "semi-shared business model", or to put
it another way, "(only) some of us know what (only) some of this
means".  My attitude to that is fine, just don't expect me to know
what any of it means.

Signature

Roy

paul c - 22 Oct 2008 18:52 GMT
...
> And before I leave this alone, there is no such thing a
> "semi-structured" data.  That term makes as much sense as
[quoted text clipped - 3 lines]
> means".  My attitude to that is fine, just don't expect me to know
> what any of it means.

Heh, in other words, semi-understood data?

Ironic how  "not-invented-here" so often actually means "invented here".

(Letting "semi-understood data" proliferate might be chaotic.  Maybe in
such a regime, to echo Walter M, it would be prudent to ensure that it
be kept "semi-shared".  Eg., amongst the EAV protagonists and their
cronies.  It usually seems to me that when these EAV proposals come up,
the question is not that the organization needs new organization-wide
"entities", for want of a better word, but additional attributes for
existing relations.  So, I'd think it might be okay from an integrity
viewpoint to let them define their own tables which are partly based on
organization-wide tables.  At least everybody could stick with the usual
relational ops.  Not sure if I've ever seen this tried, though.)
paul c - 22 Oct 2008 18:58 GMT
> ...  Not sure if I've ever seen this tried, though.)

I meant wrt db's.  In other areas, such as Linux dev't, it seems that
various dynamic features can find their way into the kernel after some
gestation time.

(I think Roy H might have hit it on the head by emphasizing "semi-shared").
Walter Mitty - 24 Oct 2008 08:57 GMT
> ...
>> And before I leave this alone, there is no such thing a
[quoted text clipped - 19 lines]
> tables.  At least everybody could stick with the usual relational ops.
> Not sure if I've ever seen this tried, though.)

Perhaps it would be sufficient to categorize the outputs of the database as
"semi-correct".
Let the users think about that one for a while!
paul c - 24 Oct 2008 14:40 GMT
>> ...
>>> And before I leave this alone, there is no such thing a
[quoted text clipped - 23 lines]
> "semi-correct".
> Let the users think about that one for a while!

Maybe the key problem to be avoided is inadvertent semi-redundancy which
could pollute the db as a whole.  If there's a way to avoid that I'd
think there is no basic problem even though various pet niceties such as
performance targets might suffer.
paul c - 24 Oct 2008 14:50 GMT
...
> Perhaps it would be sufficient to categorize the outputs of the database as
> "semi-correct".
> Let the users think about that one for a while!

Maybe we should just cut to the chase and call it semi-data.
JOG - 23 Oct 2008 13:13 GMT
> >> I'm telling them, and you, that the relational model can't do it
> >> because it was designed to handle "formatted" propositions (sets of
[quoted text clipped - 67 lines]
> And before I leave this alone, there is no such thing a
> "semi-structured" data.

Despite a growing literature, current definitions of "semi-structure"
are woefully inadequate. The standard denotation is of data that "does
not fit into the relational model".

Yes, quite.

> That term makes as much sense as
> semi-understood knowledge.  The concept that people using that term
[quoted text clipped - 5 lines]
> --
> Roy
Roy Hann - 23 Oct 2008 14:01 GMT
> Despite a growing literature, current definitions of "semi-structure"
> are woefully inadequate.

A million people can (and evidently will) talk bollocks, but it's still
bollocks.

> The standard denotation is of data that "does
> not fit into the relational model".

That definition is entirely bogus.  The relational model just applies
set theory to first order predicate logic.  If you have "data" that
doesn't fit into both of these then you better start hiring mystics to
look after it for you.

But of course what someone who says that really means is, "data that we
can't be bothered to fit into the relational model because the
programming tools we use to write the applications are so crap there
is no point."

Signature

Roy

JOG - 23 Oct 2008 14:05 GMT
> > Despite a growing literature, current definitions of "semi-structure"
> > are woefully inadequate.
[quoted text clipped - 9 lines]
> doesn't fit into both of these then you better start hiring mystics to
> look after it for you.

Indeed. And yet hundreds of peer-reviewed papers have been published
on the topic. I find this incredibly depressing.

> But of course what someone who says that really means is, "data that we
> can't be bothered to fit into the relational model because the
[quoted text clipped - 3 lines]
> --
> Roy
paul c - 23 Oct 2008 14:25 GMT
>>> Despite a growing literature, current definitions of "semi-structure"
>>> are woefully inadequate.
[quoted text clipped - 15 lines]
>> programming tools we use to write the applications are so crap there
>> is no point."

The act of deciding/agreeing upon relations exposes enough structure for
the RM to be applied.  It's the sine qua non.  Dr. Strangelove might
have said after reading a paper that suggests it unnecessary "but that's
the whole point!".
Roy Hann - 23 Oct 2008 15:10 GMT
>> > Despite a growing literature, current definitions of "semi-structure"
>> > are woefully inadequate.
[quoted text clipped - 12 lines]
> Indeed. And yet hundreds of peer-reviewed papers have been published
> on the topic. I find this incredibly depressing.

Cheer up!  :-)  It's worse than you think: clinical outcomes research
(with which I was periperally involved for the first 10 years of my
career) is *at least* as bad.

"On the word of no one."  (Paradoxically, those are good words to live
by. :-)

Signature

Roy

David BL - 23 Oct 2008 16:15 GMT
> > > Despite a growing literature, current definitions of "semi-structure"
> > > are woefully inadequate.
[quoted text clipped - 12 lines]
> Indeed. And yet hundreds of peer-reviewed papers have been published
> on the topic. I find this incredibly depressing.

Ok, I’ll bite…

No doubt any data can be made to “fit” into the relational model.
The more important question is whether it happens /naturally/.  The
relational model works really well when there is a UoD on which many
propositions can be made without needing to introduce lots of abstract
identifiers.  That’s very common, but it’s not always the case.  It
seems to me the question of whether the RM is generally appropriate
for heavily nested composite values is unresolved.   Much of the
world’s data is in this latter form.  Eg abstract syntax trees, rich
text documents, scene graphs.

If the relational model is universally applicable, why don’t
programmers enter their programs as relations?   Do you really think
it’s only because of the tools currently available?

What about automated proof systems?  Is the knowledge base and data
associated with an ongoing proof best represented using a set of
relations?  I find that quite unlikely. The RA seems to have more to
do with set based calculations on known sets of values, rather than
symbolic manipulation.   Symbolic manipulation involves a lot of
recursion and the RA on its own is too weak, which suggests it will
take a backseat role.  Eg to compute the most general unifier of two
given expressions involves recursion in the nested expressions.

I also find it rather telling that relational queries (ie RA
expressions) are not themselves represented using relations.  Surely
if that were useful, many cdt folks would jump at the opportunity to
further promote the use of relations.
paul c - 23 Oct 2008 17:09 GMT
...
> If the relational model is universally applicable, why don’t
> programmers enter their programs as relations?   ...

That is one of the most fundamental questions.  At the risk of sounding
like I'm sloughing it off, I'd say the answer has to do with tedium and
lies somewhere near the fact that we have shortcuts and shorthands to
give equivalent results and our human situation, limitations in
momentary perception such as our shallow mental stack.
toby - 23 Oct 2008 20:55 GMT
> ...
>
[quoted text clipped - 6 lines]
> give equivalent results and our human situation, limitations in
> momentary perception such as our shallow mental stack.

Many programs, or parts thereof, reduce to SQL expressions over data.
Programmers who don't understand RM well tend to under-use it for
computation. A declarative expression can often elide a lot of
imperative tedium.
paul c - 24 Oct 2008 01:44 GMT
,,,
> Many programs, or parts thereof, reduce to SQL expressions over data.
> Programmers who don't understand RM well tend to under-use it for
> computation. A declarative expression can often elide a lot of
> imperative tedium.

Not that I would know personally but from I read, most programmers
(environment designers too) see SQL as a mere storage access method.

It seems the designers of the mainstream programming
interfaces/environments where so many developers these days spend the
most time such as Javascript, PHP, et al have been just as unenlightened
that is to say ignorant.  While not exactly mainstream the RDF that JOG
decries might be the nadir on the scale.  I wonder what would result if
one of those designers were forbidden to implement arrays.

Nothing new I guess, it was years ago that I remember criticizing an
international multi-location transport system for its myriad (and
insufficient) message codes on the grounds that only two operators were
really needed, INSERT and DELETE.  It could have been seen as a
distributed system with disparate db's.  All each location really needed
to do was send portions of its own "redo" log (the portions being those
that applied to certain common tables) to the others and the latter
could decide what action, if any, was appropriate for their particular
db.  This would have had the bonus that much correction work would be
saved but the disadvantage of fewer jobs for the boys.  It was also
considered anathema to send a message that might be ignored.  Apparently
the Mott's Clamato Juice company tries to think big but I believe the
airline industry still thinks operators are people.  I know I'm drifting
but I can't help observe that the big industries that influence IT
trends are often highly regulated which situation may encourage
small-minded thinking as well as similar people.
David BL - 24 Oct 2008 03:29 GMT
> > ...
>
[quoted text clipped - 11 lines]
> computation. A declarative expression can often elide a lot of
> imperative tedium.

I agree with you but you seem to have missed the point.   For example,
can SQL expressions themselves be satisfactorily entered as relations?
David BL - 24 Oct 2008 03:17 GMT
> ...
>
[quoted text clipped - 6 lines]
> give equivalent results and our human situation, limitations in
> momentary perception such as our shallow mental stack.

I believe there is a simple answer:  The relational approach implies
the appearance of many abstract identifiers.

[Side note:  Marshall allows a relational approach to encompass
extensive use of RVAs - even to the point where at each level in the
hierarchy of a heavily nested composite value, a relation is only
being used to represent the children of a given node.  This avoids the
need to introduce lots of abstract identifiers, but I don't agree with
Marshall that such an approach should be called "relational".  Of
course if anyone really wants to call that relational then that's fine
by me - after all it's just a word.  My argument of course only
applies when heavily nested RVAs are not being used]

I think you may be discounting the importance of languages (or more
specifically grammars) or the concept of a well formed formula.  After
all the First Order Logic (FOL) is formalised with the concept of a
wff which is defined recursively.  It seems wrong to assume that data
management doesn't encompass recording wffs.  It seems wrong to assume
that wffs can be represented /naturally/ in the RM.  While it's true
that the RM/RA is closely associated with set theory and the FOL,
anyone using the full power of the FOL does a lot more than calculate
with known extensions of sets.
paul c - 24 Oct 2008 09:27 GMT
...
> I believe there is a simple answer:  The relational approach implies
> the appearance of many abstract identifiers.
[quoted text clipped - 16 lines]
> that wffs can be represented /naturally/ in the RM.  While it's true
> that the RM/RA is closely associated with set theory and the FOL,

I guess I was trying to answer the wrong question, should have said that
 I was thinking of relations in general, didn't mean to suggest that
Codd's RM is anything more than a narrow application of relations aimed
mechanical storage and manipulation of db's.
paul c - 23 Oct 2008 17:23 GMT
...
> I also find it rather telling that relational queries (ie RA
> expressions) are not themselves represented using relations.  

Just because some syntax doesn't look like a relation doesn't mean that
the result isn't defined by relations. Eg., the D&D relational operators
are all defined in terms of relations, then a bunch of shorthands are
given, so as to minimize tedium and clerical errors.  SQL doesn't do
this which may be why so many people who think it is obedient to the RM
have such weird ideas, such as thinking a relation can be updated.

Surely
> if that were useful, many cdt folks would jump at the opportunity to
> further promote the use of relations.
>  

A relation is a mathematical construction.  How well implementations
mimic it is very much in the mental "eye" of the beholder, in a way
implementation is a misleading word for what is really just an
mechanical aid for symbolic manipulation and storage of results.
David BL - 24 Oct 2008 04:10 GMT
> ...
>
[quoted text clipped - 17 lines]
> implementation is a misleading word for what is really just an
> mechanical aid for symbolic manipulation and storage of results.

Sorry I have no idea what you're getting at.

Let me be more specific:  when you see a wff in some formal language,
do you actually think of it as a relation?   How is that useful?  More
specifically, how is the RA useful?   Can you give me an example
together with the relation's degree and the names and types of its
attributes?
Walter Mitty - 24 Oct 2008 09:16 GMT
>Ok, I’ll bite…

>No doubt any data can be made to “fit” into the relational model.
>The more important question is whether it happens /naturally/.  The

I don't understand the word "naturally" in this context.  Isn't all modeling
artificial, rather than natural?
David BL - 24 Oct 2008 11:13 GMT
> >Ok, I’ll bite…
> >No doubt any data can be made to “fit” into the relational model.
> >The more important question is whether it happens /naturally/.  The
>
> I don't understand the word "naturally" in this context.  Isn't all modeling
> artificial, rather than natural?

I'm suggesting that in certain situations the RM is cumbersome, making
it inappropriate or inapplicable.   This is specifically with regard
to /recursive data types/.

For example, recursive data types are appropriate for representing
wffs in most formal languages.  They are also relevant in compound
documents.  Eg

struct Chapter
{
   String title;
   Vector<Paragraph> paragraphs;
   Vector<Chapter> subchapters;
};

There are two ways that the RM can be used to represent recursive data
types:

1.  Using recursive RVAs; or
2.  By introducing abstract identifiers for all the nodes, and
appropriate integrity constraints

I find the first quite reasonable, but I'm suspicious of actually
calling such an approach "relational".

In the second case lots of integrity constraints are needed because
the RM is too flexible! It needs to be heavily constrained to only
represent tree structures. The integrity constraints quickly get
horribly messy - particularly for a reasonably complex grammar, and I
believe it's possible to interpret it as a manually written
axiomatization of pointer semantics (the ability to "dereference" an
abstract identifier as though it points at one and only one child node
in the tree).  If you compare the RM to the grammar you will find the
former to be /much more complex/.

The following is the example I used when I talked about this 12 months
ago:

Using Prolog notation, consider the following relations which allow
for representing an expression such as (x+1)*3:

   var(N,S) :- node N is a variable named S
   number(N,I) :- node N is a number with value I
   add(N,N1,N2) :- node N is the addition of nodes N1,N2
   mult(N,N1,N2) :- node N is the product of nodes N1,N2

Define a view called nodes(N) which is a union of projections as
follows:

   nodes(N) :- var(N,_).
   nodes(N) :- number(N,_).
   nodes(N) :- add(N,_,_).
   nodes(N) :- mult(N,_,_).

The following are the integrity constraints (each query must be
empty):

   var(N,S1), var(N,S2), S1 <> S2?
   number(N,I1), number(N,I2), I1 <> I2?
   add(N,N1,_), add(N,N2,_), N1 <> N2?
   add(N,_,N1), add(N,_,N2), N1 <> N2?
   mult(N,N1,_), mult(N,N2,_), N1 <> N2?
   mult(N,_,N1), mult(N,_,N2), N1 <> N2?
   var(N,_),  number(N,_)?
   var(N,_),  add(N,_,_)?
   var(N,_),  mult(N,_,_)?
   number(N,_), add(N,_,_)?
   number(N,_), mult(N,_,_)?
   add(N,_,_), mult(N,_,_)?
   add(_,N,_), not nodes(N)?
   add(_,_,N), not nodes(N)?
   mult(_,N,_), not nodes(N)?
   mult(_,_,N), not nodes(N)?
David BL - 25 Oct 2008 07:24 GMT
> The following is the example I used when I talked about this 12 months
> ago:
[quoted text clipped - 34 lines]
>     mult(_,N,_), not nodes(N)?
>     mult(_,_,N), not nodes(N)?

It occurred to me that there are more integrity constraints
required.

Consider that we define

 % parent(P,C) :- node P is a parent of node C
 parent(P,C) :- add(P,C,_).
 parent(P,C) :- add(P,_,C).
 parent(P,C) :- mult(P,C,_).
 parent(P,C) :- mult(P,_,C).

and

 % ancestor(N1,N2) :- node N1 is an ancestor of N2.
 ancestor(N1,N2) :- parent(N1,N2).
 ancestor(N1,N2) :- parent(N1,N), ancestor(N,N2).

The following goal must return failure to express the integrity
constraint that there are no cycles:

 ancestor(N1,N2),ancestor(N2,N1)?

In addition we would like to express the constraint that there is no
garbage (ie unreachable nodes) with respect to a defined set of root
nodes.

 % root(N) :- node N is the root of an expression.

 % reachable(N) :- node N is reachable from a root
 reachable(N) :- root(N).
 reachable(C) :- parent(P,C), reachable(P).

 % integrity constraint - must be empty
 node(N), not reachable(N)?

As you can see, it's rather low level.  I don't think it's surprising
that the RM is capable of that (since it's so flexible).

A common theme on this ng is the idea of physical independence.  It
tends to be assumed that a program written in C and using pointers is
"low level" and "close to the physical hardware", whereas anything
using the RM is necessarily "high level" and "divorced from the
physical hardware".  This I agree is usually the case.

I think there is evidence here that an inappropriate use of the RM can
actually be low level in a similar way to how C is low level.  ie
algorithms can easily have bugs that resemble the creation of dangling
pointers or memory leaks!

Now consider again the following C++ struct

 using namespace std;

 struct Chapter
 {
     string title;
     vector<string> paragraphs;
     vector<Chapter> subchapters;
 };

This recursive type definition compiles without any problem and is
able to grow dynamically and enforces the integrity constraints.
Nevertheless there isn't a pointer in sight.  Behind the scenes there
is a physical implementation of the STL string and vector classes
using pointers, heap allocations and so on.

The striking thing to me is that we are able to work at a higher level
than the RM in this particular case.
JOG - 27 Oct 2008 02:16 GMT
> > "David BL" <davi...@iinet.net.au> wrote in message
>
[quoted text clipped - 25 lines]
> There are two ways that the RM can be used to represent recursive data
> types:

I think this is the wrong way of looking at it. The OO (for want of a
better word) and RM approaches are two different ways of modelling
statements of fact from the world. And yet you seem to be stating the
problem as how to try and model a struct/object in RM? (That would be
like complaining that after you've put all your milk into a fridge,
you're having trouble pouring the fridge onto your cereal!)

> 1.  Using recursive RVAs; or
> 2.  By introducing abstract identifiers for all the nodes, and
> appropriate integrity constraints
>
> I find the first quite reasonable, but I'm suspicious of actually
> calling such an approach "relational".

What do you do in real life to identify a chapter? You refer to it by
name or number - and the same for subchapters too right? And if two
subchapters have the same local identifier (e.g. 'introduction') well
you use a composite identifer such as "the 'introduction' of the third
chapter". And if you can refer to a chapter when communicating with
someone else, then you have necessarily stated something about it in
the form of a proposition - and if it can be stated as a
proposition...it can be encoded as a tuple in RM.

And as far as constraints are concerned, what more do you need apart
from a subchapter can only have one containing chapter? I don't see
the issue with this example. Regards, J.

> In the second case lots of integrity constraints are needed because
> the RM is too flexible! It needs to be heavily constrained to only
[quoted text clipped - 44 lines]
>     mult(_,N,_), not nodes(N)?
>     mult(_,_,N), not nodes(N)?
David BL - 28 Oct 2008 06:49 GMT
> > > "David BL" <davi...@iinet.net.au> wrote in message
>
[quoted text clipped - 32 lines]
> like complaining that after you've put all your milk into a fridge,
> you're having trouble pouring the fridge onto your cereal!)

I'm making the following claims:

1.  There are applications that require the management of data in the
form of heavily nested values (ie of recursive value types).

2.  There doesn't exist a satisfactory decomposition of values of a
recursive value type into the RM other than by the use of recursive
RVAs.

This has nothing specifically to do with structs, objects and OO.  In
fact if anything OO languages tend to be rather poor at supporting
user defined value types.

Lisp and Prolog both provide excellent means to represent and process
recursive value types, and arguably much better than in C/C++.

I would suggest that the constraints imposed by RM/RA are best
understood by an experienced Prolog programmer.  One of those
constraints (assuming no recursive RVAs) is tantamount to outlawing
nested terms.

> > 1.  Using recursive RVAs; or
> > 2.  By introducing abstract identifiers for all the nodes, and
[quoted text clipped - 11 lines]
> the form of a proposition - and if it can be stated as a
> proposition...it can be encoded as a tuple in RM.

I agree that statements about things are well suited to the RM.

> And as far as constraints are concerned, what more do you need apart
> from a subchapter can only have one containing chapter? I don't see
> the issue with this example. Regards, J.

The example of the chapter was to show that recursive data types are
quite common (and not just limited to wffs in formal languages).  This
suggests there are many applications for which these questions are
relevant.   However the example is too simple to reveal the problems.

If you have a fairly complex grammar (or definition of a wff in some
formal language), such as defined by the OpenDocument specification,
would you be happy to represent those wffs in the RM?

The OpenDocument V1.1 spec is 738 pages.   It is basically a heavily
commented XML schema.  Although I find XML hideous, I can understand
how the entries represent the elements of a recursively defined wff in
some formal language.   I would cringe at the idea of trying to map it
all to the RM.  The integrity constraints would be horribly complex.
JOG - 29 Oct 2008 00:26 GMT
> [snip]
> > > struct Chapter
[quoted text clipped - 19 lines]
> 1.  There are applications that require the management of data in the
> form of heavily nested values (ie of recursive value types).

Hi David. This is still putting the cart before the horse as far as
I'm concerned. Once you say there are "types", and they are
"recursive", you have already created a model in your head. Trying to
then squash that model into the RM is bound to cause problems. A book
does not contain recursive types as. Saying that they do to someone in
the street and they'll think you're loopy-loo right?

> 2.  There doesn't exist a satisfactory decomposition of values of a
> recursive value type into the RM other than by the use of recursive
> RVAs.

What was wrong with the method I suggested of just copying how you
describe things in real life? And then just representing that in
predicate logic? I don't need recursive types to model statements of
fact (although I would agree recursive queries/constraints are
extremely valuable).

> This has nothing specifically to do with structs, objects and OO.  In
> fact if anything OO languages tend to be rather poor at supporting
> user defined value types.
>
> Lisp and Prolog both provide excellent means to represent and process
> recursive value types, and arguably much better than in C/C++.

You'd like Haskell - have you tried it?

> I would suggest that the constraints imposed by RM/RA are best
> understood by an experienced Prolog programmer.  One of those
[quoted text clipped - 18 lines]
>
> I agree that statements about things are well suited to the RM.

Statements about things is what all databases are concerned with, not
just the RM. Anything else is out of its remit. Its function is to
model facts, not values such as equations. I certainly wouldn't use it
to model something like a car engine schematic either (but facts about
that engine... yes!).

> > And as far as constraints are concerned, what more do you need apart
> > from a subchapter can only have one containing chapter? I don't see
> > the issue with this example. Regards, J.
>
> The example of the chapter was to show that recursive data types are
> quite common (and not just limited to wffs in formal languages).  

Why would you want to record facts about wff's instead of, well, just
using them for things? I can't imagine the application at the moment
<scratching_head/>.

> This suggests there are many applications for which these questions are
> relevant.   However the example is too simple to reveal the problems.
>
> If you have a fairly complex grammar (or definition of a wff in some
> formal language), such as defined by the OpenDocument specification,
> would you be happy to represent those wffs in the RM?

Not a formula - it is not a datum (as meant in the term database),
just a value. However, a grammer such as a BNF, yeah I can. Very much
so in fact, because it consists of rules which are statements of fact.
I think we should give it a spin ;)

> The OpenDocument V1.1 spec is 738 pages.   It is basically a heavily
> commented XML schema.  Although I find XML hideous, I can understand
> how the entries represent the elements of a recursively defined wff in
> some formal language.   I would cringe at the idea of trying to map it
> all to the RM.  The integrity constraints would be horribly complex.
David BL - 29 Oct 2008 02:51 GMT
> > [snip]
> > > > struct Chapter
[quoted text clipped - 26 lines]
> does not contain recursive types as. Saying that they do to someone in
> the street and they'll think you're loopy-loo right?

I understand what you're saying.  However I personally find recursive
data types very intuitive and useful so if I consider /myself/ as a
user of a DBMS, a restriction to the RM feels like one hand is tied
behind my back.

I'm interested in the storage and management of compound documents,
scene graphs and so on.  I cannot imagine doing without recursive data
types.

> > 2.  There doesn't exist a satisfactory decomposition of values of a
> > recursive value type into the RM other than by the use of recursive
[quoted text clipped - 5 lines]
> fact (although I would agree recursive queries/constraints are
> extremely valuable).

I agree you don't need recursive types to model statements of fact.
However I don't consider data to necessarily be regarded as a bunch of
propositions.   We have had this argument before!

Remember when I said that I regard a recorded poem as just a value,
not a proposition?   A CD containing a single text file that is a poem
could I guess be construed as a single proposition as a claim of its
own existence!  But I regard such a claim as metaphysical and
therefore meaningless.  Alternatively the proposition could represent
the real claim that that /particular/ CD records a poem.  It would
seem silly to try to make that proposition explicit by recording
additional encoded values on the CD.  For a start that would make it
more difficult to copy the data to another media.  So if the
proposition is implicit - and you don't see it directly in the
recorded data then I suppose you could say that the "actual recorded
data" is an encoded string value whereas the "data" is a
proposition.   Do you think this distinction is useful?   I don't!

What's wrong with defining "data" to mean "encoded value(s)" rather
than "encoded fact(s)"?   Note that the former encompasses the latter
because a relation is a value.

> > This has nothing specifically to do with structs, objects and OO.  In
> > fact if anything OO languages tend to be rather poor at supporting
[quoted text clipped - 4 lines]
>
> You'd like Haskell - have you tried it?

Yes, but "tried it" is a fair description.

> > I would suggest that the constraints imposed by RM/RA are best
> > understood by an experienced Prolog programmer.  One of those
[quoted text clipped - 24 lines]
> to model something like a car engine schematic either (but facts about
> that engine... yes!).

I don't agree that the term "database" should imply it is only
concerned with storing facts.   A new term is required: "factbase"  :)

> > > And as far as constraints are concerned, what more do you need apart
> > > from a subchapter can only have one containing chapter? I don't see
[quoted text clipped - 6 lines]
> using them for things? I can't imagine the application at the moment
> <scratching_head/>.

I agree that you don't generally need to records facts about wffs.
Our point of disagreement seems to stem from the meaning and scope of
the word "data".

> > This suggests there are many applications for which these questions are
> > relevant.   However the example is too simple to reveal the problems.
[quoted text clipped - 7 lines]
> so in fact, because it consists of rules which are statements of fact.
> I think we should give it a spin ;)

Ok, I think we mostly agree with each other.
paul c - 24 Oct 2008 14:28 GMT
>> Ok, I’ll bite…
>
[quoted text clipped - 3 lines]
> I don't understand the word "naturally" in this context.  Isn't all modeling
> artificial, rather than natural?

I'm with you even though we think of the activities involved as being
natural to us.  The RM is an artifice, so are models in general.  So is
FOL (even with its trap lingo like "Exists").  I doubt if mathematics is
any more natural than a data model as it produces some conclusions that
nobody can actually visualize.  The consequences of relational closure
are one small example.  The reason I think this is important is that it
means there ought to be nothing to prevent us devising even more useful
artifices, even if most of us, including me, don't possess the insight
to do that.

Being part of nature, we are hardly in a position to duplicate it.  Our
only advantage is the artifice wherein we can drop the natural aspects
that are inconvenient or irrelevant, as we see it, to some purpose.
We've been practising this since the Stone Age.

It bugs me when people pretend that we have re-produced anything but our
own mental creations, I think that is the first step down the mystic
slope.  But reason and rationality too can get out of control, as modern
history shows.  Does that sound odd coming from an atheist?
David BL - 28 Oct 2008 02:49 GMT
> >> Ok, I’ll bite…
>
[quoted text clipped - 23 lines]
> slope.  But reason and rationality too can get out of control, as modern
> history shows.  Does that sound odd coming from an atheist?

Would you say Max Tegmark is on the mystic slope?

http://arxiv.org/PS_cache/arxiv/pdf/0704/0704.0646v2.pdf
paul c - 28 Oct 2008 03:11 GMT
>>>> Ok, I’ll bite…
>>>> No doubt any data can be made to “fit” into the relational model.
[quoted text clipped - 24 lines]
>
> http://arxiv.org/PS_cache/arxiv/pdf/0704/0704.0646v2.pdf

No idea at the moment, I'm still trying to figure out what that C++
structure has to do with declarative programming!
paul c - 30 Oct 2008 00:36 GMT
...
>> It bugs me when people pretend that we have re-produced anything but our
>> own mental creations, I think that is the first step down the mystic
[quoted text clipped - 4 lines]
>
> http://arxiv.org/PS_cache/arxiv/pdf/0704/0704.0646v2.pdf

In a word, yes, but let me say that I don't mean yes in the same sense
that I've accused some posters here of being mystics and colloquial
English being what it is, what mystical means to some stranger or other
is up for grabs.  When I throw that word around, I'm talking about
people who are ignoring the purpose of common db apps and the nature and
 capability of common processors and memories.  I've met many so-called
professional computer science experts who are almost totally unaware of
what the typical digital computer is good at.

Also, relative to my elementary mathematical understanding he is indeed
talking of mystical things, but at the same time I think I detect that
he is aiming at a rather grand systematic structure, a superstructure if
you like.  Whether it is implementable is very much something else.
Personally, I'm not too bothered about such musings because my interest
in more in finding happy coincidences between applications and what
machine instruction sets and electronic physics can imitate expediently,
without impairing some interpretation that is ready and useful for
humans.  For one example - although he didn't emphasize the details, I
feel certain that Codd saw how adjacency in machine memory could be
exploited as a way to manifest a mathematical relation with low
overhead.  I have watched while various programming paradigms such as OO
or column-based dbms's have discounted machine characteristics and while
I wish those efforts no ill will, I do think they have taken on problems
that current machines are not much good at.  (When Codd talked about
"representation" I have no doubt he was talking about both the mind's
eye and machine efficiency.)  I would hope when anybody, me or somebody
else, throws the word "mystical" around, their own limitations are taken
as givens, even if they don't acknowledge them up front in a casual
forum such as this.
Bob Badour - 24 Oct 2008 15:54 GMT
>>Ok, I’ll bite…
>
[quoted text clipped - 3 lines]
> I don't understand the word "naturally" in this context.  Isn't all modeling
> artificial, rather than natural?

Yes! Formal systems and symbolic manipulation are symbolic and abstract
not natural.
David BL - 25 Oct 2008 03:18 GMT
> >>Ok, I’ll bite…
>
[quoted text clipped - 6 lines]
> Yes! Formal systems and symbolic manipulation are symbolic and abstract
> not natural.

The natural numbers should be called the artificial numbers :)

The word natural seems to have many meanings.  There are 38 listed at
Dictionary.com.
JOG - 29 Oct 2008 01:13 GMT
> > > > Despite a growing literature, current definitions of "semi-structure"
> > > > are woefully inadequate.
[quoted text clipped - 16 lines]
>
> No doubt any data can be made to “fit” into the relational model.

Let me state first that I don't believe that the relational model is
universally applicable (I'm not sure where you think I have stated
that). However, all data can be stated in predicate logic, and all
statements of logic can be modelled in the RM. Hence, i consider it
absolutely unarguable that there is any data which cannot be
structured as a schema of relations. This is my objection to the
semistructure literature.

> The more important question is whether it happens /naturally/.
> The relational model works really well when there is a UoD on which many
> propositions can be made without needing to introduce lots of abstract
> identifiers.

The RM handles facts as naturally as stating them in predicate logic.
And why would one ever model things other than facts in predicate
logic?
I think there is confusion (in general, not simply here!) about what a
database is intended to model. It models data as it has been stated in
the real world, not the things which that data refers to.

> That’s very common, but it’s not always the case. It
> seems to me the question of whether the RM is generally appropriate
[quoted text clipped - 5 lines]
> programmers enter their programs as relations?   Do you really think
> it’s only because of the tools currently available?

Nope. I think data that requires a high number of predicates compared
to the number of statements it models can be cumbersome to manipulate
in the RM. Equally I think it misses a trick when it comes to facts
that might be represented using logical quantification. However, I do
believe this situation can be improved (specifically via greater
flexibility in defining predicates and integration of existential
quantifiers) and its general declarative principles will be
increasingly incorporated into programming languages.

Regards, Jim.

> What about automated proof systems?  Is the knowledge base and data
> associated with an ongoing proof best represented using a set of
[quoted text clipped - 9 lines]
> if that were useful, many cdt folks would jump at the opportunity to
> further promote the use of relations.
David BL - 29 Oct 2008 03:37 GMT
> > > > > Despite a growing literature, current definitions of "semi-structure"
> > > > > are woefully inadequate.
[quoted text clipped - 20 lines]
> universally applicable (I'm not sure where you think I have stated
> that).

When I said "universally applicable" I meant (only) with respect to
the recording of data, where data means "encoded values".

> However, all data can be stated in predicate logic, and all
> statements of logic can be modelled in the RM. Hence, i consider it
> absolutely unarguable that there is any data which cannot be
> structured as a schema of relations. This is my objection to the
> semistructure literature.

When you say "data" do you always mean "encoded facts"?

> > The more important question is whether it happens /naturally/.
> > The relational model works really well when there is a UoD on which many
[quoted text clipped - 4 lines]
> And why would one ever model things other than facts in predicate
> logic?

Exactly!

> I think there is confusion (in general, not simply here!) about what a
> database is intended to model.

Agreed.

> It models data as it has been stated in
> the real world, not the things which that data refers to.

Yes that fits with your assumption that data = encoded facts.

However it doesn't make sense when you say data = encoded values.
Encoded values just "are".  They don't necessarily refer to anything
in the real world.

> > That’s very common, but it’s not always the case. It
> > seems to me the question of whether the RM is generally appropriate
[quoted text clipped - 14 lines]
> quantifiers) and its general declarative principles will be
> increasingly incorporated into programming languages.
JOG - 29 Oct 2008 12:39 GMT
> > > > > > Despite a growing literature, current definitions of "semi-structure"
> > > > > > are woefully inadequate.
[quoted text clipped - 42 lines]
>
> Exactly!

Then may I suggest that your argument is not with the RM, but with the
use of predicate logic to model equations, engines, etc. And yet this
to me seems trivially true - if I was modelling a human in an art
class I'd use clay, not predicate logic.

Of course the resulting piece of clay would be an "encoded value" and
thus,  by your definition, data.
And then a bag of such pieces of clay.. its a database!
And my mantelpiece at home, which I display them on....its a data
warehouse!
And of course, when I spring-clean I am become a DBMS!

;) J.

> > I think there is confusion (in general, not simply here!) about what a
> > database is intended to model.
[quoted text clipped - 28 lines]
> > quantifiers) and its general declarative principles will be
> > increasingly incorporated into programming languages.
David BL - 30 Oct 2008 02:51 GMT
> > > The RM handles facts as naturally as stating them in predicate logic.
> > > And why would one ever model things other than facts in predicate
[quoted text clipped - 6 lines]
> to me seems trivially true - if I was modelling a human in an art
> class I'd use clay, not predicate logic.

I don't think it's quite so trivial.   For example, consider tri-
surface as a value-type.  A simple type decomposition as a set of
triangles where each triangle is independently defined by 3 vertices
doesn't express the constraint that the triangles tend to meet each
other.   It seems appropriate to introduce abstract identifiers for
the vertices in order that they may be shared.   This is evidently a
relational solution.  However unlike typical uses of the RM there
doesn't appear to be some external UoD to which the tuples,
interpreted as propositions can be related.   Rather it seems that a
particular tri-surface /value/ has introduced a local and private
namespace in order to privately apply the RM.  Note as well that this
is not like an RVA (where we think of only a single relation as a
value) because a tri-surface value is associated with /two/ relations
- one for the vertices and another for the triangles.

I have wondered whether abstract identifiers are needed precisely when
it is useful to express the concept of "common sub-expressions" within
nested value-types.  Note that scene graphs are typically thought of
as DAGs not trees for precisely this reason.

I think there is an interesting interplay between 1) degrees of
freedom (or entropy or storage space if you like) in the encoding of a
value, 2) abstract identifiers, 2) integrity constraints and 4) update
anomalies.   The existing normalisation theory in the literature seems
relevant but doesn't seem to me to account for recursive type
definitions and abstract identifiers.  Given this interplay it would
be useful to better understand why one encoding would be more
desirable than another.  In fact I wonder whether there are some
objective criteria. Evidently it is not to always avoid abstract
identifiers (as if they are implicitly evil).   I would guess that as
far as the complexity of the integrity constraints there is some sweet
spot in the use of abstract identifiers.

> Of course the resulting piece of clay would be an "encoded value" and
> thus,  by your definition, data.
> And then a bag of such pieces of clay.. its a database!
> And my mantelpiece at home, which I display them on....its a data
> warehouse!
> And of course, when I spring-clean I am become a DBMS!
JOG - 30 Oct 2008 11:41 GMT
> > > > The RM handles facts as naturally as stating them in predicate logic.
> > > > And why would one ever model things other than facts in predicate
[quoted text clipped - 16 lines]
> doesn't appear to be some external UoD to which the tuples,
> interpreted as propositions can be related.

I use Oracle Spatial to do exactly this sort of thing day in day out
in a geospatial domain, and no abstract identifers are introduced. The
coordinates of any vertex are used. That is what identifies them -
that is what is used (note that these coordinates can happily be
relative). Constraints to maintain adjacency use the spatial operators
offered by SDO_RELATE. It is very good.

I karate chop your example to pieces! Haiii-ya.

> Rather it seems that a particular tri-surface /value/ has introduced a local and private
> namespace in order to privately apply the RM.  Note as well that this
[quoted text clipped - 13 lines]
> relevant but doesn't seem to me to account for recursive type
> definitions and abstract identifiers.

I am yet to be convinced of the need for abstract identifers (or
invention of recursive types) from the examples offered so far.. the
wff is the most interesting, but I am currently questioning the sense
or utility of decomposing an equation in such a manner /at the logical
level/ (as opposed to the physical). Regards, J.

> Given this interplay it would
> be useful to better understand why one encoding would be more
[quoted text clipped - 10 lines]
> > warehouse!
> > And of course, when I spring-clean I am become a DBMS!
David BL - 30 Oct 2008 15:03 GMT
> > > > > The RM handles facts as naturally as stating them in predicate logic.
> > > > > And why would one ever model things other than facts in predicate
[quoted text clipped - 25 lines]
>
> I karate chop your example to pieces! Haiii-ya.

Please forgive my ignorance - I'm not familiar with Oracle Spatial.
Are you suggesting that for a tri-surface all that is needed is a
single relation for the triangles, and when for example you want to
change what is conceptually a shared vertex (and so which is
understood to impact multiple triangles), it is assumed that all
vertex values that appear in the relation with that same value (ie
coords) are indeed logically shared and therefore are all
automatically updated by the DBMS at the same time? If so it is not
clear to me how and when the DBMS knows that such an elaborate update
policy is required.  I presume it is inferred from the integrity
constraints.  Is that right?  Does the DBMS provide such a facility in
a generic way?

This reminds me of the idea that one can change the key of a tuple in
a relation and have the DBMS automatically update all foreign key
references across the entire database.

Anyway, I think there are data entry applications where the concept of
"shared values" needs to be under user control.   For example in the
data entry of a CAD drawing of a car the user may or may not want all
the wheels to share the same geometry.  The problem with simple copy
and paste (and no logical sharing) is that any future edits to the
wheel geometry need to be repeated on every copy.  The obvious
solution seems to be to reference a single shared geometry for a wheel
- hence the need for an abstract identifier.  Are you suggesting that
an alternative is to instead use an integrity constraint!   If so how
can you specify which geometries are logically tied and which are not
(ie even though they just happen to be equivalent in value at that
moment in time)? Doesn't that require abstract identifiers of some
sort anyway?  I can't imagine that values that happen to be the same
are always assumed to be shared, because then it would be impossible
for a user to copy and paste a value in order to create a copy that
will subsequently diverge.

> > Rather it seems that a particular tri-surface /value/ has introduced a local and private
> > namespace in order to privately apply the RM.  Note as well that this
[quoted text clipped - 19 lines]
> or utility of decomposing an equation in such a manner /at the logical
> level/ (as opposed to the physical).
JOG - 11 Nov 2008 16:58 GMT
> > > > > > The RM handles facts as naturally as stating them in predicate logic.
> > > > > > And why would one ever model things other than facts in predicate
[quoted text clipped - 27 lines]
>
> Please forgive my ignorance - I'm not familiar with Oracle Spatial.

First apologies for the delay in response. I have been distracted by
TTM.

> Are you suggesting that for a tri-surface all that is needed is a
> single relation for the triangles,

No, I meant I work with the polygon (SDO_GEOM) object types which
Oracle Spatial makes available. Tri-surfaces are a different kettle of
fish no doubt.

> and when for example you want to
> change what is conceptually a shared vertex (and so which is
[quoted text clipped - 6 lines]
> policy is required.  I presume it is inferred from the integrity
> constraints.  Is that right?

That seems a likely method for checking constraints between polygon
instances. However, I think it is symptomatic of a design flaw if one
is trying to model these tri-surface jib jobs (which I assume, are
continuous 3d surfaces made up of triangles, as are used in graphic
models, reaching back to the old days of elite?).

> Does the DBMS provide such a facility in
> a generic way?

No. One would add the check as a constraint. However, it would be very
inefficient I imagine - I could see objections to this that you might
respond with (With a typed approach however the situation is
inevitable, because a geometry instance encapsualtes/hides away its
identifying qualities (being an object). This means they are not
exposed to the RA.

> This reminds me of the idea that one can change the key of a tuple in
> a relation and have the DBMS automatically update all foreign key
[quoted text clipped - 16 lines]
> for a user to copy and paste a value in order to create a copy that
> will subsequently diverge.

One of the other reasons that my reply took time, was that I have
thought reasonably hard about the issues you have raised and come to
conclude that you are right. At least right in the sense that I now
concur that RM can't cope with the example without adding RVA's or
inventing artificial identifiers. And both approaches are hack jobs as
far as I'm concerned.

On analysis, I think that you have tangentially identified a serious
issue with the RM (and not a case for recursive types per se - this is
an attempt to solve the issue, rather than describing the cause, and
care should be taken not to conflate the two). I will post when I get
more time if you are interested in my thought process, but it has
clarified some nagging concerns I have had concerning the universal
application of 1NF.

Either way I have found the tri-surface and illuminating example.
Regards, Jim.

> > > Rather it seems that a particular tri-surface /value/ has introduced a local and private
> > > namespace in order to privately apply the RM.  Note as well that this
[quoted text clipped - 19 lines]
> > or utility of decomposing an equation in such a manner /at the logical
> > level/ (as opposed to the physical).
paul c - 12 Nov 2008 22:27 GMT
...
> One of the other reasons that my reply took time, was that I have
> thought reasonably hard about the issues you have raised and come to
[quoted text clipped - 3 lines]
> far as I'm concerned.
> ...

Aren't all identifiers artificial?  If so, where is the hack?
paul c - 12 Nov 2008 22:28 GMT
> Aren't all identifiers artificial?  ...

Ie., we make them up.
Roy Hann - 12 Nov 2008 23:08 GMT
>> Aren't all identifiers artificial?  ...
>
> Ie., we make them up.

You need to be more precise.  All identifiers are made up somewhere, but
not necessarily within the enterprise of interest.  "We" might not
need to make up any.  Or we might need to make up a few--ideally the
fewest sufficient.

Signature

Roy

paul c - 13 Nov 2008 00:39 GMT
>>> Aren't all identifiers artificial?  ...
>> Ie., we make them up.
[quoted text clipped - 3 lines]
> need to make up any.  Or we might need to make up a few--ideally the
> fewest sufficient.

Sure.  But as for "not necessarily within the enterprise of interest",
why does such a distinction matter?

(BTW, I wasn't implying that I agree with the lazy shallow people who
think every relation should have a generated key.  I piped up because
I'm always happy to try to keep threads about RVA's or recursion or
constraints alive.  The latter two areas seem rather under-explored to
me.)
David BL - 13 Nov 2008 02:49 GMT
> >>> Aren't all identifiers artificial?  ...
> >> Ie., we make them up.
[quoted text clipped - 6 lines]
> Sure.  But as for "not necessarily within the enterprise of interest",
> why does such a distinction matter?

ISTM it relates to whether we informally consider a UoD and various
external predicates to exist /a priori/ (ie independently of the DB) -
even though we regard the UoD and the external predicates as outside
our mathematical formalism.

If we need to name some things in order to state facts about them then
I don't think it's particularly useful to this theory group to try to
distinguish between "natural" and "artificial" names. I believe this
is even true if we happen to use the DB to help allocate names for
informal things that we actually consider to be in the UoD.

I think an /abstract identifier/ should be defined as an identifier
that can be regarded as a name of a variable (which holds an abstract
value) within some context within the DB (and not in the UoD).   When
I say context I mean that there is some defined scope (hopefully as
small as possible) in which the name is meaningful.   For example
within the context of representing a tri-surface (ie triangulated
irregular network) /value/ it may be useful to introduce a scope in
which abstract identifiers are names of vertex values.  Note that
whenever we have a binding from an identifier to a value within some
scope I would say by definition we have a /variable/.

A tuple that contains an attribute value than is an identifier outside
the UoD cannot represent a self-contained (ie independently
verifiable) fact on the UoD. This is why I think one should introduce
as few abstract identifiers as possible. The idea that a domain expert
can regard each tuple of a relation as a self contained fact is
extremely valuable.
paul c - 13 Nov 2008 03:23 GMT
...
> A tuple that contains an attribute value than is an identifier outside
> the UoD cannot represent a self-contained (ie independently
> verifiable) fact on the UoD. ...

I'd like to see an example, this sounds like some kind of rhetorical
imaginary paradox to me.

If I don't know the names of my great-great-great-great-great
grandfathers, I would assign numbers or some kind of code to them.  I
couldn't guarantee that there were exactly 64 of them but I could be
sure that I had at least one.  However many of them, I don't see how
those identifiers would fall outside of, say, a genealogy "UoD".  Or do
you mean something else?

(I'd still like to know what the "hack" is, too.)
David BL - 13 Nov 2008 08:19 GMT
> ...
>
[quoted text clipped - 11 lines]
> those identifiers would fall outside of, say, a genealogy "UoD".  Or do
> you mean something else?

I did mean something else, but I think it needs some rewording because
I wasn't very clear.   Let there be a relation

   father(X,Y) :- X is the father of Y.

Let there be a UoD in which bill,jane are identifiers for two
particular humans.  Then the tuple

   father(bill,jane)

is an independently verifiable statement of fact on the UoD.

Alternatively let there be relations

   age(X,N) :- X has age recorded by variable named N.
   value(N,V) :- Variable named N has value V

where the scope of the variables in value(N,V) is regarded as local to
the DB.   Then the following tuple

   age(bill,n)

is not independently verifiable by a domain expert (who will ask "what
is n?" - because n is an abstract identifier).

With this last example I'm curious to know whether anyone would
disagree with the idea that it's sometimes possible to interpret an
identifier as a name for a variable.   I know when I've suggested the
idea that relations can represent variables and pointers to variables
I've met with plenty of opposition - the argument I presume being that
relation (values) only record tuple values, which in turn only record
attribute values - so it's argued there can't be variables or
pointers.

An interesting question:  is there a practical example of the use of
abstract identifiers (i.e. identifiers that fall outside the UoD)
where they never appear as a candidate key in some relation?   If so
that would seem to defeat my argument that abstract identifiers can
always be interpreted as names of variables defined in a scope within
the DB.
paul c - 13 Nov 2008 15:25 GMT
...
> I did mean something else, but I think it needs some rewording because
> I wasn't very clear.   Let there be a relation
[quoted text clipped - 36 lines]
> always be interpreted as names of variables defined in a scope within
> the DB.

This is probably over my head, but I'll chip in anyway.  By convention,
all tuples in a relation are true.  Pointers require resolution because
they imply alternatives.  In the RM (as we know it today), the only way
to express alternatives in an e