Database Forum / General DB Topics / DB Theory / November 2008
Modeling question...
|
|
Thread rating:  |
Volker Hetzer - 08 Jul 2008 17:25 GMT Hi! Not sure if this is the right group but I've come across a problem I'm at a loss to model properly. Here's the setup: A model contains three entities ("Level"s) describing projects: - Project Family records, each referencing several - Project records (with different attributes), in turn referencing - Sub projects, with different attributes again. Those three Project levels are connected by straight forward 1 to n relationships. But the problem is that they all have a bunch of key/value pairs. So, a project family can have a key/value-pair StartDate=20080615. But each project and subproject can have a different StartDate. On the other hand, sub projects, projects and family don't need to have the same key/value pairs.
Now, the simple solution is to have three key_value tables and be done with it. However what I'd very much prefer is just one key_value table with some kind of "level" attribute, 0 being family, 1 being project and 2 being sub project. The primary keys on the level entities are all numeric so this would perhaps work but I have no idea how to get the foreign key constraints done because the key_value table would have three parents.
How does one model relationships in which - one child table has several parent tables - each parent record can refer to several child records - each child record belongs to exactly one parent record in exactly one of the several parent tables? Is there a declarative way to enforce consistency?
Lots of Greetings! Volker
 Signature For email replies, please substitute the obvious.
Bob Badour - 08 Jul 2008 19:07 GMT > Hi! > Not sure if this is the right group but I've come across a problem I'm [quoted text clipped - 27 lines] > Lots of Greetings! > Volker Ooooh! Reinventing EAV with levels...
Volker Hetzer - 08 Jul 2008 20:36 GMT Bob Badour schrieb:
>> Hi! >> Not sure if this is the right group but I've come across a problem I'm [quoted text clipped - 32 lines] > > Ooooh! Reinventing EAV with levels... Possibly. I had a look at http://ycmi.med.yale.edu/nadkarni/eav_CR_contents.htm and didn't find anything exciting. All my attributes (key value pairs) are (for the purpose of this discussion) strings, so the Data tables hierarchy ends with EAV_Objects in the first image of that link. My problem is that, that I haveTest three different "Objects_1" tables and I'd like to avoid having to replicate the EAV_Objects-Table for each "Objects_1"-Table. OTOH, I could have the "level" entities all be children of an id table and put the key value pairs into a child of that table. I need to try this out.
Thanks for providing the pointer! Volker
 Signature For email replies, please substitute the obvious.
Bob Badour - 09 Jul 2008 13:14 GMT > Bob Badour schrieb: > [quoted text clipped - 50 lines] > Thanks for providing the pointer! > Volker Just to be clear, I was more than offering a pointer. I was also ridiculing the idea of EAV.
Volker Hetzer - 25 Jul 2008 15:05 GMT Bob Badour schrieb:
>>> Ooooh! Reinventing EAV with levels... >> [quoted text clipped - 17 lines] > Just to be clear, I was more than offering a pointer. I was also > ridiculing the idea of EAV. I got that. :-) But "we want to be able to create and delete attributes" is a customer requirement. I think it's different from "I am too lazy to do a proper data model". There are plenty of "normal" attributes left to model ERD like.
Lots of Greetings! Volker
 Signature For email replies, please substitute the obvious.
JOG - 25 Jul 2008 15:19 GMT > Bob Badour schrieb: > [quoted text clipped - 29 lines] > -- > For email replies, please substitute the obvious. What's wrong with drop/add column?
Volker Hetzer - 25 Jul 2008 15:33 GMT JOG schrieb:
>> Bob Badour schrieb: >> [quoted text clipped - 27 lines] > > What's wrong with drop/add column? All the things that are wrong if an application requires DDL during its normal state. No undo, no scalability, limits on the number of attributes, limits on the structure of the attribute names, the same attributes in each project/pcb/etc. and so on. Sorry, but in my opinion DDL is for installation and maintenance. End users shouldn't trigger DDL neither directly nor indirectly.
Lots of Greetings! Volker
 Signature For email replies, please substitute the obvious.
JOG - 25 Jul 2008 16:45 GMT > JOG schrieb: > [quoted text clipped - 34 lines] > the structure of the attribute names, the same attributes in each > project/pcb/etc. and so on. If one is changing trying to change the attributes that entities possess, than one is necessarily altering the propositions that can be stated about them. This necessitates a change in relation predicates, which means it is absolutely a DDL issue. To think otherwise seems to somewhat miss the point of the relational model.
> Sorry, but in my opinion DDL is for installation and maintenance. End users > shouldn't trigger DDL neither directly nor indirectly. No need to be sorry. If you want to make the same EAV mistakes that countless have before you then that's up to you. All best, J.
> Lots of Greetings! > Volker > -- > For email replies, please substitute the obvious. Volker Hetzer - 11 Sep 2008 18:29 GMT JOG schrieb:
>> JOG schrieb: >> [quoted text clipped - 36 lines] > which means it is absolutely a DDL issue. To think otherwise seems to > somewhat miss the point of the relational model. Bit late but I was on holiday... I am not trying to change the attributes that an entity possesses but I am allowing each business object or user object or whatever you may call it to contain an attribute collection. There are no changes in relation predicates since any attribute name is just contents, like its value. I think we are talking at cross purposes.
I'm curious, what do you people do when a customer comes and says, "I want to store <thing> and I want to add, change and remove key value pairs and I want to name them freely.". Are you telling them that a database can't do it? That whatever the reason, it's stupid? That no more than 255-minus-housekeeping attributes are allowed because, say, oracle can't do more columns? That no attributes can contain more than 22 or whatever characters? What do you say when they ask the some intern and he comes up with a <thing_id>,<attribute_name>,<attribute_value> table attached to the <thing> table by a one-to-many relation and tell you this is what they want?
Lots of Greetings! Volker
 Signature For email replies, please substitute the obvious.
Bob Badour - 11 Sep 2008 21:23 GMT > JOG schrieb: > [quoted text clipped - 61 lines] > "I want to store <thing> and I want to add, change and remove key > value pairs and I want to name them freely.". In... um... 27 years of developing software, not once has a customer ever come to me and said anything even remotely similar.
Alvin Ryder - 12 Sep 2008 03:11 GMT > JOG schrieb: > [quoted text clipped - 66 lines] > > - Show quoted text - Volker,
If you just want something like forgotton password question/answer data:- User_Name='JOE', Question = 'Fav color', Answer = 'pink', OtherStuff='blah'... User_Name='JIM', Question = 'Fav color', Answer = 'purple', OtherStuff='blah'... User_Name='BILL', Question = 'How much money do I have', Answer = 'way too much'
then that's OK, the entire table is not structured around name/value pairs. Users are allowed to have different question/answer pairs.
BUT if you want name/value pairs for everything then you're asking for something very different, you're asking for trouble. It doesn't only go against fundamental principles of the relational model, you might say "so what that's so airy fairy" but many database vendors have gone out of their way to implement according to that theory so the consequences will become very practical very quickly:- -You can forget about decent performance for one. -You can forget about maintainable queries as well. -The application code will also suffer. It's all a lot more work.
You should try writing some queries using proper relations and then try the same with name/value pairs. Especially try joins involving name/value pairs. Yuck. I don't know this guy but at a glance those queries are started to look part, but I've seen much worse (http:// tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design- mistakes.html)
Now the multi-level part. Normally self-joins can be used to achieve a tree like structure but you need to be careful because many databases can choke if you don't implement them the way they like. Now you wanted name/value pairs with levels - ouch!
You should serious follow the advice already given to you by the others, devise a proper relational model and avoid EAVs
Volker Hetzer - 20 Oct 2008 18:05 GMT Alvin Ryder schrieb:
> Volker, > [quoted text clipped - 9 lines] > then that's OK, the entire table is not structured around name/value > pairs. Users are allowed to have different question/answer pairs. Ok, I see the misunderstanding here. Yes, you are right, ours aren't attributes that lead to foreign keys or even constraints, it really is arbitrary data that only gets displayed. I like your example. Here's ours: Team one thinks it wants to remember some implementation detail of the firewire port of a board. So they decide they want an attribute FIREWIRE_TYPE and they want this to be a number. That does not mean I (the database) am to do calculations with this number, just that they want some simple type checking in the input mask. This also does not imply that other pieces of data depend on that attribute being there or having a certain value. Team two doesn't do pc boards today but some industrial stuff so they decide they want an attribute CANBUS_AVAILABLE for their project and want this to be boolean. What none of the guys want is to sit on a huge heap of predefined attributes, consisting of each and every attribute ever entered or required by them. So, each board project starts out afresh, with attributes added as the team members see fit. Each new project has some new attributes because technology advances or people think that different bits are important. (It's a bit less arbitrary because for each board family the teams meet up and decide upon the attributes for that family. This happens several times a year and of course, one attribute is always overlooked in the first spec.)
OTOH: Of course, each project has a name, belongs to a family, has assembly variants and stuff and all /these/ things are modeled properly (that is, not as EAVs) because they form relations and are the skeleton of the whole application. The forgotten-password-type-attributes are just things dangling off some of the real entities.
> BUT if you want name/value pairs for everything then you're asking for > something very different, you're asking for trouble. I fully agree. This is also the deal I've insisted on with the customers. As soon as I have to do something with the data (apart from storing and retrieving) it's an entirely different quality and needs time, causes implementation costs, beta tests and all the things associated with an application change. They have assured me that this definitely will not happen, so I think it will happen less frequently than once or twice per year, with decreasing frequency as the application matures. That means we can incorporate this into a normal software development cycle.
> You should try writing some queries using proper relations and then > try the same with name/value pairs. No, thanks. I know what you're driving at and I see the same problem. If customers need to store a bunch of named sticky notes, all right, but that means named sticky notes is all they get.
> Now the multi-level part. Normally self-joins can be used to achieve a > tree like structure but you need to be careful because many databases > can choke if you don't implement them the way they like. Now you > wanted name/value pairs with levels - ouch! No, the self joins work with "real" data, that is, properly modeled stuff. :-)
> You should serious follow the advice already given to you by the > others, devise a proper relational model and avoid EAVs Believe me, I do. It's just that sometimes it's kind of hard to explain the problem to people from a very different background. Thanks again for the lost password metaphor. It really hit the nail on the head.
Lots of Greetings! Volker
 Signature For email replies, please substitute the obvious.
JOG - 21 Oct 2008 22:31 GMT > JOG schrieb: > [JOG hat gesnipped] [quoted text clipped - 16 lines] > value pairs and I want to name them freely.". > Are you telling them that a database can't do it? I'm telling them, and you, that the relational model can't do it because it was designed to handle "formatted" propositions (sets of data with a high level of common predication). It is important to recognize that the EAV approach you are looking at just happens to use the RM as its physical layer, and that's it. It does not use the RM as a logical model, and you therefore lose all of its algebraic power. (Sure you keep the management system's transactional capabilities, but thats nothing to do with the RM).
In fact, having abandonded it, you might as well use XML, OO or RDF databases and cut out the middle man. However, better imo to convince the client that designing a robust a priori conceptual model is worth doing, and that you can come and update it at appropriate intervals (I say this because currently the RM is the most solid framework we have).
I do have sympathy, because the issue of handling semistructured and dynamic schema is simply an unsolved problem (as is how to handle missing data). Proposed "solutions" are all woeful (in fact completely retrograde, whisking us back to 1960's tech). As such, anything you try and implement for your client will inevitably be an ad-hoc hack / in some way or other/. We're still in the stone age of informatics i'm afraid. Regards, Jim.
> That whatever the > reason, it's stupid? That no more than 255-minus-housekeeping [quoted text clipped - 10 lines] > -- > For email replies, please substitute the obvious. paul c - 21 Oct 2008 23:06 GMT >> JOG schrieb: >> [JOG hat gesnipped] [quoted text clipped - 40 lines] > afraid. Regards, Jim. > ... I remember more than thirty years ago when companies sent some of their mid-level non-DP types to report writer school, sponsored by various outfits who're gone now or absorbed by bigger ones. A few of those people got good enough to submit their own queries in the batch mode of the day and only resorted to the DP department (wasn't called IT then) when it was over their head. They were hip enough to know when they were over their heads and didn't waste time like some of us (including me). Of course the majority were useless at this, as I think they were in their official function (this being typical of most commercial pursuits, if you ask me).
Makes me wonder why commercial enterprises that are laying off IT people don't try some minimal training for their middle muddler people in relational algebra. If they did, I guess they'd have to put in some fairly strict rules for their dba's.
Is it naive to think that some end-users could add attributes (ignoring the problem that this would make them non-end-users)?
Walter Mitty - 22 Oct 2008 15:22 GMT > I'm telling them, and you, that the relational model can't do it > because it was designed to handle "formatted" propositions (sets of [quoted text clipped - 19 lines] > in some way or other/. We're still in the stone age of informatics i'm > afraid. Regards, Jim. The above states the case better than I can. I'm going to throw in my two cents, in addition to agreeing with JOG's comment.
The big problem with using a semistructured approach to data, such as the EAV, is setting user expectations. If users were able to understand and appreciate that there is not necessarily any way to integrate the data between users after users have each used EAV to, in effect, design their own idiosyncratic database, that would be one thing. But my experience is that either users, or at least upper management, always fall back on the notion that databases are for sharing data, and therefore when they ask for outputs that require integration, the hard work has already been done when the database was built.
In one sense, it's hard to argue with management. Databases are for sharing data. That's what they were invented for, and that's what they are good at. So the expectation lives on that getting an output from a database that requires integrated massaging of the data is a simple request. Just take the required information, map it to the way the database represents the data, crank up a report writer, and presto!
The problem here is in the phrase "the way the database represents the data." With approaches like EAV, there is no ONE way the database represents data. Each user's data is represented the way that seems good to that user. Getting the users and management to understand that fact and set their expectations accordingly, is very very difficult. The easiest way to do this is to bypass a DBMS completely, and just store the semistructured and unintegrated data in a text file. Then at least you don't get the illusion that, because we manage the database with a DBMS, we must therefore have stored the data according to some coherent general plan.
Roy Hann - 22 Oct 2008 17:33 GMT >> I'm telling them, and you, that the relational model can't do it >> because it was designed to handle "formatted" propositions (sets of [quoted text clipped - 49 lines] > illusion that, because we manage the database with a DBMS, we must > therefore have stored the data according to some coherent general plan. I don't want to disagree too violently with Walter's re-telling of JOG's very sound position, but I get the sense that even Walter doesn't fully appreciate the idiocy of EAV.
The point to get across to management is that they don't need EAV because even an SQL DBMS already does everything EAV does, and more. If management want users to be able to dream up and implement their own fact types, each user can just go ahead and create suitable tables in the usual way.
Now, how do you get all the users to share an understanding of all these tables so they can usefully collaborate (share data)? Good question. The same way they imagine EAV would do it, I guess. (Only it will be easier to implement because all you need is dynamic SQL.)
And before I leave this alone, there is no such thing a "semi-structured" data. That term makes as much sense as semi-understood knowledge. The concept that people using that term might be struggling to convey is "semi-shared business model", or to put it another way, "(only) some of us know what (only) some of this means". My attitude to that is fine, just don't expect me to know what any of it means.
 Signature Roy
paul c - 22 Oct 2008 18:52 GMT ...
> And before I leave this alone, there is no such thing a > "semi-structured" data. That term makes as much sense as [quoted text clipped - 3 lines] > means". My attitude to that is fine, just don't expect me to know > what any of it means. Heh, in other words, semi-understood data?
Ironic how "not-invented-here" so often actually means "invented here".
(Letting "semi-understood data" proliferate might be chaotic. Maybe in such a regime, to echo Walter M, it would be prudent to ensure that it be kept "semi-shared". Eg., amongst the EAV protagonists and their cronies. It usually seems to me that when these EAV proposals come up, the question is not that the organization needs new organization-wide "entities", for want of a better word, but additional attributes for existing relations. So, I'd think it might be okay from an integrity viewpoint to let them define their own tables which are partly based on organization-wide tables. At least everybody could stick with the usual relational ops. Not sure if I've ever seen this tried, though.)
paul c - 22 Oct 2008 18:58 GMT > ... Not sure if I've ever seen this tried, though.) I meant wrt db's. In other areas, such as Linux dev't, it seems that various dynamic features can find their way into the kernel after some gestation time.
(I think Roy H might have hit it on the head by emphasizing "semi-shared").
Walter Mitty - 24 Oct 2008 08:57 GMT > ... >> And before I leave this alone, there is no such thing a [quoted text clipped - 19 lines] > tables. At least everybody could stick with the usual relational ops. > Not sure if I've ever seen this tried, though.) Perhaps it would be sufficient to categorize the outputs of the database as "semi-correct". Let the users think about that one for a while!
paul c - 24 Oct 2008 14:40 GMT >> ... >>> And before I leave this alone, there is no such thing a [quoted text clipped - 23 lines] > "semi-correct". > Let the users think about that one for a while! Maybe the key problem to be avoided is inadvertent semi-redundancy which could pollute the db as a whole. If there's a way to avoid that I'd think there is no basic problem even though various pet niceties such as performance targets might suffer.
paul c - 24 Oct 2008 14:50 GMT ...
> Perhaps it would be sufficient to categorize the outputs of the database as > "semi-correct". > Let the users think about that one for a while! Maybe we should just cut to the chase and call it semi-data.
JOG - 23 Oct 2008 13:13 GMT > >> I'm telling them, and you, that the relational model can't do it > >> because it was designed to handle "formatted" propositions (sets of [quoted text clipped - 67 lines] > And before I leave this alone, there is no such thing a > "semi-structured" data. Despite a growing literature, current definitions of "semi-structure" are woefully inadequate. The standard denotation is of data that "does not fit into the relational model".
Yes, quite.
> That term makes as much sense as > semi-understood knowledge. The concept that people using that term [quoted text clipped - 5 lines] > -- > Roy Roy Hann - 23 Oct 2008 14:01 GMT > Despite a growing literature, current definitions of "semi-structure" > are woefully inadequate. A million people can (and evidently will) talk bollocks, but it's still bollocks.
> The standard denotation is of data that "does > not fit into the relational model". That definition is entirely bogus. The relational model just applies set theory to first order predicate logic. If you have "data" that doesn't fit into both of these then you better start hiring mystics to look after it for you.
But of course what someone who says that really means is, "data that we can't be bothered to fit into the relational model because the programming tools we use to write the applications are so crap there is no point."
 Signature Roy
JOG - 23 Oct 2008 14:05 GMT > > Despite a growing literature, current definitions of "semi-structure" > > are woefully inadequate. [quoted text clipped - 9 lines] > doesn't fit into both of these then you better start hiring mystics to > look after it for you. Indeed. And yet hundreds of peer-reviewed papers have been published on the topic. I find this incredibly depressing.
> But of course what someone who says that really means is, "data that we > can't be bothered to fit into the relational model because the [quoted text clipped - 3 lines] > -- > Roy paul c - 23 Oct 2008 14:25 GMT >>> Despite a growing literature, current definitions of "semi-structure" >>> are woefully inadequate. [quoted text clipped - 15 lines] >> programming tools we use to write the applications are so crap there >> is no point." The act of deciding/agreeing upon relations exposes enough structure for the RM to be applied. It's the sine qua non. Dr. Strangelove might have said after reading a paper that suggests it unnecessary "but that's the whole point!".
Roy Hann - 23 Oct 2008 15:10 GMT >> > Despite a growing literature, current definitions of "semi-structure" >> > are woefully inadequate. [quoted text clipped - 12 lines] > Indeed. And yet hundreds of peer-reviewed papers have been published > on the topic. I find this incredibly depressing. Cheer up! :-) It's worse than you think: clinical outcomes research (with which I was periperally involved for the first 10 years of my career) is *at least* as bad.
"On the word of no one." (Paradoxically, those are good words to live by. :-)
 Signature Roy
David BL - 23 Oct 2008 16:15 GMT > > > Despite a growing literature, current definitions of "semi-structure" > > > are woefully inadequate. [quoted text clipped - 12 lines] > Indeed. And yet hundreds of peer-reviewed papers have been published > on the topic. I find this incredibly depressing. Ok, I’ll bite…
No doubt any data can be made to “fit” into the relational model. The more important question is whether it happens /naturally/. The relational model works really well when there is a UoD on which many propositions can be made without needing to introduce lots of abstract identifiers. That’s very common, but it’s not always the case. It seems to me the question of whether the RM is generally appropriate for heavily nested composite values is unresolved. Much of the world’s data is in this latter form. Eg abstract syntax trees, rich text documents, scene graphs.
If the relational model is universally applicable, why don’t programmers enter their programs as relations? Do you really think it’s only because of the tools currently available?
What about automated proof systems? Is the knowledge base and data associated with an ongoing proof best represented using a set of relations? I find that quite unlikely. The RA seems to have more to do with set based calculations on known sets of values, rather than symbolic manipulation. Symbolic manipulation involves a lot of recursion and the RA on its own is too weak, which suggests it will take a backseat role. Eg to compute the most general unifier of two given expressions involves recursion in the nested expressions.
I also find it rather telling that relational queries (ie RA expressions) are not themselves represented using relations. Surely if that were useful, many cdt folks would jump at the opportunity to further promote the use of relations.
paul c - 23 Oct 2008 17:09 GMT ...
> If the relational model is universally applicable, why don’t > programmers enter their programs as relations? ... That is one of the most fundamental questions. At the risk of sounding like I'm sloughing it off, I'd say the answer has to do with tedium and lies somewhere near the fact that we have shortcuts and shorthands to give equivalent results and our human situation, limitations in momentary perception such as our shallow mental stack.
toby - 23 Oct 2008 20:55 GMT > ... > [quoted text clipped - 6 lines] > give equivalent results and our human situation, limitations in > momentary perception such as our shallow mental stack. Many programs, or parts thereof, reduce to SQL expressions over data. Programmers who don't understand RM well tend to under-use it for computation. A declarative expression can often elide a lot of imperative tedium.
paul c - 24 Oct 2008 01:44 GMT ,,,
> Many programs, or parts thereof, reduce to SQL expressions over data. > Programmers who don't understand RM well tend to under-use it for > computation. A declarative expression can often elide a lot of > imperative tedium. Not that I would know personally but from I read, most programmers (environment designers too) see SQL as a mere storage access method.
It seems the designers of the mainstream programming interfaces/environments where so many developers these days spend the most time such as Javascript, PHP, et al have been just as unenlightened that is to say ignorant. While not exactly mainstream the RDF that JOG decries might be the nadir on the scale. I wonder what would result if one of those designers were forbidden to implement arrays.
Nothing new I guess, it was years ago that I remember criticizing an international multi-location transport system for its myriad (and insufficient) message codes on the grounds that only two operators were really needed, INSERT and DELETE. It could have been seen as a distributed system with disparate db's. All each location really needed to do was send portions of its own "redo" log (the portions being those that applied to certain common tables) to the others and the latter could decide what action, if any, was appropriate for their particular db. This would have had the bonus that much correction work would be saved but the disadvantage of fewer jobs for the boys. It was also considered anathema to send a message that might be ignored. Apparently the Mott's Clamato Juice company tries to think big but I believe the airline industry still thinks operators are people. I know I'm drifting but I can't help observe that the big industries that influence IT trends are often highly regulated which situation may encourage small-minded thinking as well as similar people.
David BL - 24 Oct 2008 03:29 GMT > > ... > [quoted text clipped - 11 lines] > computation. A declarative expression can often elide a lot of > imperative tedium. I agree with you but you seem to have missed the point. For example, can SQL expressions themselves be satisfactorily entered as relations?
David BL - 24 Oct 2008 03:17 GMT > ... > [quoted text clipped - 6 lines] > give equivalent results and our human situation, limitations in > momentary perception such as our shallow mental stack. I believe there is a simple answer: The relational approach implies the appearance of many abstract identifiers.
[Side note: Marshall allows a relational approach to encompass extensive use of RVAs - even to the point where at each level in the hierarchy of a heavily nested composite value, a relation is only being used to represent the children of a given node. This avoids the need to introduce lots of abstract identifiers, but I don't agree with Marshall that such an approach should be called "relational". Of course if anyone really wants to call that relational then that's fine by me - after all it's just a word. My argument of course only applies when heavily nested RVAs are not being used]
I think you may be discounting the importance of languages (or more specifically grammars) or the concept of a well formed formula. After all the First Order Logic (FOL) is formalised with the concept of a wff which is defined recursively. It seems wrong to assume that data management doesn't encompass recording wffs. It seems wrong to assume that wffs can be represented /naturally/ in the RM. While it's true that the RM/RA is closely associated with set theory and the FOL, anyone using the full power of the FOL does a lot more than calculate with known extensions of sets.
paul c - 24 Oct 2008 09:27 GMT ...
> I believe there is a simple answer: The relational approach implies > the appearance of many abstract identifiers. [quoted text clipped - 16 lines] > that wffs can be represented /naturally/ in the RM. While it's true > that the RM/RA is closely associated with set theory and the FOL, I guess I was trying to answer the wrong question, should have said that I was thinking of relations in general, didn't mean to suggest that Codd's RM is anything more than a narrow application of relations aimed mechanical storage and manipulation of db's.
paul c - 23 Oct 2008 17:23 GMT ...
> I also find it rather telling that relational queries (ie RA > expressions) are not themselves represented using relations. Just because some syntax doesn't look like a relation doesn't mean that the result isn't defined by relations. Eg., the D&D relational operators are all defined in terms of relations, then a bunch of shorthands are given, so as to minimize tedium and clerical errors. SQL doesn't do this which may be why so many people who think it is obedient to the RM have such weird ideas, such as thinking a relation can be updated.
Surely
> if that were useful, many cdt folks would jump at the opportunity to > further promote the use of relations. > A relation is a mathematical construction. How well implementations mimic it is very much in the mental "eye" of the beholder, in a way implementation is a misleading word for what is really just an mechanical aid for symbolic manipulation and storage of results.
David BL - 24 Oct 2008 04:10 GMT > ... > [quoted text clipped - 17 lines] > implementation is a misleading word for what is really just an > mechanical aid for symbolic manipulation and storage of results. Sorry I have no idea what you're getting at.
Let me be more specific: when you see a wff in some formal language, do you actually think of it as a relation? How is that useful? More specifically, how is the RA useful? Can you give me an example together with the relation's degree and the names and types of its attributes?
Walter Mitty - 24 Oct 2008 09:16 GMT >Ok, I’ll bite…
>No doubt any data can be made to “fit” into the relational model. >The more important question is whether it happens /naturally/. The I don't understand the word "naturally" in this context. Isn't all modeling artificial, rather than natural?
David BL - 24 Oct 2008 11:13 GMT > >Ok, I’ll bite… > >No doubt any data can be made to “fit” into the relational model. > >The more important question is whether it happens /naturally/. The > > I don't understand the word "naturally" in this context. Isn't all modeling > artificial, rather than natural? I'm suggesting that in certain situations the RM is cumbersome, making it inappropriate or inapplicable. This is specifically with regard to /recursive data types/.
For example, recursive data types are appropriate for representing wffs in most formal languages. They are also relevant in compound documents. Eg
struct Chapter { String title; Vector<Paragraph> paragraphs; Vector<Chapter> subchapters; };
There are two ways that the RM can be used to represent recursive data types:
1. Using recursive RVAs; or 2. By introducing abstract identifiers for all the nodes, and appropriate integrity constraints
I find the first quite reasonable, but I'm suspicious of actually calling such an approach "relational".
In the second case lots of integrity constraints are needed because the RM is too flexible! It needs to be heavily constrained to only represent tree structures. The integrity constraints quickly get horribly messy - particularly for a reasonably complex grammar, and I believe it's possible to interpret it as a manually written axiomatization of pointer semantics (the ability to "dereference" an abstract identifier as though it points at one and only one child node in the tree). If you compare the RM to the grammar you will find the former to be /much more complex/.
The following is the example I used when I talked about this 12 months ago:
Using Prolog notation, consider the following relations which allow for representing an expression such as (x+1)*3:
var(N,S) :- node N is a variable named S number(N,I) :- node N is a number with value I add(N,N1,N2) :- node N is the addition of nodes N1,N2 mult(N,N1,N2) :- node N is the product of nodes N1,N2
Define a view called nodes(N) which is a union of projections as follows:
nodes(N) :- var(N,_). nodes(N) :- number(N,_). nodes(N) :- add(N,_,_). nodes(N) :- mult(N,_,_).
The following are the integrity constraints (each query must be empty):
var(N,S1), var(N,S2), S1 <> S2? number(N,I1), number(N,I2), I1 <> I2? add(N,N1,_), add(N,N2,_), N1 <> N2? add(N,_,N1), add(N,_,N2), N1 <> N2? mult(N,N1,_), mult(N,N2,_), N1 <> N2? mult(N,_,N1), mult(N,_,N2), N1 <> N2? var(N,_), number(N,_)? var(N,_), add(N,_,_)? var(N,_), mult(N,_,_)? number(N,_), add(N,_,_)? number(N,_), mult(N,_,_)? add(N,_,_), mult(N,_,_)? add(_,N,_), not nodes(N)? add(_,_,N), not nodes(N)? mult(_,N,_), not nodes(N)? mult(_,_,N), not nodes(N)?
David BL - 25 Oct 2008 07:24 GMT > The following is the example I used when I talked about this 12 months > ago: [quoted text clipped - 34 lines] > mult(_,N,_), not nodes(N)? > mult(_,_,N), not nodes(N)? It occurred to me that there are more integrity constraints required.
Consider that we define
% parent(P,C) :- node P is a parent of node C parent(P,C) :- add(P,C,_). parent(P,C) :- add(P,_,C). parent(P,C) :- mult(P,C,_). parent(P,C) :- mult(P,_,C).
and
% ancestor(N1,N2) :- node N1 is an ancestor of N2. ancestor(N1,N2) :- parent(N1,N2). ancestor(N1,N2) :- parent(N1,N), ancestor(N,N2).
The following goal must return failure to express the integrity constraint that there are no cycles:
ancestor(N1,N2),ancestor(N2,N1)?
In addition we would like to express the constraint that there is no garbage (ie unreachable nodes) with respect to a defined set of root nodes.
% root(N) :- node N is the root of an expression.
% reachable(N) :- node N is reachable from a root reachable(N) :- root(N). reachable(C) :- parent(P,C), reachable(P).
% integrity constraint - must be empty node(N), not reachable(N)?
As you can see, it's rather low level. I don't think it's surprising that the RM is capable of that (since it's so flexible).
A common theme on this ng is the idea of physical independence. It tends to be assumed that a program written in C and using pointers is "low level" and "close to the physical hardware", whereas anything using the RM is necessarily "high level" and "divorced from the physical hardware". This I agree is usually the case.
I think there is evidence here that an inappropriate use of the RM can actually be low level in a similar way to how C is low level. ie algorithms can easily have bugs that resemble the creation of dangling pointers or memory leaks!
Now consider again the following C++ struct
using namespace std;
struct Chapter { string title; vector<string> paragraphs; vector<Chapter> subchapters; };
This recursive type definition compiles without any problem and is able to grow dynamically and enforces the integrity constraints. Nevertheless there isn't a pointer in sight. Behind the scenes there is a physical implementation of the STL string and vector classes using pointers, heap allocations and so on.
The striking thing to me is that we are able to work at a higher level than the RM in this particular case.
JOG - 27 Oct 2008 02:16 GMT > > "David BL" <davi...@iinet.net.au> wrote in message > [quoted text clipped - 25 lines] > There are two ways that the RM can be used to represent recursive data > types: I think this is the wrong way of looking at it. The OO (for want of a better word) and RM approaches are two different ways of modelling statements of fact from the world. And yet you seem to be stating the problem as how to try and model a struct/object in RM? (That would be like complaining that after you've put all your milk into a fridge, you're having trouble pouring the fridge onto your cereal!)
> 1. Using recursive RVAs; or > 2. By introducing abstract identifiers for all the nodes, and > appropriate integrity constraints > > I find the first quite reasonable, but I'm suspicious of actually > calling such an approach "relational". What do you do in real life to identify a chapter? You refer to it by name or number - and the same for subchapters too right? And if two subchapters have the same local identifier (e.g. 'introduction') well you use a composite identifer such as "the 'introduction' of the third chapter". And if you can refer to a chapter when communicating with someone else, then you have necessarily stated something about it in the form of a proposition - and if it can be stated as a proposition...it can be encoded as a tuple in RM.
And as far as constraints are concerned, what more do you need apart from a subchapter can only have one containing chapter? I don't see the issue with this example. Regards, J.
> In the second case lots of integrity constraints are needed because > the RM is too flexible! It needs to be heavily constrained to only [quoted text clipped - 44 lines] > mult(_,N,_), not nodes(N)? > mult(_,_,N), not nodes(N)? David BL - 28 Oct 2008 06:49 GMT > > > "David BL" <davi...@iinet.net.au> wrote in message > [quoted text clipped - 32 lines] > like complaining that after you've put all your milk into a fridge, > you're having trouble pouring the fridge onto your cereal!) I'm making the following claims:
1. There are applications that require the management of data in the form of heavily nested values (ie of recursive value types).
2. There doesn't exist a satisfactory decomposition of values of a recursive value type into the RM other than by the use of recursive RVAs.
This has nothing specifically to do with structs, objects and OO. In fact if anything OO languages tend to be rather poor at supporting user defined value types.
Lisp and Prolog both provide excellent means to represent and process recursive value types, and arguably much better than in C/C++.
I would suggest that the constraints imposed by RM/RA are best understood by an experienced Prolog programmer. One of those constraints (assuming no recursive RVAs) is tantamount to outlawing nested terms.
> > 1. Using recursive RVAs; or > > 2. By introducing abstract identifiers for all the nodes, and [quoted text clipped - 11 lines] > the form of a proposition - and if it can be stated as a > proposition...it can be encoded as a tuple in RM. I agree that statements about things are well suited to the RM.
> And as far as constraints are concerned, what more do you need apart > from a subchapter can only have one containing chapter? I don't see > the issue with this example. Regards, J. The example of the chapter was to show that recursive data types are quite common (and not just limited to wffs in formal languages). This suggests there are many applications for which these questions are relevant. However the example is too simple to reveal the problems.
If you have a fairly complex grammar (or definition of a wff in some formal language), such as defined by the OpenDocument specification, would you be happy to represent those wffs in the RM?
The OpenDocument V1.1 spec is 738 pages. It is basically a heavily commented XML schema. Although I find XML hideous, I can understand how the entries represent the elements of a recursively defined wff in some formal language. I would cringe at the idea of trying to map it all to the RM. The integrity constraints would be horribly complex.
JOG - 29 Oct 2008 00:26 GMT > [snip] > > > struct Chapter [quoted text clipped - 19 lines] > 1. There are applications that require the management of data in the > form of heavily nested values (ie of recursive value types). Hi David. This is still putting the cart before the horse as far as I'm concerned. Once you say there are "types", and they are "recursive", you have already created a model in your head. Trying to then squash that model into the RM is bound to cause problems. A book does not contain recursive types as. Saying that they do to someone in the street and they'll think you're loopy-loo right?
> 2. There doesn't exist a satisfactory decomposition of values of a > recursive value type into the RM other than by the use of recursive > RVAs. What was wrong with the method I suggested of just copying how you describe things in real life? And then just representing that in predicate logic? I don't need recursive types to model statements of fact (although I would agree recursive queries/constraints are extremely valuable).
> This has nothing specifically to do with structs, objects and OO. In > fact if anything OO languages tend to be rather poor at supporting > user defined value types. > > Lisp and Prolog both provide excellent means to represent and process > recursive value types, and arguably much better than in C/C++. You'd like Haskell - have you tried it?
> I would suggest that the constraints imposed by RM/RA are best > understood by an experienced Prolog programmer. One of those [quoted text clipped - 18 lines] > > I agree that statements about things are well suited to the RM. Statements about things is what all databases are concerned with, not just the RM. Anything else is out of its remit. Its function is to model facts, not values such as equations. I certainly wouldn't use it to model something like a car engine schematic either (but facts about that engine... yes!).
> > And as far as constraints are concerned, what more do you need apart > > from a subchapter can only have one containing chapter? I don't see > > the issue with this example. Regards, J. > > The example of the chapter was to show that recursive data types are > quite common (and not just limited to wffs in formal languages). Why would you want to record facts about wff's instead of, well, just using them for things? I can't imagine the application at the moment <scratching_head/>.
> This suggests there are many applications for which these questions are > relevant. However the example is too simple to reveal the problems. > > If you have a fairly complex grammar (or definition of a wff in some > formal language), such as defined by the OpenDocument specification, > would you be happy to represent those wffs in the RM? Not a formula - it is not a datum (as meant in the term database), just a value. However, a grammer such as a BNF, yeah I can. Very much so in fact, because it consists of rules which are statements of fact. I think we should give it a spin ;)
> The OpenDocument V1.1 spec is 738 pages. It is basically a heavily > commented XML schema. Although I find XML hideous, I can understand > how the entries represent the elements of a recursively defined wff in > some formal language. I would cringe at the idea of trying to map it > all to the RM. The integrity constraints would be horribly complex. David BL - 29 Oct 2008 02:51 GMT > > [snip] > > > > struct Chapter [quoted text clipped - 26 lines] > does not contain recursive types as. Saying that they do to someone in > the street and they'll think you're loopy-loo right? I understand what you're saying. However I personally find recursive data types very intuitive and useful so if I consider /myself/ as a user of a DBMS, a restriction to the RM feels like one hand is tied behind my back.
I'm interested in the storage and management of compound documents, scene graphs and so on. I cannot imagine doing without recursive data types.
> > 2. There doesn't exist a satisfactory decomposition of values of a > > recursive value type into the RM other than by the use of recursive [quoted text clipped - 5 lines] > fact (although I would agree recursive queries/constraints are > extremely valuable). I agree you don't need recursive types to model statements of fact. However I don't consider data to necessarily be regarded as a bunch of propositions. We have had this argument before!
Remember when I said that I regard a recorded poem as just a value, not a proposition? A CD containing a single text file that is a poem could I guess be construed as a single proposition as a claim of its own existence! But I regard such a claim as metaphysical and therefore meaningless. Alternatively the proposition could represent the real claim that that /particular/ CD records a poem. It would seem silly to try to make that proposition explicit by recording additional encoded values on the CD. For a start that would make it more difficult to copy the data to another media. So if the proposition is implicit - and you don't see it directly in the recorded data then I suppose you could say that the "actual recorded data" is an encoded string value whereas the "data" is a proposition. Do you think this distinction is useful? I don't!
What's wrong with defining "data" to mean "encoded value(s)" rather than "encoded fact(s)"? Note that the former encompasses the latter because a relation is a value.
> > This has nothing specifically to do with structs, objects and OO. In > > fact if anything OO languages tend to be rather poor at supporting [quoted text clipped - 4 lines] > > You'd like Haskell - have you tried it? Yes, but "tried it" is a fair description.
> > I would suggest that the constraints imposed by RM/RA are best > > understood by an experienced Prolog programmer. One of those [quoted text clipped - 24 lines] > to model something like a car engine schematic either (but facts about > that engine... yes!). I don't agree that the term "database" should imply it is only concerned with storing facts. A new term is required: "factbase" :)
> > > And as far as constraints are concerned, what more do you need apart > > > from a subchapter can only have one containing chapter? I don't see [quoted text clipped - 6 lines] > using them for things? I can't imagine the application at the moment > <scratching_head/>. I agree that you don't generally need to records facts about wffs. Our point of disagreement seems to stem from the meaning and scope of the word "data".
> > This suggests there are many applications for which these questions are > > relevant. However the example is too simple to reveal the problems. [quoted text clipped - 7 lines] > so in fact, because it consists of rules which are statements of fact. > I think we should give it a spin ;) Ok, I think we mostly agree with each other.
paul c - 24 Oct 2008 14:28 GMT >> Ok, I’ll bite… > [quoted text clipped - 3 lines] > I don't understand the word "naturally" in this context. Isn't all modeling > artificial, rather than natural? I'm with you even though we think of the activities involved as being natural to us. The RM is an artifice, so are models in general. So is FOL (even with its trap lingo like "Exists"). I doubt if mathematics is any more natural than a data model as it produces some conclusions that nobody can actually visualize. The consequences of relational closure are one small example. The reason I think this is important is that it means there ought to be nothing to prevent us devising even more useful artifices, even if most of us, including me, don't possess the insight to do that.
Being part of nature, we are hardly in a position to duplicate it. Our only advantage is the artifice wherein we can drop the natural aspects that are inconvenient or irrelevant, as we see it, to some purpose. We've been practising this since the Stone Age.
It bugs me when people pretend that we have re-produced anything but our own mental creations, I think that is the first step down the mystic slope. But reason and rationality too can get out of control, as modern history shows. Does that sound odd coming from an atheist?
David BL - 28 Oct 2008 02:49 GMT > >> Ok, I’ll bite… > [quoted text clipped - 23 lines] > slope. But reason and rationality too can get out of control, as modern > history shows. Does that sound odd coming from an atheist? Would you say Max Tegmark is on the mystic slope?
http://arxiv.org/PS_cache/arxiv/pdf/0704/0704.0646v2.pdf
paul c - 28 Oct 2008 03:11 GMT >>>> Ok, I’ll bite… >>>> No doubt any data can be made to “fit” into the relational model. [quoted text clipped - 24 lines] > > http://arxiv.org/PS_cache/arxiv/pdf/0704/0704.0646v2.pdf No idea at the moment, I'm still trying to figure out what that C++ structure has to do with declarative programming!
paul c - 30 Oct 2008 00:36 GMT ...
>> It bugs me when people pretend that we have re-produced anything but our >> own mental creations, I think that is the first step down the mystic [quoted text clipped - 4 lines] > > http://arxiv.org/PS_cache/arxiv/pdf/0704/0704.0646v2.pdf In a word, yes, but let me say that I don't mean yes in the same sense that I've accused some posters here of being mystics and colloquial English being what it is, what mystical means to some stranger or other is up for grabs. When I throw that word around, I'm talking about people who are ignoring the purpose of common db apps and the nature and capability of common processors and memories. I've met many so-called professional computer science experts who are almost totally unaware of what the typical digital computer is good at.
Also, relative to my elementary mathematical understanding he is indeed talking of mystical things, but at the same time I think I detect that he is aiming at a rather grand systematic structure, a superstructure if you like. Whether it is implementable is very much something else. Personally, I'm not too bothered about such musings because my interest in more in finding happy coincidences between applications and what machine instruction sets and electronic physics can imitate expediently, without impairing some interpretation that is ready and useful for humans. For one example - although he didn't emphasize the details, I feel certain that Codd saw how adjacency in machine memory could be exploited as a way to manifest a mathematical relation with low overhead. I have watched while various programming paradigms such as OO or column-based dbms's have discounted machine characteristics and while I wish those efforts no ill will, I do think they have taken on problems that current machines are not much good at. (When Codd talked about "representation" I have no doubt he was talking about both the mind's eye and machine efficiency.) I would hope when anybody, me or somebody else, throws the word "mystical" around, their own limitations are taken as givens, even if they don't acknowledge them up front in a casual forum such as this.
Bob Badour - 24 Oct 2008 15:54 GMT >>Ok, I’ll bite… > [quoted text clipped - 3 lines] > I don't understand the word "naturally" in this context. Isn't all modeling > artificial, rather than natural? Yes! Formal systems and symbolic manipulation are symbolic and abstract not natural.
David BL - 25 Oct 2008 03:18 GMT > >>Ok, I’ll bite… > [quoted text clipped - 6 lines] > Yes! Formal systems and symbolic manipulation are symbolic and abstract > not natural. The natural numbers should be called the artificial numbers :)
The word natural seems to have many meanings. There are 38 listed at Dictionary.com.
JOG - 29 Oct 2008 01:13 GMT > > > > Despite a growing literature, current definitions of "semi-structure" > > > > are woefully inadequate. [quoted text clipped - 16 lines] > > No doubt any data can be made to “fit” into the relational model. Let me state first that I don't believe that the relational model is universally applicable (I'm not sure where you think I have stated that). However, all data can be stated in predicate logic, and all statements of logic can be modelled in the RM. Hence, i consider it absolutely unarguable that there is any data which cannot be structured as a schema of relations. This is my objection to the semistructure literature.
> The more important question is whether it happens /naturally/. > The relational model works really well when there is a UoD on which many > propositions can be made without needing to introduce lots of abstract > identifiers. The RM handles facts as naturally as stating them in predicate logic. And why would one ever model things other than facts in predicate logic? I think there is confusion (in general, not simply here!) about what a database is intended to model. It models data as it has been stated in the real world, not the things which that data refers to.
> That’s very common, but it’s not always the case. It > seems to me the question of whether the RM is generally appropriate [quoted text clipped - 5 lines] > programmers enter their programs as relations? Do you really think > it’s only because of the tools currently available? Nope. I think data that requires a high number of predicates compared to the number of statements it models can be cumbersome to manipulate in the RM. Equally I think it misses a trick when it comes to facts that might be represented using logical quantification. However, I do believe this situation can be improved (specifically via greater flexibility in defining predicates and integration of existential quantifiers) and its general declarative principles will be increasingly incorporated into programming languages.
Regards, Jim.
> What about automated proof systems? Is the knowledge base and data > associated with an ongoing proof best represented using a set of [quoted text clipped - 9 lines] > if that were useful, many cdt folks would jump at the opportunity to > further promote the use of relations. David BL - 29 Oct 2008 03:37 GMT > > > > > Despite a growing literature, current definitions of "semi-structure" > > > > > are woefully inadequate. [quoted text clipped - 20 lines] > universally applicable (I'm not sure where you think I have stated > that). When I said "universally applicable" I meant (only) with respect to the recording of data, where data means "encoded values".
> However, all data can be stated in predicate logic, and all > statements of logic can be modelled in the RM. Hence, i consider it > absolutely unarguable that there is any data which cannot be > structured as a schema of relations. This is my objection to the > semistructure literature. When you say "data" do you always mean "encoded facts"?
> > The more important question is whether it happens /naturally/. > > The relational model works really well when there is a UoD on which many [quoted text clipped - 4 lines] > And why would one ever model things other than facts in predicate > logic? Exactly!
> I think there is confusion (in general, not simply here!) about what a > database is intended to model. Agreed.
> It models data as it has been stated in > the real world, not the things which that data refers to. Yes that fits with your assumption that data = encoded facts.
However it doesn't make sense when you say data = encoded values. Encoded values just "are". They don't necessarily refer to anything in the real world.
> > That’s very common, but it’s not always the case. It > > seems to me the question of whether the RM is generally appropriate [quoted text clipped - 14 lines] > quantifiers) and its general declarative principles will be > increasingly incorporated into programming languages. JOG - 29 Oct 2008 12:39 GMT > > > > > > Despite a growing literature, current definitions of "semi-structure" > > > > > > are woefully inadequate. [quoted text clipped - 42 lines] > > Exactly! Then may I suggest that your argument is not with the RM, but with the use of predicate logic to model equations, engines, etc. And yet this to me seems trivially true - if I was modelling a human in an art class I'd use clay, not predicate logic.
Of course the resulting piece of clay would be an "encoded value" and thus, by your definition, data. And then a bag of such pieces of clay.. its a database! And my mantelpiece at home, which I display them on....its a data warehouse! And of course, when I spring-clean I am become a DBMS!
;) J.
> > I think there is confusion (in general, not simply here!) about what a > > database is intended to model. [quoted text clipped - 28 lines] > > quantifiers) and its general declarative principles will be > > increasingly incorporated into programming languages. David BL - 30 Oct 2008 02:51 GMT > > > The RM handles facts as naturally as stating them in predicate logic. > > > And why would one ever model things other than facts in predicate [quoted text clipped - 6 lines] > to me seems trivially true - if I was modelling a human in an art > class I'd use clay, not predicate logic. I don't think it's quite so trivial. For example, consider tri- surface as a value-type. A simple type decomposition as a set of triangles where each triangle is independently defined by 3 vertices doesn't express the constraint that the triangles tend to meet each other. It seems appropriate to introduce abstract identifiers for the vertices in order that they may be shared. This is evidently a relational solution. However unlike typical uses of the RM there doesn't appear to be some external UoD to which the tuples, interpreted as propositions can be related. Rather it seems that a particular tri-surface /value/ has introduced a local and private namespace in order to privately apply the RM. Note as well that this is not like an RVA (where we think of only a single relation as a value) because a tri-surface value is associated with /two/ relations - one for the vertices and another for the triangles.
I have wondered whether abstract identifiers are needed precisely when it is useful to express the concept of "common sub-expressions" within nested value-types. Note that scene graphs are typically thought of as DAGs not trees for precisely this reason.
I think there is an interesting interplay between 1) degrees of freedom (or entropy or storage space if you like) in the encoding of a value, 2) abstract identifiers, 2) integrity constraints and 4) update anomalies. The existing normalisation theory in the literature seems relevant but doesn't seem to me to account for recursive type definitions and abstract identifiers. Given this interplay it would be useful to better understand why one encoding would be more desirable than another. In fact I wonder whether there are some objective criteria. Evidently it is not to always avoid abstract identifiers (as if they are implicitly evil). I would guess that as far as the complexity of the integrity constraints there is some sweet spot in the use of abstract identifiers.
> Of course the resulting piece of clay would be an "encoded value" and > thus, by your definition, data. > And then a bag of such pieces of clay.. its a database! > And my mantelpiece at home, which I display them on....its a data > warehouse! > And of course, when I spring-clean I am become a DBMS! JOG - 30 Oct 2008 11:41 GMT > > > > The RM handles facts as naturally as stating them in predicate logic. > > > > And why would one ever model things other than facts in predicate [quoted text clipped - 16 lines] > doesn't appear to be some external UoD to which the tuples, > interpreted as propositions can be related. I use Oracle Spatial to do exactly this sort of thing day in day out in a geospatial domain, and no abstract identifers are introduced. The coordinates of any vertex are used. That is what identifies them - that is what is used (note that these coordinates can happily be relative). Constraints to maintain adjacency use the spatial operators offered by SDO_RELATE. It is very good.
I karate chop your example to pieces! Haiii-ya.
> Rather it seems that a particular tri-surface /value/ has introduced a local and private > namespace in order to privately apply the RM. Note as well that this [quoted text clipped - 13 lines] > relevant but doesn't seem to me to account for recursive type > definitions and abstract identifiers. I am yet to be convinced of the need for abstract identifers (or invention of recursive types) from the examples offered so far.. the wff is the most interesting, but I am currently questioning the sense or utility of decomposing an equation in such a manner /at the logical level/ (as opposed to the physical). Regards, J.
> Given this interplay it would > be useful to better understand why one encoding would be more [quoted text clipped - 10 lines] > > warehouse! > > And of course, when I spring-clean I am become a DBMS! David BL - 30 Oct 2008 15:03 GMT > > > > > The RM handles facts as naturally as stating them in predicate logic. > > > > > And why would one ever model things other than facts in predicate [quoted text clipped - 25 lines] > > I karate chop your example to pieces! Haiii-ya. Please forgive my ignorance - I'm not familiar with Oracle Spatial. Are you suggesting that for a tri-surface all that is needed is a single relation for the triangles, and when for example you want to change what is conceptually a shared vertex (and so which is understood to impact multiple triangles), it is assumed that all vertex values that appear in the relation with that same value (ie coords) are indeed logically shared and therefore are all automatically updated by the DBMS at the same time? If so it is not clear to me how and when the DBMS knows that such an elaborate update policy is required. I presume it is inferred from the integrity constraints. Is that right? Does the DBMS provide such a facility in a generic way?
This reminds me of the idea that one can change the key of a tuple in a relation and have the DBMS automatically update all foreign key references across the entire database.
Anyway, I think there are data entry applications where the concept of "shared values" needs to be under user control. For example in the data entry of a CAD drawing of a car the user may or may not want all the wheels to share the same geometry. The problem with simple copy and paste (and no logical sharing) is that any future edits to the wheel geometry need to be repeated on every copy. The obvious solution seems to be to reference a single shared geometry for a wheel - hence the need for an abstract identifier. Are you suggesting that an alternative is to instead use an integrity constraint! If so how can you specify which geometries are logically tied and which are not (ie even though they just happen to be equivalent in value at that moment in time)? Doesn't that require abstract identifiers of some sort anyway? I can't imagine that values that happen to be the same are always assumed to be shared, because then it would be impossible for a user to copy and paste a value in order to create a copy that will subsequently diverge.
> > Rather it seems that a particular tri-surface /value/ has introduced a local and private > > namespace in order to privately apply the RM. Note as well that this [quoted text clipped - 19 lines] > or utility of decomposing an equation in such a manner /at the logical > level/ (as opposed to the physical). JOG - 11 Nov 2008 16:58 GMT > > > > > > The RM handles facts as naturally as stating them in predicate logic. > > > > > > And why would one ever model things other than facts in predicate [quoted text clipped - 27 lines] > > Please forgive my ignorance - I'm not familiar with Oracle Spatial. First apologies for the delay in response. I have been distracted by TTM.
> Are you suggesting that for a tri-surface all that is needed is a > single relation for the triangles, No, I meant I work with the polygon (SDO_GEOM) object types which Oracle Spatial makes available. Tri-surfaces are a different kettle of fish no doubt.
> and when for example you want to > change what is conceptually a shared vertex (and so which is [quoted text clipped - 6 lines] > policy is required. I presume it is inferred from the integrity > constraints. Is that right? That seems a likely method for checking constraints between polygon instances. However, I think it is symptomatic of a design flaw if one is trying to model these tri-surface jib jobs (which I assume, are continuous 3d surfaces made up of triangles, as are used in graphic models, reaching back to the old days of elite?).
> Does the DBMS provide such a facility in > a generic way? No. One would add the check as a constraint. However, it would be very inefficient I imagine - I could see objections to this that you might respond with (With a typed approach however the situation is inevitable, because a geometry instance encapsualtes/hides away its identifying qualities (being an object). This means they are not exposed to the RA.
> This reminds me of the idea that one can change the key of a tuple in > a relation and have the DBMS automatically update all foreign key [quoted text clipped - 16 lines] > for a user to copy and paste a value in order to create a copy that > will subsequently diverge. One of the other reasons that my reply took time, was that I have thought reasonably hard about the issues you have raised and come to conclude that you are right. At least right in the sense that I now concur that RM can't cope with the example without adding RVA's or inventing artificial identifiers. And both approaches are hack jobs as far as I'm concerned.
On analysis, I think that you have tangentially identified a serious issue with the RM (and not a case for recursive types per se - this is an attempt to solve the issue, rather than describing the cause, and care should be taken not to conflate the two). I will post when I get more time if you are interested in my thought process, but it has clarified some nagging concerns I have had concerning the universal application of 1NF.
Either way I have found the tri-surface and illuminating example. Regards, Jim.
> > > Rather it seems that a particular tri-surface /value/ has introduced a local and private > > > namespace in order to privately apply the RM. Note as well that this [quoted text clipped - 19 lines] > > or utility of decomposing an equation in such a manner /at the logical > > level/ (as opposed to the physical). paul c - 12 Nov 2008 22:27 GMT ...
> One of the other reasons that my reply took time, was that I have > thought reasonably hard about the issues you have raised and come to [quoted text clipped - 3 lines] > far as I'm concerned. > ... Aren't all identifiers artificial? If so, where is the hack?
paul c - 12 Nov 2008 22:28 GMT > Aren't all identifiers artificial? ... Ie., we make them up.
Roy Hann - 12 Nov 2008 23:08 GMT >> Aren't all identifiers artificial? ... > > Ie., we make them up. You need to be more precise. All identifiers are made up somewhere, but not necessarily within the enterprise of interest. "We" might not need to make up any. Or we might need to make up a few--ideally the fewest sufficient.
 Signature Roy
paul c - 13 Nov 2008 00:39 GMT >>> Aren't all identifiers artificial? ... >> Ie., we make them up. [quoted text clipped - 3 lines] > need to make up any. Or we might need to make up a few--ideally the > fewest sufficient. Sure. But as for "not necessarily within the enterprise of interest", why does such a distinction matter?
(BTW, I wasn't implying that I agree with the lazy shallow people who think every relation should have a generated key. I piped up because I'm always happy to try to keep threads about RVA's or recursion or constraints alive. The latter two areas seem rather under-explored to me.)
David BL - 13 Nov 2008 02:49 GMT > >>> Aren't all identifiers artificial? ... > >> Ie., we make them up. [quoted text clipped - 6 lines] > Sure. But as for "not necessarily within the enterprise of interest", > why does such a distinction matter? ISTM it relates to whether we informally consider a UoD and various external predicates to exist /a priori/ (ie independently of the DB) - even though we regard the UoD and the external predicates as outside our mathematical formalism.
If we need to name some things in order to state facts about them then I don't think it's particularly useful to this theory group to try to distinguish between "natural" and "artificial" names. I believe this is even true if we happen to use the DB to help allocate names for informal things that we actually consider to be in the UoD.
I think an /abstract identifier/ should be defined as an identifier that can be regarded as a name of a variable (which holds an abstract value) within some context within the DB (and not in the UoD). When I say context I mean that there is some defined scope (hopefully as small as possible) in which the name is meaningful. For example within the context of representing a tri-surface (ie triangulated irregular network) /value/ it may be useful to introduce a scope in which abstract identifiers are names of vertex values. Note that whenever we have a binding from an identifier to a value within some scope I would say by definition we have a /variable/.
A tuple that contains an attribute value than is an identifier outside the UoD cannot represent a self-contained (ie independently verifiable) fact on the UoD. This is why I think one should introduce as few abstract identifiers as possible. The idea that a domain expert can regard each tuple of a relation as a self contained fact is extremely valuable.
paul c - 13 Nov 2008 03:23 GMT ...
> A tuple that contains an attribute value than is an identifier outside > the UoD cannot represent a self-contained (ie independently > verifiable) fact on the UoD. ... I'd like to see an example, this sounds like some kind of rhetorical imaginary paradox to me.
If I don't know the names of my great-great-great-great-great grandfathers, I would assign numbers or some kind of code to them. I couldn't guarantee that there were exactly 64 of them but I could be sure that I had at least one. However many of them, I don't see how those identifiers would fall outside of, say, a genealogy "UoD". Or do you mean something else?
(I'd still like to know what the "hack" is, too.)
David BL - 13 Nov 2008 08:19 GMT > ... > [quoted text clipped - 11 lines] > those identifiers would fall outside of, say, a genealogy "UoD". Or do > you mean something else? I did mean something else, but I think it needs some rewording because I wasn't very clear. Let there be a relation
father(X,Y) :- X is the father of Y.
Let there be a UoD in which bill,jane are identifiers for two particular humans. Then the tuple
father(bill,jane)
is an independently verifiable statement of fact on the UoD.
Alternatively let there be relations
age(X,N) :- X has age recorded by variable named N. value(N,V) :- Variable named N has value V
where the scope of the variables in value(N,V) is regarded as local to the DB. Then the following tuple
age(bill,n)
is not independently verifiable by a domain expert (who will ask "what is n?" - because n is an abstract identifier).
With this last example I'm curious to know whether anyone would disagree with the idea that it's sometimes possible to interpret an identifier as a name for a variable. I know when I've suggested the idea that relations can represent variables and pointers to variables I've met with plenty of opposition - the argument I presume being that relation (values) only record tuple values, which in turn only record attribute values - so it's argued there can't be variables or pointers.
An interesting question: is there a practical example of the use of abstract identifiers (i.e. identifiers that fall outside the UoD) where they never appear as a candidate key in some relation? If so that would seem to defeat my argument that abstract identifiers can always be interpreted as names of variables defined in a scope within the DB.
paul c - 13 Nov 2008 15:25 GMT ...
> I did mean something else, but I think it needs some rewording because > I wasn't very clear. Let there be a relation [quoted text clipped - 36 lines] > always be interpreted as names of variables defined in a scope within > the DB. This is probably over my head, but I'll chip in anyway. By convention, all tuples in a relation are true. Pointers require resolution because they imply alternatives. In the RM (as we know it today), the only way to express alternatives in an e
|
|