Database Forum / General DB Topics / DB Theory / February 2004
Relational and multivalue databases
|
|
Thread rating:  |
Eric Kaun - 18 Feb 2004 01:55 GMT This letter (monograph?) is in reply to Dawn Wolthius, who received what she seems to have considered a short and unhelpful series of responses from Fabian Pascal between October 9 and November 12, 2002. This was also motivated by postings on the comp.databases.theory newsgroup, specifically the “foundations of relational theory?” thread which began on September 26, 2003 and accumulated over 500 responses before petering out on November 7, 2003. The length of my response to all this indicates the depth of my fascination with the entire discussion.
For those who would flame me for opening old wounds and awakening sleeping dogs, I apologize, but I believe this discussion still has merit. I can appreciate Fabian Pascal's apparent crankiness, given that he's been answering the same questions and debating the same issues for over 20 years now. I do see reason to resurrect this debate (pedagogy), but it needn't be the same people who do it generation after generation.
And kudos to Dawn for always being willing to ask questions, explain herself, and to remain cheerful in the face of insults.
* Dawn wrote: “...find that developers are so much more productive when working with a MultiValue database” and, on her MultiValue flashcard: “Typical relational databases cannot match the productivity of the MultiValue database...”
(I'm assuming by “database” you mean “DBMS,” and there are no RDBMSs yet, excepting the allegedly excellent Dataphor, which I've yet to use.)
It sounds as if the MV development environment (IDE, reporting tools, etc.) is to credit for productivity gains. I'm far more concerned with (provable) correctness than productivity (not that the two are unrelated, unless you're committed to a “code-and-fix” cycle), but have always found that a logically solid foundation (of which “conceptual integrity” is only a diluted description) will enhance productivity – in the short term as well as the long term, though I would also posit that there's an exponential benefit with the size and complexity of the system, lifetime of the product, and turnover within the team.
* Dawn wrote, on her example MultiValue flashcard: “They [relational databases] would also need features such as variable length data structures, untyped elements, user-defined vocabularies and custom functions specified as metadata.”
“Variable length data structures” are allowed by the relational model, which places no restrictions on types (aka domains). It's lousy “relational implementations” like SQL, and even lousy SQL implementations (pick any of them), that fail the relational model utterly by offering only a crippled type system, and until recently, either nonexistent or highly proprietary type definition facilities. The type of your attribute determines whether length is significant; for example, wouldn't you want a U.S. Social Security number to be restricted to a maximum of 9 digits (as well as a minimum of 9 digits)? Wouldn't you want a UPC code to be limited to 1-5-5-1 digits (if I remember correctly), and also to guarantee that the final check digit is a valid checksum?
Current SQL implementations have DATE, INTEGER, and others. “Data structure lengths” violate the relational model in at least two ways: not allowing users to define a type (including operations), and exposing physical implementation details (of built-in types).
* Dawn wrote: “Also, what is the theory that leads to strong typing and fixed lengths that are found in many relational databases?”
The databases you referenced are not relational. Fixed lengths I addressed. As to strong typing, you'll have to be more specific... but if you like, you're free (even in SQL!) to define every attribute as a general-use type (e.g. a BLOB / CLOB / string), and then manipulate it as you like in individual application programs. Every program will have to be aware of how to properly address characters or bytes within the general type (since if you've bothered to define a field it typically has some meaning), and the DBMS will be unable to restrict what's placed in that “data element.” This loses a great deal, since declaring types to an RDBMS allows the definition of your “type rules” in one place, and disallows operations that would corrupt your database (for example, placing “ABCDEFGHI” into the aforementioned Social Security Number attribute).
* Dawn wrote: “They [Don Nelson and Richard Pick] based the way the data was specified (which I'm terming the data model, but that might not be the right use of the term) on how it was to be queried.”
By “specifying data,” I'm assuming you're referring a combination of type definition, relation definition (including normalization), and constraints.
The problem is that we're most of us not psychic, and any database I've ever developed (or inherited) has outgrown its originating application, which means that in designing a database, you have to look at the business domain and not just the queries being asked for right now, which won't necessarily predict future query needs. Unless, of course, your application is so awkward that the users have no desire to extend it, or the company's automation needs are so limited that they have no need to extend it.
Relational shines in its ability to model data of all sorts in a minimal and egalitarian fashion which is closely-aligned with logic, so that you don't repeat yourself (and have to write extra code to synchronize the redundant elements). This applies to every aspect of the relational model and its attendant (though orthogonal) techniques: normalization, type definition, constraints. All of these are designed to allow you to state a required predicate once and be done with it. Predicates underlie the relational model at every turn; it's the single driving concept, rooted in logic and directly applicable to all data. This is the concept that most people miss, as it's not stressed nearly enough (by enough people). It's a not-entirely-unexpected nicety that specification languages like Z and VDM use predicates as well; hence the practical value of relational in executing queries augments its ability to model high-level specification abstractions.
The root problem is that the “entity-relationship” view of data is horribly limited; relationships often require (or acquire) enough additional attributes and “business logic” to be reasonably viewed as entities in their own right, and at that point you'll find you've been treating a first-class citizen as a second-class one, and in addition marring your syntax with arbitrary (and unnecessary) path expressions. In “An Introduction to Database Systems (8th Edition),” C.J. Date gives a good example of this phenomenon: that of marriages. While you could regard a marriage as a simple “relationship” between two “entities,” queries like “How many marriages took place between 1972 and 1974 in Old St. Luke's Episcopal Church?” obviously “view” a marriage as an entity – and it has attributes such as date, time, venue, etc. Line items on an order, for example, could just be viewed as a relationship between an order and a stock item, right?
Don't think of entities and their relationships; those shift. Predicates are far more stable, and treating them as individual concepts will do much to enhance their use and reuse.
* Dawn wrote: “...multivalues crop up in the way people talk and think.”
Perhaps, but that seems a highly subjective statement predicated on the fact that you know (and enjoy) the concept to begin with. I could just as easily state that people talk and think in terms of predicates, and that despite being an object-oriented programmer, I find the use of predicates far more useful and “roll them up” into objects only as late as humanly possible. I've found that business people talk much more in terms of raw “rules” - which correlate much more directly to predicates than anything else. Using objects (as in OOA) can be useful to flush (flesh?) out additional concepts, but are not the lingua franca of business, despite what OO pundits will tell you.
Besides, people often talk and think in terms that are various combinations of profane, vague, contradictory, unrealistic, nonsensical, wishful, etc... that doesn't mean our logic machines have to work that way!
* Dawn wrote: “It [MultiValued platforms] is old, yet could be revived as it provides an amazingly productive environment, perhaps because it is so forgiving and because it resembles XML to quite an extent.”
I would take resemblance to XML to be a damning attribute until demonstrated otherwise... and as far as productivity, I'll assume you're right; that doesn't say much about the model itself, at least not to me. I personally want my DBMS to be as unforgiving as possible, as long as it's unforgiving of someone trying to violate its rules! The rules (relations, types, constraints) are how I state what the data “means.” That's how I protect my company's data from corruption (e.g. from having its representation or encoding mangled into a “value” that violates the data's meaning).
“Firewalling” is something every good programmer does, even within his or her own code. It helps prevent mistakes, it helps protect us from the logical (though sometimes undesirable and unpredictable) effects of combining multiple pieces of code, and best of all, it expresses our intentions. All software is description, and explicit is best.
* Dawn wrote: “It appears there are lower initial costs and lower ongoing costs for companies using the MV platform over a more standard RDBMS (Oracle, SQL Server, etc.)
For initial costs, I can't say; in any event, they're dwarfed by ongoing costs for a system of any degree of importance (and barring license price gouging, which I've seen). Ongoing costs are tricky to measure; however, the cost of data inconsistencies (violation of rules) is likely to dwarf productivity losses. Regarding productivity, this is nearly impossible to meaningfully compare, but I've been able to develop in 1 day (actually during the meeting where we were discussing the requirements for it!) an application consisting of 20 tables, a dozen screens, and several reports, against an MS Access “DBMS” which was later ported (in very little time) to SQL Server. I don't think this is either impressive or unusual (it occupied me during a dull meeting), but the constraints were never a hindrance; in fact, a properly relational language would have allowed me to express many more “business rules” directly in the database, rather than in my code, thus saving me time and potential errors.
* It really isn't very time-consuming to add SQL tables, given even crude database design tools.
Dawn wrote: “Of what are we [those who would use XML] ignorant? Of some theory or practical advice?”
Possibly the theory, or possibly of the practical value of the theory, or possibly neither (I can't make a blanket statement of ignorance, but do believe that XML is a huge leap backward for this industry, and will hasten the decline of the reputation of programmers everywhere once light is shined on the naked Emperor).
Read chapters 3-10 in Date's “Intro to DB Systems (8th edition).” Then read additional chapters to see how the pure theory translates into direct benefits including such “pragmatic” topics as database distribution, concurrency and recovery, type systems (!), and even code generation (actually declarative programming, but code generation is one implementation that's more marketing-friendly).
* Dawn wrote: “It is easy to understand why it is so common when we see that it [a 1-many relationship] is also a generalization of the relationship between a relation and an element in the relation.”
First, see the dangers of dividing relations (predicates) into “entities” and “relationships.” And keep in mind the excellent analogy from Hugh Darwen: “Types are to relations as nouns are to sentences.” Types (domains / attributes, sort of) are the things we can speak about; relations are what we can say about them. Since you refer to language all too frequently in referencing MultiValue, the noun/sentence distinction should strike a chord.
* Dawn used an example of people owning cars and bikes, and how in MultiValue the cars and bikes could be multi-valued attributes of the person.
This might be OK as long as the attribute is one on which you're not doing a computation (in which case its cardinality can't be easily increased or decreased), and as long as you never need to track additional information about those attribute values. How would you store, for example, the model and year of the car or the color of the bike? Additional attributes – and if so, how do you correlate those with the original car and bike attributes? If they're separate attributes, then what does it mean if I have 3 bike “values” and 2 color “values”?
And the relational model “makes sense” of this sort of “relationship” perfectly well; in addition, it allows you to formulate many more constraints about the data, and these constraints are an important aspect of “business rules” that you would otherwise need to program (perhaps repeatedly).
Hierarchies are much more uncommon, and much less useful, than most people seem to acknowledge. Even the cliché “org chart,” while hierarchical, fails to capture the matrix reporting which occurs in most organizations.
* Dawn wrote: “...the cascading deletes for referential integrity are a no-brainer...”
They are in relational too, and even in SQL.
* Dawn wrote: “...switching cardinality of a data element in a maintenance phase (or post-design) of a project is very easy to do, breaking very little.”
Only if the data element isn't involved in actual logic (e.g. the text examples like car names I usually see in Pick examples). If the data element is numeric, does making it a list mean that reports now have to sum the elements in the list? Average them? If I'm only displaying them, why even a list? Why not a string?
And, for the record, the relational model supports lists (or relations!) as types as well. It just isn't typically a good idea, unless it's an attribute with which you do nothing but display on reports and screens.
* Dawn wrote: “The database tables could still be stored as in an RDBMS, but show themselves to the designers and developers in this more intuitive format.”
Intuition is a poor basis for logical and data decisions, as it's subjective and prone to change. In particular, a database consisting of many relations can be used to generate an uncountable (?) number of hierarchies, depending on which (and how many) joins are done, and in what order, and on what attributes. Which one you choose depends on your immediate needs, but see above regarding the durability of “query needs”; they're prone to change, so why bother with a hierarchy in the first place when it expresses only one thing you're going to need to do?
* Dawn wrote: “...a column constraint specifying the maximum length of an element value might be easy for the computer to do, but doesn't make all that much sense when it comes to all of the integrity constraints one might put on a field.”
Agreed. The “type constraint” you mention isn't one; it's because the non-relational SQL doesn't offer real type definition, and exposes unnecessary physical implementation details, that such things exist at all. That ain't relational. So how do MultiValue DBMSs enforce types; for example, that an SS# has to be 9 digits? Does the DBMS know or enforce this? Does it know or enforce any restriction at all on types? While unnecessary bounding is a problem, infinite bounding (allowing any values in any data element) is also a problem, as I'm sure you can see. Dawn wrote: “I suspect there are times when having a constraint coded into an RDBMS increases the maintenance work to an extent that is not cost-justifiable, particularly when the constraint is one that is prone to change.”
Constraints are or aren't; they're seldom prone to change. Changes like that usually indicate either an overly-specific constraint (which a good RDBMS would allow to be generalized) or shallow analysis. If relational systems (all 1 of them!) had a consistent catalog (which represents, in relational form, the structure of the database), then you could possibly add constraints to the catalog itself (since they're just relations too!), which would further enforce requirements. This is beyond me, but I theorize that it could both be done and be very expressive, not implicit in the code of many programs across the business.
* Dawn wrote: “Simplicity: All data are strings with additional type specifications for external purposes only. Not only simple to specify but to maintain!”
This places the burden of maintenance in every program that uses your file/table; every application has to know enough about that data element not to violate how you intend it to be used. Because those types are not in the database, you'll need a document of some sort to describe to each programmer how to use the data! Or, at least, you have to hope your data element names are suggestive enough to describe to people how to use it. You risk much with this sort of assumption; I don't think there's anything simple about it. You're simply delaying paying for the “simplicity,” but that loan accumulates interest.
Again, if the data is only displayed, this might be acceptable. But type systems can be very rich, and that richness is extremely useful, and practical. It lets you (again) say something of critical business value, once and only once.
* Dawn wrote: “No need to logically retrieve multiple tables when concerning yourself with a single proposition, simply because the proposition has conjunctive clauses.”
The hierarchies that MultiValue allows you to establish aren't single propositions, unless you restrict yourself you propositions about the “top-level entity” in your file. Attributes in a relation are conjunctive clauses (e.g. “the employee has id ID and has name NAME and was born on BIRTHDATE”). If you “nest” the employee's dependents, for example, you're making a number of additional assertions, and merely glossing over the fact that they're separate – furthermore, this approach becomes intractable if one of those dependents becomes an employee.
If you read about “antipatterns,” you might know one called Big Ball of Mud. The idea is that applications are best factored into multiple pieces, so no one piece becomes too complex. Propositions are similar – why lump them together unless you have to?
* Dawn wrote: “...but the only meat I can find is related to practical issues regarding integrity constraints – not enough to write off the entire MultiValue platform nor XML.”
On the contrary: integrity constraints, and types, define not just the spine but the skeleton and muscle of your application. It's only flaccid relational-scented SQL implementations that have diluted this fact. I hope for great things (including application generation) from the relational implementation in Dataphor (and, I hope, soon many others). To repeat what Mr. Pascal said: “It [XML] was invented by people who know nothing about data management – text publishers.” Not a solid foundation for logical machines like programs.
Dawn wrote: “...instead of a programmer coding some procedural code (!) specific to a certain circumstance, the logic is declared in the database, but designated as a local constraint where the next developer working with the database might not want/need to apply it?”
Can you give an example? I suspect your logic isn't sufficiently general, and that the Alphora folks could volunteer some specifics on their D4 language's capabilities in this arena.
- Eric
Dawn M. Wolthuis - 18 Feb 2004 17:28 GMT Wow -- thanks, Eric! I appreciate your responses.
I'm sure you'll understand if it takes me a little time to address your points. Also, you are responding to a dialolg from 2002 when I was starting to delve further into researching why the practical experience I had did not align with the database theory I had learned. I've learned a bit more in the past year (so I'm still capable of it ;-) and will accept several of your points without need to respond.
But, I have become more, rather than less, convinced that the relational model is not the most useful so I could still be classified as a SQL detractor (which many relational theorists are too) as well as a relational theory detractor (not that it isn't good as a theory, but that it doesn't yield productive development environments). So it is useful for me to try out my objections on interested parties who can correct my logic or agree that a particular topic is a matter of taste or the requirements being addressed.
Along with the fact that I'm probably more interested in theories that yield developer productivity than those which yield a more academic goal, the biggest areas of disagreement I have relate to 1. modeling propositions as relations rather than functions that represent graphs (mathematically) 2. referential integrity as well as type constraints and where these should be specified and enforced.
I'm sure there are other areas I can respond to as well. Thanks for your interest in my questions and your thoughtful responses. As you can see, my attempts to dialog with trees or dogs end up with me talkin' to myself and/or trying to determine whether I really am stupid or just ignorant ;-)
Cheers! --dawn
Eric Kaun - 18 Feb 2004 22:13 GMT > Wow -- thanks, Eric! I appreciate your responses. > [quoted text clipped - 4 lines] > the past year (so I'm still capable of it ;-) and will accept several of > your points without need to respond. Not a problem, I know it's a huge posting.
> Along with the fact that I'm probably more interested in theories that yield > developer productivity than those which yield a more academic goal, I'll posit that at least in shops where I've worked, provable correctness is far more than an academic goal. In fact, the lack of such is the biggest productivity deterrent I see, hands down. Its handmaidens are closure and referential transparency, to which relational caters nicely.
> the biggest areas of disagreement I have relate to > 1. modeling propositions as relations rather than functions that represent > graphs (mathematically) Please explain and/or give an example of such a function.
> 2. referential integrity as well as type constraints and where these should > be specified and enforced. This one I'll probably argue about more - suffice it to say that declaring them once and generating the necessary enforcement (e.g. on the client) will enable higher productivity and ensurable correctness. Where do you propose specifying and enforcing them?
- erk
Bob Badour - 18 Feb 2004 18:40 GMT > This letter (monograph?) is in reply to Dawn Wolthius, who received what > she seems to have considered a short and unhelpful series of responses [quoted text clipped - 11 lines] > over 20 years now. I do see reason to resurrect this debate (pedagogy), > but it needn't be the same people who do it generation after generation. Good luck. I predict you will discover an inexhaustible supply of vociferous ignorami.
> And kudos to Dawn for always being willing to ask questions, explain > herself, and to remain cheerful in the face of insults. I predict you will quickly learn not to encourage the vociferous ignorami.
> * Dawn wrote: ?...find that developers are so much more productive when > working with a MultiValue database? [quoted text clipped - 6 lines] > It sounds as if the MV development environment (IDE, reporting tools, > etc.) is to credit for productivity gains. What gains? Her alleged productivity advantage is nothing but a myth. She is like a person who claims: "After making very careful measurements, we have determined that horses are more productive than cars. A man can ride a horse ten miles in far less time than he can push a car a similar distance."
> * Dawn wrote, on her example MultiValue flashcard: ?They [relational > databases] would also need features such as variable length data [quoted text clipped - 3 lines] > ?Variable length data structures? are allowed by the relational model, > which places no restrictions on types (aka domains). As you notice, her allegation or assumption is false, which renders her entire point meaningless.
> * Dawn wrote: ?Also, what is the theory that leads to strong typing and > fixed lengths that are found in many relational databases?? > > The databases you referenced are not relational. Again, it suffices to note that Dawn is ignorant and is burning a straw man.
> * Dawn wrote: ?They [Don Nelson and Richard Pick] based the way the data > was specified (which I'm terming the data model, but that might not be [quoted text clipped - 3 lines] > type definition, relation definition (including normalization), and > constraints. She refers to exposing every physical implementation detail to the most causual of users and to tying applications to specific physical artifacts. Only the profoundly ignorant can consider such a feature advantageous or productive compared to logical and physical independence.
> * Dawn wrote: ?...multivalues crop up in the way people talk and think.? > > Perhaps, I disagree. Sets crop up in the way people talk and think. Multivalues crop up in the way the cognitively damaged or mentally injured talk and think.
> * Dawn wrote: ?It [MultiValued platforms] is old, yet could be revived > as it provides an amazingly productive environment, perhaps because it > is so forgiving and because it resembles XML to quite an extent.? > > I would take resemblance to XML to be a damning attribute until > demonstrated otherwise... It suffices to note Dawn's profound ignorance of the Great Debate happening nearly 30 years ago and that the debate proved pick sucks.
> * Dawn wrote: ?It appears there are lower initial costs and lower > ongoing costs for companies using the MV platform over a more standard > RDBMS (Oracle, SQL Server, etc.) > > For initial costs, I can't say Again, it only appears that way to the intellectually crippled. Her quackery is no different from the homeopaths who think water remembers. I highly recommend _How We Know What Isn't So_ by Thomas Gilovich ISBN: 0029117062.
Otherwise, it suffices to note that Dawn is an ignorant quack.
> Dawn wrote: ?Of what are we [those who would use XML] ignorant? Of some > theory or practical advice?? > > Possibly the theory, or possibly of the practical value of the theory, Every principle of sound data management and of sound application design.
> * Dawn wrote: ?It is easy to understand why it is so common when we see > that it [a 1-many relationship] is also a generalization of the [quoted text clipped - 7 lines] > all too frequently in referencing MultiValue, the noun/sentence > distinction should strike a chord. She is a vociferous ignoramus with an axe to grind. She is impervious to reason and logic.
> * Dawn used an example of people owning cars and bikes, and how in > MultiValue the cars and bikes could be multi-valued attributes of the > person. > > This might be OK No, it's not. Search on "red blue car"
> * Dawn wrote: ?...the cascading deletes for referential integrity are a > no-brainer...? > > They are in relational too, and even in SQL. Triggered operations of any kind, in fact.
> * Dawn wrote: ?...switching cardinality of a data element in a > maintenance phase (or post-design) of a project is very easy to do, > breaking very little.? > > Only if the data element isn't involved in actual logic (e.g. the text > examples like car names I usually see in Pick examples). Search on "red blue car". Her assertion is fatuous and demonstrates her unwillingness to acknowledge the serious flaws in the model she espouses. Like every vociferous ignoramus, Dawn lives in a fantasy world of denial.
> * Dawn wrote: ?The database tables could still be stored as in an RDBMS, > but show themselves to the designers and developers in this more > intuitive format.? > > Intuition is a poor basis for logical and data decisions Any usability expert will tell you that intuition is complex and unpredictable. Any statement to the above effect requires careful empiricism for backing. In fact, this is true of any complex result of biological origin.
Frankly, Dawn is simply an ignorant making an absurd claim.
> * Dawn wrote: ?...a column constraint specifying the maximum length of > an element value might be easy for the computer to do, but doesn't make [quoted text clipped - 3 lines] > Agreed. The ?type constraint? you mention isn't one; it's because the > non-relational SQL doesn't offer real type definition It suffices to note that Dawn is an ignorant tilting at the windmills of her imagination.
> * Dawn wrote: ?Simplicity: All data are strings with additional type > specifications for external purposes only. Not only simple to specify > but to maintain!? > > This places the burden of maintenance in every program that uses your > file/table; It also exposes her deceit regarding productivity. By ignoring integrity and by discounting the cost of corruption, she pretends--in her own mind--that she can increase productivity as if the only measure that counts is the time until the first compilation that reports no errors instead of the time until the system runs correctly.
> * Dawn wrote: ?No need to logically retrieve multiple tables when > concerning yourself with a single proposition, simply because the > proposition has conjunctive clauses.? > > The hierarchies that MultiValue allows you to establish aren't single > propositions It suffices to note that Dawn is an ignorant who fails to comprehend even what she, herself, writes.
> * Dawn wrote: ?...but the only meat I can find is related to practical > issues regarding integrity constraints ? not enough to write off the > entire MultiValue platform nor XML.? > > On the contrary: integrity constraints, and types, define not just the > spine but the skeleton and muscle of your application. Again, it suffices to note that Dawn is an ignorant making absurd, nonsensical statements.
> Dawn wrote: ?...instead of a programmer coding some procedural code (!) > specific to a certain circumstance, the logic is declared in the > database, but designated as a local constraint where the next developer > working with the database might not want/need to apply it?? > > Can you give an example? If the data require the constraint, it doesn't matter whether the next application programmer who comes along finds the constraint convenient. The constraint keeps the fool from corrupting the data.
If requirements have changed, then it is much more productive to reflect that change in one central location, ie. the database, than in every application. If the change is such that it will require changing applications, it makes sense to centralise the error-detection logic to prevent costly mistakes in one application from damaging all the others.
Dawn is a chronic ignoramus whose nonsense does not warrant a reply.
Dawn M. Wolthuis - 18 Feb 2004 22:23 GMT OK, I'll bite, but only for the purpose of entertaining myself and others who find this amusing.
<snip> <snip>
>Good luck. I predict you will discover an inexhaustible supply of vociferous ignorami.
I am surely ignorant, but not so ignorant that I believe I have nothing to learn from the perspectives of others and from asking questions when I'm perplexed, Bob. My ignorance doesn't compare to someone who thinks they have all of the answers, however.
> > And kudos to Dawn for always being willing to ask questions, explain > > herself, and to remain cheerful in the face of insults. > > I predict you will quickly learn not to encourage the vociferous ignorami. Employing sticks and stones rather than logical arguments is a long-standing technique in the bag of tricks used to dislodge women, among others, in business situations. It is a sub-cateogory of the wider collection of intimidation techniques and is rarely as effective as some of the more subtle approaches. You should have learned long ago not to be a bully.
> > * Dawn wrote: "...find that developers are so much more productive when > > working with a MultiValue database" [quoted text clipped - 11 lines] > determined that horses are more productive than cars. A man can ride a horse > ten miles in far less time than he can push a car a similar distance." I will be the very first to state that I do not have enough emperical data to support this -- it is what I have found in my experience and when I compare with others, there are many such anecdotes. That is not conclusive. Given that most (all?) benchmarks for databases these days require that the database be SQL-based, it isn't even easy to get comparisons on what should be somewhat straight-forward to measure between the implementations of relational theory and that of PICK.
There are some facts that perhaps we could measure at some point related to the total number of software developers required to write and also to support systems with similar functionality, but whoever loses will make arguments about the differences in functionality, claiming that the additional resources required are related to an equivalent gain for the business. So, how would you propose testing out a hypothesis, such as mine, that non-1NF implementations that are not based on the relational model, such as PICK, provide a bigger bang for the buck for the entity paying the bills than do the current implementations that call themselves RDBMS's?.
> > * Dawn wrote, on her example MultiValue flashcard: "They [relational > > databases] would also need features such as variable length data [quoted text clipped - 6 lines] > As you notice, her allegation or assumption is false, which renders her > entire point meaningless. You can say that I'm wrong, but you have given no proof.
> > * Dawn wrote: "Also, what is the theory that leads to strong typing and > > fixed lengths that are found in many relational databases?" > > > > The databases you referenced are not relational. > > Again, it suffices to note that Dawn is ignorant and is burning a straw man. I've already agreed that I am not all-knowing, but I'm addressing both the theory and implementations of RDMBS -- to what straw man are you referring? Even if RDBMS's do not require fixed length fields, for example, the % of variable length fields is quite low, I suspect. I do not have hands-on experience with more than ten applications implemented in an RDBMS, however, so perhaps I'm wrong. If my assumptions are incorrect, I'm more than happy to be corrected.
> > * Dawn wrote: "They [Don Nelson and Richard Pick] based the way the data > > was specified (which I'm terming the data model, but that might not be [quoted text clipped - 8 lines] > Only the profoundly ignorant can consider such a feature advantageous or > productive compared to logical and physical independence. As I type this, the person on CNN just said "that's comically pompous". It rolled off his tongue so well, I'll use it here. Is it even mildly interesting that IBM is pushing their Informix users to DB2, but is retaining their U2 users in U2? Why? Dollars. So, I'm apparently not the only "profoundly ignorant" person out there.
> > * Dawn wrote: "...multivalues crop up in the way people talk and think." > > > > Perhaps, > > I disagree. Sets crop up in the way people talk and think. Multivalues crop > up in the way the cognitively damaged or mentally injured talk and think. Is it time for me to say "Shut up, Bob" yet?
> > * Dawn wrote: "It [MultiValued platforms] is old, yet could be revived > > as it provides an amazingly productive environment, perhaps because it [quoted text clipped - 5 lines] > It suffices to note Dawn's profound ignorance of the Great Debate happening > nearly 30 years ago and that the debate proved pick sucks. I have studied both the history of PICK and the history of SQL and RDBMS's to varying degrees. I'm aware of the Bachman/Codd debates as well as challenges to SQL by QBE, for example. I also know that Pick and Codd had no fondness for the thinking of the other. I am unware of any proof that "pick sucks" however -- please enlighten me (and IBM, for that matter).
> > * Dawn wrote: "It appears there are lower initial costs and lower > > ongoing costs for companies using the MV platform over a more standard [quoted text clipped - 5 lines] > is no different from the homeopaths who think water remembers. I highly > recommend _How We Know What Isn't So_ by Thomas Gilovich ISBN: 0029117062. You have probably figured out that in spite of being appalled by your lack of basic manners in such a discourse, I am amused at being called "intellectually crippled" and the like. I'm thinking that if I'm now stupid (or perhaps always was) then maybe now I can be good looking (I sortof figured I had to choose one and I was told I was smart more often than beautiful, so ...).
> Otherwise, it suffices to note that Dawn is an ignorant quack. Ignorant, yes -- a "quack" -- nope, guess again.
<snip>
> She is a vociferous ignoramus with an axe to grind. She is impervious to > reason and logic. I have no axe to grind, nor financial investment to protect in this regard. I'm curious and rational and would like to get a better understanding related to relational and non-relational databases -- both theory and implementation.
I would like to hear one statement of reason or logic, along with the axioms from which you think it arises, that you believe I disagree with. I don't think I'm incapable of following reason. I'm teaching two weeks of Calculus in a few weeks to fill in for a paternity leave and haven't taught it in 20 years. However, I can still prove theorems related to continuity using either the analyt's tools of epsilon-delta proofs or the logician's non-standard analysis tools that include definitions of infinitessimals. I suspect that few illogical people could do this. I might have some brain blips in that peri-menopause state, but if you pass me some specific reason or logic to which you believe I am impervious, I will attempt to understand it.
> > * Dawn used an example of people owning cars and bikes, and how in > > MultiValue the cars and bikes could be multi-valued attributes of the [quoted text clipped - 3 lines] > > No, it's not. Search on "red blue car" Gotta admit I miss your point here, Bob.
It is typically considerably easier to query non-1NF structures than to use SQL on anything. Here's a common type of query:
LIST STUDENTS WITH EVERY MAJOR NOT EQUAL "MATH"
Think this easy query through in your typical RDBMS (SQL) implementation. This is not an isolated case.
And it isn't just single fields that can be nested, but fields can be grouped together and nested as a "nested function" or "nested relation" (if you prefer).
<snip>
> > Intuition is a poor basis for logical and data decisions But a good basis for a hypothesis
<snip>
> It also exposes her deceit regarding productivity. By ignoring integrity and > by discounting the cost of corruption, she pretends--in her own mind--that > she can increase productivity as if the only measure that counts is the time > until the first compilation that reports no errors instead of the time until > the system runs correctly. Finally, I almost stopped reading, but here you actually have some meat. I disagree with your statement and will address this when responding to Eric
<snip>
> Dawn is a chronic ignoramus whose nonsense does not warrant a reply. and yet you just can't help yourself, can you, Bob? and this time I gave you the satisfaction of a reply too, which might not have been the better part of wisdom, but I felt like it. So, have a good day and don't forget to smile a little.
--dawn
Eric Kaun - 19 Feb 2004 13:50 GMT > OK, I'll bite, but only for the purpose of entertaining myself and others > who find this amusing. *sigh* And I thought I was going to be able to resurrect the debate sans flaming. Oh well.
> It is typically considerably easier to query non-1NF structures than to use > SQL on anything. Here's a common type of query: [quoted text clipped - 3 lines] > Think this easy query through in your typical RDBMS (SQL) implementation. > This is not an isolated case. So how would this "look" in Pick? Are your assumptions that a student has multiple majors? Once I see what you're getting at, I (we?) can respond with relational counterparts. This doesn't look especially troubling, even for SQL. I'm assuming (until I hear otherwise) that you'd have a Student relation, a Major relation, and a StudentHasMajor relation. Depending on how you "knew" it was "Math", you could omit the Major relation from your query. Join Student to StudentHasMajor, limit based on major, and project over student ID (or whatever).
In Pick I'm guessing you'd say you have just 1 file, with a list of majors as an attribute. But consider the following: 1. To what would you attach, for example, requirements for a major? Surely a major is more than a piece of text? 2. To what would you attach, for example, the date that the student picked up (or completed!) a major? 3. Do I really need to loop through every student in the file to determine how many math majors there are?
And many such others.
> And it isn't just single fields that can be nested, but fields can be > grouped together and nested as a "nested function" or "nested relation" (if > you prefer). Can you explain further? What does the grouping mean, and how does a function figure into this?
- Eric
Bob Badour - 19 Feb 2004 16:14 GMT > > OK, I'll bite, but only for the purpose of entertaining myself and others > > who find this amusing. [quoted text clipped - 12 lines] > > So how would this "look" in Pick? That was Pick. What Dawn omits is the same query might ask different questions depending on the physical file structure. See "red blue car"
In D:
STUDENTS WHERE ( MAJOR WHERE MAJOR = 'MATH' ) = TABLE_DUM
or:
STUDENTS WHERE NOT ( 'MATH' IN MAJOR )
In SQL:
SELECT * FROM STUDENTS WHERE 'MATH' != ALL ( SELECT SUBJECT FROM MAJOR WHERE STUDENTS.ID = MAJOR.STUDENT_ID )
Dawn is a vociferous ignoramus. First, she ignores how easy the 'challenge' is in a truly relational language. Second, she demonstrates profound ignorance of the importance of precision and explicitness. Terse is not necessarily good. Especially if terse leads to a reduction in expressiveness as it does in Pick or if it obfuscates meaning as it does in Pick or if it increases the likelihood of difficult to discover errors as it does in Pick.
For your own benefit, I suggest you assume all Pick users--much like all crack users--have been cognitively damaged by their use. Thus far, I have seen no evidence of any Pick user who has survived unscathed.
> Are your assumptions that a student has > multiple majors? The WITH keyword anticipates either multiple values or multiple pointers to a file of majors.
> In Pick I'm guessing you'd say you have just 1 file, with a list of majors > as an attribute. Not necessarily. One might. One might have pointers to a file of majors with the join hard-coded in the dictionary for all users. Users have no ability to express joins unless hard-coded in the dictionary. Alternatively, one might have multiple mv attributes where values might or might not be associated by physical order.
Dawn M. Wolthuis - 19 Feb 2004 17:24 GMT <snip>
> > It is typically considerably easier to query non-1NF structures than to > use [quoted text clipped - 13 lines] > Join Student to StudentHasMajor, limit based on major, and project over > student ID (or whatever). First note that the corresponding SQL statement is not complicated for a SQL coder, but notice how English-like the PICK counterpart sounds. Compare:
LIST STUDENTS WITH EVERY MAJOR NOT EQUAL "MATH"
with the SQL corresponding statement that Bob wrote for this:
SELECT * FROM STUDENTS WHERE 'MATH' != ALL ( SELECT SUBJECT FROM MAJOR WHERE STUDENTS.ID = MAJOR.STUDENT_ID )
Think of a "file" as a function that maps an identifier to a set of attributes. For example, the STUDENTS function could be
STUDENT(identifier)={string-of-attribute-data-with-delimiters}
This string of data could be: Joan<field-delimiter> Doe<field-delimiter> 6165551234<value-delimiter>7615552222<field-delimiter> MATH<sub-value-delimiter>2002<value-delimiter>PHIL<sub-value-delimiter>2003< field-delimiter>
There are also <record-delimiter> <file-delimiter>
and one can define delimiters to any desired level, with typically at least this many included in the packaged functions for the database implementation
Then associate this with vocabulary functions such as
FirstName(STUDENTS, identifier) = string-field-in-location-1 SecondaryPhoneNumber(STUDENTS, identifier) = string-value-in-field-3-value-2
So it is the vocabulary functions that make the queries very easy for users. A vocabulary entry can contain many other types of functions, including those that reach to any other files within the system.
So, the details about a major, for example, would be in a subject file such as MAJORS, which would also be a function
MAJORS(identifier) = {string}
Then a vocabulary entry could be defined for students such as
MajorRequirement(STUDENTS, identifier) = MAJORS(StudentMajor, field n)
So, everything is defined in terms of functions including stored data and any other vocabulary (for stored or virtual fields)
> In Pick I'm guessing you'd say you have just 1 file, with a list of majors > as an attribute. But consider the following: [quoted text clipped - 14 lines] > Can you explain further? What does the grouping mean, and how does a > function figure into this? I think the above example shows this. Please let me know if I should clarify anything here. Thanks. --dawn
> - Eric Mikito Harakiri - 19 Feb 2004 18:10 GMT > This string of data could be: > Joan<field-delimiter> > Doe<field-delimiter> > 6165551234<value-delimiter>7615552222<field-delimiter> MATH<sub-value-delimiter>2002<value-delimiter>PHIL<sub-value-delimiter>2003<
> field-delimiter> How about:
<sub-value-delimiter>2002<sub-sub-value-delimiter>Jan<sub-sub-sub-value-deli miter>25
Am I expert Pick programmer already?
Dawn M. Wolthuis - 19 Feb 2004 18:38 GMT > > This string of data could be: > > Joan<field-delimiter> > > Doe<field-delimiter> > > 6165551234<value-delimiter>7615552222<field-delimiter> MATH<sub-value-delimiter>2002<value-delimiter>PHIL<sub-value-delimiter>2003<
> > field-delimiter> > > How about: <sub-value-delimiter>2002<sub-sub-value-delimiter>Jan<sub-sub-sub-value-deli
> miter>25 > > Am I expert Pick programmer already? You got it -- simple, right? It's just a function that represents a graph. We could call it a web! But on the outside to the user, it is a vocabulary with functions (also part of the vocabulary).
How do you decide whether info is a sub-table of an existing table or should go in its own table? If the information is functionally dependent. My phone numbers are information that one might think of as being in the relationship between me and the telecom industry. But my phone numbers have no meaning apart from me -- sure they are phone numbers that exist in the world, but the point of capturing the data is to capture information about a person. So, don't stick them in some other function (aka file) -- put them with the person, even if there are more than one of them.
Make sense? --dawn
Mikito Harakiri - 19 Feb 2004 20:37 GMT <sub-value-delimiter>2002<sub-sub-value-delimiter>Jan<sub-sub-sub-value-deli
> > miter>25 > > [quoted text clipped - 3 lines] > We could call it a web! But on the outside to the user, it is a vocabulary > with functions (also part of the vocabulary). We must hire somebody in this group to do subtitles for humor impaired.
Bob Badour - 19 Feb 2004 20:46 GMT <sub-value-delimiter>2002<sub-sub-value-delimiter>Jan<sub-sub-sub-value-deli
> > > miter>25 > > > [quoted text clipped - 9 lines] > >[It was a joke dumbass!] How can everything be a vocabulary? I thought everything was an object! Does this mean vocabulary is synonymous with object?
[Pick folks are as flaky and stupid as object folks.]
[Closed captioning for the mentally impaired brought to you by the letter D and the number 9.]
Mikito Harakiri - 20 Feb 2004 21:47 GMT <sub-value-delimiter>2002<sub-sub-value-delimiter>Jan<sub-sub-sub-value-deli
> > > > miter>25 > > > > [quoted text clipped - 17 lines] > [Closed captioning for the mentally impaired brought to you by the letter D > and the number 9.] Let me express myself little bit more politely. Yes, 1NF could be considered not quite as sound theoretical basis. But, what is your idea? Files and delimiters? Any database researcher would stop reading your manuscript right there.
Dawn M. Wolthuis - 21 Feb 2004 00:17 GMT > <snip> > Let me express myself little bit more politely. Yes, 1NF could be considered > not quite as sound theoretical basis. But, what is your idea? Files and > delimiters? Any database researcher would stop reading your manuscript right > there. Yes, it is simply a key-value -- that is function(key) = value type of data storage. In fact, it makes more sense to call it a file structure than a database. When I first saw PICK (as a manager at a new place of employment) I remarked "that is NOT a database"! I don't care what we call it, but I wouldn't want to go back to the DBMS's that my teams and I had worked with in the past. It would simply not be stewardly (in terms of dollars and people time) to do so. But I'll keep working at determinig whether this is a fluke or whether there really is a good logical reason for non-relational data storage to be advisable in more instances than, perhaps, a relational model is appropriate.
Thanks for engaging. I'm still a student and not trying to sound like I have all the answers -- just a bunch of questions after considerable research and experience. Cheers! --dawn
Mikito Harakiri - 21 Feb 2004 01:13 GMT > Yes, it is simply a key-value -- that is function(key) = value type of data > storage. In fact, it makes more sense to call it a file structure than a [quoted text clipped - 6 lines] > data storage to be advisable in more instances than, perhaps, a relational > model is appropriate. Dawn,
Relational is not about the cost. It is not about storage either. Some folks say it's about data management, but I would disagree. It's just a high level programming model. Unless, you demonstrate some innovative Pick methods in that area, you'll have hard time finding good listeners here on cdt.
Speaking about cheap solutions, there are open source databases...
Dawn M. Wolthuis - 21 Feb 2004 01:37 GMT > > Yes, it is simply a key-value -- that is function(key) = value type of > data [quoted text clipped - 19 lines] > > Speaking about cheap solutions, there are open source databases... Mikito -- I might not have been clear about the theory side on this. Here is my dillemma -- I have studied database theory and I have used many databases and data storage approaches. It seemed to me that there are many folks who have been taught or who believe that the relational data model actually leads to a better solution in data quality, database maintenance over time, etc. That is, it seemed from my studies that a company would be the best steward of their financial resources if they were to employ a relational database.
However, my experience tells me that the implementations of the relational model "seem to" (I admit I have no concrete proof of this) be more costly, without corresponding benefits, to the corporate owner.
If the relational model is not intended to yield a better solution, when taking into consideration all factors, than a non-relational model, then I could care less about it -- would you still care about it then?
So, let's look at the big picture of requirements for an application that includes data storage -- if we look at the overall cost of ownership (including data quality, ongoing support costs etc) of an RDBMS is it lower or higher than the implementations of other models such as PICK. My hypothesis is that it is more expensive (often considerably more) to employ an RDBMS. Is this irrelevant? I don't think so -- I think it tosses into question what the purpose of the theory is in the first place.
So, what is our goal in having a good theory of how to store and retrieve data? --dawn
Mikito Harakiri - 21 Feb 2004 02:07 GMT > Mikito -- I might not have been clear about the theory side on this. Here > is my dillemma -- I have studied database theory and I have used many [quoted text clipped - 4 lines] > the best steward of their financial resources if they were to employ a > relational database. I have been on nonrelational database implementation side as well. But nobody on this group could care less what my relational experience is, let alone nonrelational.
> However, my experience tells me that the implementations of the relational > model "seem to" (I admit I have no concrete proof of this) be more costly, > without corresponding benefits, to the corporate owner. Let managers worry about the cost, and, as technical people, let us be fascinated with technology.
> If the relational model is not intended to yield a better solution, when > taking into consideration all factors, than a non-relational model, then I [quoted text clipped - 10 lines] > So, what is our goal in having a good theory of how to store and retrieve > data? --dawn The purpose of the theory is leading industry to high-tech solutions, rather than surrendering to chaos of ad-hock approaches.
Dawn M. Wolthuis - 21 Feb 2004 02:39 GMT > > Mikito -- I might not have been clear about the theory side on this. Here > > is my dillemma -- I have studied database theory and I have used many [quoted text clipped - 17 lines] > Let managers worry about the cost, and, as technical people, let us be > fascinated with technology. I was once told that there are people who think of cars as fascinating machines and those who think of cars as (perhaps fascinating) machines that get us from one place to another. This person then said that we call the first group "men" and the second group "women". I certainly disagree with that as a blanket statement, but I suspect there could be emperical data to suggest something similar related to computers.
I have no fascination for technology outside of how it provides improvements for people or creation. I do have a love of mathematics for mathematics sake, however. And if that is what is going on with discussions of relational theory, then continue the game, by all means. There is no need to have implementations of the theory, however, unless it is useful. Surely implementations of relational databases are useful and have been accepted by IT professionals. However, some of the reaons they are employed have to do with the false notion that there is something about relational theory that is "more mathematical" or more orthodox or pure. Companies then employ these beasts thinking they are more cost-effective.
So, if we are tinkering with cars in our garage, count me out -- I just don't care. If we are telling people that cars are better when they use more gas, and if people then believe these claims and buy bigger and "better" cars, then I do care and I feel a need to speak up and say it ain't so.
> > If the relational model is not intended to yield a better solution, when > > taking into consideration all factors, than a non-relational model, then I [quoted text clipped - 15 lines] > The purpose of the theory is leading industry to high-tech solutions, rather > than surrendering to chaos of ad-hock approaches. The purpose of tinkering with our car is to have really cool tires, bumpers, etc? I sure hope not. I think Codd really thought he was coming up with a better theory that would translate into something good in database implementations. There are good things about the relational model and its implementations and there are failures as well. There is nothing anointing the relational database model from above as being the best approach to managing data in a software application.
So, testing out various database theories and finding the pros and cons of each as it relates to the actual USE of products that attempt to implement a theory seems like what one might want to do with a discussion related to database theory. A theory related to serial killers that doesn't actually help us find serial killers is just not interesting to me. Make sense? Thanks. --dawn
Mikito Harakiri - 21 Feb 2004 03:05 GMT > I was once told that there are people who think of cars as fascinating > machines and those who think of cars as (perhaps fascinating) machines that > get us from one place to another. This person then said that we call the > first group "men" and the second group "women". ;-)
> I certainly disagree with > that as a blanket statement, but I suspect there could be emperical data to > suggest something similar related to computers. You didn't have to spoil the effect of the previous paragraph with this sentence.
> I have no fascination for technology outside of how it provides improvements > for people or creation. I do have a love of mathematics for mathematics [quoted text clipped - 6 lines] > is "more mathematical" or more orthodox or pure. Companies then employ > these beasts thinking they are more cost-effective. No, people who tinker with things sometimes come up with very effective solutions. Hint: L.Torvald. [Subtitles: I don't mean they should necessarily be role models, either]
> So, if we are tinkering with cars in our garage, count me out -- I just > don't care. If we are telling people that cars are better when they use > more gas, and if people then believe these claims and buy bigger and > "better" cars, then I do care and I feel a need to speak up and say it ain't > so. No, if people tell you they want to save the world, that is most often is b**t.
> The purpose of tinkering with our car is to have really cool tires, bumpers, > etc? I sure hope not. I think Codd really thought he was coming up with a [quoted text clipped - 8 lines] > theory seems like what one might want to do with a discussion related to > database theory. No, you don't have to spend your lifetime at customer site in order to be useful. You can't possibly have all time in the universe to check out all the crank theories out there.
Dawn M. Wolthuis - 21 Feb 2004 03:18 GMT > > I was once told that there are people who think of cars as fascinating > > machines and those who think of cars as (perhaps fascinating) machines [quoted text clipped - 11 lines] > You didn't have to spoil the effect of the previous paragraph with this > sentence. Sorry to tease you with a little humor and then toss you into some pseudo-feminist garbage, Mikito -- my favorite table tennis technique is side to side. ;-) I won't comment on any of my other games. smiles. --dawn
Eric Kaun - 23 Feb 2004 12:29 GMT > However, my experience tells me that the implementations of the relational > model "seem to" (I admit I have no concrete proof of this) be more costly, > without corresponding benefits, to the corporate owner. Fair enough. I've had (and continue to have) the opposite experience, for example: - poorly-normalized databases causing data integrity problems and query nightmares - "normalizable" business logic stuffed into baroque procedural code - new business requirements forcing database reorganization because of poor decisions which were easy to see as such even at the time they were done
Relational, done right, would yield practical value. SQL, done as "relationally as possible", yields some practical value.
Obviously, then, our mileage varies...
> So, what is our goal in having a good theory of how to store and retrieve > data? --dawn Theory is fun for its own sake, but in the case of relational is also intended to help people at a practical level. Codd developed it in direct response to many, many problems with network and hierarchic databases.
- erk
Dawn M. Wolthuis - 19 Feb 2004 21:45 GMT <sub-value-delimiter>2002<sub-sub-value-delimiter>Jan<sub-sub-sub-value-deli
> > > miter>25 > > > [quoted text clipped - 7 lines] > > We must hire somebody in this group to do subtitles for humor impaired. So you are unable to see the humor in my responses, Mikito? Rest assured that I can tell when I'm responding to tongue-in-cheek responses. smiles. --dawn
Eric Kaun - 19 Feb 2004 22:01 GMT > "Mikito Harakiri" <mikharakiri@iahu.com> wrote in message
> You got it -- simple, right? It's just a function that represents a graph. > We could call it a web! But on the outside to the user, it is a vocabulary > with functions (also part of the vocabulary). Isn't it a tree or hierarchy rather than a graph?
> How do you decide whether info is a sub-table of an existing table or should > go in its own table? If the information is functionally dependent. Normalization is based entirely on this same principle, so this should be an interesting point of comparison.
> My phone numbers are information that one might think of as being in the > relationship between me and the telecom industry. But my phone numbers have > no meaning apart from me -- sure they are phone numbers that exist in the > world, but the point of capturing the data is to capture information about a > person. The phone numbers could have a great deal of meaning, and nesting them limits you unnecessarily. I can think of much better examples from my experience, but this one is more obvious and accessible. If I wanted to keep, for example, a phone record for each employee (what outside numbers they called), I'd then have to move the data, right? Since they're "only" attributes, the moment I want to enrich them (i.e. I think of any other predicate that involves them, other than "Joe Blow has phone number 123-456-7890"), I need to change my model.
Functional dependency as you've illustrated it (not that espoused in relational theory / first-order predicate logic) is in the eye of the beholder, and in any system I've ever worked on, that changes. Different people see different parts of the DB, and different people consider different predicates ("entities", if you must) to be central. The local telecomm guy might want an application to look at phone usage, and to enable multiple people to share a line - then the number itself (actually a connection, not necessarily a number) becomes "central."
In relational, your model needn't change. And you haven't spent much time setting it up right, in my opinion - we're talking about a few extra tables for a huge flexibility advantage.
> Make sense? --dawn Not yet...
- erk
Dawn M. Wolthuis - 19 Feb 2004 23:55 GMT > > "Mikito Harakiri" <mikharakiri@iahu.com> wrote in message > [quoted text clipped - 5 lines] > > Isn't it a tree or hierarchy rather than a graph? A tree is a special case of a graph. Yes, this can be modeled as a tree graph or it can be modeled as a graph that is not a tree, with or without cycles, once the links are made more explicit (and they are just functions and represent connections from one node to another). It is a di-graph (directed graph).
> > How do you decide whether info is a sub-table of an existing table or > should > > go in its own table? If the information is functionally dependent. > > Normalization is based entirely on this same principle, so this should be an > interesting point of comparison. Yes, indeed. In fact, data normalization without 1NF works from my perspective -- it is 1NF that is very flawed. With the definition of each subsequent normal form including that the data must be in 1NF has a domino effect of corrupting the whole lot of these rules, however.
> > My phone numbers are information that one might think of as being in the > > relationship between me and the telecom industry. But my phone numbers [quoted text clipped - 12 lines] > predicate that involves them, other than "Joe Blow has phone number > 123-456-7890"), I need to change my model. Oddly enough, this is where the model excels -- first of all, in your example, outgoing calls (rather than "phone numbers") is quite a different matter and could very well warrant a separate table without changing anything about what the phone numbers (and e-mail addresses for that matter) are that refer to this person.
Additionally, it is changes to the data that are very easy in PICK. If you need to change the cardinality of any field (column-ish) in any file (table-ish), you just do it. Sometimes nothing at all need changing to coorespond, but often an input screen needs a field attribute changed along with the field attribute in the file. Sure it would be better if these were in synch, and I'm sure some tools have that, but it is not standard in implementations. Anyway, if you have a report that asks for a person and their phone numbers when the phone number was single-valued (which it was in many systems in the 70's and 80's) and then you permit multiple values for the phone number, the report prints out the new phone numbers too, without any changes.
> Functional dependency as you've illustrated it (not that espoused in > relational theory / first-order predicate logic) is in the eye of the [quoted text clipped - 4 lines] > multiple people to share a line - then the number itself (actually a > connection, not necessarily a number) becomes "central." Exactly! This idea of democracy of data is silly - data has meaning. If it didn't, then there would be no attributes for entities -- everything would be a top level entity. If the phone company system is added to the database then a link between their phone numbers and the people who have them could be made without harming anything at all (add a link to the person web page to navigate you to the info for the telco about that phone number and a link set to the telco system that shows all people associated with a phone number if desired)
> In relational, your model needn't change. And you haven't spent much time > setting it up right, in my opinion - we're talking about a few extra tables [quoted text clipped - 5 lines] > > - erk Well, did that help or not? --dawn
Marshall Spight - 21 Feb 2004 16:40 GMT > Additionally, it is changes to the data that are very easy in PICK. If you > need to change the cardinality of any field (column-ish) in any file [quoted text clipped - 7 lines] > the phone number, the report prints out the new phone numbers too, without > any changes. It strikes me that this paragraph describes features of Pick's GUI builder integration. Now, GUI builder integration is a good thing, but I don't believe it has anything to do with the qualities of the data model.
I think we should take care to separate our discussions of data models and application integration issues.
Marshall
Dawn M. Wolthuis - 21 Feb 2004 16:53 GMT > > Additionally, it is changes to the data that are very easy in PICK. If you > > need to change the cardinality of any field (column-ish) in any file [quoted text clipped - 14 lines] > I think we should take care to separate our discussions of data models > and application integration issues. Nope -- nothing to do with any GUI, but it does have to do with a database retrieval language -- not SQL. To the extent that SQL discussions have something to do with relational database modeling, the query lanuage that has gone by so many names that it is unknown by any (started out as GIRLS in the 60's and now includes JQL for jBASE, UniQuery for UniData, Retrieval for UniVerse, etc) is a way of talking about the data model underlying PICK. Make sense?
--dawn
Bob Badour - 21 Feb 2004 17:16 GMT > > Additionally, it is changes to the data that are very easy in PICK. If you > > need to change the cardinality of any field (column-ish) in any file [quoted text clipped - 14 lines] > I think we should take care to separate our discussions of data models > and application integration issues. The really astounding thing about Dawn's paragraph above is the "you just do it" part. As I explained at length and ad nauseum in the "red blue car" thread, when you just do it, you just change the meaning of existing queries.
Dawn M. Wolthuis - 21 Feb 2004 17:35 GMT > > "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message > news:c13ieq$27a$1@news.netins.net... [quoted text clipped - 29 lines] > thread, when you just do it, you just change the meaning of existing > queries. Bob -- I'll admit I do not understand your red blue car issue. Could you spell it out for me? I would appreciate it. thanks. --dawn
Dave Rolsky - 22 Feb 2004 06:50 GMT > How do you decide whether info is a sub-table of an existing table or should > go in its own table? If the information is functionally dependent. My [quoted text clipped - 4 lines] > person. So, don't stick them in some other function (aka file) -- put them > with the person, even if there are more than one of them. So you've never encountered two people who share the same phone number?
-dave
/*======================= House Absolute Consulting www.houseabsolute.com =======================*/
Dawn M. Wolthuis - 22 Feb 2004 15:05 GMT > > How do you decide whether info is a sub-table of an existing table or should > > go in its own table? If the information is functionally dependent. My [quoted text clipped - 8 lines] > > -dave Sure, but what do you want to do, Dave -- get a random identifier for each phone number and then since people have more than one have a link tabke that links people with each of the keys to their phone numbers? I guess that might make sense to someone in the RDBMS world, but step back a minute and look that -- the not-terribly-technical-term "silly" comes to my mind.
smiles. --dawn
Marshall Spight - 23 Feb 2004 00:44 GMT > Sure, but what do you want to do, Dave -- get a random identifier for each > phone number and then since people have more than one have a link tabke that > links people with each of the keys to their phone numbers? I guess that > might make sense to someone in the RDBMS world, but step back a minute and > look that -- the not-terribly-technical-term "silly" comes to my mind. That argument may carry some weight when the data type involved, a phone number, is approximately the same size as a foreign key. For example, one could imagine using the phone number itself as the key to the phone numbers table (it is unique, after all) and at that point, there's no value to the association table any more.
But that argument stops working as soon as the amount of information grows a bit. Consider even something as simple as addresses. They aren't all that much more complicated than a phone number: line1, line2, city, state, zip. Repeating that in the person record for each person at the house isn't efficient, and it makes correcting typographic errors more error-prone. (One can imagine each person in the house having the same address but with the street spelled differently.)
For my personal experience, the day someone explained association tables to me was the first day that I began to think that the database world might really have something interesting to say. (This was some time ago, but I still remember the exact moment of realization.) It is a significant achievement, and I know of no other system that has something that handles many:many relationships as well. Certainly no OO language, for all the emphasis on container classes, has ever handled them as well.
Marshall
Bob Badour - 23 Feb 2004 02:41 GMT > > Sure, but what do you want to do, Dave -- get a random identifier for each > > phone number and then since people have more than one have a link tabke that [quoted text clipped - 4 lines] > That argument may carry some weight when the data type involved, a phone > number, is approximately the same size as a foreign key. As you point out, the suggestion to have a surrogate for a simple, familiar, stable candidate key is simply fatuous. But what can one expect from someone like Dawn?
Dawn M. Wolthuis - 23 Feb 2004 03:19 GMT > > Sure, but what do you want to do, Dave -- get a random identifier for each > > phone number and then since people have more than one have a link tabke that [quoted text clipped - 24 lines] > no OO language, for all the emphasis on container classes, has ever > handled them as well. Yes, Marshall, you are right that I answered the specific instance and not a generalization. When it comes to addresses, they are many-to-many with people, while 1-1 with places (if we can make the assumption that each place has one and only one address). I know the language I'm about to use doesn't play well with the RDBMS theorists, but I would think of my entities as people, places, and things. Addresses go with a "Places" entity, perhaps named "Addresses".
So, we have people and we have addresses with a M-M relationship. In a PICK model, the implementation would likely include a PEOPLE function and an ADDRESSES function (called "files"). The PEOPLE function would map to a list of ADDRESSES identifiers. Two files with a link between them.
Often the implementation would include return-links to improve performance if there would be queries not just about what addresses a person has, but also who lives at a particular address. Then extensions could be made to the vocabulary of each function so that queries like this would be the norm:
LIST PEOPLE FULL-ADDRESSES
LIST ADDRESSES FULL-NAMES
This is a different approach to creating a VIEW with COLUMNS from each TABLE against which the user would query. In the case of the RDBMS, we then come up with a new vocabularly for an entity, while often retaining the names of the attributes. PICK lets the user of the database think of each file/function as a portal into the database, increasing the language for that file/function but not altering the name of the file unless there is some purpose to doing so. Each "view" in PICK looks through the eyes of one of the implemented functions (files).
This approach is much more like the web, where there are documents with links to other documents, which might then have return links to all docs that point to them. Part of the charm is in the simplicity of the approach.
--dawn
Dave Rolsky - 23 Feb 2004 05:39 GMT > Yes, Marshall, you are right that I answered the specific instance and not a > generalization. When it comes to addresses, they are many-to-many with [quoted text clipped - 17 lines] > > LIST ADDRESSES FULL-NAMES So basically what you're saying is you'd write custom code (if I read the word "function correctly" here) to do what a relational database would you let you do with it's query language?
And you'd do that _every_ time you had to express an M-M relationship?
And if you _don't_ implement these performance improving return links, the database cannot optimize that query?
> This approach is much more like the web, where there are documents with > links to other documents, which might then have return links to all docs > that point to them. Part of the charm is in the simplicity of the approach. I don't see the charm in having to hand-code the same thing over and over again.
I'd rather declaratively say "this is my data, these are the relationships between them", and have the DBMS take care of the optimization.
Granted, today's SQL databases don't handle this perfectly, but they're not all that bad at it either.
-dave
/*======================= House Absolute Consulting www.houseabsolute.com =======================*/
Dawn M. Wolthuis - 23 Feb 2004 13:48 GMT > > Yes, Marshall, you are right that I answered the specific instance and not a > > generalization. When it comes to addresses, they are many-to-many with [quoted text clipped - 21 lines] > word "function correctly" here) to do what a relational database would you > let you do with it's query language? Sorry-- I should hvae defined my terms -- a mathematical function is a relation that has only one value for each element in the domain. I'm modeling the data in functions (which are then necessarily relations) and I use the word "functions" for two reasons: 1) because if I were to use the term "relations" then I would be corrected on the modeling into these relations because I do not use all of "relational database theory" and implementations of this "functional model" are not RDBMS's. 2) because I would rather work with functional databases than relational databases ;-)
> And you'd do that _every_ time you had to express an M-M relationship? > > And if you _don't_ implement these performance improving return links, the > database cannot optimize that query? Oddly enough, having a database that does query optimization was not something I ever heard of until RDBMS's came about -- OF COURSE databases optimize the processing of the query -- that's one of their jobs, right? However, no matter how optimized queries are, stored data is faster to query than derived data.. Since the vocabulary for an entity is the combination of words for stored data and words for derived data, having a vocabularly element for the list of people who have this address, for example, is going to yield query results faster if the data is located there (in that same stored record) than if the data needs to be found and then derived from stored data in many other records..
> > This approach is much more like the web, where there are documents with > > links to other documents, which might then have return links to all docs > > that point to them. Part of the charm is in the simplicity of the approach. > > I don't see the charm in having to hand-code the same thing over and over > again. definitely agree -- that is not the case.
> I'd rather declaratively say "this is my data, these are the relationships > between them", and have the DBMS take care of the optimization. > Granted, today's SQL databases don't handle this perfectly, but they're > not all that bad at it either. Each has its proc and cons, just as other models do. When I read books that talk about databases, they seem to indicate that relational databases are whats good and non-relational have somehow been proven to be the bad, old stuff. With OO, XML, and OLAP discussions added to some text books, this is starting to move a bit -- I'm just trying to help it move faster. Relational database are neither mathematically nor emperically proven to be superiour to anything, as best I can tell. They are simply one approach and not, in my opinion, the best that we, as an industry, can offer. smiles. --dawn
Marshall Spight - 25 Feb 2004 03:53 GMT > So, we have people and we have addresses with a M-M relationship. In a PICK > model, the implementation would likely include a PEOPLE function and an > ADDRESSES function (called "files"). If I understand correctly, your use of the word "function" here means a mapping from one set to another. In the PEOPLE case, it would be from something analogous to a primary key, to all the other attributes of a person. Yes?
Doesn't this mean that a Pick, uh, "file" is the same thing as an SQL relation with the restriction of only having one unique attribute? Again, I'm not sure I understand, but it sounds like another difference is that (no distinction is made | it's easy to change) in deciding if an attribute is single valued or set-valued. Does that mean that every attribute has zero-or-more values?
> The PEOPLE function would map to a > list of ADDRESSES identifiers. Among other things, one assumes?
> Two files with a link between them. > [quoted text clipped - 6 lines] > > LIST ADDRESSES FULL-NAMES Okay, but this introduces all kinds of opportunities for inconsistent data. Is there anything keeping the two sets of data in sync? Is it automatic or manual?
> This is a different approach to creating a VIEW with COLUMNS from each TABLE > against which the user would query. In the case of the RDBMS, we then come [quoted text clipped - 8 lines] > links to other documents, which might then have return links to all docs > that point to them. Part of the charm is in the simplicity of the approach. I note that the web is a disaster from a data management point of view.
Marshall
Bob Badour - 19 Feb 2004 19:40 GMT > > This string of data could be: > > Joan<field-delimiter> > > Doe<field-delimiter> > > 6165551234<value-delimiter>7615552222<field-delimiter> MATH<sub-value-delimiter>2002<value-delimiter>PHIL<sub-value-delimiter>2003<
> > field-delimiter> > > How about: <sub-value-delimiter>2002<sub-sub-value-delimiter>Jan<sub-sub-sub-value-deli
> miter>25 > > Am I expert Pick programmer already? I don't know. Is your nose bleeding yet?
Dawn M. Wolthuis - 19 Feb 2004 20:15 GMT > > > This string of data could be: > > > Joan<field-delimiter> > > > Doe<field-delimiter> > > > 6165551234<value-delimiter>7615552222<field-delimiter> MATH<sub-value-delimiter>2002<value-delimiter>PHIL<sub-value-delimiter>2003<
> > > field-delimiter> > > > > How about: <sub-value-delimiter>2002<sub-sub-value-delimiter>Jan<sub-sub-sub-value-deli
> > miter>25 > > > > Am I expert Pick programmer already? > > I don't know. Is your nose bleeding yet? Maybe if there were no relational-zealot-bullies we wouldn't have to live with bloody noses.
Seriously, Bob -- do you know what you talking about? Have you worked with any implementations of the Nelson-Pick model? If so, which one(s). Have you studied the model? Your responses to Pick all seem so emotionally charged and not based on logic -- what is the basis for your claims?
I've worked with hierarchical, network, relational dbms's as well as various file systems along with PICK. There are pros and cons to each of the environments I've worked with, but the PICK advantage is the "big bang for the buck" advantage. The core of the implementations that are out there today are quite dated in this distributed computing world (so I wouldn't call it state of the art), but the data model is definitely making a comeback, and for good reason. --dawn
Eric Kaun - 23 Feb 2004 12:39 GMT > [...] > Think of a "file" as a function that maps an identifier to a set of [quoted text clipped - 6 lines] > Doe<field-delimiter> > 6165551234<value-delimiter>7615552222<field-delimiter> MATH<sub-value-delimiter>2002<value-delimiter>PHIL<sub-value-delimiter>2003<
> field-delimiter> > [quoted text clipped - 25 lines] > So, everything is defined in terms of functions including stored data and > any other vocabulary (for stored or virtual fields) But the functions are only naming things, right? They don't really establish any typing, nor are they computational functions (or can they be?). It's just to establish that field 3 is phone number? In any event, should you choose to, relational can do exactly the same thing - establish a PHONELIST type, and write SECONDARY_PHONE as a function over it. Or even a LIST type, and SECOND_ELEMENT on it, if you care to.
- erk
Dawn M. Wolthuis - 23 Feb 2004 13:54 GMT > > [...] > > Think of a "file" as a function that maps an identifier to a set of [quoted text clipped - 6 lines] > > Doe<field-delimiter> > > 6165551234<value-delimiter>7615552222<field-delimiter> MATH<sub-value-delimiter>2002<value-delimiter>PHIL<sub-value-delimiter>2003<
> > field-delimiter> > > [quoted text clipped - 34 lines] > any typing, nor are they computational functions (or can they be?). It's > just to establish that field 3 is phone number? These are actually able to define any derived data from any files across the system. So, they do linking, computation, etc (by use of subroutines employing procedural code -- feel free to groan, but ...).
> In any event, should you > choose to, relational can do exactly the same thing - establish a PHONELIST > type, and write SECONDARY_PHONE as a function over it. Or even a LIST type, > and SECOND_ELEMENT on it, if you care to. I know that PICK can emulate a relational database (at least to the extent that it can be SQL-compliant) and I know that relational db's can employ user-defined functions or stored procedures, or whatever, to add to the language. However, in implementations like Oracle, a stored procedure still doesn't return a list of values to my knowledge (SQL Server does have that capability). I recall a few years ago someone saying that PICK can pretend to be relational, but relational cannot pretend to be PICK. That might not be the case now with some of the additional types added to some rdbms's for non-simple values. Cheers! --dawn
Paul - 18 Feb 2004 22:59 GMT bbadour@golden.net says...
> > The databases you referenced are not relational.
> Again, it suffices to note that Dawn is ignorant and is burning a straw man. Just a quibble, but doesn't one demolish a straw man?
Paul...
 Signature plinehan y_a_h_o_o and d_o_t com C++ Builder 5 SP1, Interbase 6.0.1.6 IBX 5.04 W2K Pro Please do not top-post.
"XML avoids the fundamental question of what we should do, by focusing entirely on how we should do it."
quote from http://www.metatorial.com
Christopher Browne - 19 Feb 2004 03:29 GMT > bbadour@golden.net says... > [quoted text clipped - 3 lines] > > Just a quibble, but doesn't one demolish a straw man? He's presumably mixing metaphors intentionally, and actually I'd quite enjoy demolishing a straw man by setting fire to it; they certainly are vulnerable to fire :-).
 Signature (format nil "~S@~S" "cbbrowne" "cbbrowne.com") http://cbbrowne.com/info/languages.html I hate wet paper bags.
rkc - 19 Feb 2004 14:35 GMT > bbadour@golden.net says... > [quoted text clipped - 3 lines] > > Just a quibble, but doesn't one demolish a straw man? "They tore my legs off and they threw them over there." "Then they took my chest out and they threw it over there."
The Scarecrow - The Wizard of Oz
Marshall Spight - 21 Feb 2004 16:42 GMT > "They tore my legs off and they threw them over there." > "Then they took my chest out and they threw it over there." "That's you, all over."
Marshall
|
|