Database Forum / General DB Topics / DB Theory / July 2005
Base Normal Form
|
|
Thread rating:  |
David Cressey - 04 Jul 2005 02:48 GMT I'd like to suggest a new Normal Form definition, one that I'm calling Base Normal Form, for lack of a better term.
The purpose is NOT to advance theory any further. It's to make it easier to teach introductory database design.
Here it is:
A table is in base normal form if and only if it has at least one candidate key.
Note that the definition is for tables and not relations. All relations are in Base Normal Form, so the definition is superfluous with regard to relations.
I haven't defined candidate key, but I would want the definition to be compatible with the definition of candidate key as used in BCNF.
I'm speculating that this definition, together with BCNF, would be sufficient to cover introductory DB design, skipping over 1NF, 2NF, and 3NF, and leaving normal forms beyond BCNF for more advanced treatement.
Jan Hidders - 04 Jul 2005 20:41 GMT David Cressey wrote:
> I'd like to suggest a new Normal Form definition, one that I'm calling Base > Normal Form, for lack of a better term. In some texts I've seen it called the UNF (unnormalized normal form) or the NFNF (non first normal form). To make things confusing, it is also equivalent with Chris Date's definition of 1NF.
> I haven't defined candidate key, but I would want the definition to be > compatible with the definition of candidate key as used in BCNF. That's do-able. If you define a table as a list of tuples that all have the same attributes, then a superkey is a set of attributes K such that there cannot be two different positions in the table such that the tuples at these positions agree on the values for K. The notion of candidate key is then based on the notion of superkey in the usual way.
Note that this definition is sloppy in the usual way where we ignore the fact that a candidate key is not so much a property of a single relation but rather of the set of all relations that are valid for a certain relvar.
> I'm speculating that this definition, together with BCNF, would be > sufficient to cover introductory DB design, skipping over 1NF, 2NF, and 3NF, > and leaving normal forms beyond BCNF for more advanced treatement. I wouldn't skip 3NF. The difference between BCNF and 3NF is an important one and the decision to go beyond 3NF should be an informed one.
-- Jan Hidders
David Cressey - 06 Jul 2005 02:30 GMT > David Cressey wrote: > > I'd like to suggest a new Normal Form definition, one that I'm calling Base [quoted text clipped - 3 lines] > the NFNF (non first normal form). To make things confusing, it is also > equivalent with Chris Date's definition of 1NF. Thanks, Jan.
I guess I'll stick with "Base Normal Form" for now. Are UNF and NFNF defined on tables or on relations?
> > I haven't defined candidate key, but I would want the definition to be > > compatible with the definition of candidate key as used in BCNF. [quoted text clipped - 4 lines] > tuples at these positions agree on the values for K. The notion of > candidate key is then based on the notion of superkey in the usual way. What's the difference between calling a table a list of tuples, and calling a table an array of tuples?
> Note that this definition is sloppy in the usual way where we ignore the > fact that a candidate key is not so much a property of a single relation > but rather of the set of all relations that are valid for a certain relvar. I'm hazy on the distinction between a relation and a relvar.
> > I'm speculating that this definition, together with BCNF, would be > > sufficient to cover introductory DB design, skipping over 1NF, 2NF, and 3NF, > > and leaving normal forms beyond BCNF for more advanced treatement. > > I wouldn't skip 3NF. The difference between BCNF and 3NF is an important > one and the decision to go beyond 3NF should be an informed one. Can you say some more about this? What are the consequences of taking a step between 3NF and BCNF?
> -- Jan Hidders Jan Hidders - 06 Jul 2005 08:12 GMT David Cressey wrote:
>> David Cressey wrote: >> [quoted text clipped - 10 lines] > I guess I'll stick with "Base Normal Form" for now. Are UNF and NFNF > defined on tables or on relations? Hmm, good point, on relations. So they are actually not the same as your concept.
>>> I haven't defined candidate key, but I would want the definition >>> to be compatible with the definition of candidate key as used in [quoted text clipped - 9 lines] > What's the difference between calling a table a list of tuples, and > calling a table an array of tuples? Nothing, really. Although for me an 'array' has some associations with a particular kind of implementation.
>> Note that this definition is sloppy in the usual way where we >> ignore the fact that a candidate key is not so much a property of a >> single relation but rather of the set of all relations that are >> valid for a certain relvar. > > I'm hazy on the distinction between a relation and a relvar. A relation is a particular set of tuples, say, the contents of the Employees table at this moment. The relvar would in that case be more abstract notion of the Employees table. People will often use the term relation for both things.
>> I wouldn't skip 3NF. The difference between BCNF and 3NF is an >> important one and the decision to go beyond 3NF should be an >> informed one. > > Can you say some more about this? What are the consequences of > taking a step between 3NF and BCNF? You might no longer be dependency preserving. That means that some of the constraints can no longer be represented as key constraints and foreign key constraints.
-- Jan Hidders
Jon Heggland - 06 Jul 2005 09:47 GMT > What's the difference between calling a table a list of tuples, and calling > a table an array of tuples? In some formal contexts, I think a list is either empty, or consisting of an element and a tail, which is also a list. In contrast, an array is a sequence of elements that can be referred to by ordinal position. In this case, there is no significant difference, I think.
> I'm hazy on the distinction between a relation and a relvar. "Relvar" is short for "relation variable". A relvar can hold a relation value, just like an integer variable can hold an integer. "Relvar" corresponds to TABLE in SQL; a relation value is the value (contents) of a table at some time. Updating a table is really assigning a new relation value to the relvar (though in many cases, the new value closely resembles the old).
Most literature use "relation" for both "relation variable" and "relation value", and depend on context to make the meaning clear. Alternatively, they dismiss the notion that there is a difference.
 Signature Jon
dawn - 07 Jul 2005 03:24 GMT > I'd like to suggest a new Normal Form definition, one that I'm calling Base > Normal Form, for lack of a better term. [quoted text clipped - 6 lines] > A table is in base normal form if and only if it has at least one candidate > key. I think that what you are describing is a mathematical function (sometimes referred to as a mapping), perhaps? (I believe the def of a function and mathematical def of relation are in the cdt glossary.) Using these terms can be helpful as there are very clear, generally agreed upon, definitions for sets, relations, functions in mathematics from which to draw. It is necessary to make a distinction, then, between a mathematical relation and the (myriad of) defs the database community has for "relation" but since the latter arose from the former originally, this is not hard for a student to comprehend.
Using the term "function" is also helpful when looking at computers in general and discussing programming in terms of input-processing-output (sure, I've heard of objects, but I'll keep on track and avoid straying into that subject right now). Input-processing-output --> keyValue-lookupFunction-relationValue
> Note that the definition is for tables I'm not sure what your def is for this -- rows and columns where a cell may include any defined type?
> and not relations. All relations are > in Base Normal Form, so the definition is superfluous with regard to > relations. If you go back to the mathematical definition of a relation, rather than one of the variations in the database field, a relation is not necessarily a function, but a function is a relation (by definition). Once you require a relation to have a candidate key, you do have a function -- a good starting point prior to identifying functional dependencies and discussing normal forms, I would think.
I, too, don't know if this advances any theory as I'm concerned about practice (and related instruction). I have found using these mathematical terms to be helpful due to their precision and simplicity.
<snip>
Cheers! --dawn
Jan Hidders - 07 Jul 2005 19:09 GMT >>I'd like to suggest a new Normal Form definition, one that I'm calling Base >>Normal Form, for lack of a better term. [quoted text clipped - 9 lines] > I think that what you are describing is a mathematical function > (sometimes referred to as a mapping), perhaps? I think that would be very confusing. First, his table actually has ordering, functions don't. Second, functions are binary relations, and even though you could interpret every n-ary relation as a binary relation, that is different from actually being the same. Finally, if there are multiple candidate keys then it actually represents several functions. It's really not a good idea to confuse the concept of relation and function.
-- Jan Hidders
dawn - 07 Jul 2005 21:30 GMT > >>I'd like to suggest a new Normal Form definition, one that I'm calling Base > >>Normal Form, for lack of a better term. [quoted text clipped - 12 lines] > I think that would be very confusing. First, his table actually has > ordering, functions don't. The elements of mathematical relations are also ordered tuples, correct? One problem might be that the def of relation and, subsequently, function is often written in terms of a binary relation, but that is because an n-ary relation can be seen as binary with domain and range where each is n dimensional.
> Second, functions are binary relations, if I had read one more sentence fragment, I would have seen that is what you decided. Functions need not be binary any more than relations need be -- you can view them as binary if you choose, as there is no harm in doing so.
> and > even though you could interpret every n-ary relation as a binary > relation, exactly -- there is no more difficultly in considering relations as n-ary than functions. Functions are just a special case of relations, right? They are relations with a mapping defined from a "domain" to a "range" -- from input to output. Very clean.
> that is different from actually being the same. Correct, but however you want to work with relations - either binary or n-ary, you can do the same with functions.
> Finally, if > there are multiple candidate keys then it actually represents several > functions. Yes, but that is not a new concept. We can lookup a person by SSN or some other ID -- two functions. Very intuitive.
> It's really not a good idea to confuse the concept of > relation and function. I definitely agree! A function is a specific type of relation where a relation might not be a function at all. A function is a relation that has a mapping from, for example, a candidate key value to a relation tuple (I wrote "relationValue" in my posting, which I noticed as I sent it when I intended relationElementValue or tuple, by the way).
It is important not to confuse the two as the one is a subset of the other. Similarly, we do not want to confuse a set and a relation. That is why they each have distinct definitions. Where is the problem? Thanks --dawn
> -- Jan Hidders Jan Hidders - 07 Jul 2005 22:14 GMT >>It's really not a good idea to confuse the concept of >>relation and function. > > I definitely agree! And yet that seems to be the consequence of what you are proposing. I could be wrong but I interpreted your suggestion as that every relation with a candidate key is a function. Since every relation has at least one candidate key that means that every relation is a function according to you. And yet, at the same time, you say:
> A function is a specific type of relation where a > relation might not be a function at all. So you seem to be contradicting yourself.
-- Jan Hidders
dawn - 08 Jul 2005 04:35 GMT > >>It's really not a good idea to confuse the concept of > >>relation and function. [quoted text clipped - 4 lines] > could be wrong but I interpreted your suggestion as that every relation > with a candidate key is a function. Yes, that is what I am saying, but I'll try to say it better.
Every database relation can be modeled as a mathematical relation. Every database relation that has a candidate key can be modeled as a mathematical function as well. David asked about tables, which imply a column ordering and (perhaps) are more lax about typing. A table can also be modeled with a mathematical relation. Unlike a database relation, a mathematical relation does have (column) ordering in (row) tuples. A table with a candidate key can be modeled by a function as well. A table in "Base Normal Form" is one that can be modeled by a function.
> Since every relation has at least > one candidate key I don't have a handy list of all definitions of "relation" handy, but I thought that some such definitions did not require candidate keys, although I suspect most modern ones do. If every definition of a database relation requires it to have at least one candidate key, then all database relations, but not all tables (which do not have such a requirement), and not all mathematical relations can be modeled with mathematical functions.
> that means that every relation is a function according > to you. And yet, at the same time, you say: Every database relation, if defined to require a candidate key, can then be modeled as a function. Yes.
> > A function is a specific type of relation where a > > relation might not be a function at all. Every mathematical function is necessarily a mathematical relation.
> So you seem to be contradicting yourself. Did that clarify? Does it make sense? Do you accept it?
In spite of the fact there are different definitions for "database relation" and "mathematical relation" I still think it helps to teach the basics of data modeling by starting with language, deriving/creating appropriate predicates and corresponding tables, then modeling these as functions. Functions seem easier for people to grasp than relations, so I start with sets & functions. I suspect that is what David is getting at when talking about "tables in base normal form". It's just a function, brother. ;-)
It isn't hard to comprehend the differences between a mathematical relation and a relation as defined by the relational database community, although I'll admit it can be confusing. If I explain data modeling to somone not predisposed to talk about "relations" then I can just talk about sets and functions and they get it. A function takes in a URL and gives you a web page. It takes in a state abbreviation and gives you a state name. It takes in an SSN and gives you a name. It can be thought of as a set/object as well as a function/actor.
Then we can add in operators (which are, of course, functions) and types (sets) and move on to logic (two-valued, please!).
[Then temporal matters and metadata discussions and so on. After that we can discuss where the real complexity comes in -- with data stored in a human language, determined, entered, searched for, and interpreted by humans.]
Cheers! --dawn
> -- Jan Hidders mAsterdam - 08 Jul 2005 09:45 GMT Nice to hear from you :-)
>>>>It's really not a good idea to confuse the concept of >>>>relation and function. [quoted text clipped - 10 lines] > Every database relation that has a candidate key can be modeled as a > mathematical function as well. There is a function for every candidate key. That is the somewhat confusing difference here, no?
[snip]
> Every database relation, if defined to require a candidate key, can > then be modeled as a function. Yes. So "a function" here should be /functions/.
[snip]
> It isn't hard to comprehend the differences between a mathematical > relation and a relation as defined by the relational database [quoted text clipped - 12 lines] > in a human language, determined, entered, searched for, and interpreted > by humans.] dawn - 08 Jul 2005 17:18 GMT > Nice to hear from you :-) > [quoted text clipped - 15 lines] > There is a function for every candidate key. That is the somewhat > confusing difference here, no? Not to me. It is easier for me to teach and to view multiple sets of candidate keys as multiple functions as well. When you teach someone how to determine a candidate key, you typically teach it by describing a function anyway. You can use language such as primary and alternate functions if you like.
> [snip] > > > Every database relation, if defined to require a candidate key, can > > then be modeled as a function. Yes. > > So "a function" here should be /functions/. It can be modeled as many relations as well since a database relation has unordered columns and a mathematical relation has them ordered. If you model with a function (of your choice) you need not additionally talk about having a candidate key -- your mathematical model incorporates this database concept.
Sure you can model with a mathematical relation and then talk about some other concept of a candidate key if you prefer, but I prefer to have a nice clean mathematical model that is so easy to understand. I suspect that most people who have not done any data modeling before will have an easier time with the concept of a mathematical function than the concept of a mathematical relations. Talking about input-function-output is also very intuitive. All digital data are stored in functions if they are to be retrieved again.
Additionally, the concept of a foreign key pops right out as a function mapping an attribute of one function to a "key" (by whatever name you like) of another.
All models are flawed, of course, but some are useful (George Box?) and I have found that modeling data as functions (relations with candidate keys) as well as modeling processes/methods/functions as functions aligns very well with how people think.
I don't ever model data in the abstract, but only in the context of one or more applications. Rather than a data vs process mindset, I see these as two sides of the same coin and showing how cleanly data is modeled with functions help with that wholistic approach.
I doubt I said that very well and anyone reading "wholistic approach" might jump in with discussions of decoupling (and I have opinions about that too but I'm headed to the big city for the weekend, so too-da-loo for now.) Cheers! --dawn
> [snip] > [quoted text clipped - 14 lines] > > in a human language, determined, entered, searched for, and interpreted > > by humans.] mAsterdam - 09 Jul 2005 03:11 GMT >>>>>>It's really not a good idea to confuse the concept of >>>>>>relation and function. [quoted text clipped - 19 lines] > a function anyway. You can use language such as primary and alternate > functions if you like. Hm... I was with you until the last sentence. I never liked the "primary key" thing. It forces an arbitrary decision at a time when you really don't know enough yet.
>>[snip] >>>Every database relation, if defined to require a candidate key, can [quoted text clipped - 4 lines] > It can be modeled as many relations as well since a database relation > has unordered columns and a mathematical relation has them ordered. Ah the "naming" trick. It takes care of associating values to the right part of the header - and it invites language.
This is the schism point of database and set theory.
> If you model with a function (of your choice) > you need not additionally [quoted text clipped - 7 lines] > will have an easier time with the concept of a mathematical function > than the concept of a mathematical relations. In my experience: yes.
> Talking about > input-function-output is also very intuitive. All digital data are > stored in functions if they are to be retrieved again. Here we differ. The notion of storing something in a function seems alien. A function to get something (return it) - ok. A function to set something - state, side effects come to mind. A function to store something in. Huh?
> Additionally, the concept of a foreign key pops right out as a function > mapping an attribute of one function to a "key" (by whatever name you [quoted text clipped - 9 lines] > these as two sides of the same coin and showing how cleanly data is > modeled with functions help with that wholistic approach. Same here. (Holistic this side of the pond)
[snip]
>>>[Then temporal matters and metadata discussions and so on. After that >>>we can discuss where the real complexity comes in -- with data stored >>>in a human language, determined, entered, searched for, and interpreted >>>by humans.] Marshall Spight - 09 Jul 2005 19:36 GMT > > It can be modeled as many relations as well since a database relation > > has unordered columns and a mathematical relation has them ordered. [quoted text clipped - 3 lines] > > This is the schism point of database and set theory. I don't see that it's much of a schism. One needs to be able to uniquely identify the attributes of the relation, in either mathematics or data management. One does so via identifying marks. These marks can be strings or they can be integer indexes. No big whoop. Notationally, some advantages and some disadvantages to either approach.
To be clear: this is a notational issue merely.
Marshall
mAsterdam - 10 Jul 2005 15:18 GMT Marshall Spight wrote:
>>>It can be modeled as many relations as well since a database relation >>>has unordered columns and a mathematical relation has them ordered. [quoted text clipped - 8 lines] > mathematics or data management. One does so via identifying marks. > These marks can be strings or they can be integer indexes. To the DBMS internally the names are just strings. People tend to associate meaning with the names. These raise expectations about the content of tables. Not meeting those triggers changes of the design of the database.
> No big whoop. Notationally, some advantages and some disadvantages > to either approach. > > To be clear: this is a notational issue merely. Indeed. Notational differences can get quite big though:
roman vs arab numerals phonographic vs ideographic scripts
dawn - 12 Jul 2005 10:56 GMT <snip>
> >>There is a function for every candidate key. That is the somewhat > >>confusing difference here, no? [quoted text clipped - 8 lines] > the "primary key" thing. It forces an arbitrary decision at a time when > you really don't know enough yet. Most modeling in the early stages results in the modeler identifying a primary key, whether or not that is an implementation concept in the target database. Web pages are all stored with a primary key (the URL) and all those tabbed paper folders that are still used at (most? all?) doctor's offices and elsewhere have a primary key. It might even be more awkward for people to consider all CKs equal than to think of one as the primary "lookup" key. And, yes, I'm "mixing" process and data again ;-)
> >>[snip] > >>>Every database relation, if defined to require a candidate key, can [quoted text clipped - 9 lines] > > This is the schism point of database and set theory. It is easy enough to model a database relation with a set theory relation, so this is not a problem, but when db relational theorists get all preachy on this unordered point, it amuses me.
Using language for headers is fine and dandy, but if you have a model that permits synonyms, first giving each attribute a unique ordinal and then redefining the ordinals can be useful at times. Dutch and English numbers are the same, for example. The similarity of one relation to another is evident when naming the attributes with numbers. Only naming them with numbers would be poor form, however, and in most SQL-DBMS tools, it is a bad idea to create your table with attribute names that are numbers.
But the fact that there is application software in production today that includes attributes that can optionally be identified in the software with an ordering is not a bad thing. Having a logical order to attributes (such as the order in which the attributes are listed in a CREATE TABLE statement) must have some consequences of which I am not aware given all of the blanket statements about how awful it is.
Perhaps the issue is in implementation? What happens when we rename the attribute "3" to "2"? That is likely where the concerns come in. Is that just like renaming the attribute from LAST_NAME to lastName or is the implementation model tied to the ordinals in the logical model? I work with an old model that does tie the logical model to the implementation model, which many of us have been taught is a bad thing. But I can see no problems at all stemming from this. This ordinal then becomes the only fixed "name" for an attribute and can have many other logical names, but you can't just change the ordering without changing the relation. And, surprisingly enough, the sky isn't falling ;-)
So, remind me, what precisely is the problem with ordering the attributes in relations? It results in ... because ... ???
> > If you model with a function (of your choice) > > you need not additionally [quoted text clipped - 16 lines] > Here we differ. The notion of storing something in a > function seems alien. Storing? Who said anything about storage? [as my colleage, mAsterdam, might tell you ... laughing]
We model data throughout our software applications -- in the UI, in the db schema, in rules, with all messages. I don't care about storage, but I care about the API to whatever services I need to use and the results they product. What functions do I need to call, passing in what objects and getting what results and data in return? I don't need to think in terms of creating a table, but in terms of using a service that makes the system remember information. What input do I need to send to what function to get the desired results?
In the case of a lookup function or a remember-this function, it makes sense to pass in a key that is matched with a value. Store this under this handle so I can get it back again.
> A function to get something (return it) - ok. > A function to set something - state, side effects come to mind. > A function to store something in. Huh? Storing & Retrieval oare sets and gets. The addition of the word "in" in that last one that isn't relevant. I don't model what the data are stored "in".
> > Additionally, the concept of a foreign key pops right out as a function > > mapping an attribute of one function to a "key" (by whatever name you [quoted text clipped - 11 lines] > > Same here. (Holistic this side of the pond) Terrific! I'm forever reading statements about how we must be sure to model the data separate from how they will be used.
--dawn
Gene Wirchenko - 12 Jul 2005 17:17 GMT [snip]
>It is easy enough to model a database relation with a set theory >relation, so this is not a problem, but when db relational theorists >get all preachy on this unordered point, it amuses me. Be careful about that amusement. Sometimes, it is because the other people know more about the situation than you do. Surely, you have heard the saying "Ignorance is bliss."?
[snip]
>So, remind me, what precisely is the problem with ordering the >attributes in relations? It results in ... because ... ??? If you say that the ordering is significant, then (a,b,c) and (a,c,b) are different. If you go purely by the name, they are the same. (I have omitted the types for simplicity. Assume that they are the same.)
This does not look like much, but consider a join. Now, you have to define which order the columns are in the join. Make sure you do it right, or you could end up with A join B and B join A resulting in different orderings.
[snip]
This is rather basic stuff, Dawn. You claim to have studied the material. Is your axe not ground enough yet?
Sincerely,
Gene Wirchenko
dawn - 12 Jul 2005 21:41 GMT > [snip] > [quoted text clipped - 5 lines] > other people know more about the situation than you do. Surely, you > have heard the saying "Ignorance is bliss."? I'll grant that my amusement could be due to ignorance. Will you grant that it might not be? I would suggest my amusement has more to do with an interest in watching how language is treated. When a relational theorist insists that columns be unordered, it is due to using a different definition of "relation" than what a mathematician uses, right? Do you think I am ignorant of the change in the definition of "relation" between mathematics and relational database theorists, ignorant of the reason that the change was made, ignorant of just how important it is to avoid logical ordering of logical columns, or what? I have learned a lot from this list and don't plan to stop learning, so please set me straight on this one, Gene.
> [snip] > [quoted text clipped - 13 lines] > it right, or you could end up with A join B and B join A resulting in > different orderings. I don't say that ordering is significant AT ALL, but I do say that using the term "relation" and then insisting that tuples be unordered is redefining the term, thereby muddying terminology and notation unnecessarily, in my opinion.
> [snip] > > This is rather basic stuff, Dawn. You claim to have studied the > material. Is your axe not ground enough yet? You aren't reading me quite right yet, brother. I have no axe to grind; I have problems to solve. One of them is how to build bigger-bang-for-the-buck software. I have several interests in this regard. One area is the way that data are modeled from user interface to persistence interface and everything else (e.g. validation, constraints, business rules).
There are reasons for decisions that have been made in defining relational theory and reasons why so many people use SQL as the api to persisted data. But are there times when using the same model of data throughout an application might be a good idea, rather than using a different API and different models for the UI and the database? If so, what are our options for how to model these data and what are the pros and cons of each? These questions and more...
Sifting out the best practices (such as discussions of "functional dependencies") from the religion in relational theory (such as the most common defs of 1NF taught to our college students) is one approach I'm taking. Every time there is a term with multiple definitions, particularly if the defs are mutually exclusive, I might initially be frustrated, but eventually I settle on "amused," perhaps as a coping strategy.
But I will admit that I have more questions than answers. smiles. --dawn
> Sincerely, > > Gene Wirchenko Gene Wirchenko - 13 Jul 2005 00:45 GMT [snip]
>> If you say that the ordering is significant, then >> (a,b,c) [quoted text clipped - 11 lines] >I don't say that ordering is significant AT ALL, but I do say that >using the term "relation" and then insisting that tuples be unordered ^^^^^^^^^^^^^^^^^^^ No, not insisting that tuples be unordered, but that their order is irrelevant to the RM and that you can not count on any specific order.
>is redefining the term, thereby muddying terminology and notation >unnecessarily, in my opinion. Different contexts, different definitions.
I think there may a question of which tuples are being referred to.
1) There are the tuples that define the relation. These are unordered for the reason I give above.
2) The tuples of data are also unordered. If you request data, you will not get it in any particular order unless you specify the order. (If it does not matter what order the data comes to you, then why have an order, even an implicit one? If it does matter, then specify the order you want.)
I have no problem with either lack of ordering.
[snip]
>taking. Every time there is a term with multiple definitions, >particularly if the defs are mutually exclusive, I might initially be >frustrated, but eventually I settle on "amused," perhaps as a coping >strategy. I get amused from time to time, too, but I watch it closely. Too much amusement may mean that I am missing the point of something.
>But I will admit that I have more questions than answers. The knowledge I prize the most is the knowledge of what I know and what I do not know.
Sincerely,
Gene Wirchenko
dawn - 14 Jul 2005 01:10 GMT <snip>
> >I don't say that ordering is significant AT ALL, but I do say that > >using the term "relation" and then insisting that tuples be unordered > ^^^^^^^^^^^^^^^^^^^ > No, not insisting that tuples be unordered, but that their order > is irrelevant to the RM and that you can not count on any specific > order. Completely agree. Tuples are ordered (by def typically, although one could define the term otherwise) and their order is irrelevant to the RM.
> >is redefining the term, thereby muddying terminology and notation > >unnecessarily, in my opinion. [quoted text clipped - 3 lines] > I think there may a question of which tuples are being referred > to. any of 'em, but I was referring to "column ordering" and not row ordering. The RM operators need not act on the column ordering info, but it might be useful to define and order for printing out a data catalog or whatever.
[snip]
> The knowledge I prize the most is the knowledge of what I know > and what I do not know. and someone once said that the most dangerous knowledge is that of which we are certain, but wrong. Cheers! --dawn
> Sincerely, > > Gene Wirchenko Eric Junkermann - 12 Jul 2005 21:02 GMT [snip]
>> Same here. (Holistic this side of the pond) > >Terrific! I'm forever reading statements about how we must be sure to >model the data separate from how they will be used. > >--dawn You are all programmers at heart, you are tricked by your need to process data into believing that it is yours. What you are writing today may be the only application now, but there _will_ be another, and another... If you don't consider the data in its own right, you are guaranteeing high cost (or even impossibility) for all future applications that will want to use this data.
To put it another way, data is the corporate asset, corporate survival means being able to use assets in new ways when required.
How much expensive ETL would we now need if data design had always been given a high priority?
 Signature Eric Junkermann
dawn - 12 Jul 2005 22:01 GMT > [snip] > >> Same here. (Holistic this side of the pond) [quoted text clipped - 5 lines] > > You are all programmers at heart, I'll accept that.
> you are tricked by your need to > process data into believing that it is yours. but not that. What I will accept is that changes will be required and I need to mitigate those changes I think are most likely, using a risk assessment strategy. That includes changes that affect all aspects of this application including the UI, the business rules, etc. and changes that include other "services" needing to work with the same data or share any other aspects of an application. If one application needs to "remember" (in human terms) anything and another application arises that needs access to that same "memory" (again, human terms), whether what is being remembered is called "code" or "data" or WHATEVER, then our design needs to accomodate this sharing of "memory".
> What you are writing today > may be the only application now, but there _will_ be another, and you can bet that I know that -- this is a huge reason I'm doing the poking around. I don't buy everything the eXtreme A/D folks claim in designing and implementing only for what is in front of me and not looking past my nose, but we just might be going too far when we try to think of data as if it needs no context.
> and > another... If you don't consider the data in its own right, you are > guaranteeing high cost (or even impossibility) for all future > applications that will want to use this data. Nope, I don't buy that. If we don't have flexible software development languages, models, tools, etc, then we are guaranteeing high costs. There are tradeoffs to constraining the length of a field, for example, and allowing any length. The latter is more flexible, which might yield better quality of data over time as requirements change if the user doesn't feel forced to truncate information to fit their existing software. The former forces more conformity, which might provide better quality of data by a tighter definition and enforcement of todays constraints, with the (often false) assumption that if requirements change then our implementation will change concurrently. There isn't one right way, but some choices are better than others in a given situation and some are commonly better choices than others.
> To put it another way, data is the corporate asset, agreed
> corporate survival > means being able to use assets in new ways when required. yes, indeed, but be sure you are looking at the whole picture and not just focusing on one at the risk to others.
> How much expensive ETL would we now need if data design had always been > given a high priority? Do you really think that most ETL is done because data are NOT modeled relationally? I suspect (but do not know) that more is done because they are (and, of course, also because of a need to fix data at a point in time in systems that do not have a time dimension built in). It might even be the case that those shops that do the least ETL are those who are not using SQL-DBMS's. I have some anecdotal reasons to believe this might be the case. Cheers! --dawn
Eric Junkermann - 13 Jul 2005 21:59 GMT >> [snip] >> >> Same here. (Holistic this side of the pond) [quoted text clipped - 12 lines] > >but not that. What I will accept is that changes will be required [snip]
>our design needs to accomodate this sharing of "memory". and if the shared "memory" is the wrong "shape", so that our path from "given" to "required" for the new application is too complicated, we will pay for it somehow. What I am saying is that if you use data in a certain way you will tend to bundle it in that way, even if you are trying to be flexible, so the best bundle is an open bundle, which will have different ways in.
[snip]
>There are tradeoffs to constraining the length of a field, for example, >and allowing any length. but this is a physical issue, an implementation issue, not a design issue, the logical data design does not care.
[snip]
>Do you really think that most ETL is done because data are NOT modeled >relationally? Did I mention relational?
ETL happens because data is the wrong "shape", the type of system it is stored in is not the main problem. Given similar needs, the ones that do the least ETL are those with the most flexible data design.
Maybe I shouldn't have dragged ETL in, but my main point remains that data independence is a good thing - which is where Codd came in, whether you like his answer or not.
Regards,
Eric
 Signature Eric Junkermann
dawn - 14 Jul 2005 18:17 GMT > >> [snip] > >> >> Same here. (Holistic this side of the pond) [quoted text clipped - 18 lines] > "given" to "required" for the new application is too complicated, we > will pay for it somehow. I'm with you there.
> What I am saying is that if you use data in a > certain way you will tend to bundle it in that way, even if you are > trying to be flexible, Yes, I completely agree.
> so the best bundle is an open bundle, which will > have different ways in. That is a bit vague. It is best to weigh the options and risks for various approaches.
> [snip] > >There are tradeoffs to constraining the length of a field, for example, > >and allowing any length. > > but this is a physical issue, I don't think of the conceptual data model as having such constraints in it, but I do think of such constraints being in the logical model if such a constraint is required.
> an implementation issue, not a design > issue, the logical data design does not care. Hmm. If our current business logic is that our product codes are a maximum of eight characters, then I would think that would be a constraint in the logical design, no?
> [snip] > > >Do you really think that most ETL is done because data are NOT modeled > >relationally? > > Did I mention relational? I guess not, but it is only relational theorists that I have heard suggest that data be considered outside of any context (I might not be saying that right, so perhaps you can restate your original point in your own words?)
> ETL happens because data is the wrong "shape", the type of system it is > stored in is not the main problem. I agree with you on the "shape" issue. The systems I have seen that do not end up forcing customers into data marts to the hilt include more operations than just relational ones & permit more complex data types than simply relations, thereby providing what I would see as more "shapes" for the data. That was my point.
> Given similar needs, the ones that > do the least ETL are those with the most flexible data design. Yes, and those I have seen get some of that that flexibility from not being tied to only the relational model.
> Maybe I shouldn't have dragged ETL in, but my main point remains that > data independence is a good thing - There simply is no such thing as useful words (data & metadata) without context.
> which is where Codd came in, whether > you like his answer or not. I could be wrong, but I don't think that was the point of his efforts. Codd modeled sets of propositions as relations and worked with set operators and such, thereby introducing some much needed formalism. Don't get me wrong, that was definitely a good thing.
My point is that if you have a business domain that you are modeling, you need to model the data from front to back in this system in line with the known requirements. You should identify risks by understanding what you are modeling.
So, for example, if you model a toenail as if it were a property of a person, it is a good idea to be aware whether that toenail can still be a toenail without a person anywhere around. It is the risk assessment that should determine whether or not it is wise to give individual toe numbers their own identity and prepare a relationship between people and toenails rather than making toenail an attribute of person. There is obviously a cost in doing so and all involved parties might consider this toenail situation and deem it to be so unlikely that the owner of this application asset would ever care about the toenails after they have been clipped or torn off a person that there is no reason to model them as anything other than a property of the person.
Does that explain my position in a way you agree with or do you still think you can do some "data in a vacuum" (my words, not yours, so feel free to correct my understanding of your position) type of modeling?
cheers! --dawn
> Regards, > > Eric Jan Hidders - 08 Jul 2005 23:42 GMT >>>>It's really not a good idea to confuse the concept of >>>>relation and function. [quoted text clipped - 6 lines] > > Yes, that is what I am saying, but I'll try to say it better. Ok. That might well solve it because basically my point was that your terminology was too sloppy for my taste.
> Every database relation can be modeled as a mathematical relation. Agreed. From a mathematical point of view it doesn't make much sense to distinguish the two.
> Every database relation that has a candidate key can be modeled as a > mathematical function as well. Sure. But let me emphasize once more that *every* database relation has at least one candidate key. This is not up for discussion, it's a simple mathematical fact. I suspect that the case that you are overlooking is the one where that candidate key happens to be the set of all columns, in which case the function you would associate with it is probably the function that maps each tuple to itself.
By the way, which function do you chose? If I have R(A,B,C,D) with candidate key AB then do we take function AB->ABCD or AB->CD? You seem to prefer the first. Do you have a specific reason for that?
>>Since every relation has at least >>one candidate key > > I don't have a handy list of all definitions of "relation" handy, but I > thought that some such definitions did not require candidate keys, > although I suspect most modern ones do. It doesn't have to be required, it usually follows from the fact that the relation is a set.
> Did that clarify? Does it make sense? Do you accept it? I have no problem with statements like "I can model every relation with a candidate key CK as a function that maps a value for the CK to a tuple of that relation". But for me that is very different from saying that the two concepts are the same. Such a formulation also does more justice to the fact that there might in fact be several candidate keys which all define different functions.
> In spite of the fact there are different definitions for "database > relation" and "mathematical relation" I still think it helps to teach > the basics of data modeling by starting with language, > deriving/creating appropriate predicates and corresponding tables, then > modeling these as functions. Functions seem easier for people to grasp > than relations, so I start with sets & functions. I've heard similar arguments before from people who think that when you explain normalization you should first explain things under the assumption that there is only one candidate key. That one-key assumption also seems to be hovering in the background of your terminology. In my experience there is always a backlash once you get to the 3NF and BCNF because there you have to deal with more than one candidate key. That means that the student then has to make two big conceptual jumps at once: understanding >1 cand. keys, and BCNF.
Note by the way that the link between functional dependencies and interpreting relations as functions is very close. If the relation R(A,B,C,D,E) has the dependencies AB->C and BD->E and ABD->C the I wouldn't say it represents a function, I would say it represents three functions.
-- Jan Hidders
Gene Wirchenko - 11 Jul 2005 17:55 GMT [snip]
>Note by the way that the link between functional dependencies and >interpreting relations as functions is very close. If the relation >R(A,B,C,D,E) has the dependencies AB->C and BD->E and ABD->C the I >wouldn't say it represents a function, I would say it represents three >functions. I struggle to follow these arguments. In the above, I think that there is an error or something questionable. If AB->C then ABanything->C (as in ABD->C), but what is the point?
Sincerely,
Gene Wirchenko
Jan Hidders - 12 Jul 2005 22:28 GMT > [snip] > [quoted text clipped - 7 lines] > there is an error or something questionable. If AB->C then > ABanything->C (as in ABD->C), Yes, well observed, the FD ABC->C is redundant.
> but what is the point? A relation can sometimes represent several functions, not just one.
-- Jan Hidders
Jan Hidders - 12 Jul 2005 23:44 GMT >> [snip] >> [quoted text clipped - 9 lines] > > Yes, well observed, the FD ABC->C is redundant. !@#$ What I wanted to say was: the FD ABD->C is redundant.
-- JH
dawn - 12 Jul 2005 12:09 GMT > >>>>It's really not a good idea to confuse the concept of > >>>>relation and function. [quoted text clipped - 9 lines] > Ok. That might well solve it because basically my point was that your > terminology was too sloppy for my taste. I'll accept that. When applying a mathematical model within a field that has taken mathematical terms and given them new definitions is problematic. A candidate key of a relation could be modeled as the domain of a function -- gotta love language, eh?
> > Every database relation can be modeled as a mathematical relation. > > Agreed. From a mathematical point of view it doesn't make much sense to > distinguish the two. I suspect that what you mean here is that when you are doing any mathematics, such as set operations, with relations, you are using the mathematical definition of the relation. Could we possibly rephrase your statement as "when working with the mathematical model of the relation, we work only with the mathematical definition of a relation"? We can ditch the database relation definition (we need not care about meaningful names for attributes and ensuring no ordering of such, for example) when doing set operations. So, when do we need to use the database definitions for relation? Could we use terms like "table", "folder" or even "file" (gasp) and stop using muddying things by using definitions of "relation" that both add to and subtract from the mathematical relation definition?
> > Every database relation that has a candidate key can be modeled as a > > mathematical function as well. > > Sure. But let me emphasize once more that *every* database relation has > at least one candidate key. This is not up for discussion, it's a simple > mathematical fact. I'll buy that it is a fact by definition. That is yet another way that the term "relation" has been butchered. If every database relation has at least one candidate key, then if we choose to model that relation as a mathematical relation, then our mathematical relation has candidate keys.
> I suspect that the case that you are overlooking is > the one where that candidate key happens to be the set of all columns, > in which case the function you would associate with it is probably the > function that maps each tuple to itself. I wasn't overlooking that case (since I'm working from an existing model) but there are two different functions that are used with it as in these examples:
Person("12345") = ("John", "Smith", "M") and Person("12345") = ("12345", "John", "Smith", "M")
So in the case of a candidate key consisting of all attributes, both functions
f(CK) = CK g(CK) = null set
are both useful. Sloppy notation (where is the tuple or set notation?) is on purpose
> By the way, which function do you chose? If I have R(A,B,C,D) with > candidate key AB then do we take function AB->ABCD or AB->CD? You seem > to prefer the first. Do you have a specific reason for that? I guess I just answered that. Both are useful. I prefer AB->ABCD because the entire tuple, which could have candidate keys other than AB, is together, thereby getting the benefit of the function model and the relation.
> >>Since every relation has at least > >>one candidate key [quoted text clipped - 5 lines] > It doesn't have to be required, it usually follows from the fact that > the relation is a set. You lost me here. How do you define a candidate key in mathematics so that every set must have one?
> > Did that clarify? Does it make sense? Do you accept it? > > I have no problem with statements like "I can model every relation with > a candidate key CK as a function that maps a value for the CK to a tuple > of that relation". And that is what I'm saying with the additional statement that I find it useful to do so.
> But for me that is very different from saying that > the two concepts are the same. I'm not sure if/where I might have said that. My intent was to answer the question of what else people might call this idea of "base normal form" when teaching these concepts once you have a table and at least one candidate key. I use the term "function" and show how you have a function mapping key values to tuple values. This makes the concepts very simple, perhaps too simple for some.
> Such a formulation also does more justice > to the fact that there might in fact be several candidate keys which all > define different functions. Absolutely. It makes for a good way to introduce multiple candidate keys into the discussion -- multiple mappings/functions.
> > In spite of the fact there are different definitions for "database > > relation" and "mathematical relation" I still think it helps to teach [quoted text clipped - 6 lines] > explain normalization you should first explain things under the > assumption that there is only one candidate key. I don't think such an assumption is important, but I can see starting with a single key, one composed of a single attribute, to introduce the topic of candidate keys, then lead up to more than one. Functions are an easy way to illustrate that, going from one simple key to multiple composite candidate keys.
> That one-key assumption > also seems to be hovering in the background of your terminology. Yes, in that each function only models one of the candidate keys. But, a candidate key is then incorporated into that function and corresponding notation. You can see that you have a key when you are modeling your db relation as a function. The relation can be modeled as multiple functions, but some are more useful than others in any given situation.
> In my > experience there is always a backlash once you get to the 3NF and BCNF > because there you have to deal with more than one candidate key. That > means that the student then has to make two big conceptual jumps at > once: understanding >1 cand. keys, and BCNF. The non-SQL-RDBMS implementation that I work with most often does fix a single candidate key as the "primary key" but I think of this approach as being implementation-independent, so I'll reflect on that again.
> Note by the way that the link between functional dependencies and > interpreting relations as functions is very close. If the relation > R(A,B,C,D,E) has the dependencies AB->C and BD->E and ABD->C the I > wouldn't say it represents a function, I would say it represents three > functions. YES, YES, YES -- SO WOULD I!!! It's all about functions and it's all about data (so can we drag that "relations" word to the trash when introducing these concepts and make it simple?)
Cheers! --dawn
> -- Jan Hidders Marshall Spight - 12 Jul 2005 16:19 GMT > [a lot of stuff] Dawn,
Uh, wow.
I don't *want* to ditch the word relation; I don't have another word that comes close. "Function" doesn't cut it. I also don't see any particular conflict or even a difference between mathematics and computer science; I consider CS to be a branch of math, with some particular areas of emphasis, such as how much work calculating a particular function is.
And what's wrong with "a subset of the product of sets?" That's not all that complicated. It's no more complicated than "a mapping from one set to another."
> > It doesn't have to be required, it usually follows from the fact that > > the relation is a set. > > You lost me here. How do you define a candidate key in mathematics so > that every set must have one? A "set" is a collection of *unique* members. Because of that word "unique" there has to be a candidate key.
Marshall
dawn - 12 Jul 2005 21:06 GMT > > [a lot of stuff] > > Dawn, > > Uh, wow. Glad you enjoyed? :-)
> I don't *want* to ditch the word relation; I don't have another > word that comes close. "Function" doesn't cut it. Because ... ? Granted, it is but one way to model data. I don't insist on a single way of viewing the subject. However, I can discuss the topic of data modeling quite thouroughly without ever using the term "relation" but not without using the term "function".
> I also > don't see any particular conflict or even a difference between > mathematics and computer science; I consider CS to be a branch > of math, with some particular areas of emphasis, such as > how much work calculating a particular function is. So one might hope that CS would not take a solid, generally agreed upon, mathematical term and redefine it. But that does seem to be what has happened, unfortunately.
> And what's wrong with "a subset of the product of sets?" That's > not all that complicated. I like it. I haven't seen a product of sets that has unordered "columns" however.
> It's no more complicated than "a mapping > from one set to another." Functions are a subset of relations that is a little easier to teach, however. In kindergarten the students match an item on the left hand side of the page to exactly one on the right before any exercises that permit the more generalized form of a relation, allowing an item on the left to have a line to more than one item. So, I would argue that there is some slight advantage to introducing the concept of a function and then generalizing instead of starting with relations and then narrowing it down.
> > > It doesn't have to be required, it usually follows from the fact that > > > the relation is a set. [quoted text clipped - 4 lines] > A "set" is a collection of *unique* members. Because of that word > "unique" there has to be a candidate key. OK, I'm with you and willing to hit my forehead and say "duh". Saying that because something is a set it must be a function is not accurate, but, yes, there is a function that can be associated with any set. If no subset of the element/tuple variable is a candidate key, then the entire element would be. Sorry I missed the point. --dawn
Marshall Spight - 13 Jul 2005 04:29 GMT > > > [a lot of stuff] > > [quoted text clipped - 3 lines] > > Glad you enjoyed? :-) let's say I was engrossed.
> > I don't *want* to ditch the word relation; I don't have another > > word that comes close. "Function" doesn't cut it. > > Because ... ? For one thing, in my field, the term 'function' has a strong connotation of being intentional, and relation has a strong connotation of being extentional. enough so that if I mean the other kind of each I explicitly specify.
> > I also > > don't see any particular conflict or even a difference between [quoted text clipped - 5 lines] > upon, mathematical term and redefine it. But that does seem to be what > has happened, unfortunately. It doesn't look much that way to me. And if it had, what would be the problem? words change meaning as they move from one field to another all the time.
> > And what's wrong with "a subset of the product of sets?" That's > > not all that complicated. > > I like it. I haven't seen a product of sets that has unordered > "columns" however. the product operation on integers and on relations is commutative, so I don't see any basis for describing the operands of either as ordered.
> > It's no more complicated than "a mapping > > from one set to another." [quoted text clipped - 7 lines] > and then generalizing instead of starting with relations and then > narrowing it down. I respect that you have pedagogical issues to deal with but I don't see that they are relevant here.
Marshall
Drago Ganic - 17 Jul 2005 21:25 GMT Dawn, functions and relations have in CS and mathematics two very important but different meanings: function ==> computation, action/transformation relation ==> storage, state
The most fundamental objects in mathematics are functions and sets (=> relations).
Functions cannot be UPDATEed / DELETEed or MODIFYed ... those are done on data which are stored somewhere. Functions just SELECTs data i.e. gives/calculates value(s) FROM somewhere.
Where is the confusion? This newsgroup is about storage, hence sets. So sets (incl. relations) are a good staring point for teaching databases (also in kindergarten :-)
Greeting from Croatia, Drago Ganic
>> > [a lot of stuff] >> [quoted text clipped - 54 lines] > no subset of the element/tuple variable is a candidate key, then the > entire element would be. Sorry I missed the point. --dawn dawn - 18 Jul 2005 05:49 GMT > Dawn, > functions and relations have in CS and mathematics two very important but > different meanings: > function ==> computation, action/transformation > relation ==> storage, state I am quite sure that any mathematical definition of a function will ensure that the function is a relation. Please look up the definitions of these mathematical terms and let me know if that is not what you find.
> The most fundamental objects in mathematics are functions and sets (=> > relations). I just deleted my full response to that and I'll only respond to the last part of your statement. A relation is a set (and a set does not imply a relation, if that is what your notation was to indicate). A function is a relation. Therefore a function is a set.
A function is a set that can be represented by showing that for each element in the domain of the function (you could think of the domain of a function to be the set of all values from which a key -- CK -- might come) you can identify exactly one element in the range of the function (a tuple, for example). We can say that a function PERSON maps the identifier "12345" to the tuple ("12345", "Rena", "Bakker"), for example, where that tuple is an element of the relation that is more specifically a function, and more generally a set.
> Functions cannot be UPDATEed / DELETEed or MODIFYed ... those are done on > data which are stored somewhere. Of course they can. You are relying too much on how you perceive a function and not on what it is defined to be.
> Functions just SELECTs data i.e. > gives/calculates value(s) FROM somewhere. Are you saying that for any value from the set of possible inputs to the function, there is a single output? Now look at the set of tuples with all input and corresponding output values. That would be a relation, right?
> Where is the confusion? This newsgroup is about storage, there are more than a few folks here who would differ with you on this, but I'll go along with it -- storage by whatever name
> hence sets. Storage, hence sets? You can't store a bag/multiset?
> So sets > (incl. relations) are a good staring point for teaching databases (also in > kindergarten :-) I'm good with sets. Sets & the more specific flavor of sets that are functions are actually both good for kindergarten. I've even seen relations that are not sets there. When a student graduates from high school, they typically have heard definitions of two of those three terms -- the two I like to start with.
> Greeting from Croatia, and back to you from the middle of corn fields in Iowa USA. Cheers! --dawn
> Drago Ganic > [quoted text clipped - 56 lines] > > no subset of the element/tuple variable is a candidate key, then the > > entire element would be. Sorry I missed the point. --dawn Jan Hidders - 12 Jul 2005 23:17 GMT >>>>>>It's really not a good idea to confuse the concept of >>>>>>relation and function. [quoted text clipped - 13 lines] > that has taken mathematical terms and given them new definitions is > problematic. Oh, come on, Dawn. Really! It is quite clear that there is a close relationship between a relation as defined in the relational model and the usual mathematical notion of relation. Just take a look in the "Foundations of databases" book by Abiteboul, Hull and Vianu. (That's a graduate text I usually recommend to PhD students.) They happily switch between the two definitions and use whichever one is most practical for the thing they want to present in the full knowledge they could use either of them.
> A candidate key of a relation could be modeled as the > domain of a function -- gotta love language, eh? That's still way too sloppy and I think you can do better than that.
>>>Every database relation that has a candidate key can be modeled as a >>>mathematical function as well. [quoted text clipped - 8 lines] > a mathematical relation, then our mathematical relation has candidate > keys. Mathematical relations also have always at least one candidate key, but here you represent them as a set of numbers instead of a set of column names. For every mathematical relation it is true that there is a set of positions such that no two distince tupels in the relation have the same values in these positions.
>>>Did that clarify? Does it make sense? Do you accept it? >> [quoted text clipped - 4 lines] > And that is what I'm saying with the additional statement that I find > it useful to do so. I also have no problem with the additional statement. :-)
>>But for me that is very different from saying that >>the two concepts are the same. [quoted text clipped - 5 lines] > function mapping key values to tuple values. This makes the concepts > very simple, perhaps too simple for some. How about "a relation is a set, so every tuple in it occurs at most once". Is that too difficult for you? :-)
>>Note by the way that the link between functional dependencies and >>interpreting relations as functions is very close. If the relation [quoted text clipped - 5 lines] > about data (so can we drag that "relations" word to the trash when > introducing these concepts and make it simple?) I'm sorry? Explaining the relational model gets simpeler if you don't use or define the word "relation"?
-- Jan Hidders
dawn - 13 Jul 2005 02:56 GMT <snip>
> > I'll accept that. When applying a mathematical model within a field > > that has taken mathematical terms and given them new definitions is [quoted text clipped - 8 lines] > the thing they want to present in the full knowledge they could use > either of them. Maybe the problem here is that I was accepting Date's def from his latest edition of an intro text. It really seems to be quite far removed from a clean, clear def of "relation" and I was guessing that there are many undergrads learning that or similar definitions.
> > A candidate key of a relation could be modeled as the > > domain of a function -- gotta love language, eh? > > That's still way too sloppy and I think you can do better than that. That was purposely sloppy -- many words with multiple meanings in multiple contexts.
> >>I have no problem with statements like "I can model every relation with > >>a candidate key CK as a function that maps a value for the CK to a tuple [quoted text clipped - 4 lines] > > I also have no problem with the additional statement. :-) Yes, I gathered that. I suspect I'm working in a much simpler context than you are, however. For example, I might be explaining to a web page developer who has never had a database behind their pages before, how to model data. I would not bring in the term "relation" very quickly, where I would use the term "function" right away. Look how many people, even software development professionals, explain relations by saying something about how there are multiple tables and they are related to each other. The term "relation" really isn't all that helpful when getting practitioners up and going, in my opinion.
> How about "a relation is a set, so every tuple in it occurs at most > once". Is that too difficult for you? :-) You know me, I'm pretty dense ;-)
> I'm sorry? Explaining the relational model gets simpeler if you don't > use or define the word "relation"? Remove that second question mark and I do believe we are getting somewhere. smiles. --dawn
Jan Hidders - 13 Jul 2005 22:24 GMT > <snip> > [quoted text clipped - 15 lines] > removed from a clean, clear def of "relation" and I was guessing that > there are many undergrads learning that or similar definitions. They probably do, yes, although AFAIK the sales of his intro books are dropping. I think I already discussed in the past that his definitions can be somewhat simplified, so I won't comment on that here.
>>>A candidate key of a relation could be modeled as the >>>domain of a function -- gotta love language, eh? [quoted text clipped - 3 lines] > That was purposely sloppy -- many words with multiple meanings in > multiple contexts. Ah, my apologies, I think I misunderstood what you wanted to say with that sentence.
> [...] I suspect I'm working in a much simpler context > than you are, however. For example, I might be explaining to a web > page developer who has never had a database behind their pages before, > how to model data. I would not bring in the term "relation" very > quickly, where I would use the term "function" right away. Hmm. I think I understand where you are coming from, but this has probably more to do with you and the way you think about data modelling than with some inherent difficulty in understanding of the relational model. I have taught at university level and sub-university level, also to business students and chemistry students and I would *never* explain things in terms of functions. The first time the term would show up would probably at the point where I would discuss normalization.
One of the dangers with viewing relations as functions is that it also tends to promote a view of the database where its only purpose is to retrieve data given a certain key. That much too narrow view is exactly the reason why so many web-applications have incredible crappy database-programming behind it, where they forget you can also have other indexes and use the table "backwards" or where joins are not done in SQL but in PHP. If I would teach web-developers I would see it as one of my biggest tasks to prevent such misuse of the DBMS.
So I'm afraid I'm still very very wary of your terminology.
-- Jan Hidders
dawn - 14 Jul 2005 05:20 GMT > > <snip> > >>>A candidate key of a relation could be modeled as the [quoted text clipped - 16 lines] > Hmm. I think I understand where you are coming from, but this has > probably more to do with you dang
> and the way you think about data modelling like a programmer, right?
> than with some inherent difficulty in understanding of the relational > model. While the average comp sci major can figure out relations from the way it is typically explained, I don't like the way we have carved out the dbms from the rest of the application, modeling stored, persisted, remembered data; validating such data; naming such data; etc decidedly differently than other data that will hang around for less time. The current language does nothing to promote a holistic approach to software development unless you use mountain man's approach of hauling everything into the dbms. I think that is the next best thing to hauling it all out of "there" (the typical sql-dbms). (The last statement was for your amusement only, mostly, sortof).
> I have taught at university level and sub-university level, also > to business students and chemistry students and I would *never* explain > things in terms of functions. The first time the term would show up > would probably at the point where I would discuss normalization. I'm sure. So we disagree. You are much more knowledgeable about the theory than I and almost all of my concerns are those of an "old" (seasoned?) practitioner. I've watched the software development industry devolve in some areas where I'd like to see us take another look at the partitioning of the discipline and our software products. When I led back to back projects with a team doing work in a "pre-relational" database and then in a SQL-DBMS, a light went on, even if the words, theory, and experiments to back it up aren't yet there -- a "Blink" moment, if you have read that book.
Trimming down the terms so we can see where we have like items and where we don't by looking at the basics -- input to functions, processing by functions, output from functions -- is one small step in trying to show that databases need not seem altogether different than the rest of the work in developing software. It's all about data and it's all about functions. Input-processing-output. End of mystery.
> One of the dangers with viewing relations as functions is that it also > tends to promote a view of the database where its only purpose is to > retrieve data given a certain key. Ok, so you are, indeed, tapping into my brain (pretty scary). One of the dangers of declaring "stored" database functions as "relations" is that they seem so distant, as if we cannot access them. Set operators are fine and dandy, but people understand single transactions handily. They pay at the grocery store, get money from the bank, and type their name into ba-zillion web pages that require it. Start with individual "records" (as in "the doctors office keeps a paper copy of my ct scan record") and show that you have an API for it, then go to sets. How do we collect data? One record at a time. (I seem to be in a mood as if trying to entice Fabian to quote me, eh?)
> That much too narrow view it is equally as narrow as describing data only in terms of sets and set operators. I would move to aggregate (by some defs) data soon enough and introduce the functions that can be applied to these "stored" functions.
> is exactly > the reason why so many web-applications have incredible crappy > database-programming behind it, NOW we are addressing a similar issue, but with a somewhat different diagnosis. A lot of the LAMP, uh, crap, that is going into production is written by people who have learned what a relation is and how to put it in 3NF, without a clue how to develop quality, maintainable, flexible, scalable, etc, software. Programming languages are taught in one undergrad course and data modeling in another, with different words, concepts, pictures, tools, naming standards, ...
> where they forget you can also have > other indexes and use the table "backwards" or where joins are not done > in SQL but in PHP. MySQL is setting us back more than a little, methinks. But, yes, all other functions need to be brought into the discussion, but I like starting with put and get -- lookUp.
I'm starting to dabble with AJAX (like thousands of others, but I am the first girl on my block to give it a spin) and I can cope with XHTML, CSS, and even JavaScript and the dog-ugly way you have to instantiate an http request object, but every time I think I'll try to figure out more than the basics of PHP, I gag. Now put that together with a striped down SQL-RDBMS and a person armed with the definition of a relation and rules for normalization and I am just really starting to sound like an old person, sheesh!
> If I would teach web-developers I would see it as one > of my biggest tasks to prevent such misuse of the DBMS. I'm thinking I'll prevent misuse by preventing use. From the file system and indexed sequential files to the DBMS and back to the file system with a semantic web or .xml files. Waddaya think? (Don't answer.)
> So I'm afraid I'm still very very wary of your terminology. Golly gee, I haven't a clue why ;-) But you gotta admit that we have to mix something up here as software development in general and data modeling in particular are becoming a DIY (do it yourself) hobby with a backlash against more traditional approaches. Since the SQL-RDBMS is in need of a significant face lift anyway, I'm thinking there will be a breakthrough there with a not-only-SQL/relational tool going neck and neck with MySQL, reaping the beneifts of XQuery for gets and puts, for example. I'd like to write a database app with no SQL in it (of course I can do that with 1970's tools today, but I'd like to do it with new tools).
> -- Jan Hidders Apologies for being too long and wordy, but I'm turning in rather than proofreading and trimming back. Hopefully it was entertaining enough and not too stupid. Cheers! --dawn
Marshall Spight - 14 Jul 2005 20:01 GMT > While the average comp sci major can figure out relations from the way > it is typically explained, I don't like the way we have carved out the [quoted text clipped - 6 lines] > hauling it all out of "there" (the typical sql-dbms). (The last > statement was for your amusement only, mostly, sortof). I agree that this separation is a real problem for software development. I believe that it is a historical accident, largely driven by the fact that dbms's have been completely dominated by commercial interests, while programming languages have had a much freer development path. The problem is, indeed, that we segment these concerns (application code and data management) but I don't believe there is actually any contest here. The goal isn't for one side to "win." Rather, each side has some really important lessons they have learned. The dbms tradition is quite impoverished when it comes to writing application code, and the application code side is really quite limited when it comes to what they can do with data management.
I believe the solution lies in a new, higher-order view that takes into account the lessons of both sides. That means we want the best features of each. We need declarative integrity constraints; we need the relational algebra, AND we need clever type systems, polymorphism, modularity, etc. We need to have these tools all available "right here" so they can be used together seamlessly. The database should not feel "far away."
(As an aside, I don't think that dynamically typed languages have a lot to offer for the enterprise. They're great for prototyping and for quite-small teams, but they don't scale up to large development processes. Just my opinion.)
> Ok, so you are, indeed, tapping into my brain (pretty scary). One of > the dangers of declaring "stored" database functions as "relations" is [quoted text clipped - 6 lines] > we collect data? One record at a time. (I seem to be in a mood as if > trying to entice Fabian to quote me, eh?) I think this approach makes sense if we're trying to come up with a view of programming and data management that we want to be able to teach in high school, but I don't see why this approach is the one we should take for the professional software engineer.
And in fact, I assert that this kind of record-at-a-time approach is a disaster for scalability. It works fine for hundres of records, and maybe for thousands, but for tens of thousands, or hundreds of millions, it's a disaster. The computing landscape is distributed now, for good or ill, and we have to build systems that don't send a network packet for every record, or every keypress, or whatever.
Marshall
David Cressey - 08 Jul 2005 03:44 GMT > I think that would be very confusing. First, his table actually has > ordering, functions don't. It's not clear to me that a table has ordering, if I understand you correctly.
The columns of a table can be referred to by name. While the names can be sorted alphabetically, it's not clear that columns have an inherent order.
The rows of a table can be assigned identifying numbers, if desired. Those numbers have a natural order, but it's not clear that the rows themselves do.
That's distinct from the contents of the rows, which can be ordered if the domains of the columns are ordered.
And, of course, at some layer of representation, it all has order.
dawn - 08 Jul 2005 17:24 GMT > > I think that would be very confusing. First, his table actually has > > ordering, functions don't. > > It's not clear to me that a table has ordering, if I understand you > correctly. I don't know if "table" has a commonly agreed upon def (and I don't know if it is in the cdt glossary), but I suspect that Jan is right that column ordering would be typical of the common use, if not the definition, of the term. If you use that word without a definition, then there is likely an assumption in the mind of the listener that they can see the table as rows and columns and that if someone else saw this same table, they would see it with the columns in the same order.
Do you have a definition you work with when talking about tables? I tend to refer to them as spreadsheets since the audience can relate to that easily. While spreadsheet columns can be reordered, at any point in time they have an ordering.
> The columns of a table can be referred to by name. While the names can be > sorted alphabetically, it's not clear that columns have an inherent order. What is the distinction you make between a table and a relation?
> The rows of a table can be assigned identifying numbers, if desired. Those > numbers have a natural order, but it's not clear that the rows themselves [quoted text clipped - 4 lines] > > And, of course, at some layer of representation, it all has order. Since we are talking about the definition of a term, I suppose you could define it however you like, but perhaps we could find one or more defs and put them in the glossary since mAsterdam is back.
Cheers! --dawn
mAsterdam - 09 Jul 2005 03:42 GMT >>>I think that would be very confusing. First, his table actually has >>>ordering, functions don't. [quoted text clipped - 4 lines] > I don't know if "table" has a commonly agreed upon def (and I don't > know if it is in the cdt glossary), Nope. Except for Jans excellent contribution on NULL and a rephrase of data and meaning in wikipedia nothing much has changed since 0.0.4. If there is more I'll take the time to make a 0.0.5 - or jump to 0.1.0. There was a sub-thread with nice cdt glossary stuff about possreps and unions (by VC and Jon), but nothing copy & pastable enough for the glossary yet :-(
> but I suspect that Jan is right > that column ordering would be typical of the common use, if not the [quoted text clipped - 25 lines] > could define it however you like, but perhaps we could find on |
|