Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / July 2005

Tip: Looking for answers? Try searching our database.

Base Normal Form

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
David  Cressey - 04 Jul 2005 02:48 GMT
I'd like to suggest a new Normal Form definition, one that I'm calling Base
Normal Form, for lack of a better term.

The purpose is NOT to advance theory any further.  It's to make it easier to
teach introductory database design.

Here it is:

A table is in base normal form if and only if it has at least one candidate
key.

Note that the definition is for tables and not relations.  All relations are
in Base Normal Form, so the definition is superfluous with regard to
relations.

I haven't defined candidate key, but I would want the definition to be
compatible with the definition of candidate key as used in BCNF.

I'm speculating that this definition,  together with BCNF,  would be
sufficient to cover introductory DB design, skipping over 1NF, 2NF, and 3NF,
and leaving normal forms beyond BCNF for more advanced treatement.
Jan Hidders - 04 Jul 2005 20:41 GMT
David Cressey wrote:
> I'd like to suggest a new Normal Form definition, one that I'm calling Base
> Normal Form, for lack of a better term.

In some texts I've seen it called the UNF (unnormalized normal form) or
the NFNF (non first normal form). To make things confusing, it is also
equivalent with Chris Date's definition of 1NF.

> I haven't defined candidate key, but I would want the definition to be
> compatible with the definition of candidate key as used in BCNF.

That's do-able. If you define a table as a list of tuples that all have
the same attributes, then a superkey is a set of attributes K such that
there cannot be two different positions in the table such that the
tuples at these positions agree on the values for K. The notion of
candidate key is then based on the notion of superkey in the usual way.

Note that this definition is sloppy in the usual way where we ignore the
fact that a candidate key is not so much a property of a single relation
but rather of the set of all relations that are valid for a certain relvar.

> I'm speculating that this definition,  together with BCNF,  would be
> sufficient to cover introductory DB design, skipping over 1NF, 2NF, and 3NF,
> and leaving normal forms beyond BCNF for more advanced treatement.

I wouldn't skip 3NF. The difference between BCNF and 3NF is an important
one and the decision to go beyond 3NF should be an informed one.

-- Jan Hidders
David  Cressey - 06 Jul 2005 02:30 GMT
> David Cressey wrote:
> > I'd like to suggest a new Normal Form definition, one that I'm calling Base
[quoted text clipped - 3 lines]
> the NFNF (non first normal form). To make things confusing, it is also
> equivalent with Chris Date's definition of 1NF.

Thanks, Jan.

I guess I'll stick with "Base Normal Form" for now.  Are UNF and NFNF
defined on tables or on relations?

> > I haven't defined candidate key, but I would want the definition to be
> > compatible with the definition of candidate key as used in BCNF.
[quoted text clipped - 4 lines]
> tuples at these positions agree on the values for K. The notion of
> candidate key is then based on the notion of superkey in the usual way.

What's the difference between calling a table a list of tuples, and calling
a table an array of tuples?

> Note that this definition is sloppy in the usual way where we ignore the
> fact that a candidate key is not so much a property of a single relation
> but rather of the set of all relations that are valid for a certain relvar.

I'm hazy on the distinction between a relation and a relvar.

> > I'm speculating that this definition,  together with BCNF,  would be
> > sufficient to cover introductory DB design, skipping over 1NF, 2NF, and 3NF,
> > and leaving normal forms beyond BCNF for more advanced treatement.
>
> I wouldn't skip 3NF. The difference between BCNF and 3NF is an important
> one and the decision to go beyond 3NF should be an informed one.

Can you say some more about this?  What are the consequences of taking a
step between 3NF and BCNF?

> -- Jan Hidders
Jan Hidders - 06 Jul 2005 08:12 GMT
David Cressey wrote:

>> David Cressey wrote:
>>
[quoted text clipped - 10 lines]
> I guess I'll stick with "Base Normal Form" for now.  Are UNF and NFNF
> defined on tables or on relations?

Hmm, good point, on relations. So they are actually not the same as your
concept.

>>> I haven't defined candidate key, but I would want the definition
>>> to be compatible with the definition of candidate key as used in
[quoted text clipped - 9 lines]
> What's the difference between calling a table a list of tuples, and
> calling a table an array of tuples?

Nothing, really. Although for me an 'array' has some associations with a
particular kind of implementation.

>> Note that this definition is sloppy in the usual way where we
>> ignore the fact that a candidate key is not so much a property of a
>> single relation but rather of the set of all relations that are
>> valid for a certain relvar.
>
> I'm hazy on the distinction between a relation and a relvar.

A relation is a particular set of tuples, say, the contents of the
Employees table at this moment. The relvar would in that case be more
abstract notion of the Employees table. People will often use the term
relation for both things.

>> I wouldn't skip 3NF. The difference between BCNF and 3NF is an
>> important one and the decision to go beyond 3NF should be an
>> informed one.
>
> Can you say some more about this?  What are the consequences of
> taking a step between 3NF and BCNF?

You might no longer be dependency preserving. That means that some of
the constraints can no longer be represented as key constraints and
foreign key constraints.

-- Jan Hidders
Jon Heggland - 06 Jul 2005 09:47 GMT
> What's the difference between calling a table a list of tuples, and calling
> a table an array of tuples?

In some formal contexts, I think a list is either empty, or consisting
of an element and a tail, which is also a list. In contrast, an array is
a sequence of elements that can be referred to by ordinal position. In
this case, there is no significant difference, I think.

> I'm hazy on the distinction between a relation and a relvar.

"Relvar" is short for "relation variable". A relvar can hold a relation
value, just like an integer variable can hold an integer. "Relvar"
corresponds to TABLE in SQL; a relation value is the value (contents) of
a table at some time. Updating a table is really assigning a new
relation value to the relvar (though in many cases, the new value
closely resembles the old).

Most literature use "relation" for both "relation variable" and
"relation value", and depend on context to make the meaning clear.
Alternatively, they dismiss the notion that there is a difference.
Signature

Jon

dawn - 07 Jul 2005 03:24 GMT
> I'd like to suggest a new Normal Form definition, one that I'm calling Base
> Normal Form, for lack of a better term.
[quoted text clipped - 6 lines]
> A table is in base normal form if and only if it has at least one candidate
> key.

I think that what you are describing is a mathematical function
(sometimes referred to as a mapping), perhaps?  (I believe the def of a
function and mathematical def of relation are in the cdt glossary.)
Using these terms can be helpful as there are very clear, generally
agreed upon, definitions for sets, relations, functions in mathematics
from which to draw.  It is necessary to make a distinction, then,
between a mathematical relation and the (myriad of) defs the database
community has for "relation" but since the latter arose from the former
originally, this is not hard for a student to comprehend.

Using the term "function" is also helpful when looking at computers in
general and discussing programming in terms of input-processing-output
(sure, I've heard of objects, but I'll keep on track and avoid straying
into that subject right now).  Input-processing-output -->
keyValue-lookupFunction-relationValue

> Note that the definition is for tables

I'm not sure what your def is for this -- rows and columns where a cell
may include any defined type?

> and not relations.  All relations are
> in Base Normal Form, so the definition is superfluous with regard to
> relations.

If you go back to the mathematical definition of a relation, rather
than one of the variations in the database field, a relation is not
necessarily a function, but a function is a relation (by definition).
Once you require a relation to have a candidate key, you do have a
function -- a good starting point prior to identifying functional
dependencies and discussing normal forms, I would think.

I, too, don't know if this advances any theory as I'm concerned about
practice (and related instruction).  I have found using these
mathematical terms to be helpful due to their precision and simplicity.

<snip>

Cheers!  --dawn
Jan Hidders - 07 Jul 2005 19:09 GMT
>>I'd like to suggest a new Normal Form definition, one that I'm calling Base
>>Normal Form, for lack of a better term.
[quoted text clipped - 9 lines]
> I think that what you are describing is a mathematical function
> (sometimes referred to as a mapping), perhaps?

I think that would be very confusing. First, his table actually has
ordering, functions don't. Second, functions are binary relations, and
even though you could interpret every n-ary relation as a binary
relation, that is different from actually being the same. Finally, if
there are multiple candidate keys then it actually represents several
functions. It's really not a good idea to confuse the concept of
relation and function.

-- Jan Hidders
dawn - 07 Jul 2005 21:30 GMT
> >>I'd like to suggest a new Normal Form definition, one that I'm calling Base
> >>Normal Form, for lack of a better term.
[quoted text clipped - 12 lines]
> I think that would be very confusing. First, his table actually has
> ordering, functions don't.

The elements of mathematical relations are also ordered tuples,
correct?  One problem might be that the def of relation and,
subsequently, function is often written in terms of a binary relation,
but that is because an n-ary relation can be seen as binary with domain
and range where each is n dimensional.

> Second, functions are binary relations,

if I had read one more sentence fragment, I would have seen that is
what you decided.  Functions need not be binary any more than relations
need be -- you can view them as binary if you choose, as there is no
harm in doing so.

> and
> even though you could interpret every n-ary relation as a binary
> relation,

exactly -- there is no more difficultly in considering relations as
n-ary than functions.  Functions are just a special case of relations,
right?  They are relations with a mapping defined from a "domain" to a
"range" -- from input to output.  Very clean.

> that is different from actually being the same.

Correct, but however you want to work with relations - either binary or
n-ary, you can do the same with functions.

> Finally, if
> there are multiple candidate keys then it actually represents several
> functions.

Yes, but that is not a new concept.  We can lookup a person by SSN or
some other ID -- two functions.  Very intuitive.

> It's really not a good idea to confuse the concept of
> relation and function.

I definitely agree!  A function is a specific type of relation where a
relation might not be a function at all.  A function is a relation that
has a mapping from, for example, a candidate key value to a relation
tuple (I wrote "relationValue" in my posting, which I noticed as I sent
it when I intended relationElementValue or tuple, by the way).

It is important not to confuse the two as the one is a subset of the
other.  Similarly, we do not want to confuse a set and a relation.
That is why they each have distinct definitions.  Where is the problem?
Thanks --dawn

> -- Jan Hidders
Jan Hidders - 07 Jul 2005 22:14 GMT
>>It's really not a good idea to confuse the concept of
>>relation and function.
>
> I definitely agree!

And yet that seems to be the consequence of what you are proposing. I
could be wrong but I interpreted your suggestion as that every relation
with a candidate key is a function. Since every relation has at least
one candidate key that means that every relation is a function according
to you. And yet, at the same time, you say:

> A function is a specific type of relation where a
> relation might not be a function at all.

So you seem to be contradicting yourself.

-- Jan Hidders
dawn - 08 Jul 2005 04:35 GMT
> >>It's really not a good idea to confuse the concept of
> >>relation and function.
[quoted text clipped - 4 lines]
> could be wrong but I interpreted your suggestion as that every relation
> with a candidate key is a function.

Yes, that is what I am saying, but I'll try to say it better.

Every database relation can be modeled as a mathematical relation.
Every database relation that has a candidate key can be modeled as a
mathematical function as well. David asked about tables, which imply a
column ordering and (perhaps) are more lax about typing.  A table can
also be modeled with a mathematical relation.  Unlike a database
relation, a mathematical relation does have (column) ordering in (row)
tuples.  A table with a candidate key can be modeled by a function as
well.  A table in "Base Normal Form" is one that can be modeled by a
function.

> Since every relation has at least
> one candidate key

I don't have a handy list of all definitions of "relation" handy, but I
thought that some such definitions did not require candidate keys,
although I suspect most modern ones do. If every definition of a
database relation requires it to have at least one candidate key, then
all database relations, but not all tables (which do not have such a
requirement), and not all mathematical relations can be modeled with
mathematical functions.

> that means that every relation is a function according
> to you. And yet, at the same time, you say:

Every database relation, if defined to require a candidate key, can
then be modeled as a function.  Yes.

> > A function is a specific type of relation where a
> > relation might not be a function at all.

Every mathematical function is necessarily a mathematical relation.

> So you seem to be contradicting yourself.

Did that clarify?  Does it make sense?  Do you accept it?

In spite of the fact there are different definitions for "database
relation" and "mathematical relation" I still think it helps to teach
the basics of data modeling by starting with language,
deriving/creating appropriate predicates and corresponding tables, then
modeling these as functions.  Functions seem easier for people to grasp
than relations, so I start with sets & functions.  I suspect that is
what David is getting at when talking about "tables in base normal
form".  It's just a function, brother.  ;-)

It isn't hard to comprehend the differences between a mathematical
relation and a relation as defined by the relational database
community, although I'll admit it can be confusing.  If I explain data
modeling to somone not predisposed to talk about "relations" then I can
just talk about sets and functions and they get it.  A function takes
in a URL and gives you a web page.  It takes in a state abbreviation
and gives you a state name.  It takes in an SSN and gives you a name.
It can be thought of as a set/object as well as a function/actor.

Then we can add in operators (which are, of course, functions) and
types (sets) and move on to logic (two-valued, please!).

[Then temporal matters and metadata discussions and so on.  After that
we can discuss where the real complexity comes in -- with data stored
in a human language, determined, entered, searched for, and interpreted
by humans.]

Cheers!  --dawn

> -- Jan Hidders
mAsterdam - 08 Jul 2005 09:45 GMT
Nice to hear from you :-)

>>>>It's really not a good idea to confuse the concept of
>>>>relation and function.
[quoted text clipped - 10 lines]
> Every database relation that has a candidate key can be modeled as a
> mathematical function as well.

There is a function for every candidate key. That is the somewhat
confusing difference here, no?

[snip]

> Every database relation, if defined to require a candidate key, can
> then be modeled as a function.  Yes.

So "a function" here should be /functions/.

 [snip]

> It isn't hard to comprehend the differences between a mathematical
> relation and a relation as defined by the relational database
[quoted text clipped - 12 lines]
> in a human language, determined, entered, searched for, and interpreted
> by humans.]
dawn - 08 Jul 2005 17:18 GMT
> Nice to hear from you :-)
>
[quoted text clipped - 15 lines]
> There is a function for every candidate key. That is the somewhat
> confusing difference here, no?

Not to me.  It is easier for me to teach and to view multiple sets of
candidate keys as multiple functions as well.  When you teach someone
how to determine a candidate key, you typically teach it by describing
a function anyway.  You can use language such as primary and alternate
functions if you like.

> [snip]
>
> > Every database relation, if defined to require a candidate key, can
> > then be modeled as a function.  Yes.
>
> So "a function" here should be /functions/.

It can be modeled as many relations as well since a database relation
has unordered columns and a mathematical relation has them ordered.  If
you model with a function (of your choice) you need not additionally
talk about having a candidate key -- your mathematical model
incorporates this database concept.

Sure you can model with a mathematical relation and then talk about
some other concept of a candidate key if you prefer, but I prefer to
have a nice clean mathematical model that is so easy to understand.  I
suspect that most people who have not done any data modeling before
will have an easier time with the concept of a mathematical function
than the concept of a mathematical relations.  Talking about
input-function-output is also very intuitive.  All digital data are
stored in functions if they are to be retrieved again.

Additionally, the concept of a foreign key pops right out as a function
mapping an attribute of one function to a "key" (by whatever name you
like) of another.

All models are flawed, of course, but some are useful (George Box?) and
I have found that modeling data as functions (relations with candidate
keys) as well as modeling processes/methods/functions as functions
aligns very well with how people think.

I don't ever model data in the abstract, but only in the context of one
or more applications.  Rather than a data vs process mindset, I see
these as two sides of the same coin and showing how cleanly data is
modeled with functions help with that wholistic approach.

I doubt I said that very well and anyone reading "wholistic approach"
might jump in with discussions of decoupling (and I have opinions about
that too but I'm headed to the big city for the weekend, so too-da-loo
for now.)  Cheers!
--dawn

>   [snip]
>
[quoted text clipped - 14 lines]
> > in a human language, determined, entered, searched for, and interpreted
> > by humans.]
mAsterdam - 09 Jul 2005 03:11 GMT
>>>>>>It's really not a good idea to confuse the concept of
>>>>>>relation and function.
[quoted text clipped - 19 lines]
> a function anyway.  You can use language such as primary and alternate
> functions if you like.

Hm... I was with you until the last sentence. I never liked
the "primary key" thing. It forces an arbitrary decision at a time when
you really don't know enough yet.

>>[snip]
>>>Every database relation, if defined to require a candidate key, can
[quoted text clipped - 4 lines]
> It can be modeled as many relations as well since a database relation
> has unordered columns and a mathematical relation has them ordered.

Ah the "naming" trick. It takes care of associating values
to the right part of the header - and it invites language.

This is the schism point of database and set theory.

> If you model with a function (of your choice)
> you need not additionally
[quoted text clipped - 7 lines]
> will have an easier time with the concept of a mathematical function
> than the concept of a mathematical relations.

In my experience: yes.

> Talking about
> input-function-output is also very intuitive.  All digital data are
> stored in functions if they are to be retrieved again.

Here we differ. The notion of storing something in a
function seems alien. A function to get something (return it) - ok.
A function to set something - state, side effects come to mind.
A function to store something in. Huh?

> Additionally, the concept of a foreign key pops right out as a function
> mapping an attribute of one function to a "key" (by whatever name you
[quoted text clipped - 9 lines]
> these as two sides of the same coin and showing how cleanly data is
> modeled with functions help with that wholistic approach.

Same here. (Holistic this side of the pond)

[snip]

>>>[Then temporal matters and metadata discussions and so on.  After that
>>>we can discuss where the real complexity comes in -- with data stored
>>>in a human language, determined, entered, searched for, and interpreted
>>>by humans.]
Marshall  Spight - 09 Jul 2005 19:36 GMT
> > It can be modeled as many relations as well since a database relation
> > has unordered columns and a mathematical relation has them ordered.
[quoted text clipped - 3 lines]
>
> This is the schism point of database and set theory.

I don't see that it's much of a schism. One needs to be able
to uniquely identify the attributes of the relation, in either
mathematics or data management. One does so via identifying marks.
These marks can be strings or they can be integer indexes.
No big whoop. Notationally, some advantages and some disadvantages
to either approach.

To be clear: this is a notational issue merely.

Marshall
mAsterdam - 10 Jul 2005 15:18 GMT
Marshall Spight wrote:

>>>It can be modeled as many relations as well since a database relation
>>>has unordered columns and a mathematical relation has them ordered.
[quoted text clipped - 8 lines]
> mathematics or data management. One does so via identifying marks.
> These marks can be strings or they can be integer indexes.

To the DBMS internally the names are just strings.
People tend to associate meaning with the names.
These raise expectations about the content of tables.
Not meeting those triggers changes of the design of the database.

> No big whoop. Notationally, some advantages and some disadvantages
> to either approach.
>
> To be clear: this is a notational issue merely.

Indeed. Notational differences can get quite big though:

roman vs arab numerals
phonographic vs ideographic scripts
dawn - 12 Jul 2005 10:56 GMT
<snip>
> >>There is a function for every candidate key. That is the somewhat
> >>confusing difference here, no?
[quoted text clipped - 8 lines]
> the "primary key" thing. It forces an arbitrary decision at a time when
> you really don't know enough yet.

Most modeling in the early stages results in the modeler identifying a
primary key, whether or not that is an implementation concept in the
target database.  Web pages are all stored with a primary key (the URL)
and all those tabbed paper folders that are still used at (most? all?)
doctor's offices and elsewhere have a primary key.  It might even be
more awkward for people to consider all CKs equal than to think of one
as the primary "lookup" key.  And, yes, I'm "mixing" process and data
again ;-)

> >>[snip]
> >>>Every database relation, if defined to require a candidate key, can
[quoted text clipped - 9 lines]
>
> This is the schism point of database and set theory.

It is easy enough to model a database relation with a set theory
relation, so this is not a problem, but when db relational theorists
get all preachy on this unordered point, it amuses me.

Using language for headers is fine and dandy, but if you have a model
that permits synonyms, first giving each attribute a unique ordinal and
then redefining the ordinals can be useful at times.  Dutch and English
numbers are the same, for example.  The similarity of one relation to
another is evident when naming the attributes with numbers.  Only
naming them with numbers would be poor form, however, and in most
SQL-DBMS tools, it is a bad idea to create your table with attribute
names that are numbers.

But the fact that there is application software in production today
that includes attributes that can optionally be identified in the
software with an ordering is not a bad thing.  Having a logical order
to attributes (such as the order in which the attributes are listed in
a CREATE TABLE statement) must have  some consequences of which I am
not aware given all of the blanket statements about how awful it is.

Perhaps the issue is in implementation?  What happens when we rename
the attribute "3" to "2"?  That is likely where the concerns come in.
Is that just like renaming the attribute from LAST_NAME to lastName or
is the implementation model tied to the ordinals in the logical model?
I work with an old model that does tie the logical model to the
implementation model, which many of us have been taught is a bad thing.
But I can see no problems at all stemming from this.  This ordinal
then becomes the only fixed "name" for an attribute and can have many
other logical names, but you can't just change the ordering without
changing the relation.  And, surprisingly enough, the sky isn't falling
;-)

So, remind me, what precisely is the problem with ordering the
attributes in relations?  It results in ... because ...  ???

> > If you model with a function (of your choice)
> > you need not additionally
[quoted text clipped - 16 lines]
> Here we differ. The notion of storing something in a
> function seems alien.

Storing?  Who said anything about storage?  [as my colleage, mAsterdam,
might tell you ... laughing]

We model data throughout our software applications -- in the UI, in the
db schema, in rules, with all messages.  I don't care about storage,
but I care about the API to whatever services I need to use and the
results they product.  What functions do I need to call, passing in
what objects and getting what results and data in return?  I don't need
to think in terms of creating a table, but in terms of using a service
that makes the system remember information.  What input do I need to
send to what function to get the desired results?

In the case of a lookup function or a remember-this function, it makes
sense to pass in a key that is matched with a value.  Store this under
this handle so I can get it back again.

> A function to get something (return it) - ok.
> A function to set something - state, side effects come to mind.
> A function to store something in. Huh?

Storing & Retrieval oare sets and gets.  The addition of the word "in"
in that last one that isn't relevant.  I don't model what the data are
stored "in".

> > Additionally, the concept of a foreign key pops right out as a function
> > mapping an attribute of one function to a "key" (by whatever name you
[quoted text clipped - 11 lines]
>
> Same here. (Holistic this side of the pond)

Terrific!  I'm forever reading statements about how we must be sure to
model the data separate from how they will be used.

--dawn
Gene Wirchenko - 12 Jul 2005 17:17 GMT
[snip]

>It is easy enough to model a database relation with a set theory
>relation, so this is not a problem, but when db relational theorists
>get all preachy on this unordered point, it amuses me.

    Be careful about that amusement.  Sometimes, it is because the
other people know more about the situation than you do.  Surely, you
have heard the saying "Ignorance is bliss."?

[snip]

>So, remind me, what precisely is the problem with ordering the
>attributes in relations?  It results in ... because ...  ???

    If you say that the ordering is significant, then
         (a,b,c)
and
         (a,c,b)
are different.  If you go purely by the name, they are the same.  (I
have omitted the types for simplicity.  Assume that they are the
same.)

    This does not look like much, but consider a join.  Now, you have
to define which order the columns are in the join.  Make sure you do
it right, or you could end up with A join B and B join A resulting in
different orderings.

[snip]

    This is rather basic stuff, Dawn.  You claim to have studied the
material.  Is your axe not ground enough yet?

Sincerely,

Gene Wirchenko
dawn - 12 Jul 2005 21:41 GMT
> [snip]
>
[quoted text clipped - 5 lines]
> other people know more about the situation than you do.  Surely, you
> have heard the saying "Ignorance is bliss."?

I'll grant that my amusement could be due to ignorance.  Will you grant
that it might not be?  I would suggest my amusement has more to do with
an interest in watching how language is treated.  When a relational
theorist insists that columns be unordered, it is due to using a
different definition of "relation" than what a mathematician uses,
right?  Do you think I am ignorant of the change in the definition of
"relation" between mathematics and relational database theorists,
ignorant of the reason that the change was made, ignorant of just how
important it is to avoid logical ordering of logical columns, or what?
I have learned a lot from this list and don't plan to stop learning, so
please set me straight on this one, Gene.

> [snip]
>
[quoted text clipped - 13 lines]
> it right, or you could end up with A join B and B join A resulting in
> different orderings.

I don't say that ordering is significant AT ALL, but I do say that
using the term "relation" and then insisting that tuples be unordered
is redefining the term, thereby muddying terminology and notation
unnecessarily, in my opinion.

> [snip]
>
>      This is rather basic stuff, Dawn.  You claim to have studied the
> material.  Is your axe not ground enough yet?

You aren't reading me quite right yet, brother.  I have no axe to
grind; I have problems to solve.  One of them is how to build
bigger-bang-for-the-buck software.  I have several interests in this
regard.  One area is the way that data are modeled from user interface
to persistence interface and everything else (e.g. validation,
constraints, business rules).

There are reasons for decisions that have been made in defining
relational theory and reasons why so many people use SQL as the api to
persisted data. But are there times when using the same model of data
throughout an application might be a good idea, rather than using a
different API and different models for the UI and the database?  If so,
what are our options for how to model these data and what are the pros
and cons of each?  These questions and more...

Sifting out the best practices (such as discussions of "functional
dependencies") from the religion in relational theory (such as the most
common defs of 1NF taught to our college students) is one approach I'm
taking.  Every time there is a term with multiple definitions,
particularly if the defs are mutually exclusive, I might initially be
frustrated, but eventually I settle on "amused," perhaps as a coping
strategy.

But I will admit that I have more questions than answers.
smiles.  --dawn

> Sincerely,
>
> Gene Wirchenko
Gene Wirchenko - 13 Jul 2005 00:45 GMT
[snip]

>>      If you say that the ordering is significant, then
>>           (a,b,c)
[quoted text clipped - 11 lines]
>I don't say that ordering is significant AT ALL, but I do say that
>using the term "relation" and then insisting that tuples be unordered
                                                  ^^^^^^^^^^^^^^^^^^^
    No, not insisting that tuples be unordered, but that their order
is irrelevant to the RM and that you can not count on any specific
order.

>is redefining the term, thereby muddying terminology and notation
>unnecessarily, in my opinion.

    Different contexts, different definitions.

    I think there may a question of which tuples are being referred
to.

 1) There are the tuples that define the relation.  These are
unordered for the reason I give above.

 2) The tuples of data are also unordered.  If you request data, you
will not get it in any particular order unless you specify the order.
(If it does not matter what order the data comes to you, then why have
an order, even an implicit one?  If it does matter, then specify the
order you want.)

    I have no problem with either lack of ordering.

[snip]

>taking.  Every time there is a term with multiple definitions,
>particularly if the defs are mutually exclusive, I might initially be
>frustrated, but eventually I settle on "amused," perhaps as a coping
>strategy.

    I get amused from time to time, too, but I watch it closely.  Too
much amusement may mean that I am missing the point of something.

>But I will admit that I have more questions than answers.

    The knowledge I prize the most is the knowledge of what I know
and what I do not know.

Sincerely,

Gene Wirchenko
dawn - 14 Jul 2005 01:10 GMT
<snip>
> >I don't say that ordering is significant AT ALL, but I do say that
> >using the term "relation" and then insisting that tuples be unordered
>                                                    ^^^^^^^^^^^^^^^^^^^
>      No, not insisting that tuples be unordered, but that their order
> is irrelevant to the RM and that you can not count on any specific
> order.

Completely agree.  Tuples are ordered (by def typically, although one
could define the term otherwise) and their order is irrelevant to the
RM.

> >is redefining the term, thereby muddying terminology and notation
> >unnecessarily, in my opinion.
[quoted text clipped - 3 lines]
>      I think there may a question of which tuples are being referred
> to.

any of 'em, but I was referring to "column ordering" and not row
ordering.  The RM operators need not act on the column ordering info,
but it might be useful to define and order for printing out a data
catalog or whatever.

[snip]

>      The knowledge I prize the most is the knowledge of what I know
> and what I do not know.

and someone once said that the most dangerous knowledge is that of
which we are certain, but wrong.  Cheers!  --dawn

> Sincerely,
>
> Gene Wirchenko
Eric Junkermann - 12 Jul 2005 21:02 GMT
[snip]
>> Same here. (Holistic this side of the pond)
>
>Terrific!  I'm forever reading statements about how we must be sure to
>model the data separate from how they will be used.
>
>--dawn

You are all programmers at heart, you are tricked by your need to
process data into believing that it is yours. What you are writing today
may be the only application now, but there _will_ be another, and
another... If you don't consider the data in its own right, you are
guaranteeing high cost (or even impossibility) for all future
applications that will want to use this data.

To put it another way, data is the corporate asset, corporate survival
means being able to use assets in new ways when required.

How much expensive ETL would we now need if data design had always been
given a high priority?
Signature

Eric Junkermann

dawn - 12 Jul 2005 22:01 GMT
> [snip]
>  >> Same here. (Holistic this side of the pond)
[quoted text clipped - 5 lines]
>
> You are all programmers at heart,

I'll accept that.

> you are tricked by your need to
> process data into believing that it is yours.

but not that.  What I will accept is that changes will be required and
I need to mitigate those changes I think are most likely, using a risk
assessment strategy.  That includes changes that affect all aspects of
this application including the UI, the business rules, etc. and changes
that include other "services" needing to work with the same data or
share any other aspects of an application.  If one application needs to
"remember" (in human terms) anything and another application arises
that needs access to that same "memory" (again, human terms), whether
what is being remembered is called "code" or "data" or WHATEVER, then
our design needs to accomodate this sharing of "memory".

> What you are writing today
> may be the only application now, but there _will_ be another,

and you can bet that I know that -- this is a huge reason I'm doing the
poking around.  I don't buy everything the eXtreme A/D folks claim in
designing and implementing only for what is in front of me and not
looking past my nose, but we just might be going too far when we try to
think of data as if it needs no context.

> and
> another... If you don't consider the data in its own right, you are
> guaranteeing high cost (or even impossibility) for all future
> applications that will want to use this data.

Nope, I don't buy that.  If we don't have flexible software development
languages, models, tools, etc, then we are guaranteeing high costs.
There are tradeoffs to constraining the length of a field, for example,
and allowing any length.  The latter is more flexible, which might
yield better quality of data over time as requirements change if the
user doesn't feel forced to truncate information to fit their existing
software.  The former forces more conformity, which might provide
better quality of data by a tighter definition and enforcement of
todays constraints, with the (often false) assumption that if
requirements change then our implementation will change concurrently.
There isn't one right way, but some choices are better than others in a
given situation and some are commonly better choices than others.

> To put it another way, data is the corporate asset,

agreed

> corporate survival
> means being able to use assets in new ways when required.

yes, indeed, but be sure you are looking at the whole picture and not
just focusing on one at the risk to others.

> How much expensive ETL would we now need if data design had always been
> given a high priority?

Do you really think that most ETL is done because data are NOT modeled
relationally?  I suspect (but do not know) that more is done because
they are (and, of course, also because of a need to fix data at a point
in time in systems that do not have a time dimension built in).  It
might even be the case that those shops that do the least ETL are those
who are not using SQL-DBMS's. I have some anecdotal reasons to believe
this might be the case.  
Cheers! --dawn
Eric Junkermann - 13 Jul 2005 21:59 GMT
>> [snip]
>>  >> Same here. (Holistic this side of the pond)
[quoted text clipped - 12 lines]
>
>but not that.  What I will accept is that changes will be required
[snip]
>our design needs to accomodate this sharing of "memory".

and if the shared "memory" is the wrong "shape", so that our path from
"given" to "required" for the new application is too complicated, we
will pay for it somehow.  What I am saying is that if you use data in a
certain way you will tend to bundle it in that way, even if you are
trying to be flexible, so the best bundle is an open bundle, which will
have different ways in.

[snip]
>There are tradeoffs to constraining the length of a field, for example,
>and allowing any length.

but this is a physical issue, an implementation issue, not a design
issue, the logical data design does not care.

[snip]

>Do you really think that most ETL is done because data are NOT modeled
>relationally?

Did I mention relational?

ETL happens because data is the wrong "shape", the type of system it is
stored in is not the main problem.  Given similar needs, the ones that
do the least ETL are those with the most flexible data design.

Maybe I shouldn't have dragged ETL in, but my main point remains that
data independence is a good thing - which is where Codd came in, whether
you like his answer or not.

Regards,

Eric
Signature

Eric Junkermann

dawn - 14 Jul 2005 18:17 GMT
> >> [snip]
> >>  >> Same here. (Holistic this side of the pond)
[quoted text clipped - 18 lines]
> "given" to "required" for the new application is too complicated, we
> will pay for it somehow.

I'm with you there.

> What I am saying is that if you use data in a
> certain way you will tend to bundle it in that way, even if you are
> trying to be flexible,

Yes, I completely agree.

> so the best bundle is an open bundle, which will
> have different ways in.

That is a bit vague.  It is best to weigh the options and risks for
various approaches.

> [snip]
> >There are tradeoffs to constraining the length of a field, for example,
> >and allowing any length.
>
> but this is a physical issue,

I don't think of the conceptual data model as having such constraints
in it, but I do think of such constraints being in the logical model if
such a constraint is required.

> an implementation issue, not a design
> issue, the logical data design does not care.

Hmm. If our current business logic is that our product codes are a
maximum of eight characters, then I would think that would be a
constraint in the logical design, no?

> [snip]
>
> >Do you really think that most ETL is done because data are NOT modeled
> >relationally?
>
> Did I mention relational?

I guess not, but it is only relational theorists that I have heard
suggest that data be considered outside of any context (I might not be
saying that right, so perhaps you can restate your original point in
your own words?)

> ETL happens because data is the wrong "shape", the type of system it is
> stored in is not the main problem.

I agree with you on the "shape" issue.  The systems I have seen that do
not end up forcing customers into data marts to the hilt include more
operations than just relational ones & permit more complex data types
than simply relations, thereby providing what I would see as more
"shapes" for the data.  That was my point.

> Given similar needs, the ones that
> do the least ETL are those with the most flexible data design.

Yes, and those I have seen get some of that that flexibility from not
being tied to only the relational model.

> Maybe I shouldn't have dragged ETL in, but my main point remains that
> data independence is a good thing -

There simply is no such thing as useful words (data & metadata) without
context.

> which is where Codd came in, whether
> you like his answer or not.

I could be wrong, but I don't think that was the point of his efforts.
Codd modeled sets of propositions as relations and worked with set
operators and such, thereby introducing some much needed formalism.
Don't get me wrong, that was definitely a good thing.

My point is that if you have a business domain that you are modeling,
you need to model the data from front to back in this system in line
with the known requirements.  You should identify risks by
understanding what you are modeling.

So, for example, if you model a toenail as if it were a property of a
person, it is a good idea to be aware whether that toenail can still be
a toenail without a person anywhere around.  It is the risk assessment
that should determine whether or not it is wise to give individual toe
numbers their own identity and prepare a relationship between people
and toenails rather than making toenail an attribute of person.  There
is obviously a cost in doing so and all involved parties might consider
this toenail situation and deem it to be so unlikely that the owner of
this application asset would ever care about the toenails after they
have been clipped or torn off a person that there is no reason to model
them as anything other than a property of the person.

Does that explain my position in a way you agree with or do you still
think you can do some "data in a vacuum" (my words, not yours, so feel
free to correct my understanding of your position) type of modeling?

cheers!  --dawn

> Regards,
>
> Eric
Jan Hidders - 08 Jul 2005 23:42 GMT
>>>>It's really not a good idea to confuse the concept of
>>>>relation and function.
[quoted text clipped - 6 lines]
>
> Yes, that is what I am saying, but I'll try to say it better.

Ok. That might well solve it because basically my point was that your
terminology was too sloppy for my taste.

> Every database relation can be modeled as a mathematical relation.

Agreed. From a mathematical point of view it doesn't make much sense to
distinguish the two.

> Every database relation that has a candidate key can be modeled as a
> mathematical function as well.

Sure. But let me emphasize once more that *every* database relation has
at least one candidate key. This is not up for discussion, it's a simple
mathematical fact. I suspect that the case that you are overlooking is
the one where that candidate key happens to be the set of all columns,
in which case the function you would associate with it is probably the
function that maps each tuple to itself.

By the way, which function do you chose? If I have R(A,B,C,D) with
candidate key AB then do we take function AB->ABCD or AB->CD? You seem
to prefer the first. Do you have a specific reason for that?

>>Since every relation has at least
>>one candidate key
>
> I don't have a handy list of all definitions of "relation" handy, but I
> thought that some such definitions did not require candidate keys,
> although I suspect most modern ones do.

It doesn't have to be required, it usually follows from the fact that
the relation is a set.

> Did that clarify?  Does it make sense?  Do you accept it?

I have no problem with statements like "I can model every relation with
a candidate key CK as a function that maps a value for the CK to a tuple
of that relation". But for me that is very different from saying that
the two concepts are the same. Such a formulation also does more justice
to the fact that there might in fact be several candidate keys which all
define different functions.

> In spite of the fact there are different definitions for "database
> relation" and "mathematical relation" I still think it helps to teach
> the basics of data modeling by starting with language,
> deriving/creating appropriate predicates and corresponding tables, then
> modeling these as functions.  Functions seem easier for people to grasp
> than relations, so I start with sets & functions.

I've heard similar arguments before from people who think that when you
explain normalization you should first explain things under the
assumption that there is only one candidate key. That one-key assumption
also seems to be hovering in the background of your terminology. In my
experience there is always a backlash once you get to the 3NF and BCNF
because there you have to deal with more than one candidate key. That
means that the student then has to make two big conceptual jumps at
once: understanding >1 cand. keys, and BCNF.

Note by the way that the link between functional dependencies and
interpreting relations as functions is very close. If the relation
R(A,B,C,D,E) has the dependencies AB->C and BD->E and ABD->C the I
wouldn't say it represents a function, I would say it represents three
functions.

-- Jan Hidders
Gene Wirchenko - 11 Jul 2005 17:55 GMT
[snip]

>Note by the way that the link between functional dependencies and
>interpreting relations as functions is very close. If the relation
>R(A,B,C,D,E) has the dependencies AB->C and BD->E and ABD->C the I
>wouldn't say it represents a function, I would say it represents three
>functions.

    I struggle to follow these arguments.  In the above, I think that
there is an error or something questionable.  If AB->C then
ABanything->C (as in ABD->C), but what is the point?

Sincerely,

Gene Wirchenko
Jan Hidders - 12 Jul 2005 22:28 GMT
> [snip]
>
[quoted text clipped - 7 lines]
> there is an error or something questionable.  If AB->C then
> ABanything->C (as in ABD->C),

Yes, well observed, the FD ABC->C is redundant.

> but what is the point?

A relation can sometimes represent several functions, not just one.

-- Jan Hidders
Jan Hidders - 12 Jul 2005 23:44 GMT
>> [snip]
>>
[quoted text clipped - 9 lines]
>
> Yes, well observed, the FD ABC->C is redundant.

!@#$ What I wanted to say was: the FD ABD->C is redundant.

-- JH
dawn - 12 Jul 2005 12:09 GMT
> >>>>It's really not a good idea to confuse the concept of
> >>>>relation and function.
[quoted text clipped - 9 lines]
> Ok. That might well solve it because basically my point was that your
> terminology was too sloppy for my taste.

I'll accept that.  When applying a mathematical model within a field
that has taken mathematical terms and given them new definitions is
problematic.  A candidate key of a relation could be modeled as the
domain of a function -- gotta love language, eh?

> > Every database relation can be modeled as a mathematical relation.
>
> Agreed. From a mathematical point of view it doesn't make much sense to
> distinguish the two.

I suspect that what you mean here is that when you are doing any
mathematics, such as set operations, with relations, you are using the
mathematical definition of the relation.  Could we possibly rephrase
your statement as "when working with the mathematical model of the
relation, we work only with the mathematical definition of a relation"?
We can ditch the database relation definition (we need not care about
meaningful names for attributes and ensuring no ordering of such, for
example) when doing set operations. So, when do we need to use the
database definitions for relation?  Could we use terms like "table",
"folder" or even "file" (gasp) and stop using muddying things by using
definitions of "relation" that both add to and subtract from the
mathematical relation definition?

> > Every database relation that has a candidate key can be modeled as a
> > mathematical function as well.
>
> Sure. But let me emphasize once more that *every* database relation has
> at least one candidate key. This is not up for discussion, it's a simple
> mathematical fact.

I'll buy that it is a fact by definition.  That is yet another way that
the term "relation" has been butchered.  If every database relation has
at least one candidate key, then if we choose to model that relation as
a mathematical relation, then our mathematical relation has candidate
keys.

> I suspect that the case that you are overlooking is
> the one where that candidate key happens to be the set of all columns,
> in which case the function you would associate with it is probably the
> function that maps each tuple to itself.

I wasn't overlooking that case (since I'm working from an existing
model) but there are two different functions that are used with it as
in these examples:

Person("12345") = ("John", "Smith", "M")
and
Person("12345") = ("12345", "John", "Smith", "M")

So in the case of a candidate key consisting of all attributes, both
functions

f(CK) = CK
g(CK) = null set

are both useful.  Sloppy notation (where is the tuple or set notation?)
is on purpose

> By the way, which function do you chose? If I have R(A,B,C,D) with
> candidate key AB then do we take function AB->ABCD or AB->CD? You seem
> to prefer the first. Do you have a specific reason for that?

I guess I just answered that.  Both are useful.  I prefer AB->ABCD
because the entire tuple, which could have candidate keys other than
AB, is together, thereby getting the benefit of the function model and
the relation.

> >>Since every relation has at least
> >>one candidate key
[quoted text clipped - 5 lines]
> It doesn't have to be required, it usually follows from the fact that
> the relation is a set.

You lost me here.  How do you define a candidate key in mathematics so
that every set must have one?

> > Did that clarify?  Does it make sense?  Do you accept it?
>
> I have no problem with statements like "I can model every relation with
> a candidate key CK as a function that maps a value for the CK to a tuple
> of that relation".

And that is what I'm saying with the additional statement that I find
it useful to do so.

> But for me that is very different from saying that
> the two concepts are the same.

I'm not sure if/where I might have said that.  My intent was to answer
the question of what else people might call this idea of "base normal
form" when teaching these concepts once you have a table and at least
one candidate key.  I use the term "function" and show how you have a
function mapping key values to tuple values.  This makes the concepts
very simple, perhaps too simple for some.

> Such a formulation also does more justice
> to the fact that there might in fact be several candidate keys which all
> define different functions.

Absolutely. It makes for a good way to introduce multiple candidate
keys into the discussion -- multiple mappings/functions.

> > In spite of the fact there are different definitions for "database
> > relation" and "mathematical relation" I still think it helps to teach
[quoted text clipped - 6 lines]
> explain normalization you should first explain things under the
> assumption that there is only one candidate key.

I don't think such an assumption is important, but I can see starting
with a single key, one composed of a single attribute, to introduce the
topic of candidate keys, then lead up to more than one.  Functions are
an easy way to illustrate that, going from one simple key to multiple
composite candidate keys.

> That one-key assumption
> also seems to be hovering in the background of your terminology.

Yes, in that each function only models one of the candidate keys.  But,
a candidate key is then incorporated into that function and
corresponding notation.  You can see that you have a key when you are
modeling your db relation as a function.  The relation can be modeled
as multiple functions, but some are more useful than others in any
given situation.

> In my
> experience there is always a backlash once you get to the 3NF and BCNF
> because there you have to deal with more than one candidate key. That
> means that the student then has to make two big conceptual jumps at
> once: understanding >1 cand. keys, and BCNF.

The non-SQL-RDBMS implementation that I work with most often does fix a
single candidate key as the "primary key" but I think of this approach
as being implementation-independent, so I'll reflect on that again.

> Note by the way that the link between functional dependencies and
> interpreting relations as functions is very close. If the relation
> R(A,B,C,D,E) has the dependencies AB->C and BD->E and ABD->C the I
> wouldn't say it represents a function, I would say it represents three
> functions.

YES, YES, YES -- SO WOULD I!!!  It's all about functions and it's all
about data (so can we drag that "relations" word to the trash when
introducing these concepts and make it simple?)

Cheers!  --dawn

> -- Jan Hidders
Marshall  Spight - 12 Jul 2005 16:19 GMT
> [a lot of stuff]

Dawn,

Uh, wow.

I don't *want* to ditch the word relation; I don't have another
word that comes close. "Function" doesn't cut it. I also
don't see any particular conflict or even a difference between
mathematics and computer science; I consider CS to be a branch
of math, with some particular areas of emphasis, such as
how much work calculating a particular function is.

And what's wrong with "a subset of the product of sets?" That's
not all that complicated. It's no more complicated than "a mapping
from one set to another."

> > It doesn't have to be required, it usually follows from the fact that
> > the relation is a set.
>
> You lost me here.  How do you define a candidate key in mathematics so
> that every set must have one?

A "set" is a collection of *unique* members. Because of that word
"unique" there has to be a candidate key.

Marshall
dawn - 12 Jul 2005 21:06 GMT
> > [a lot of stuff]
>
> Dawn,
>
> Uh, wow.

Glad you enjoyed? :-)

> I don't *want* to ditch the word relation; I don't have another
> word that comes close. "Function" doesn't cut it.

Because ... ?  Granted, it is but one way to model data. I don't insist
on a single way of viewing the subject.  However, I can discuss the
topic of data modeling quite thouroughly without ever using the term
"relation" but not without using the term "function".

> I also
> don't see any particular conflict or even a difference between
> mathematics and computer science; I consider CS to be a branch
> of math, with some particular areas of emphasis, such as
> how much work calculating a particular function is.

So one might hope that CS would not take a solid, generally agreed
upon, mathematical term and redefine it.  But that does seem to be what
has happened, unfortunately.

> And what's wrong with "a subset of the product of sets?" That's
> not all that complicated.

I like it.  I haven't seen a product of sets that has unordered
"columns" however.

> It's no more complicated than "a mapping
> from one set to another."

Functions are a subset of relations that is a little easier to teach,
however.  In kindergarten the students match an item on the left hand
side of the page to exactly one on the right before any exercises that
permit the more generalized form of a relation, allowing an item on the
left to have a line to more than one item.  So, I would argue that
there is some slight advantage to introducing the concept of a function
and then generalizing instead of starting with relations and then
narrowing it down.

> > > It doesn't have to be required, it usually follows from the fact that
> > > the relation is a set.
[quoted text clipped - 4 lines]
> A "set" is a collection of *unique* members. Because of that word
> "unique" there has to be a candidate key.

OK, I'm with you and willing to hit my forehead and say "duh".  Saying
that because something is a set it must be a function is not accurate,
but, yes, there is a function that can be associated with any set.  If
no subset of the element/tuple variable is a candidate key, then the
entire element would be.  Sorry I missed the point.  --dawn
Marshall  Spight - 13 Jul 2005 04:29 GMT
> > > [a lot of stuff]
> >
[quoted text clipped - 3 lines]
>
> Glad you enjoyed? :-)

let's say I was engrossed.

> > I don't *want* to ditch the word relation; I don't have another
> > word that comes close. "Function" doesn't cut it.
>
> Because ... ?

For one thing, in my field, the term 'function' has a strong
connotation of being intentional, and relation has a strong
connotation of being extentional. enough so that if I mean
the other kind of each I explicitly specify.

> > I also
> > don't see any particular conflict or even a difference between
[quoted text clipped - 5 lines]
> upon, mathematical term and redefine it.  But that does seem to be what
> has happened, unfortunately.

It doesn't look much that way to me. And if it had, what would be
the problem? words change meaning as they move from one field to
another all the time.

> > And what's wrong with "a subset of the product of sets?" That's
> > not all that complicated.
>
> I like it.  I haven't seen a product of sets that has unordered
> "columns" however.

the product operation on integers and on relations is commutative,
so I don't see any basis for describing the operands of either
as ordered.

> > It's no more complicated than "a mapping
> > from one set to another."
[quoted text clipped - 7 lines]
> and then generalizing instead of starting with relations and then
> narrowing it down.

I respect that you have pedagogical issues to deal with but I don't
see that they are relevant here.

Marshall
Drago Ganic - 17 Jul 2005 21:25 GMT
Dawn,
functions and relations have in CS and mathematics two very important but
different meanings:
 function ==> computation, action/transformation
 relation ==> storage, state

The most fundamental objects in mathematics are functions and sets (=>
relations).

Functions cannot be UPDATEed / DELETEed  or MODIFYed ... those are done on
data which are stored somewhere. Functions just SELECTs data i.e.
gives/calculates value(s) FROM somewhere.

Where is the confusion? This newsgroup is about storage, hence sets. So sets
(incl. relations) are a good staring point for teaching databases (also in
kindergarten :-)

Greeting from Croatia,
Drago Ganic

>> > [a lot of stuff]
>>
[quoted text clipped - 54 lines]
> no subset of the element/tuple variable is a candidate key, then the
> entire element would be.  Sorry I missed the point.  --dawn
dawn - 18 Jul 2005 05:49 GMT
> Dawn,
> functions and relations have in CS and mathematics two very important but
> different meanings:
>   function ==> computation, action/transformation
>   relation ==> storage, state

I am quite sure that any mathematical definition of a function will
ensure that  the function is a relation.  Please look up the
definitions of these mathematical terms and let me know if that is not
what you find.

> The most fundamental objects in mathematics are functions and sets (=>
> relations).

I just deleted my full response to that and I'll only respond to the
last part of your statement.  A relation is a set (and a set does not
imply a relation, if that is what your notation was to indicate).  A
function is a relation.  Therefore a function is a set.

A function is a set that can be represented by showing that for each
element in the domain of the function (you could think of the domain of
a function to be the set of all values from which a key -- CK -- might
come) you can identify exactly one element in the range of the function
(a tuple, for example).  We can say that a function PERSON maps the
identifier "12345" to the tuple ("12345", "Rena", "Bakker"), for
example, where that tuple is an element of the relation that is more
specifically a function, and more generally a set.

> Functions cannot be UPDATEed / DELETEed  or MODIFYed ... those are done on
> data which are stored somewhere.

Of course they can.  You are relying too much on how you perceive a
function and not on what it is defined to be.

> Functions just SELECTs data i.e.
> gives/calculates value(s) FROM somewhere.

Are you saying that for any value from the set of possible inputs to
the function, there is a single output?  Now look at the set of tuples
with all input and corresponding output values.  That would be a
relation, right?

> Where is the confusion? This newsgroup is about storage,

there are more than a few folks here who would differ with you on this,
but I'll go along with it -- storage by whatever name

> hence sets.

Storage, hence sets?  You can't store a bag/multiset?

> So sets
> (incl. relations) are a good staring point for teaching databases (also in
> kindergarten :-)

I'm good with sets.  Sets & the more specific flavor of sets that are
functions are actually both good for kindergarten.  I've even seen
relations that are not sets there.  When a student graduates from high
school, they typically have heard definitions of two of those three
terms -- the two I like to start with.

> Greeting from Croatia,

and back to you from the middle of corn fields in Iowa USA.  Cheers!
--dawn

> Drago Ganic
>
[quoted text clipped - 56 lines]
> > no subset of the element/tuple variable is a candidate key, then the
> > entire element would be.  Sorry I missed the point.  --dawn
Jan Hidders - 12 Jul 2005 23:17 GMT
>>>>>>It's really not a good idea to confuse the concept of
>>>>>>relation and function.
[quoted text clipped - 13 lines]
> that has taken mathematical terms and given them new definitions is
> problematic.

Oh, come on, Dawn. Really! It is quite clear that there is a close
relationship between a relation as defined in the relational model and
the usual mathematical notion of relation. Just take a look in the
"Foundations of databases" book by Abiteboul, Hull and Vianu. (That's a
graduate text I usually recommend to PhD students.) They happily switch
between the two definitions and use whichever one is most practical for
the thing they want to present in the full knowledge they could use
either of them.

> A candidate key of a relation could be modeled as the
> domain of a function -- gotta love language, eh?

That's still way too sloppy and I think you can do better than that.

>>>Every database relation that has a candidate key can be modeled as a
>>>mathematical function as well.
[quoted text clipped - 8 lines]
> a mathematical relation, then our mathematical relation has candidate
> keys.

Mathematical relations also have always at least one candidate key, but
here you represent them as a set of numbers instead of a set of column
names. For every mathematical relation it is true that there is a set of
 positions such that no two distince tupels in the relation have the
same values in these positions.

>>>Did that clarify?  Does it make sense?  Do you accept it?
>>
[quoted text clipped - 4 lines]
> And that is what I'm saying with the additional statement that I find
> it useful to do so.

I also have no problem with the additional statement. :-)

>>But for me that is very different from saying that
>>the two concepts are the same.
[quoted text clipped - 5 lines]
> function mapping key values to tuple values.  This makes the concepts
> very simple, perhaps too simple for some.

How about "a relation is a set, so every tuple in it occurs at most
once". Is that too difficult for you? :-)

>>Note by the way that the link between functional dependencies and
>>interpreting relations as functions is very close. If the relation
[quoted text clipped - 5 lines]
> about data (so can we drag that "relations" word to the trash when
> introducing these concepts and make it simple?)

I'm sorry? Explaining the relational model gets simpeler if you don't
use or define the word "relation"?

-- Jan Hidders
dawn - 13 Jul 2005 02:56 GMT
<snip>

> > I'll accept that.  When applying a mathematical model within a field
> > that has taken mathematical terms and given them new definitions is
[quoted text clipped - 8 lines]
> the thing they want to present in the full knowledge they could use
> either of them.

Maybe the problem here is that I was accepting Date's def from his
latest edition of an intro text.  It really seems to be quite far
removed from a clean, clear def of "relation" and I was guessing that
there are many undergrads learning that or similar definitions.

> > A candidate key of a relation could be modeled as the
> > domain of a function -- gotta love language, eh?
>
> That's still way too sloppy and I think you can do better than that.

That was purposely sloppy -- many words with multiple meanings in
multiple contexts.

> >>I have no problem with statements like "I can model every relation with
> >>a candidate key CK as a function that maps a value for the CK to a tuple
[quoted text clipped - 4 lines]
>
> I also have no problem with the additional statement. :-)

Yes, I gathered that.  I suspect I'm working in a much simpler context
than you are, however.  For example, I might be explaining to a web
page developer who has never had a database behind their pages before,
how to model data. I would not bring in the term "relation" very
quickly, where I would use the term "function" right away.  Look how
many people, even software development professionals, explain relations
by saying something about how there are multiple tables and they are
related to each other.  The term "relation" really isn't all that
helpful when getting practitioners up and going, in my opinion.

> How about "a relation is a set, so every tuple in it occurs at most
> once". Is that too difficult for you? :-)

You know me, I'm pretty dense ;-)

> I'm sorry? Explaining the relational model gets simpeler if you don't
> use or define the word "relation"?

Remove that second question mark and I do believe we are getting
somewhere.
smiles.  --dawn
Jan Hidders - 13 Jul 2005 22:24 GMT
> <snip>
>
[quoted text clipped - 15 lines]
> removed from a clean, clear def of "relation" and I was guessing that
> there are many undergrads learning that or similar definitions.

They probably do, yes, although AFAIK the sales of his intro books are
dropping. I think I already discussed  in the past that his definitions
can be somewhat simplified, so I won't comment on that here.

>>>A candidate key of a relation could be modeled as the
>>>domain of a function -- gotta love language, eh?
[quoted text clipped - 3 lines]
> That was purposely sloppy -- many words with multiple meanings in
> multiple contexts.

Ah, my apologies, I think I misunderstood what you wanted to say with
that sentence.

> [...]  I suspect I'm working in a much simpler context
> than you are, however.  For example, I might be explaining to a web
> page developer who has never had a database behind their pages before,
> how to model data. I would not bring in the term "relation" very
> quickly, where I would use the term "function" right away.

Hmm. I think I understand where you are coming from, but this has
probably more to do with you and the way you think about data modelling
than with some inherent difficulty in understanding of the relational
model. I have taught at university level and sub-university level, also
to business students and chemistry students and I would *never* explain
things in terms of functions. The first time the term would show up
would probably at the point where I would discuss normalization.

One of the dangers with viewing relations as functions is that it also
tends to promote a view of the database where its only purpose is to
retrieve data given a certain key. That much too narrow view is exactly
the reason why so many web-applications have incredible crappy
database-programming behind it, where they forget you can also have
other indexes and use the table "backwards" or where joins are not done
in SQL but in PHP. If I would teach web-developers I would see it as one
of my biggest tasks to prevent such misuse of the DBMS.

So I'm afraid I'm still very very wary of your terminology.

-- Jan Hidders
dawn - 14 Jul 2005 05:20 GMT
> > <snip>
> >>>A candidate key of a relation could be modeled as the
[quoted text clipped - 16 lines]
> Hmm. I think I understand where you are coming from, but this has
> probably more to do with you

dang

> and the way you think about data modelling

like a programmer, right?

> than with some inherent difficulty in understanding of the relational
> model.

While the average comp sci major can figure out relations from the way
it is typically explained, I don't like the way we have carved out the
dbms from the rest of the application, modeling stored, persisted,
remembered data; validating such data; naming such data; etc decidedly
differently than other data that will hang around for less time.  The
current language does nothing to promote a holistic approach to
software development unless you use mountain man's approach of hauling
everything into the dbms. I think that is the next best thing to
hauling it all out of "there" (the typical sql-dbms).  (The last
statement was for your amusement only, mostly, sortof).

> I have taught at university level and sub-university level, also
> to business students and chemistry students and I would *never* explain
> things in terms of functions. The first time the term would show up
> would probably at the point where I would discuss normalization.

I'm sure.  So we disagree.  You are much more knowledgeable about the
theory than I and almost all of my concerns are those of an "old"
(seasoned?) practitioner.  I've watched the software development
industry devolve in some areas where I'd like to see us take another
look at the partitioning of the discipline and our software products.
When I led back to back projects with a team doing work in a
"pre-relational" database and then in a SQL-DBMS, a light went on, even
if the words, theory, and experiments to back it up aren't yet there --
a "Blink" moment, if you have read that book.

Trimming down the terms so we can see where we have like items and
where we don't by looking at the basics -- input to functions,
processing by functions, output from functions -- is one small step in
trying to show that databases need not seem altogether different than
the rest of the work in developing software.  It's all about data and
it's all about functions.  Input-processing-output.  End of mystery.

> One of the dangers with viewing relations as functions is that it also
> tends to promote a view of the database where its only purpose is to
> retrieve data given a certain key.

Ok, so you are, indeed, tapping into my brain (pretty scary).  One of
the dangers of declaring "stored" database functions as "relations" is
that they seem so distant, as if we cannot access them.  Set operators
are fine and dandy, but people understand single transactions handily.
They pay at the grocery store, get money from the bank, and type their
name into ba-zillion web pages that require it.  Start with individual
"records" (as in "the doctors office keeps a paper copy of my ct scan
record") and show that you have an API for it, then go to sets.  How do
we collect data?  One record at a time.  (I seem to be in a mood as if
trying to entice Fabian to quote me, eh?)

> That much too narrow view

it is equally as narrow as describing data only in terms of sets and
set operators. I would move to aggregate (by some defs) data soon
enough and introduce the functions that can be applied to these
"stored" functions.

> is exactly
> the reason why so many web-applications have incredible crappy
> database-programming behind it,

NOW we are addressing a similar issue, but with a somewhat different
diagnosis.   A lot of the LAMP, uh, crap, that is going into production
is written by people who have learned what a relation is and how to put
it in 3NF, without a clue how to develop quality, maintainable,
flexible, scalable, etc, software.  Programming languages are taught in
one undergrad course and data modeling in another, with different
words, concepts, pictures, tools, naming standards, ...

> where they forget you can also have
> other indexes and use the table "backwards" or where joins are not done
> in SQL but in PHP.

MySQL is setting us back more than a little, methinks.  But, yes, all
other functions need to be brought into the discussion, but I like
starting with put and get -- lookUp.

I'm starting to dabble with AJAX (like thousands of others, but I am
the first girl on my block to give it a spin) and I can cope with
XHTML, CSS, and even JavaScript and the dog-ugly way you have to
instantiate an http request object, but every time I think I'll try to
figure out more than the basics of PHP, I gag.  Now put that together
with a striped down SQL-RDBMS and a person armed with the definition of
a relation and rules for normalization and I am just really starting to
sound like an old person, sheesh!

> If I would teach web-developers I would see it as one
> of my biggest tasks to prevent such misuse of the DBMS.

I'm thinking I'll prevent misuse by preventing use.  From the file
system and indexed sequential files to the DBMS and back to the file
system with a semantic web or .xml files.  Waddaya think?  (Don't
answer.)

> So I'm afraid I'm still very very wary of your terminology.

Golly gee, I haven't a clue why ;-)  But you gotta admit that we have
to mix something up here as software development in general and data
modeling in particular are becoming a DIY (do it yourself) hobby with a
backlash against more traditional approaches.  Since the SQL-RDBMS is
in need of a significant face lift anyway, I'm thinking there will be a
breakthrough there with a not-only-SQL/relational tool going neck and
neck with MySQL, reaping the beneifts of XQuery for gets and puts, for
example.  I'd like to write a database app with no SQL in it (of course
I can do that with 1970's tools today, but I'd like to do it with new
tools).

> -- Jan Hidders

Apologies for being too long and wordy, but I'm turning in rather than
proofreading and trimming back.  Hopefully it was entertaining enough
and not too stupid.  Cheers! --dawn
Marshall  Spight - 14 Jul 2005 20:01 GMT
> While the average comp sci major can figure out relations from the way
> it is typically explained, I don't like the way we have carved out the
[quoted text clipped - 6 lines]
> hauling it all out of "there" (the typical sql-dbms).  (The last
> statement was for your amusement only, mostly, sortof).

I agree that this separation is a real problem for software
development.
I believe that it is a historical accident, largely driven by the
fact that dbms's have been completely dominated by commercial
interests,
while programming languages have had a much freer development path.
The problem is, indeed, that we segment these concerns (application
code and data management) but I don't believe there is actually
any contest here. The goal isn't for one side to "win." Rather, each
side has some really important lessons they have learned. The dbms
tradition is quite impoverished when it comes to writing application
code, and the application code side is really quite limited when
it comes to what they can do with data management.

I believe the solution lies in a new, higher-order view that takes
into account the lessons of both sides. That means we want the best
features of each. We need declarative integrity constraints; we
need the relational algebra, AND we need clever type systems,
polymorphism, modularity, etc. We need to have these tools all
available "right here" so they can be used together seamlessly.
The database should not feel "far away."

(As an aside, I don't think that dynamically typed languages have
a lot to offer for the enterprise. They're great for prototyping
and for quite-small teams, but they don't scale up to large
development processes. Just my opinion.)

> Ok, so you are, indeed, tapping into my brain (pretty scary).  One of
> the dangers of declaring "stored" database functions as "relations" is
[quoted text clipped - 6 lines]
> we collect data?  One record at a time.  (I seem to be in a mood as if
> trying to entice Fabian to quote me, eh?)

I think this approach makes sense if we're trying to come up with
a view of programming and data management that we want to be able to
teach in high school, but I don't see why this approach is the one
we should take for the professional software engineer.

And in fact, I assert that this kind of record-at-a-time approach
is a disaster for scalability. It works fine for hundres of records,
and maybe for thousands, but for tens of thousands, or hundreds
of millions, it's a disaster. The computing landscape is distributed
now, for good or ill, and we have to build systems that don't send
a network packet for every record, or every keypress, or whatever.

Marshall
David  Cressey - 08 Jul 2005 03:44 GMT
> I think that would be very confusing. First, his table actually has
> ordering, functions don't.

It's not clear to me that a table has ordering, if I understand you
correctly.

The columns of a table can be referred to by name.  While the names can be
sorted alphabetically, it's not clear that columns have an inherent order.

The rows of a table can be assigned identifying numbers, if desired.  Those
numbers have a natural order, but it's not clear that the rows themselves
do.

That's distinct from the contents of the rows,  which can be ordered if the
domains of the columns are ordered.

And, of course, at some layer of representation,  it all has order.
dawn - 08 Jul 2005 17:24 GMT
> > I think that would be very confusing. First, his table actually has
> > ordering, functions don't.
>
> It's not clear to me that a table has ordering, if I understand you
> correctly.

I don't know if "table" has a commonly agreed upon def (and I don't
know if it is in the cdt glossary), but I suspect that Jan is right
that column ordering would be typical of the common use, if not the
definition, of the term.  If you use that word without a definition,
then there is likely an assumption in the mind of the listener that
they can see the table as rows and columns and that if someone else saw
this same table, they would see it with the columns in the same order.

Do you have a definition you work with when talking about tables?  I
tend to refer to them as spreadsheets since the audience can relate to
that easily.  While spreadsheet columns can be reordered, at any point
in time they have an ordering.

> The columns of a table can be referred to by name.  While the names can be
> sorted alphabetically, it's not clear that columns have an inherent order.

What is the distinction you make between a table and a relation?

> The rows of a table can be assigned identifying numbers, if desired.  Those
> numbers have a natural order, but it's not clear that the rows themselves
[quoted text clipped - 4 lines]
>
> And, of course, at some layer of representation,  it all has order.

Since we are talking about the definition of a term, I suppose you
could define it however you like, but perhaps we could find one or more
defs and put them in  the glossary since mAsterdam is back.

Cheers!  --dawn
mAsterdam - 09 Jul 2005 03:42 GMT
>>>I think that would be very confusing. First, his table actually has
>>>ordering, functions don't.
[quoted text clipped - 4 lines]
> I don't know if "table" has a commonly agreed upon def (and I don't
> know if it is in the cdt glossary),

Nope. Except for Jans excellent contribution on NULL and a rephrase of
data and meaning in wikipedia nothing much has changed since 0.0.4.
If there is more I'll take the time to make a 0.0.5 - or jump to 0.1.0.
There was a sub-thread with nice cdt glossary stuff about possreps and
unions (by VC and Jon), but nothing copy & pastable enough for the
glossary yet :-(

> but I suspect that Jan is right
> that column ordering would be typical of the common use, if not the
[quoted text clipped - 25 lines]
> could define it however you like, but perhaps we could find on