Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / April 2004

Tip: Looking for answers? Try searching our database.

Pizza Example

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Dawn M. Wolthuis - 03 Apr 2004 22:41 GMT
In trying to get a better handle on relational database design compared to
XML or PICK database design, I'm writing up the example below and got stuck
on just how a relational theorist would likely design this.  Below is the
design for a non-1NF implementation.  I'll use my terminology for this so
that it is not confused with relational terms.

Function/File:  ORDER_ITEMS
Purpose: This file maps the OrderID | ItemNumber to details of the ordered
item

Fields and sample data:
Key:
  OrderID.................72819
  ItemNumber...........1
OrderedItem..............Pizza
Size...........................16"
Crust.........................Chicago
Sauce........................Tomato
Cheese......................Mozzarella
.................................Feta
.................................Parmesan
Topping....................Pepperoni
................................Black Olives

Requirements for an application using this data include displaying line
items for the information in a particular format. I realize this is not
relevant for relational data design, but gets to the heart of what I'm
trying to understand with the relational model.

Order    Item   OrderItem Size    Crust      Sauce     Cheese
Topping

72819   1        Pizza        16"     Chicago   Tomato   Mozzarella
Pepperoni

Feta            Black Olives

Parmesan

This is, not coincidentally, how a non-1NF language, such as the PICK query
language (which goes by many names) might show the data with a statement
such as:

LIST ORDER_ITEMS OrderID ItemNumber OrderedItem Size Crust Sauce Cheese
Topping

(LIST is like SQL SELECT but with the function/file typed first, then the
fields)

In addition to the ORDER_ITEMS file, I would also have a file that includes
valid entries, along with descriptions with a file such as:

Function/File: VALIDATIONS

Purpose: This file maps a validation list name to the set of valid entries
and their descriptions

Fields & sample data:

Key:

   ValidationList.............Cheeses  (with other validation lists in this
same file for validating Sauces, Crusts, Toppings, and OrderedItems, having
those values as the key for the relevant validation list)

ValidEntries (designed with the below fields associated with each other):

  Abbreviation................Mozzarella

  .................................Feta

  .................................Parmesan

  Description..................Mozzarella Cheese

  .................................Feta Cheese

  .................................Parmesan Cheese

Then to show the descriptions instead of the abbreviation, the LIST
statement would include a virtual field CheeseDesc (for example) instead of
the stored data from the field named "Cheese".  In other words, the
"vocabulary" for the OrderItems function would be expanded to incorporate
these descriptions as virtual fields.

How would you model this data for an RDBMS and what would an analogous SQL
statement look like, both for just the stored order data and if there is a
join to get the descriptions for cheeses, for example?  I realize there
would be many more tables than there are files but it is not clear to me
just how this would typically be designed and queried.

Thanks in advance for your help.  --dawn
Dawn M. Wolthuis - 04 Apr 2004 00:17 GMT
I'll try to see if I can get the essense of the display of data in the pizza
example to be clearer by removing the first few columns so it doesn't wrap.
I hope this displays better.  Thanks.  --dawn

Size....Crust.......Sauce.....Cheese.........Topping
16"     Chicago   Tomato   Mozzarella   Pepperoni
                                        Feta            Black Olives
                                        Parmesan
rkc - 04 Apr 2004 04:41 GMT
> I'll try to see if I can get the essense of the display of data in the pizza
> example to be clearer by removing the first few columns so it doesn't wrap.
[quoted text clipped - 4 lines]
>                                          Feta            Black Olives
>                                          Parmesan

Start with your desired 'display' and apply the rules of normalization.
You seem to bright to not understand that, so you must be trying to
make some other point.
Dawn M. Wolthuis - 04 Apr 2004 05:18 GMT
> > I'll try to see if I can get the essense of the display of data in the
> pizza
[quoted text clipped - 10 lines]
> You seem to bright to not understand that, so you must be trying to
> make some other point.

Ah, so not all readers of the list have Bob B's opinion, eh?  Actually, it
is neither a) that I don't know how to perform a strict normalization
process on the data as I have written it nor b) that I'm trying to make
another point here.

It is that I suspect that someone who thinks in terms of normalized data
would not think about the problem domain in the same way.  For example,
perhaps someone would decide that the number of combinations was a
relatively small finite number at this point in time and for the foreseeable
future and might make a table of all possible combinations of pizzas with a
generated candidate key.  Then that would be placed as a foreign key in the
ORDER_ITEM table.  I'm sure there are other possibilities too.  I'm guessing
that it would be the rare RDBMS DBA that would actually implement a strictly
normalized version of the data described in my example, but I could be wrong
about that.

So, you are right, I'm not simply asking for this data to be normalized, but
for it to be understood so that a relational database implementation way of
thinking is presented.  I was going to simply use this as an example of the
difference between how an XML/PICK model of the data would look compared to
a relational model and when I normalized the data, I looked at it and
thought that it was unlikely that anyone would actually implement it that
way.  So, how would you do it?

Smiles!  --dawn
Laconic2 - 04 Apr 2004 10:36 GMT
> a relational model and when I normalized the data, I looked at it and
> thought that it was unlikely that anyone would actually implement it that
> way.  So, how would you do it?

It depends.
Tony - 04 Apr 2004 12:26 GMT
> > > I'll try to see if I can get the essense of the display of data in the
>  pizza
[quoted text clipped - 26 lines]
> normalized version of the data described in my example, but I could be wrong
> about that.

I would hope that you are wrong about that.  I hope that most RDBMS
DBAs would know that a normalized database is the right approach.

> So, you are right, I'm not simply asking for this data to be normalized, but
> for it to be understood so that a relational database implementation way of
[quoted text clipped - 3 lines]
> thought that it was unlikely that anyone would actually implement it that
> way.  So, how would you do it?

This inevitably leads to further questions.  Apparently "pizza" is not
the only orderable item, but all orderable items are similar to pizza
in that they have size, crust, toppings etc?  Assuming that to be true
(as it seems to be assumed by your definition):

create table item_types( item_type_name varchar2(30) primary key );
create table cheeses( cheese_name varchar2(30) primary key );
create table toppings( topping_name varchar2(30) primary key );
create table sauses( sauce_name varchar2(30) primary key );
create table crusts( crust_name varchar2(30) primary key );
create table sizes( size_inches integer primary key );
create table order( order_no integer primary key );
create table order_item
( order_no integer, item_no integer
, size_inches references sizes
, crust_name references crusts
, sauce_name references sauces
, primary key (order_no, item_no)
);
create table order_item_cheeses
( order_no integer, item_no integer, cheese_name references cheeses
, foreign key (order_no, item_no) references order_items
, primary key (order_no, item_no, cheese_name)
);
create table order_item_toppings
( order_no integer, item_no integer, topping_name references toppings
, foreign key (order_no, item_no) references order_items
, primary key (order_no, item_no, topping_name)
);

This is a very quick & dirty design, so please don't quibble about
column names or whether 30 chars is appropriate for a cheese name!
Dawn M. Wolthuis - 04 Apr 2004 15:06 GMT
<snip>
> This inevitably leads to further questions.  Apparently "pizza" is not
> the only orderable item, but all orderable items are similar to pizza
> in that they have size, crust, toppings etc?  Assuming that to be true
> (as it seems to be assumed by your definition):

Bingo -- that is one of the issues.  No, the "Garlic Bread" or other items
that could be ordered might have a subset (including null set) of these
items, but these includes all attributes that we are putting on the sum
total of all items.  So you can think of the attributes listed as the set of
all possible attributes for an item.  I chose to at least have one item that
does use all of them, but a typical PICK implementation (from what I have
seen) includes attributes for sub-types.

> create table item_types( item_type_name varchar2(30) primary key );
> create table cheeses( cheese_name varchar2(30) primary key );
[quoted text clipped - 23 lines]
> This is a very quick & dirty design, so please don't quibble about
> column names or whether 30 chars is appropriate for a cheese name!

That's helpful, in particular because it verifies my claim that those
working with an RDBMS do typically put size constraints on data even when
the problem domain gives no indication of such constraints being required.
I'm guessing you opted to size these attributes because you size everything
because that's the way RDBMS's like it (for performance?)

So, now, if you don't make the assumption that every item that someone could
buy would have each attribute (such as Sauce), then how would you do
t?  --dawn
Laconic2 - 04 Apr 2004 15:22 GMT
Now we're getting somewhere.  I have no idea what a theorist would do, but a
perspicacious analyst would hesitate to generalize data requirements from a
single example.

If another of the orderable items were Neapolitan Ice Cream, rather than
Garlic Bread,  then the list of toppings might be quite diffferent.  And so
on.
Dawn M. Wolthuis - 04 Apr 2004 15:48 GMT
> Now we're getting somewhere.  I have no idea what a theorist would do, but a
> perspicacious analyst would hesitate to generalize data requirements from a
> single example.

Of course, of course, but using a Pizza example is a way to provide a
problem domain of which most of us are somewhat comfortable.  A complete set
of requirements would be inappropriate for this forum unless they can be
indicated with a statement like "provide a data model for ordered items in
an order entry system for a small pizza shop".

> If another of the orderable items were Neapolitan Ice Cream, rather than
> Garlic Bread,  then the list of toppings might be quite diffferent.  And so
> on.

And if you concept of a pizza place includes selling Neapolitan Ice Cream
with toppings, then I'm sure I can learn something from the way you would
model that too.  So, I can accept some human interpretation of the situation
in order to keep the requirements statement short.

cheers!  --dawn
Laconic2 - 04 Apr 2004 16:51 GMT
> Of course, of course, but using a Pizza example is a way to provide a
> problem domain of which most of us are somewhat comfortable.

And that's precisely the problem.  A problem domain where most of us feel
comfortable is a problem domain where we each come to the table with a large
number of assumptions.  When those assumptions coincide,  we move forward
quickly.  When they don't, we have the illusion of a common data model
rather than the reality of one.

Give me a problem domain that I don't know anything about, and a subject
matter expert that is willing to work with me on a data model that is
subject matter relevant,  and I'll come up with something.  Give me a
subject matter that I think I know, but really do not,  and you'll see me
make a fool out of myself.

I know how to order pizza from a pizza place, but I never worked at one.  Am
I a subject matter expert?

> And if you concept of a pizza place includes selling Neapolitan Ice Cream
> with toppings, then I'm sure I can learn something from the way you would
> model that too.  So, I can accept some human interpretation of the situation
> in order to keep the requirements statement short.

They aren't just short,  they are laconic.  As I said before, it depends.
If the mission of the database users is order processing,  normalization is
important, at least when it comes to data that is written during order
placement and order fulfillment.   (It would probably be ok to update data
that is less than fully normalized when a new kind of cheese comes along.)
If the mission is tactical marketing,  then a multidimensional model might
be more useful.

And even the above statements depend on how the design is going to be used.
If its a construction model that is going to be used to create a schema of
tables, columns and constraints in a fairly mechanical fashion,  I might
design it one way.  If it's intent is to be an abstract model,  from which a
schema can be derived,  then I might model it differently.
Dawn M. Wolthuis - 04 Apr 2004 19:11 GMT
> > Of course, of course, but using a Pizza example is a way to provide a
> > problem domain of which most of us are somewhat comfortable.
[quoted text clipped - 13 lines]
> I know how to order pizza from a pizza place, but I never worked at one.  Am
> I a subject matter expert?

I once heard that there is a philosophy difference among news writers about
whether the person writing up the story should have some previous knowledge
of the subject matter or be ignorant of it.

When it comes to database design, I'd have to disagree with you and suggest
that a business analyst who specializes in a particular vertical application
will, in general, do a better job with the data design than someone who
doesn't know it.  When I wrote programs for public television in the early
80's, I stopped turning the channel when they were begging for money --
getting a feel for their promotional practices helped me design the software
(including the data structures) better, ask better questions, etc.

So, I'll support my choice of a pizza application, figuring that many of us
can make many valid assumptions -- at least enough to provide a response to
my question that is helpful (at least to me) in understanding how relational
modelers think about the data.  Of course, I will grant that if you were to
do an implementation of this next week you might want to have a few
interviews with the users & owners of the application to be sure you hit the
nail on the head.  For the purposes of my question, feel free to pretend you
are the user and make the type of assumptions you would with the limited
knowledge you have of a pizza ordering system.
Cheers!  --dawn

>  >
> > And if you concept of a pizza place includes selling Neapolitan Ice Cream
[quoted text clipped - 16 lines]
> design it one way.  If it's intent is to be an abstract model,  from which a
> schema can be derived,  then I might model it differently.
Tony - 04 Apr 2004 20:56 GMT
> > "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message
>  news:<c4o2at$frb$1@news.netins.net>...
[quoted text clipped - 46 lines]
> I'm guessing you opted to size these attributes because you size everything
> because that's the way RDBMS's like it (for performance?)

SQL DBMSs, not RDBMSs as such.  But I didn't have to do that.   I
could use e.g. Oracle's CLOB datatype, which allows up to 4GB of data.
Or I could have said varchar2(4000).  RDBMS designers tend to like to
impose some kind of discipline on the data being stored, it isn't
really a performance issue - at least not from the DBMS's point of
view.  But if you allow 4000 character pizza names in the database,
then you must allow for that in your user interface, which means
declaring 4000-character variables - something of a memory drain,
perhaps?  Frankly it would just seem rather stupid to allow unlimited
length for every data item.

> So, now, if you don't make the assumption that every item that someone could
> buy would have each attribute (such as Sauce), then how would you do
> t?  --dawn

In a SQL database you could just allow NULLs in those items.  In a
true RDBMS that doesn't allow nulls you could either create a table
for each subtype of order_items or you could move the attributes like
crust_name into separate tables like:

create table order_item_crust
( order_no integer, item_no integer, crust_name references crusts
, foreign key (order_no, item_no) references pizza_order_items
, primary key (order_no, item_no)
);

(Note that this table only allows a maximum of one crust per pizza).
Dawn M. Wolthuis - 04 Apr 2004 21:28 GMT
> > > "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message
> >  news:<c4o2at$frb$1@news.netins.net>...
> > > > >
> >  <snip>

> SQL DBMSs, not RDBMSs as such.  But I didn't have to do that.   I
> could use e.g. Oracle's CLOB datatype, which allows up to 4GB of data.
[quoted text clipped - 6 lines]
> perhaps?  Frankly it would just seem rather stupid to allow unlimited
> length for every data item.

I see no logic in this.  The database need not constrain the size of data
that is stored and the UI can either constrain it or not, as the
designer/user see fit.  Constraining the database to only store up to a
particular length means that the database needs to be changed if a new valid
value for an attribute arises that is longer than what was anticipated.  The
UI can permit scrolling to show only a limited number of characters, but
scroll to show all.  A report writer can wrap long values to a certain width
for practicality.  Unless there is a REAL business reason for limiting the
number of characters for a value (or your DBMS product requires it or gives
you better features if you do), don't do it.  From what I have seen of RDBMS
tools, those who use them are inclined to put an attribute of "color" in the
database (for example) with a max number of characters for the value.  All
we need, then, is a new color to come available and you can bet that the
name will be abbreviated now to keep it within that length.  How silly IMO.

> > So, now, if you don't make the assumption that every item that someone could
> > buy would have each attribute (such as Sauce), then how would you do
[quoted text clipped - 12 lines]
>
> (Note that this table only allows a maximum of one crust per pizza).

Yes, this does look like an RDBMS approach to the problem.  I don't think it
is a "natural" way to view the problem, but it follows a set of rules for
what that is worth.
smiles.  --dawn
Lemming - 04 Apr 2004 22:16 GMT
>> "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message
>news:<c4p4q7$on1$1@news.netins.net>...
[quoted text clipped - 28 lines]
>we need, then, is a new color to come available and you can bet that the
>name will be abbreviated now to keep it within that length.  How silly IMO.

Doesn't "ALTER TABLE ... " make this problem go away?

>> > So, now, if you don't make the assumption that every item that someone
>could
[quoted text clipped - 18 lines]
>what that is worth.
>smiles.  --dawn

Maybe I've missed the point, but it seems to me the above solution is
backwards.  The order_item shouldn't reference crusts at all ... you
need an intermediate table to resolve the possible many-to-many
relationship of order_item to crust.  Informally:

Table: crust (crust_id,
            data ...)
Table: order_item
        (order_id,item_id,
            data ...)
Table: item_crust
        (crust_id references crust (crust_id),
        (order_id, item_id) references order_item(order_id, item_id),
            data ...)

Hope that makes sense, it's late ...

Lemming
Signature

Curiosity *may* have killed Schrodinger's cat.

Tony - 05 Apr 2004 10:17 GMT
> >> "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message
>  news:<c4p4q7$on1$1@news.netins.net>...
[quoted text clipped - 31 lines]
>
> Doesn't "ALTER TABLE ... " make this problem go away?

Yes, but programmers like Dawn seem to think DBMSs are fiendishly
difficult to use, so would avoid requiring ALTER TABLE statements like
the plague ;-)

> >> > So, now, if you don't make the assumption that every item that someone
>  could
[quoted text clipped - 33 lines]
>         (order_id, item_id) references order_item(order_id, item_id),
>             data ...)

Yes, that is exactly what I meant.  My "order_item_crust" table is the
intermediate table between my order_items and crusts tables.
Paul - 04 Apr 2004 23:22 GMT
> I see no logic in this.  The database need not constrain the size of data
> that is stored and the UI can either constrain it or not, as the
[quoted text clipped - 10 lines]
> we need, then, is a new color to come available and you can bet that the
> name will be abbreviated now to keep it within that length.  How silly IMO.

The logic is that is helps to enforce data integrity.

As a real-world example, I sometimes have to import data from a flat
file into a database. Sometimes the people who create the flat file mess
things up and give the columns in the wrong order or something.

If this happens when I have maximum sizes defined for the column widths,
it usually will be the case that something will overflow and give an
error. This is what I *want* to happen in this situation. If I just had
everything at varchar(8000) it would all import with no errors except my
database would end up hopelessly corrupted.

The downside is, as you say, that you have to be careful not to make the
minimum column widths too small. But with a judicious choice of sizes
this problem should be much less than the data integrity benefits.

Paul.
Tony - 05 Apr 2004 10:35 GMT
> > "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message
>  news:<c4p4q7$on1$1@news.netins.net>...
[quoted text clipped - 16 lines]
>
> I see no logic in this.

Why doesn't that surprise me? ;-)

> The database need not constrain the size of data
> that is stored and the UI can either constrain it or not, as the
> designer/user see fit.

No it NEED not, as I said above.  But it makes a lot of sense to do so
- see Paul's post for integrity reasons.  Another reason would be to
impose some predictability on data items.  Obviously, if you have an
attribute like "customer_opinion_on_pizza" then you would allow for
plenty of text - though even then you may prefer to limit to 2000
chars (e.g.) rather than let some verbose customer fill 4GB of waffle
about it.  But if EVERY column allowed "unlimited" length, how would
you design your screens and reports?  Scrollable text fields for
everything presumably.

> Constraining the database to only store up to a
> particular length means that the database needs to be changed if a new valid
> value for an attribute arises that is longer than what was anticipated.

How terrifying!  More likely of course, the user will come up with a
suitably abbreviated name.  We are all familiar with such abbreviated
names from till receipts, credit card statements etc.

In any case, let's be clear: this argument is a side-line that has
NOTHING to do with choosing an RDBMS over Pick or whatever.  It is a
design issue; the fact that relational database designers are more
likely to choose to constrain data lengths than you are is perhaps
just a symptom of our more disciplined approach to database design.

> > > So, now, if you don't make the assumption that every item that someone
>  could
[quoted text clipped - 17 lines]
> is a "natural" way to view the problem, but it follows a set of rules for
> what that is worth.

What it is worth is a LOT, since computers work better with data
structured according to rules rather than "unstructured" or "natural"
data.  Who cares if it seems "natural" to you?  If you are a database
designer, you better learn to see beyond the "natural" if you are
going to be useful, and if you are an end-user accessing the database
via an application you can be protected from needing to understand the
relational way of doing it.
Dawn M. Wolthuis - 05 Apr 2004 17:19 GMT
> > > "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message
> >  news:<c4p4q7$on1$1@news.netins.net>...
[quoted text clipped - 17 lines]
>
> Why doesn't that surprise me? ;-)

Hey Tony, you could cut me some slack -- I'm at least asking question in an
honest effort to learn.  Some of the relational theorists on this list are
open to learning other approaches, while many seem to be happy to live with
their relational tunnel vision.

I understand now from Paul's response that the reason folks like to restrict
the size of fields is not for performance reasons, but because they want the
database, not just software applications, to kick out data that is clearly
wrong.  I suspect that in many cases, by the time data reaches the DBMS a
software developer has already found a reason to apply constraint logic
(coded outside of the database) to the transactions because they needed to
for the GUI or to provide an immediate response to a web services
transaction, but this locks down the database further so if a programmer
doesn't catch something, the database will.

It does seem to me that it might make more sense, then, for these lengths to
give warnings in many cases, rather than be fixed constraints if they are
checking for the likelihood of good data, but I don't know if there is a
clean way to do that.

Otherwise, allowing the database to store every data element as variable
length unless there is a clear business reason not to, but ensuring that all
points of entry of data -- GUI or web service, for example -- apply
constraint logic with warnings in cases where unlikely data is entered makes
more sense to me.

> > The database need not constrain the size of data
> > that is stored and the UI can either constrain it or not, as the
[quoted text clipped - 17 lines]
> suitably abbreviated name.  We are all familiar with such abbreviated
> names from till receipts, credit card statements etc.

Yes, exactly the state of data I have seen.  Instead of the easy, but not
completely without risk, task of altering a table, the user simply figures
out how to work around the constraints.  I think that is a bit of a shame,
but I guess I can see where the cost of both end-users and software
developers figuring out how to live with the initial constraints for the
life of a software application rather than "fixing" them could be considered
part of the cost of the software.  Hopefully neither the user or the
developer will go to extremes in this regard and will request the required
changes when needed (but we have all seen cases where they have, right?)

> In any case, let's be clear: this argument is a side-line that has
> NOTHING to do with choosing an RDBMS over Pick or whatever.  It is a
[quoted text clipped - 31 lines]
> via an application you can be protected from needing to understand the
> relational way of doing it.

But if there is a structure and set of rules that align better with the
natural structure of language, then we could possibly have a cost savings
over the life of software (and that is what the anecdotal evidence suggests
is happening, but I have nothing to prove it).  I am NOT at all opposed to
structure.  I am alsol pro-constraints and would like to see the constraints
that are important for data integrity be applied at the proper points in an
application -- that might mean both in a GUI and when the data are stored.

What I have seen with RDBMS's is that the size constraint is placed on
everything while more important constraints related to the allowable values
are not as often applied, especially when it is a "yellow flag" type of data
value for an attribute rather than obviously not permitted.  I'll put this
in a separate thread at some point (but have several other questions prior
to that one).

Thanks.  --dawn
Laconic2 - 05 Apr 2004 18:50 GMT
Cutting slack is a two way street.  You could cut other people some slack,
and recognize that anyone who expresses a relational model in the form of
SQL CREATE commands is probably going to include some features of the
physical data model as well as features of the logical model.

Choosing an upper limit of 30 was not inherent in the logical data model.

It might have been (assumption) inherent in the domain definition of the
data requirements.  A limit on the number of characters IS, after all,  a
domain constraint.  If its conceptual it is discovered, not designed,  and
is in the conceptual data model as well as the logical data model.

Or it might have been invented at the time the logical model was converted
to a physical model,  for reasons like storage capacity or throughput
(again, an assumption).

Also,  what's the big deal?  If thirty turns out to be too low,  then just
ALTER the column  to make it forty.  A good RDBMS will do that for you,  and
pad the existing data with blanks.  Of course that could wreak havoc with
data independence, when users of the data try to stuff a 40 character
data value into a 30 character working storage variable.  But you can't have
everything, can you?
Anthony W. Youngman - 05 Apr 2004 23:05 GMT
>Or it might have been invented at the time the logical model was converted
>to a physical model,  for reasons like storage capacity or throughput
[quoted text clipped - 6 lines]
>data value into a 30 character working storage variable.  But you can't have
>everything, can you?

Or, at the design stage, you simply accept the fact that length is
variable, and expect your database to be clever enough to cope :-)

Constrain the display width, but don't apply artificial constraints to
data.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Laconic2 - 06 Apr 2004 15:02 GMT
> Or, at the design stage, you simply accept the fact that length is
> variable, and expect your database to be clever enough to cope :-)

Very good point.  In fact a good DBMS will apply compression to save space,
with no information loss.  And, in general,  the time saved by saving space
will be more than the time expended in compression/decompression.  And, when
odd cases occur, the DBA can tell the DBMS what to do.

However, there are some situations where variable length data is
unacceptable design,  and not for reasons that can be hidden inside the
DBMS.  Other writers have given several examples in this thread.
T - 16 Apr 2004 16:26 GMT
Talk about a discussion in semantics.  The limit on the *description* of the
constituent ingredients is one that is inherent to the problem domain.  One
can make a safe assumption that no ingredient's description is larger than n
number of characters at the time of design.  In a good design with any
decent RDMS, that can easily be changed at any time to accommodate expanded
text.

But let's be clear, the general user (i.e., not the manager or supervisor)
would *never* be allowed to enter an ingredient. Rather, they would only
ever be allowed to *choose* ingredients.  Specifically, they would only ever
be allowed to associate an ingredient choice with an order item.  Only a
manager would ever enter the description of an ingredient and one can safely
assume that there are limits on that name.  If the manager wishes to be more
verbose (say store the chemical makeup of the ingredient.)  Then I would
provide a second field for that verbose description and that field would
provide unlimited space.  The purpose of main "name" field (if we call it
that) is so that an unsophisticated user can quickly identify the ingredient
they wish to assign. Providing for and entry of extended text would serve to
confuse the user.

Thus, the discussion about "artificial limits" on the user's data is
nonsensical.  Rather the design approaches provided in this thread are ones
that account for areas of lengthy text as well as areas requiring simple,
short text which by definition means there is a limit on its length.

Thomas

> Cutting slack is a two way street.  You could cut other people some slack,
> and recognize that anyone who expresses a relational model in the form of
[quoted text clipped - 18 lines]
> data value into a 30 character working storage variable.  But you can't have
> everything, can you?
Eric Kaun - 05 Apr 2004 20:44 GMT
> > "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message
> news:<c4pr5i$n3p$1@news.netins.net>...
[quoted text clipped - 4 lines]
> Hey Tony, you could cut me some slack -- I'm at least asking question in an
> honest effort to learn.

I agree completely - such sniping is mean-spirited, discourages honest
discussion, and is usually more amusing to the sniper than to onlookers.

> Some of the relational theorists on this list are
> open to learning other approaches, while many seem to be happy to live with
> their relational tunnel vision.

Well, tunnel vision is one explanation, but I've made an effort to read the
XML specs, including XQuery, before pronouncing it damned beyond redemption.
Rejection doesn't necessarily imply ignorance and tunnel vision.

> It does seem to me that it might make more sense, then, for these lengths to
> give warnings in many cases, rather than be fixed constraints if they are
[quoted text clipped - 6 lines]
> constraint logic with warnings in cases where unlikely data is entered makes
> more sense to me.

In some cases, such as DB conversions and loads from flat files, the DB is
adequate, but granted that UIs should have up-front validation rules
generate from the DB constraints. I think it's still extremely useful to
have one authority, and not just a document, but something enforced and
specific.

> > > Yes, this does look like an RDBMS approach to the problem.  I don't
> think it
[quoted text clipped - 14 lines]
> over the life of software (and that is what the anecdotal evidence suggests
> is happening, but I have nothing to prove it).

I'm not asking for proof, since such is very difficult and expensive to
acquire, but my experience suggests the opposite. The more "natural"
approach of XML leads to chaos - data out of sync with no constraints to
tell you, difficult debugging before you realize the Doc1 node 3 levels deep
is out of sync with the top-level node of Doc2 in a different place, etc. I
see absolutely no reason to use the natural structure of language for
anything other than communicating with business people. Logic and
mathematics are the basis of most works of engineering, and yet most
laypeople don't understand them. I would prefer that people building medical
instruments and weapon systems NOT use language that I'm capable of
understanding in their designs - that would make me immediately suspicious
that they're making it simpler than is feasible for good design.

> I am NOT at all opposed to
> structure.  I am also pro-constraints and would like to see the
constraints
> that are important for data integrity be applied at the proper points in an
> application -- that might mean both in a GUI and when the data are stored.

Agreed.

> What I have seen with RDBMS's

There aren't any yet other than Dataphor.

> is that the size constraint is placed on
> everything while more important constraints related to the allowable values
> are not as often applied, especially when it is a "yellow flag" type of data
> value for an attribute rather than obviously not permitted.

Good type/domain support in the DBMS would handle this. You're right, it's a
big gap and a big problem.

- Eric
Anthony W. Youngman - 05 Apr 2004 23:14 GMT
>I'm not asking for proof, since such is very difficult and expensive to
>acquire, but my experience suggests the opposite. The more "natural"
[quoted text clipped - 8 lines]
>understanding in their designs - that would make me immediately suspicious
>that they're making it simpler than is feasible for good design.

Somebody gave me a wonderful quote recently. "Logic and mathematics give
you a consistent model. Academicians have an unfortunate tendency to
confuse consistency with truth."

As for you being worried if you could understand engineers or doctors -
I'm sorry, but I don't see why I should assume that other people are
better than me. If I can't understand (at least superficially) what
they're doing, then I conclude they are either crap at explaining
themselves or, worse, they don't understand themselves. And I'm sorry if
I'm cynical, but I've had enough experience of various professions (and
from my own research) to know that people are very good seeing what they
want, and not seeing what they don't want. I don't trust "experts". Far
too many of them wear blinkers :-(

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Dawn M. Wolthuis - 05 Apr 2004 23:54 GMT
> >I'm not asking for proof, since such is very difficult and expensive to
> >acquire, but my experience suggests the opposite. The more "natural"
[quoted text clipped - 12 lines]
> you a consistent model. Academicians have an unfortunate tendency to
> confuse consistency with truth."

Yes, so true!  A mathematical model is a MODEL -- a metaphor.  As someone
once said, "all models are flawed, but some are useful".  Each data model I
have seen has some aspects which make it useful and some which are not as
helpful.  Looking at the benefits and deficits of various models makes
sense, but if we are to anoint one as THE model (which our profession has
somewhat done with the relational model), we should be very careful we don't
go and apply the theory of relativity to very small objects (an analogy) --
we should use it where it is useful.  From what I can see, the relational
model is not close enough to a generalized TOE (theory of everything) for
data that we should stop reviewing other options.

I was in the middle of another post that relates to language when I saw your
quote here and it fits nicely with that so I could just jump into that topic
here, but instead I'll just agree with the comment and note that it seems
likely that we need not just mathematics and logic, but also language in our
discussions of data.   Cheers!
--dawn

> As for you being worried if you could understand engineers or doctors -
> I'm sorry, but I don't see why I should assume that other people are
[quoted text clipped - 8 lines]
> Cheers,
> Wol
Laconic2 - 06 Apr 2004 15:27 GMT
> I was in the middle of another post that relates to language when I saw your
> quote here and it fits nicely with that so I could just jump into that topic
> here, but instead I'll just agree with the comment and note that it seems
> likely that we need not just mathematics and logic, but also language in our
> discussions of data.   Cheers!

Absolutely true!

Cybernetics is ultimately about communication and control in the entire
system.
If there are humans in the system, language is an essential component of
communication.
Eric Kaun - 06 Apr 2004 16:21 GMT
> Looking at the benefits and deficits of various models makes
> sense, but if we are to anoint one as THE model (which our profession has
[quoted text clipped - 3 lines]
> model is not close enough to a generalized TOE (theory of everything) for
> data that we should stop reviewing other options.

Couldn't agree more - we should be. The value of O-O is still far from
concluded. What's a little different about relational is that it's derived
very directly from logic and set theory, and also is the implicit base in
most other theories of data - you've said as much yourself regarding
multivalues. However, the "extensions" can corrupt the base, which I believe
is that case with most other theories, and furthermore they often add
nothing.

> I was in the middle of another post that relates to language when I saw your
> quote here and it fits nicely with that so I could just jump into that topic
> here, but instead I'll just agree with the comment and note that it seems
> likely that we need not just mathematics and logic, but also language in our
> discussions of data.   Cheers!

True, but I've not seen anything formal, which a computer system needs to
be. Chomsky, Wittgenstein and countless others have theories of language;
have they been applied to data? So far, all I've heard is "we should
consider language", but that's not much good unless we get more concrete. As
I said, we can't ever stop thinking and evaluating... but we need something
fairly concrete.

- erk
Laconic2 - 06 Apr 2004 17:43 GMT
I'm going to bring up Terry Winograd, again.

I don't know what Terry Winograd has written in the last thirty years,  but
he certainly gave language, and more specifically semantics, a formal
treatment.

At one point, he called a word a "reference to a process which, executed in
context, will figure out the meaning".  That's a very loose paraphrase on my
part, and I apologize if I got it wrong.  Human memory, you know.

That reminds me of the integration of the relational model and the OO model
by storing the "shadow of an object"  (the projection, really)  as a data
item in a database, and then using it as a "view".

But maybe I'm reading too much into either or both of the above.
Anthony W. Youngman - 07 Apr 2004 00:33 GMT
>> Looking at the benefits and deficits of various models makes
>> sense, but if we are to anoint one as THE model (which our profession has
[quoted text clipped - 13 lines]
>is that case with most other theories, and furthermore they often add
>nothing.

The problem I see with relational, is that it does not cope with -
indeed, denies the existence of - emergent complexity. For example,
there is no way for the DBMS to know (or record the fact) whether a
table represents a "noun" or a "repeating adjective". Seeing as nouns
and adjectives are fundamentally different in language, I would have
thought the inability of a DBMS to distinguish between the data
equivalents would be a serious handicap...

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Laconic2 - 07 Apr 2004 03:55 GMT
I question whether nouns and adjectives are fundamentally different in
language.  I also question whether all natural languages have nouns and
adjectives as components.
Eric Kaun - 07 Apr 2004 12:27 GMT
> The problem I see with relational, is that it does not cope with -
> indeed, denies the existence of - emergent complexity.

I'll address the specific example below, but what does this mean in general?

> For example,
> there is no way for the DBMS to know (or record the fact) whether a
> table represents a "noun" or a "repeating adjective".

What difference does it make?

> Seeing as nouns
> and adjectives are fundamentally different in language, I would have
> thought the inability of a DBMS to distinguish between the data
> equivalents would be a serious handicap...

The DBMS also can't distinguish past tense from present perfect tense. So
what? What does the "data equivalent" mean?

Your argument seems to be that because the relational data model doesn't
mirror language in its fundamental structures, that it's "fundamentally
unable" to distinguish between several different things that need to be
distinguished. Assuming that one could model the vagaries of human language
(which might be doubtful), I don't see why relational would do a worse job
than other data models. How does MV do so?
Tony - 07 Apr 2004 17:45 GMT
> > The problem I see with relational, is that it does not cope with -
> > indeed, denies the existence of - emergent complexity.
>
> I'll address the specific example below, but what does this mean in general?

I have learnt from past experience that Wol likes to throw in
irrelevant but important-sounding scientific concepts from time to
time.  Heisenberg's Uncertainty Priciple comes up quite often, along
with Godel's Incompleteness Theorem and of course Relativity v.
Newton!  The relational model is deemed wanting because it fails to
take these into account, though it is far from clear how MV or
anything else does other, nor why they should.

It's best to try to ignore these digressions, but I usually fail to do
so...
Dawn M. Wolthuis - 07 Apr 2004 18:21 GMT
> > > The problem I see with relational, is that it does not cope with -
> > > indeed, denies the existence of - emergent complexity.
[quoted text clipped - 11 lines]
> It's best to try to ignore these digressions, but I usually fail to do
> so...

I tend to agree, although I have learned that Wol has a point even if
obscured by his seeming lack of interest in or respect for mathematical
models.  I agree with him wholeheartedly that a model is a model -- just
that, no more.  Pointing out both flaws and usefulness in particular models
is helpful.  Thinking all mathematical models are hogwash is not, however.
I don't need to emperically determine that 1000 + 1000 = 2000 because I have
a model, a mathematical theory, that starts with some axioms and builds on
that and that convinces me, without any emperical data, that if I have 1000
objects and add in another 1000 objects, then I will have 2000.

So in spite of the shared appreciation that I have for the underdocumented
PICK "model" (loose use of the term in this case) along with Wol, he and I
have had this mathematics vs science discussion before in other forums.  If
we have a mathematical model that is useful AND we have some emperical data
to back up the usefulness of the model, then that would be great.  I think
"language" belongs in there somewhere too, but not sure exactly where.

I don't know of any data approaches that have both a solid mathematical
model (and no, it will not be both complete and consistent) and emperical
data that gives evidence of the usefulness of the model.

But if anyone else does, I'm interested.  --dawn
Eric Kaun - 07 Apr 2004 20:53 GMT
> > "Eric Kaun" <ekaun@yahoo.com> wrote in message
> news:<hARcc.51781$ca7.11809@newssvr16.news.prodigy.com>...
[quoted text clipped - 37 lines]
>
> But if anyone else does, I'm interested.  --dawn

I have no doubt that Wol knows his stuff when it comes to physics, far
better than I do. I do have severe doubts about the applicability of
reasoning about physical models to computing, though we're getting into
philosophy here.

I'm certainly willing to entertain language notions - I just haven't heard
anything concrete enough to serve as the basis for a data model. Loose
correspondence to English is, in my opinion, not a good metric.

- erk
Jan Hidders - 07 Apr 2004 21:43 GMT
> I'm certainly willing to entertain language notions - I just haven't heard
> anything concrete enough to serve as the basis for a data model. Loose
> correspondence to English is, in my opinion, not a good metric.

Did you ever look at the philosophy behind ORM (Object-Role Modelling)?
Or the philosophy behind predicate logic, for that matter?

-- Jan Hidders
Eric Kaun - 07 Apr 2004 22:12 GMT
> > I'm certainly willing to entertain language notions - I just haven't heard
> > anything concrete enough to serve as the basis for a data model. Loose
> > correspondence to English is, in my opinion, not a good metric.
>
> Did you ever look at the philosophy behind ORM (Object-Role Modelling)?
> Or the philosophy behind predicate logic, for that matter?

No, can't say I have in any specific way. Pointers/links?
Jan Hidders - 07 Apr 2004 22:59 GMT
>>>I'm certainly willing to entertain language notions - I just haven't
>
[quoted text clipped - 7 lines]
>  
> No, can't say I have in any specific way. Pointers/links?

For ORM see http://www.orm.net  and as an introcution perhaps
http://www.orm.net/pdf/ORMwhitePaper.pdf

For predicate logic you can always read Frege's Begriffsschrift. :-)
Sorry, only kidding. Can't think of a good reference right now.

The point is that all these *are* in fact based upon a correspondance to
language. In some sense that was what the great discovery by Aristotle
was: the fact that you can sometimes reason on the basis of only the
*form* of statements. Hence Formal logic.

-- Jan Hidders
Eric Kaun - 08 Apr 2004 14:58 GMT
> >>>I'm certainly willing to entertain language notions - I just haven't
> >
[quoted text clipped - 18 lines]
> was: the fact that you can sometimes reason on the basis of only the
> *form* of statements. Hence Formal logic.

Certainly - since computers can't understand meaning in the way that we can
(which we ourselves don't understand), it follows that form is about all
there is, and thus the mechanizability of symbolic logic. I'm not
downplaying the role of language, but languages that computers understand
are far different than those we understand. So if we're going to use human
language as a basis for computing, we have to specify which aspects of it,
or subset of it, and furthermore impose rules that might not make much
difference to our comprehension.
Jan Hidders - 08 Apr 2004 17:38 GMT
>>The point is that all these *are* in fact based upon a correspondance to
>>language. In some sense that was what the great discovery by Aristotle
[quoted text clipped - 9 lines]
> or subset of it, and furthermore impose rules that might not make much
> difference to our comprehension.

I couldn't agree more. I guess the point that I was trying to make is
that this has already been done for a while and there is lot's of
literature on it. The outcome is more or less what we know as the
differenct brances of symbolic logic, the very thing the relational
model is based upon.

-- Jan Hidders
Dawn M. Wolthuis - 08 Apr 2004 17:45 GMT
> > >>>I'm certainly willing to entertain language notions - I just haven't
> > >
[quoted text clipped - 27 lines]
> or subset of it, and furthermore impose rules that might not make much
> difference to our comprehension.

Obvously, the computer itself need not understand the language in order for
us to understand language that is output from the computer.  So, if we don't
split up English sentences (for example) unnecessarily when we feed them to
the computer as "data," then we can retrieve them in a form closer to the
original.

So, if "the Pizza has Mozzarella and Parmesan cheese" is stored as:

Pizza  Mozzarella
         Parmesan

in a single, uh, record, then if the ordering is unwittingly useful, we
didn't harm that ordering.

--dawn
Eric Kaun - 08 Apr 2004 20:37 GMT
> > > >>>I'm certainly willing to entertain language notions - I just haven't
> > > >
[quoted text clipped - 44 lines]
> in a single, uh, record, then if the ordering is unwittingly useful, we
> didn't harm that ordering.

Of course. This assumes that your main reason for storing the data is to
display it again, in its original form, for humans. If that's all you're
doing, then many different simple systems will suffice - a Word document, a
spreadsheet, etc. But if you're trying to reason about the data, then you
need to structure it in a way amenable to automated deduction. You are also
perfectly free to keep a copy of the original, which would then be dependent
upon the key, the whole key, and nothing but the key per normalization
rules.
Dawn M. Wolthuis - 08 Apr 2004 21:24 GMT
> > > > >>>I'm certainly willing to entertain language notions - I just
> haven't
[quoted text clipped - 61 lines]
> upon the key, the whole key, and nothing but the key per normalization
> rules.

I want my cake and eat it too!  The PICK structure does what I have
described and is "amenable to automated deduction" and it seems to me that
there is some value in that, but I'm still poking and prodding to clarify
what that might be.  The original Pizza example and this example about
retaining order (related to another thread) both show up some advantages of
the PICK or XML approach vs. the relational approach.  I realize they fall
far short of proving anything, however.  Cheers!  --dawn
Eric Kaun - 12 Apr 2004 16:40 GMT
> > Of course. This assumes that your main reason for storing the data is to
> > display it again, in its original form, for humans. If that's all you're
[quoted text clipped - 12 lines]
> there is some value in that, but I'm still poking and prodding to clarify
> what that might be.

Of course it's amenable, just much less so. It's its lack of symmetry and
consistency that poses a problem. By nesting values inside values (and then
a further layer inside that, I believe), you complicate the algebras,
closure, and optimizations. Relational is much simpler, hence its power.

On a related note, things like objects add a great deal of complexity in the
pursuit of "intuitive" modeling techniques; specifically hierarchies,
graphs, and a persistent (no pun intended) confusion between variables and
values. That additional complexity always comes at a cost, and it's
unfortunate that people are so uncomfortable with symbolic logic and related
disciplines that they bite into much more complexity than they would
otherwise have to, simply because the "operational" (procedural) approach
appears intuitive.

- erk
Dawn M. Wolthuis - 12 Apr 2004 17:35 GMT
> > > Of course. This assumes that your main reason for storing the data is to
> > > display it again, in its original form, for humans. If that's all you're
[quoted text clipped - 19 lines]
> a further layer inside that, I believe), you complicate the algebras,
> closure, and optimizations. Relational is much simpler, hence its power.

Possibly conceptually simpler as a mathematical model, but in real life it
does not seem to be simpler at all.  The end-user sees it as symmetric once
the vocabulary is in place for that to be the case.

LIST STUDENTS COURSES WITH COURSE_NAME LIKE "A..."
might list the students and all of their courses with a name starting with A

LIST COURSES STUDENTS WITH STUDENT_NAME LIKE "A..."
might list the courses and all of their students with a name starting with A

It is not symmetric when the vocabularly has not been put in place for it to
be so, for example, if the student name (which would obviously not be the
stored foreign key for the student and is not physically stored in the
COURSES file) is not placed in the vocabularly of the COURSES
file/graph/tree/function.

> On a related note, things like objects add a great deal of complexity in the
> pursuit of "intuitive" modeling techniques; specifically hierarchies,
[quoted text clipped - 4 lines]
> otherwise have to, simply because the "operational" (procedural) approach
> appears intuitive.

It doesn't just "appear" intuitive -- it typically IS more
ntuitive.  --dawn

P.S.  I've been mulling over Date's write-up on the issue of whether a class
can "be" a relation.  The argument against it seems to come down to ... "If
you were to create a class that defines a relation type, then you would no
longer be consistently following the relational model".  I suspect I'll have
some questions or opinions on that matter soon.
Jan Hidders - 13 Apr 2004 21:38 GMT
> [...] By nesting values inside values (and then
> a further layer inside that, I believe), you complicate the algebras,
> closure, and optimizations. Relational is much simpler, hence its power.

That's funny. I somehow remember Date and Darwen proposing a relational
model where you could have tuples containing sets that contained tuples
.. et cetera. The horror! The heresy!! In fact, the whole 1NF thing
seemed to have gone up in smoke, somehow. It still gives me the creeps
when I think about it, every now and then ... Surely, that must have
been a bad dream, right?

-- Jan Hidders
Eric Kaun - 13 Apr 2004 22:31 GMT
> > [...] By nesting values inside values (and then
> > a further layer inside that, I believe), you complicate the algebras,
> > closure, and optimizations. Relational is much simpler, hence its power.
>
> That's funny.

I'm not laughing. :-)

> I somehow remember Date and Darwen proposing a relational
> model where you could have tuples containing sets that contained tuples
> .. et cetera. The horror! The heresy!! In fact, the whole 1NF thing
> seemed to have gone up in smoke, somehow. It still gives me the creeps
> when I think about it, every now and then ... Surely, that must have
> been a bad dream, right?

OK, I expressed myself badly. Logically, the relational user sees values in
relations. The types of those values can be anything, including lists and
such, but there shouldn't be operators in the data model to manipulate
those - rather, those are user-defined operators for the specified types.
Pick-like MV exposes the nesting as part of its logical model (at least I
think, since it's not formally defined).

Is that better? Date's paper "What First Normal Form Really Means" talks
about this in great detail...

- erk
Jan Hidders - 13 Apr 2004 23:34 GMT
> OK, I expressed myself badly. Logically, the relational user sees values in
> relations. The types of those values can be anything, including lists and
> such, but there shouldn't be operators in the data model to manipulate
> those - rather, those are user-defined operators for the specified types.

So, one would expect that the NEST and UNNEST operators of the nested
relational algebra would not be allowed, wouldn't one? I know I would.
What else could "logically expose" mean for a relation-valued column? Or
the nested relational calculus, which doesn't have such operators, but
still allows you to operate on (and combine) the nested sets, would that
be Ok?

> Is that better? Date's paper "What First Normal Form Really Means" talks
> about this in great detail...

Ah, well, let me say here and now that I'm not a big fan of Chris Date,
to put it mildly, and the arrogance of dbdebunk makes me physically
sick. I know the paper you refer to, and I find it rather unconvincing.

-- Jan Hidders
Dawn M. Wolthuis - 14 Apr 2004 01:33 GMT
> > OK, I expressed myself badly. Logically, the relational user sees values in
> > relations. The types of those values can be anything, including lists and
[quoted text clipped - 14 lines]
> to put it mildly, and the arrogance of dbdebunk makes me physically
> sick. I know the paper you refer to, and I find it rather unconvincing.

Ditto on all three points in that last paragraph with the caveat that I am
impressed with Date (so a fan to that extent) even though I disagree with
him on quite a bit. I know that it is either far too easy or far too
difficult to disagree with me when I lack precision -- Date lays it out
there so that in some ways it is much easier to find points of disagreement
with him than with others and I appreciate that.  --dawn
Eric Kaun - 14 Apr 2004 01:49 GMT
> > OK, I expressed myself badly. Logically, the relational user sees values in
> > relations. The types of those values can be anything, including lists and
[quoted text clipped - 3 lines]
> So, one would expect that the NEST and UNNEST operators of the nested
> relational algebra would not be allowed, wouldn't one?

Do you mean GROUP? As far as I know, those are merely shorthand, not
something new. And there's certainly no

> I know I would.
> What else could "logically expose" mean for a relation-valued column?

Operators on an attribute/column of type T are exposed, same as operators on
any other type are exposed for use on values of that type. But one could
certainly argue that the "nesting" in values/subvalues in Pick are simply
exposed operators of a 2/3 level type. Hmmm. Mayhap I've argued myself into
a corner. Or maybe it's just late.

> Or the nested relational calculus, which doesn't have such operators, but
> still allows you to operate on (and combine) the nested sets, would that
> be Ok?

Again, unless I'm thinking of the wrong operators, they're mere shorthand
for combinations of other relational operators. Much like intervals, I
think.

> > Is that better? Date's paper "What First Normal Form Really Means" talks
> > about this in great detail...
>
> Ah, well, let me say here and now that I'm not a big fan of Chris Date,
> to put it mildly, and the arrogance of dbdebunk makes me physically
> sick.

I can certainly see that, and I don't claim to be an expert - from what
you've written, I'm fairly sure you're much more knowledgable than I on
relational matters. But I find their site interesting, and useful as a
bulwark against the wave of "novel" new data management techniques. While
I'm not familiar with the "deep" research, I don't see much understanding of
relational in common practice, and think it's certainly better than the ad
hoc approaches being advocated.

> I know the paper you refer to, and I find it rather unconvincing.

I actually thought that was the point - to show the relativism inherent in
normalization, including 1NF. I found that paper to be non-absolutist.

> -- Jan Hidders
Jan Hidders - 14 Apr 2004 22:32 GMT
>>So, one would expect that the NEST and UNNEST operators of the nested
>>relational algebra would not be allowed, wouldn't one?
>
> Do you mean GROUP? As far as I know, those are merely shorthand, not
> something new.

?? Are you saying that the GROUP / UNGROUP operators, as Date calls
them, can be expressed with the operators of the flat relational algebra?

>>I know I would.
>>What else could "logically expose" mean for a relation-valued column?
[quoted text clipped - 4 lines]
> exposed operators of a 2/3 level type. Hmmm. Mayhap I've argued myself into
> a corner. Or maybe it's just late.

Maybe. Maybe it's just Date. :-)

>>Ah, well, let me say here and now that I'm not a big fan of Chris Date,
>>to put it mildly, and the arrogance of dbdebunk makes me physically
[quoted text clipped - 7 lines]
> relational in common practice, and think it's certainly better than the ad
> hoc approaches being advocated.

I agree with all of the above, although I would add that the "is
certainly better than" should be qualified. An RDBMS is not *always*
under all circumstances certainly better. As much as I dislike dbdebunk
and Date's tendency to speak to us ex-cathedra of all things relational,
I consider myself very much in the "relational camp" and believe it has
the best (scientific and non-scientific) arguments of them all. That's
exactly why we don't need all this religious zealotry with the
apparently necessary condescending attitude and oversimplifications.

-- Jan Hidders
Dan - 15 Apr 2004 04:14 GMT
> >>So, one would expect that the NEST and UNNEST operators of the nested
> >>relational algebra would not be allowed, wouldn't one?
[quoted text clipped - 4 lines]
> ?? Are you saying that the GROUP / UNGROUP operators, as Date calls
> them, can be expressed with the operators of the flat relational algebra?

I believe these types of operators wouldn't conform to the rules of
constructing well-formed formulas as well, which of course impacts
FOL-based queries.  In predicate calculus, the construction of a wff
using a form G(P(x),...) is not allowed.  I haven't had a chance to
think much in terms of what the implications would be exactly however.
Perhaps you know.

- Dan

> >>I know I would.
> >>What else could "logically expose" mean for a relation-valued column?
[quoted text clipped - 29 lines]
>
> -- Jan Hidders
Jan Hidders - 15 Apr 2004 20:30 GMT
>>>>So, one would expect that the NEST and UNNEST operators of the nested
>>>>relational algebra would not be allowed, wouldn't one?
[quoted text clipped - 11 lines]
> think much in terms of what the implications would be exactly however.
>  Perhaps you know.

You go from first-order logic to higher-order logic, but in a very
restricted way because all your sets are finite and typed (and therefore
also finite in depth). Queries are still computable but can now be
exponential, and deciding satisfiability was already gone in the flat
relational model anyway. Note that you can always simulate this in the
flat model by allowing abstract identifiers that denote the sets and a
binary table that encodes the element-of relationship, plus a constraint
that says that two set-identifiers are the same if they have exactly the
same elements associated with them.

-- Jan Hidders
Eric Kaun - 15 Apr 2004 16:59 GMT
> >>So, one would expect that the NEST and UNNEST operators of the nested
> >>relational algebra would not be allowed, wouldn't one?
[quoted text clipped - 4 lines]
> ?? Are you saying that the GROUP / UNGROUP operators, as Date calls
> them, can be expressed with the operators of the flat relational algebra?

Ah, probably not... I once again forgot to bring the book in. I was probably
thinking of his temporal/6NF stuff. On second thought, this doesn't seem
likely, but I'll re-read his introduction of GROUP.

> >>I know I would.
> >>What else could "logically expose" mean for a relation-valued column?
[quoted text clipped - 6 lines]
>
> Maybe. Maybe it's just Date. :-)

heh

> >>Ah, well, let me say here and now that I'm not a big fan of Chris Date,
> >>to put it mildly, and the arrogance of dbdebunk makes me physically
[quoted text clipped - 16 lines]
> exactly why we don't need all this religious zealotry with the
> apparently necessary condescending attitude and oversimplifications.

A good point indeed.

- Eric
Anthony W. Youngman - 15 Apr 2004 22:11 GMT
>> I want my cake and eat it too!  The PICK structure does what I have
>> described and is "amenable to automated deduction" and it seems to me that
[quoted text clipped - 5 lines]
>a further layer inside that, I believe), you complicate the algebras,
>closure, and optimizations. Relational is much simpler, hence its power.

Define "simple".

So if I run a query, that gives me a view of a complex object, why does
it give me an indeterminate number of copies of a piece of data that is
stored just once?

Or, I want to find all the attributes of a single, real-world object.
With relational, that's a complex query. With Pick, it's "here's the
(singular) primary key, get the data".

As I said, define "simple" :-)

As for saying "it complicates the algebra...", WHY? It makes it easier
to do stupid things, but it doesn't affect the algebra and, indeed,
despite making it easier to be stupid, in practice it seems to make it
less common.

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Mikito Harakiri - 16 Apr 2004 01:58 GMT
> So if I run a query, that gives me a view of a complex object, why does
> it give me an indeterminate number of copies of a piece of data that is
> stored just once?

Consider query "return representation of 5 in unary base system". It should
return

1
1
1
1
1

Question, where "1" is stored? Do you understand now the difference between
logical model and naive physical representation in Pick?
Eric Kaun - 19 Apr 2004 14:42 GMT
> >> I want my cake and eat it too!  The PICK structure does what I have
> >> described and is "amenable to automated deduction" and it seems to me that
[quoted text clipped - 7 lines]
>
> Define "simple".

Good question; in this case I meant simple as advocated by Occam's Razor.
Fewer concepts, and more consistent notation.

> So if I run a query, that gives me a view of a complex object, why does
> it give me an indeterminate number of copies of a piece of data that is
> stored just once?

I don't know what you mean here.

> Or, I want to find all the attributes of a single, real-world object.
> With relational, that's a complex query.

Define "complex." It usually depends on the "object" - oh, can you define
that as well?

> With Pick, it's "here's the
> (singular) primary key, get the data".

That's certainly possible in SQL too, just as it's possible to be normalized
in Pick. However, your statement is just hand-washing, since "the data" is
what the entire disagreement is about. Is it a set of attribute values? Is
it an object graph? Is it both, meaning we've decided in advance the view of
"the object" everyone must have because it's "real world" (can you define
that?).

> As I said, define "simple" :-)
>
> As for saying "it complicates the algebra...", WHY? It makes it easier
> to do stupid things, but it doesn't affect the algebra and, indeed,
> despite making it easier to be stupid, in practice it seems to make it
> less common.

You have separate notation for values, sub-values, sub-sub-values, ...
sub^N-values, whatever N is (once I heard 3, once I heard 6).

- erk
Anthony W. Youngman - 19 Apr 2004 21:33 GMT
>> >> I want my cake and eat it too!  The PICK structure does what I have
>> >> described and is "amenable to automated deduction" and it seems to me
[quoted text clipped - 13 lines]
>Good question; in this case I meant simple as advocated by Occam's Razor.
>Fewer concepts, and more consistent notation.

Occam's razor - as phrased by ?Einstein - "Things should be as simple as
possible - but no simpler".

To me, relational has simplified too far - "The car is blue and green",
"John is Mike's dad". If I model both these situations in relational, I
need a relation to link the colours to the car. I need a relation to
link John and Mike. In the first situation I'm linking an entity with
its attributes. In the second I'm linking two entities. Relational makes
no distinction between the two types of link ...

>> So if I run a query, that gives me a view of a complex object, why does
>> it give me an indeterminate number of copies of a piece of data that is
>> stored just once?
>
>I don't know what you mean here.

I think I'm confusing relational and SQL here ...

>> Or, I want to find all the attributes of a single, real-world object.
>> With relational, that's a complex query.
>
>Define "complex." It usually depends on the "object" - oh, can you define
>that as well?

Well, I'd define a real world object as something described by a noun -
an invoice, a car, a person ...

Certainly with an invoice, or a person, the relational query could be
quite complex with joins across quite a lot of tables.

>> With Pick, it's "here's the
>> (singular) primary key, get the data".
[quoted text clipped - 5 lines]
>"the object" everyone must have because it's "real world" (can you define
>that?).

Typically, it's a set of attribute values. And yes, I see what you mean
- given the primary key, a relational view will return (after a lot of
work if it's scattered across many tables) all the attributes associated
with an entity. With Pick, it is a single "row" in a single "table".

>> As I said, define "simple" :-)
>>
[quoted text clipped - 5 lines]
>You have separate notation for values, sub-values, sub-sub-values, ...
>sub^N-values, whatever N is (once I heard 3, once I heard 6).

Well, yes, the fact that in relational N is only ever 2 makes life
simple. But that's a "cosmological constant" which upsets purist
physicists. If I can solve a problem for N where N is any number, it's a
far better solution than if it only works for "N=2" :-) (and actually,
both the values you've quoted for N I've met in practice :-)

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 20 Apr 2004 14:45 GMT
> >Good question; in this case I meant simple as advocated by Occam's Razor.
> >Fewer concepts, and more consistent notation.
>
> Occam's razor - as phrased by ?Einstein - "Things should be as simple as
> possible - but no simpler".

Agreed.

> To me, relational has simplified too far - "The car is blue and green",
> "John is Mike's dad". If I model both these situations in relational, I
> need a relation to link the colours to the car. I need a relation to
> link John and Mike. In the first situation I'm linking an entity with
> its attributes. In the second I'm linking two entities. Relational makes
> no distinction between the two types of link ...

Does it have to? Remember, we're modeling data, not "the real world" (unless
you want to build a small model of a car and paint it blue and green, which
is up to you). It's not a model in the same sense as you discuss elsewhere
with physical models.

Unless, of course, you claim to have seen DATA in the "real world" with your
own eyes. I don't mean Brent Spiner in Star Trek: The Next Generation.

And in any event, the case you cite is very simple (colors, which are in
most models scalars, make the argument for 1NF less persuasive than in other
cases). Other cases like children, line items, and any business data I can
think of are another matter. The line between entity and relationship, while
there, is fuzzy. Basing your statements of fact on normalized predicates is
much less ambiguous, and this clarity of design guidance is another benefit
of relational. There will always be an appeal to the business domain you're
modeling, but given that, normalization rules are much better guides than
anything else I've seen.

> >Define "complex." It usually depends on the "object" - oh, can you define
> >that as well?
>
> Well, I'd define a real world object as something described by a noun -
> an invoice, a car, a person ...

Attributes are nouns.

I'm not just being argumentative. I think it's a credit to relational that
it (tries to) base itself on predicates, as opposed to entities,
relationships, and attributes, is stronger for many reasons covered in books
already mentioned...

> > "the data" is
> >what the entire disagreement is about. Is it a set of attribute values? Is
[quoted text clipped - 6 lines]
> work if it's scattered across many tables) all the attributes associated
> with an entity. With Pick, it is a single "row" in a single "table".

And the decision of what "it" you're talking about is likely to be premature
and limited in usefulness. The up-front simplicity of modeling based on
entities is undercut by its application focus.

> Well, yes, the fact that in relational N is only ever 2 makes life
> simple.

HA! So you admit it? :-)

> But that's a "cosmological constant" which upsets purist
> physicists. If I can solve a problem for N where N is any number, it's a
> far better solution than if it only works for "N=2" :-) (and actually,
> both the values you've quoted for N I've met in practice :-)

Uh... I suspect there's another model / meta-model confusion here, but the
margin of this message is too narrow to contain the marvelous proof I've
concocted.

- erk
Dawn M. Wolthuis - 20 Apr 2004 18:10 GMT
<snip>

> Uh... I suspect there's another model / meta-model confusion here, but the
> margin of this message is too narrow to contain the marvelous proof I've
> concocted.

So my instincts were right -- you were Fermat in a previous life,
h?  --dawn
Eric Kaun - 20 Apr 2004 20:55 GMT
> <snip>
>
[quoted text clipped - 4 lines]
> So my instincts were right -- you were Fermat in a previous life,
> h?  --dawn

And now I just ferment.
Anthony W. Youngman - 22 Apr 2004 00:14 GMT
>> >Good question; in this case I meant simple as advocated by Occam's Razor.
>> >Fewer concepts, and more consistent notation.
[quoted text clipped - 15 lines]
>is up to you). It's not a model in the same sense as you discuss elsewhere
>with physical models.

But the data is representing the real world -if it doesn't, what's the
point of it?

>Unless, of course, you claim to have seen DATA in the "real world" with your
>own eyes. I don't mean Brent Spiner in Star Trek: The Next Generation.
[quoted text clipped - 16 lines]
>
>Attributes are nouns.

I thought they were adjectives. For example, "the car is blue" - "blue"
is an adjective.

>I'm not just being argumentative. I think it's a credit to relational that
>it (tries to) base itself on predicates, as opposed to entities,
[quoted text clipped - 17 lines]
>and limited in usefulness. The up-front simplicity of modeling based on
>entities is undercut by its application focus.

In practice, it actually seems to work extremely well...

>> Well, yes, the fact that in relational N is only ever 2 makes life
>> simple.
[quoted text clipped - 11 lines]
>
>- erk

Cheers,
Wol
Signature

Anthony W. Youngman - wol at thewolery dot demon dot co dot uk
HEX wondered how much he should tell the Wizards. He felt it would not be a
good idea to burden them with too much input. Hex always thought of his reports
as Lies-to-People.
The Science of Discworld : (c) Terry Pratchett 1999

Eric Kaun - 23 Apr 2004 00:02 GMT
> >> >Good question; in this case I meant simple as advocated by Occam's Razor.
> >> >Fewer concepts, and more consistent notation.
[quoted text clipped - 18 lines]
> But the data is representing the real world -if it doesn't, what's the
> point of it?

Yes, the data represents things about the "real world" (