Database Forum / General DB Topics / DB Theory / April 2004
Pizza Example
|
|
Thread rating:  |
Dawn M. Wolthuis - 03 Apr 2004 22:41 GMT In trying to get a better handle on relational database design compared to XML or PICK database design, I'm writing up the example below and got stuck on just how a relational theorist would likely design this. Below is the design for a non-1NF implementation. I'll use my terminology for this so that it is not confused with relational terms.
Function/File: ORDER_ITEMS Purpose: This file maps the OrderID | ItemNumber to details of the ordered item
Fields and sample data: Key: OrderID.................72819 ItemNumber...........1 OrderedItem..............Pizza Size...........................16" Crust.........................Chicago Sauce........................Tomato Cheese......................Mozzarella .................................Feta .................................Parmesan Topping....................Pepperoni ................................Black Olives
Requirements for an application using this data include displaying line items for the information in a particular format. I realize this is not relevant for relational data design, but gets to the heart of what I'm trying to understand with the relational model.
Order Item OrderItem Size Crust Sauce Cheese Topping
72819 1 Pizza 16" Chicago Tomato Mozzarella Pepperoni
Feta Black Olives
Parmesan
This is, not coincidentally, how a non-1NF language, such as the PICK query language (which goes by many names) might show the data with a statement such as:
LIST ORDER_ITEMS OrderID ItemNumber OrderedItem Size Crust Sauce Cheese Topping
(LIST is like SQL SELECT but with the function/file typed first, then the fields)
In addition to the ORDER_ITEMS file, I would also have a file that includes valid entries, along with descriptions with a file such as:
Function/File: VALIDATIONS
Purpose: This file maps a validation list name to the set of valid entries and their descriptions
Fields & sample data:
Key:
ValidationList.............Cheeses (with other validation lists in this same file for validating Sauces, Crusts, Toppings, and OrderedItems, having those values as the key for the relevant validation list)
ValidEntries (designed with the below fields associated with each other):
Abbreviation................Mozzarella
.................................Feta
.................................Parmesan
Description..................Mozzarella Cheese
.................................Feta Cheese
.................................Parmesan Cheese
Then to show the descriptions instead of the abbreviation, the LIST statement would include a virtual field CheeseDesc (for example) instead of the stored data from the field named "Cheese". In other words, the "vocabulary" for the OrderItems function would be expanded to incorporate these descriptions as virtual fields.
How would you model this data for an RDBMS and what would an analogous SQL statement look like, both for just the stored order data and if there is a join to get the descriptions for cheeses, for example? I realize there would be many more tables than there are files but it is not clear to me just how this would typically be designed and queried.
Thanks in advance for your help. --dawn
Dawn M. Wolthuis - 04 Apr 2004 00:17 GMT I'll try to see if I can get the essense of the display of data in the pizza example to be clearer by removing the first few columns so it doesn't wrap. I hope this displays better. Thanks. --dawn
Size....Crust.......Sauce.....Cheese.........Topping 16" Chicago Tomato Mozzarella Pepperoni Feta Black Olives Parmesan
rkc - 04 Apr 2004 04:41 GMT > I'll try to see if I can get the essense of the display of data in the pizza > example to be clearer by removing the first few columns so it doesn't wrap. [quoted text clipped - 4 lines] > Feta Black Olives > Parmesan Start with your desired 'display' and apply the rules of normalization. You seem to bright to not understand that, so you must be trying to make some other point.
Dawn M. Wolthuis - 04 Apr 2004 05:18 GMT > > I'll try to see if I can get the essense of the display of data in the > pizza [quoted text clipped - 10 lines] > You seem to bright to not understand that, so you must be trying to > make some other point. Ah, so not all readers of the list have Bob B's opinion, eh? Actually, it is neither a) that I don't know how to perform a strict normalization process on the data as I have written it nor b) that I'm trying to make another point here.
It is that I suspect that someone who thinks in terms of normalized data would not think about the problem domain in the same way. For example, perhaps someone would decide that the number of combinations was a relatively small finite number at this point in time and for the foreseeable future and might make a table of all possible combinations of pizzas with a generated candidate key. Then that would be placed as a foreign key in the ORDER_ITEM table. I'm sure there are other possibilities too. I'm guessing that it would be the rare RDBMS DBA that would actually implement a strictly normalized version of the data described in my example, but I could be wrong about that.
So, you are right, I'm not simply asking for this data to be normalized, but for it to be understood so that a relational database implementation way of thinking is presented. I was going to simply use this as an example of the difference between how an XML/PICK model of the data would look compared to a relational model and when I normalized the data, I looked at it and thought that it was unlikely that anyone would actually implement it that way. So, how would you do it?
Smiles! --dawn
Laconic2 - 04 Apr 2004 10:36 GMT > a relational model and when I normalized the data, I looked at it and > thought that it was unlikely that anyone would actually implement it that > way. So, how would you do it? It depends.
Tony - 04 Apr 2004 12:26 GMT > > > I'll try to see if I can get the essense of the display of data in the > pizza [quoted text clipped - 26 lines] > normalized version of the data described in my example, but I could be wrong > about that. I would hope that you are wrong about that. I hope that most RDBMS DBAs would know that a normalized database is the right approach.
> So, you are right, I'm not simply asking for this data to be normalized, but > for it to be understood so that a relational database implementation way of [quoted text clipped - 3 lines] > thought that it was unlikely that anyone would actually implement it that > way. So, how would you do it? This inevitably leads to further questions. Apparently "pizza" is not the only orderable item, but all orderable items are similar to pizza in that they have size, crust, toppings etc? Assuming that to be true (as it seems to be assumed by your definition):
create table item_types( item_type_name varchar2(30) primary key ); create table cheeses( cheese_name varchar2(30) primary key ); create table toppings( topping_name varchar2(30) primary key ); create table sauses( sauce_name varchar2(30) primary key ); create table crusts( crust_name varchar2(30) primary key ); create table sizes( size_inches integer primary key ); create table order( order_no integer primary key ); create table order_item ( order_no integer, item_no integer , size_inches references sizes , crust_name references crusts , sauce_name references sauces , primary key (order_no, item_no) ); create table order_item_cheeses ( order_no integer, item_no integer, cheese_name references cheeses , foreign key (order_no, item_no) references order_items , primary key (order_no, item_no, cheese_name) ); create table order_item_toppings ( order_no integer, item_no integer, topping_name references toppings , foreign key (order_no, item_no) references order_items , primary key (order_no, item_no, topping_name) );
This is a very quick & dirty design, so please don't quibble about column names or whether 30 chars is appropriate for a cheese name!
Dawn M. Wolthuis - 04 Apr 2004 15:06 GMT <snip>
> This inevitably leads to further questions. Apparently "pizza" is not > the only orderable item, but all orderable items are similar to pizza > in that they have size, crust, toppings etc? Assuming that to be true > (as it seems to be assumed by your definition): Bingo -- that is one of the issues. No, the "Garlic Bread" or other items that could be ordered might have a subset (including null set) of these items, but these includes all attributes that we are putting on the sum total of all items. So you can think of the attributes listed as the set of all possible attributes for an item. I chose to at least have one item that does use all of them, but a typical PICK implementation (from what I have seen) includes attributes for sub-types.
> create table item_types( item_type_name varchar2(30) primary key ); > create table cheeses( cheese_name varchar2(30) primary key ); [quoted text clipped - 23 lines] > This is a very quick & dirty design, so please don't quibble about > column names or whether 30 chars is appropriate for a cheese name! That's helpful, in particular because it verifies my claim that those working with an RDBMS do typically put size constraints on data even when the problem domain gives no indication of such constraints being required. I'm guessing you opted to size these attributes because you size everything because that's the way RDBMS's like it (for performance?)
So, now, if you don't make the assumption that every item that someone could buy would have each attribute (such as Sauce), then how would you do t? --dawn
Laconic2 - 04 Apr 2004 15:22 GMT Now we're getting somewhere. I have no idea what a theorist would do, but a perspicacious analyst would hesitate to generalize data requirements from a single example.
If another of the orderable items were Neapolitan Ice Cream, rather than Garlic Bread, then the list of toppings might be quite diffferent. And so on.
Dawn M. Wolthuis - 04 Apr 2004 15:48 GMT > Now we're getting somewhere. I have no idea what a theorist would do, but a > perspicacious analyst would hesitate to generalize data requirements from a > single example. Of course, of course, but using a Pizza example is a way to provide a problem domain of which most of us are somewhat comfortable. A complete set of requirements would be inappropriate for this forum unless they can be indicated with a statement like "provide a data model for ordered items in an order entry system for a small pizza shop".
> If another of the orderable items were Neapolitan Ice Cream, rather than > Garlic Bread, then the list of toppings might be quite diffferent. And so > on. And if you concept of a pizza place includes selling Neapolitan Ice Cream with toppings, then I'm sure I can learn something from the way you would model that too. So, I can accept some human interpretation of the situation in order to keep the requirements statement short.
cheers! --dawn
Laconic2 - 04 Apr 2004 16:51 GMT > Of course, of course, but using a Pizza example is a way to provide a > problem domain of which most of us are somewhat comfortable. And that's precisely the problem. A problem domain where most of us feel comfortable is a problem domain where we each come to the table with a large number of assumptions. When those assumptions coincide, we move forward quickly. When they don't, we have the illusion of a common data model rather than the reality of one.
Give me a problem domain that I don't know anything about, and a subject matter expert that is willing to work with me on a data model that is subject matter relevant, and I'll come up with something. Give me a subject matter that I think I know, but really do not, and you'll see me make a fool out of myself.
I know how to order pizza from a pizza place, but I never worked at one. Am I a subject matter expert?
> And if you concept of a pizza place includes selling Neapolitan Ice Cream > with toppings, then I'm sure I can learn something from the way you would > model that too. So, I can accept some human interpretation of the situation > in order to keep the requirements statement short. They aren't just short, they are laconic. As I said before, it depends. If the mission of the database users is order processing, normalization is important, at least when it comes to data that is written during order placement and order fulfillment. (It would probably be ok to update data that is less than fully normalized when a new kind of cheese comes along.) If the mission is tactical marketing, then a multidimensional model might be more useful.
And even the above statements depend on how the design is going to be used. If its a construction model that is going to be used to create a schema of tables, columns and constraints in a fairly mechanical fashion, I might design it one way. If it's intent is to be an abstract model, from which a schema can be derived, then I might model it differently.
Dawn M. Wolthuis - 04 Apr 2004 19:11 GMT > > Of course, of course, but using a Pizza example is a way to provide a > > problem domain of which most of us are somewhat comfortable. [quoted text clipped - 13 lines] > I know how to order pizza from a pizza place, but I never worked at one. Am > I a subject matter expert? I once heard that there is a philosophy difference among news writers about whether the person writing up the story should have some previous knowledge of the subject matter or be ignorant of it.
When it comes to database design, I'd have to disagree with you and suggest that a business analyst who specializes in a particular vertical application will, in general, do a better job with the data design than someone who doesn't know it. When I wrote programs for public television in the early 80's, I stopped turning the channel when they were begging for money -- getting a feel for their promotional practices helped me design the software (including the data structures) better, ask better questions, etc.
So, I'll support my choice of a pizza application, figuring that many of us can make many valid assumptions -- at least enough to provide a response to my question that is helpful (at least to me) in understanding how relational modelers think about the data. Of course, I will grant that if you were to do an implementation of this next week you might want to have a few interviews with the users & owners of the application to be sure you hit the nail on the head. For the purposes of my question, feel free to pretend you are the user and make the type of assumptions you would with the limited knowledge you have of a pizza ordering system. Cheers! --dawn
> > > > And if you concept of a pizza place includes selling Neapolitan Ice Cream [quoted text clipped - 16 lines] > design it one way. If it's intent is to be an abstract model, from which a > schema can be derived, then I might model it differently. Tony - 04 Apr 2004 20:56 GMT > > "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message > news:<c4o2at$frb$1@news.netins.net>... [quoted text clipped - 46 lines] > I'm guessing you opted to size these attributes because you size everything > because that's the way RDBMS's like it (for performance?) SQL DBMSs, not RDBMSs as such. But I didn't have to do that. I could use e.g. Oracle's CLOB datatype, which allows up to 4GB of data. Or I could have said varchar2(4000). RDBMS designers tend to like to impose some kind of discipline on the data being stored, it isn't really a performance issue - at least not from the DBMS's point of view. But if you allow 4000 character pizza names in the database, then you must allow for that in your user interface, which means declaring 4000-character variables - something of a memory drain, perhaps? Frankly it would just seem rather stupid to allow unlimited length for every data item.
> So, now, if you don't make the assumption that every item that someone could > buy would have each attribute (such as Sauce), then how would you do > t? --dawn In a SQL database you could just allow NULLs in those items. In a true RDBMS that doesn't allow nulls you could either create a table for each subtype of order_items or you could move the attributes like crust_name into separate tables like:
create table order_item_crust ( order_no integer, item_no integer, crust_name references crusts , foreign key (order_no, item_no) references pizza_order_items , primary key (order_no, item_no) );
(Note that this table only allows a maximum of one crust per pizza).
Dawn M. Wolthuis - 04 Apr 2004 21:28 GMT > > > "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message > > news:<c4o2at$frb$1@news.netins.net>... > > > > > > > <snip>
> SQL DBMSs, not RDBMSs as such. But I didn't have to do that. I > could use e.g. Oracle's CLOB datatype, which allows up to 4GB of data. [quoted text clipped - 6 lines] > perhaps? Frankly it would just seem rather stupid to allow unlimited > length for every data item. I see no logic in this. The database need not constrain the size of data that is stored and the UI can either constrain it or not, as the designer/user see fit. Constraining the database to only store up to a particular length means that the database needs to be changed if a new valid value for an attribute arises that is longer than what was anticipated. The UI can permit scrolling to show only a limited number of characters, but scroll to show all. A report writer can wrap long values to a certain width for practicality. Unless there is a REAL business reason for limiting the number of characters for a value (or your DBMS product requires it or gives you better features if you do), don't do it. From what I have seen of RDBMS tools, those who use them are inclined to put an attribute of "color" in the database (for example) with a max number of characters for the value. All we need, then, is a new color to come available and you can bet that the name will be abbreviated now to keep it within that length. How silly IMO.
> > So, now, if you don't make the assumption that every item that someone could > > buy would have each attribute (such as Sauce), then how would you do [quoted text clipped - 12 lines] > > (Note that this table only allows a maximum of one crust per pizza). Yes, this does look like an RDBMS approach to the problem. I don't think it is a "natural" way to view the problem, but it follows a set of rules for what that is worth. smiles. --dawn
Lemming - 04 Apr 2004 22:16 GMT >> "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message >news:<c4p4q7$on1$1@news.netins.net>... [quoted text clipped - 28 lines] >we need, then, is a new color to come available and you can bet that the >name will be abbreviated now to keep it within that length. How silly IMO. Doesn't "ALTER TABLE ... " make this problem go away?
>> > So, now, if you don't make the assumption that every item that someone >could [quoted text clipped - 18 lines] >what that is worth. >smiles. --dawn Maybe I've missed the point, but it seems to me the above solution is backwards. The order_item shouldn't reference crusts at all ... you need an intermediate table to resolve the possible many-to-many relationship of order_item to crust. Informally:
Table: crust (crust_id, data ...) Table: order_item (order_id,item_id, data ...) Table: item_crust (crust_id references crust (crust_id), (order_id, item_id) references order_item(order_id, item_id), data ...)
Hope that makes sense, it's late ...
Lemming
 Signature Curiosity *may* have killed Schrodinger's cat.
Tony - 05 Apr 2004 10:17 GMT > >> "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message > news:<c4p4q7$on1$1@news.netins.net>... [quoted text clipped - 31 lines] > > Doesn't "ALTER TABLE ... " make this problem go away? Yes, but programmers like Dawn seem to think DBMSs are fiendishly difficult to use, so would avoid requiring ALTER TABLE statements like the plague ;-)
> >> > So, now, if you don't make the assumption that every item that someone > could [quoted text clipped - 33 lines] > (order_id, item_id) references order_item(order_id, item_id), > data ...) Yes, that is exactly what I meant. My "order_item_crust" table is the intermediate table between my order_items and crusts tables.
Paul - 04 Apr 2004 23:22 GMT > I see no logic in this. The database need not constrain the size of data > that is stored and the UI can either constrain it or not, as the [quoted text clipped - 10 lines] > we need, then, is a new color to come available and you can bet that the > name will be abbreviated now to keep it within that length. How silly IMO. The logic is that is helps to enforce data integrity.
As a real-world example, I sometimes have to import data from a flat file into a database. Sometimes the people who create the flat file mess things up and give the columns in the wrong order or something.
If this happens when I have maximum sizes defined for the column widths, it usually will be the case that something will overflow and give an error. This is what I *want* to happen in this situation. If I just had everything at varchar(8000) it would all import with no errors except my database would end up hopelessly corrupted.
The downside is, as you say, that you have to be careful not to make the minimum column widths too small. But with a judicious choice of sizes this problem should be much less than the data integrity benefits.
Paul.
Tony - 05 Apr 2004 10:35 GMT > > "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message > news:<c4p4q7$on1$1@news.netins.net>... [quoted text clipped - 16 lines] > > I see no logic in this. Why doesn't that surprise me? ;-)
> The database need not constrain the size of data > that is stored and the UI can either constrain it or not, as the > designer/user see fit. No it NEED not, as I said above. But it makes a lot of sense to do so - see Paul's post for integrity reasons. Another reason would be to impose some predictability on data items. Obviously, if you have an attribute like "customer_opinion_on_pizza" then you would allow for plenty of text - though even then you may prefer to limit to 2000 chars (e.g.) rather than let some verbose customer fill 4GB of waffle about it. But if EVERY column allowed "unlimited" length, how would you design your screens and reports? Scrollable text fields for everything presumably.
> Constraining the database to only store up to a > particular length means that the database needs to be changed if a new valid > value for an attribute arises that is longer than what was anticipated. How terrifying! More likely of course, the user will come up with a suitably abbreviated name. We are all familiar with such abbreviated names from till receipts, credit card statements etc.
In any case, let's be clear: this argument is a side-line that has NOTHING to do with choosing an RDBMS over Pick or whatever. It is a design issue; the fact that relational database designers are more likely to choose to constrain data lengths than you are is perhaps just a symptom of our more disciplined approach to database design.
> > > So, now, if you don't make the assumption that every item that someone > could [quoted text clipped - 17 lines] > is a "natural" way to view the problem, but it follows a set of rules for > what that is worth. What it is worth is a LOT, since computers work better with data structured according to rules rather than "unstructured" or "natural" data. Who cares if it seems "natural" to you? If you are a database designer, you better learn to see beyond the "natural" if you are going to be useful, and if you are an end-user accessing the database via an application you can be protected from needing to understand the relational way of doing it.
Dawn M. Wolthuis - 05 Apr 2004 17:19 GMT > > > "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message > > news:<c4p4q7$on1$1@news.netins.net>... [quoted text clipped - 17 lines] > > Why doesn't that surprise me? ;-) Hey Tony, you could cut me some slack -- I'm at least asking question in an honest effort to learn. Some of the relational theorists on this list are open to learning other approaches, while many seem to be happy to live with their relational tunnel vision.
I understand now from Paul's response that the reason folks like to restrict the size of fields is not for performance reasons, but because they want the database, not just software applications, to kick out data that is clearly wrong. I suspect that in many cases, by the time data reaches the DBMS a software developer has already found a reason to apply constraint logic (coded outside of the database) to the transactions because they needed to for the GUI or to provide an immediate response to a web services transaction, but this locks down the database further so if a programmer doesn't catch something, the database will.
It does seem to me that it might make more sense, then, for these lengths to give warnings in many cases, rather than be fixed constraints if they are checking for the likelihood of good data, but I don't know if there is a clean way to do that.
Otherwise, allowing the database to store every data element as variable length unless there is a clear business reason not to, but ensuring that all points of entry of data -- GUI or web service, for example -- apply constraint logic with warnings in cases where unlikely data is entered makes more sense to me.
> > The database need not constrain the size of data > > that is stored and the UI can either constrain it or not, as the [quoted text clipped - 17 lines] > suitably abbreviated name. We are all familiar with such abbreviated > names from till receipts, credit card statements etc. Yes, exactly the state of data I have seen. Instead of the easy, but not completely without risk, task of altering a table, the user simply figures out how to work around the constraints. I think that is a bit of a shame, but I guess I can see where the cost of both end-users and software developers figuring out how to live with the initial constraints for the life of a software application rather than "fixing" them could be considered part of the cost of the software. Hopefully neither the user or the developer will go to extremes in this regard and will request the required changes when needed (but we have all seen cases where they have, right?)
> In any case, let's be clear: this argument is a side-line that has > NOTHING to do with choosing an RDBMS over Pick or whatever. It is a [quoted text clipped - 31 lines] > via an application you can be protected from needing to understand the > relational way of doing it. But if there is a structure and set of rules that align better with the natural structure of language, then we could possibly have a cost savings over the life of software (and that is what the anecdotal evidence suggests is happening, but I have nothing to prove it). I am NOT at all opposed to structure. I am alsol pro-constraints and would like to see the constraints that are important for data integrity be applied at the proper points in an application -- that might mean both in a GUI and when the data are stored.
What I have seen with RDBMS's is that the size constraint is placed on everything while more important constraints related to the allowable values are not as often applied, especially when it is a "yellow flag" type of data value for an attribute rather than obviously not permitted. I'll put this in a separate thread at some point (but have several other questions prior to that one).
Thanks. --dawn
Laconic2 - 05 Apr 2004 18:50 GMT Cutting slack is a two way street. You could cut other people some slack, and recognize that anyone who expresses a relational model in the form of SQL CREATE commands is probably going to include some features of the physical data model as well as features of the logical model.
Choosing an upper limit of 30 was not inherent in the logical data model.
It might have been (assumption) inherent in the domain definition of the data requirements. A limit on the number of characters IS, after all, a domain constraint. If its conceptual it is discovered, not designed, and is in the conceptual data model as well as the logical data model.
Or it might have been invented at the time the logical model was converted to a physical model, for reasons like storage capacity or throughput (again, an assumption).
Also, what's the big deal? If thirty turns out to be too low, then just ALTER the column to make it forty. A good RDBMS will do that for you, and pad the existing data with blanks. Of course that could wreak havoc with data independence, when users of the data try to stuff a 40 character data value into a 30 character working storage variable. But you can't have everything, can you?
Anthony W. Youngman - 05 Apr 2004 23:05 GMT >Or it might have been invented at the time the logical model was converted >to a physical model, for reasons like storage capacity or throughput [quoted text clipped - 6 lines] >data value into a 30 character working storage variable. But you can't have >everything, can you? Or, at the design stage, you simply accept the fact that length is variable, and expect your database to be clever enough to cope :-)
Constrain the display width, but don't apply artificial constraints to data.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Laconic2 - 06 Apr 2004 15:02 GMT > Or, at the design stage, you simply accept the fact that length is > variable, and expect your database to be clever enough to cope :-) Very good point. In fact a good DBMS will apply compression to save space, with no information loss. And, in general, the time saved by saving space will be more than the time expended in compression/decompression. And, when odd cases occur, the DBA can tell the DBMS what to do.
However, there are some situations where variable length data is unacceptable design, and not for reasons that can be hidden inside the DBMS. Other writers have given several examples in this thread.
T - 16 Apr 2004 16:26 GMT Talk about a discussion in semantics. The limit on the *description* of the constituent ingredients is one that is inherent to the problem domain. One can make a safe assumption that no ingredient's description is larger than n number of characters at the time of design. In a good design with any decent RDMS, that can easily be changed at any time to accommodate expanded text.
But let's be clear, the general user (i.e., not the manager or supervisor) would *never* be allowed to enter an ingredient. Rather, they would only ever be allowed to *choose* ingredients. Specifically, they would only ever be allowed to associate an ingredient choice with an order item. Only a manager would ever enter the description of an ingredient and one can safely assume that there are limits on that name. If the manager wishes to be more verbose (say store the chemical makeup of the ingredient.) Then I would provide a second field for that verbose description and that field would provide unlimited space. The purpose of main "name" field (if we call it that) is so that an unsophisticated user can quickly identify the ingredient they wish to assign. Providing for and entry of extended text would serve to confuse the user.
Thus, the discussion about "artificial limits" on the user's data is nonsensical. Rather the design approaches provided in this thread are ones that account for areas of lengthy text as well as areas requiring simple, short text which by definition means there is a limit on its length.
Thomas
> Cutting slack is a two way street. You could cut other people some slack, > and recognize that anyone who expresses a relational model in the form of [quoted text clipped - 18 lines] > data value into a 30 character working storage variable. But you can't have > everything, can you? Eric Kaun - 05 Apr 2004 20:44 GMT > > "Dawn M. Wolthuis" <dwolt@tincat-group.com> wrote in message > news:<c4pr5i$n3p$1@news.netins.net>... [quoted text clipped - 4 lines] > Hey Tony, you could cut me some slack -- I'm at least asking question in an > honest effort to learn. I agree completely - such sniping is mean-spirited, discourages honest discussion, and is usually more amusing to the sniper than to onlookers.
> Some of the relational theorists on this list are > open to learning other approaches, while many seem to be happy to live with > their relational tunnel vision. Well, tunnel vision is one explanation, but I've made an effort to read the XML specs, including XQuery, before pronouncing it damned beyond redemption. Rejection doesn't necessarily imply ignorance and tunnel vision.
> It does seem to me that it might make more sense, then, for these lengths to > give warnings in many cases, rather than be fixed constraints if they are [quoted text clipped - 6 lines] > constraint logic with warnings in cases where unlikely data is entered makes > more sense to me. In some cases, such as DB conversions and loads from flat files, the DB is adequate, but granted that UIs should have up-front validation rules generate from the DB constraints. I think it's still extremely useful to have one authority, and not just a document, but something enforced and specific.
> > > Yes, this does look like an RDBMS approach to the problem. I don't > think it [quoted text clipped - 14 lines] > over the life of software (and that is what the anecdotal evidence suggests > is happening, but I have nothing to prove it). I'm not asking for proof, since such is very difficult and expensive to acquire, but my experience suggests the opposite. The more "natural" approach of XML leads to chaos - data out of sync with no constraints to tell you, difficult debugging before you realize the Doc1 node 3 levels deep is out of sync with the top-level node of Doc2 in a different place, etc. I see absolutely no reason to use the natural structure of language for anything other than communicating with business people. Logic and mathematics are the basis of most works of engineering, and yet most laypeople don't understand them. I would prefer that people building medical instruments and weapon systems NOT use language that I'm capable of understanding in their designs - that would make me immediately suspicious that they're making it simpler than is feasible for good design.
> I am NOT at all opposed to > structure. I am also pro-constraints and would like to see the constraints
> that are important for data integrity be applied at the proper points in an > application -- that might mean both in a GUI and when the data are stored. Agreed.
> What I have seen with RDBMS's There aren't any yet other than Dataphor.
> is that the size constraint is placed on > everything while more important constraints related to the allowable values > are not as often applied, especially when it is a "yellow flag" type of data > value for an attribute rather than obviously not permitted. Good type/domain support in the DBMS would handle this. You're right, it's a big gap and a big problem.
- Eric
Anthony W. Youngman - 05 Apr 2004 23:14 GMT >I'm not asking for proof, since such is very difficult and expensive to >acquire, but my experience suggests the opposite. The more "natural" [quoted text clipped - 8 lines] >understanding in their designs - that would make me immediately suspicious >that they're making it simpler than is feasible for good design. Somebody gave me a wonderful quote recently. "Logic and mathematics give you a consistent model. Academicians have an unfortunate tendency to confuse consistency with truth."
As for you being worried if you could understand engineers or doctors - I'm sorry, but I don't see why I should assume that other people are better than me. If I can't understand (at least superficially) what they're doing, then I conclude they are either crap at explaining themselves or, worse, they don't understand themselves. And I'm sorry if I'm cynical, but I've had enough experience of various professions (and from my own research) to know that people are very good seeing what they want, and not seeing what they don't want. I don't trust "experts". Far too many of them wear blinkers :-(
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Dawn M. Wolthuis - 05 Apr 2004 23:54 GMT > >I'm not asking for proof, since such is very difficult and expensive to > >acquire, but my experience suggests the opposite. The more "natural" [quoted text clipped - 12 lines] > you a consistent model. Academicians have an unfortunate tendency to > confuse consistency with truth." Yes, so true! A mathematical model is a MODEL -- a metaphor. As someone once said, "all models are flawed, but some are useful". Each data model I have seen has some aspects which make it useful and some which are not as helpful. Looking at the benefits and deficits of various models makes sense, but if we are to anoint one as THE model (which our profession has somewhat done with the relational model), we should be very careful we don't go and apply the theory of relativity to very small objects (an analogy) -- we should use it where it is useful. From what I can see, the relational model is not close enough to a generalized TOE (theory of everything) for data that we should stop reviewing other options.
I was in the middle of another post that relates to language when I saw your quote here and it fits nicely with that so I could just jump into that topic here, but instead I'll just agree with the comment and note that it seems likely that we need not just mathematics and logic, but also language in our discussions of data. Cheers! --dawn
> As for you being worried if you could understand engineers or doctors - > I'm sorry, but I don't see why I should assume that other people are [quoted text clipped - 8 lines] > Cheers, > Wol Laconic2 - 06 Apr 2004 15:27 GMT > I was in the middle of another post that relates to language when I saw your > quote here and it fits nicely with that so I could just jump into that topic > here, but instead I'll just agree with the comment and note that it seems > likely that we need not just mathematics and logic, but also language in our > discussions of data. Cheers! Absolutely true!
Cybernetics is ultimately about communication and control in the entire system. If there are humans in the system, language is an essential component of communication.
Eric Kaun - 06 Apr 2004 16:21 GMT > Looking at the benefits and deficits of various models makes > sense, but if we are to anoint one as THE model (which our profession has [quoted text clipped - 3 lines] > model is not close enough to a generalized TOE (theory of everything) for > data that we should stop reviewing other options. Couldn't agree more - we should be. The value of O-O is still far from concluded. What's a little different about relational is that it's derived very directly from logic and set theory, and also is the implicit base in most other theories of data - you've said as much yourself regarding multivalues. However, the "extensions" can corrupt the base, which I believe is that case with most other theories, and furthermore they often add nothing.
> I was in the middle of another post that relates to language when I saw your > quote here and it fits nicely with that so I could just jump into that topic > here, but instead I'll just agree with the comment and note that it seems > likely that we need not just mathematics and logic, but also language in our > discussions of data. Cheers! True, but I've not seen anything formal, which a computer system needs to be. Chomsky, Wittgenstein and countless others have theories of language; have they been applied to data? So far, all I've heard is "we should consider language", but that's not much good unless we get more concrete. As I said, we can't ever stop thinking and evaluating... but we need something fairly concrete.
- erk
Laconic2 - 06 Apr 2004 17:43 GMT I'm going to bring up Terry Winograd, again.
I don't know what Terry Winograd has written in the last thirty years, but he certainly gave language, and more specifically semantics, a formal treatment.
At one point, he called a word a "reference to a process which, executed in context, will figure out the meaning". That's a very loose paraphrase on my part, and I apologize if I got it wrong. Human memory, you know.
That reminds me of the integration of the relational model and the OO model by storing the "shadow of an object" (the projection, really) as a data item in a database, and then using it as a "view".
But maybe I'm reading too much into either or both of the above.
Anthony W. Youngman - 07 Apr 2004 00:33 GMT >> Looking at the benefits and deficits of various models makes >> sense, but if we are to anoint one as THE model (which our profession has [quoted text clipped - 13 lines] >is that case with most other theories, and furthermore they often add >nothing. The problem I see with relational, is that it does not cope with - indeed, denies the existence of - emergent complexity. For example, there is no way for the DBMS to know (or record the fact) whether a table represents a "noun" or a "repeating adjective". Seeing as nouns and adjectives are fundamentally different in language, I would have thought the inability of a DBMS to distinguish between the data equivalents would be a serious handicap...
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Laconic2 - 07 Apr 2004 03:55 GMT I question whether nouns and adjectives are fundamentally different in language. I also question whether all natural languages have nouns and adjectives as components.
Eric Kaun - 07 Apr 2004 12:27 GMT > The problem I see with relational, is that it does not cope with - > indeed, denies the existence of - emergent complexity. I'll address the specific example below, but what does this mean in general?
> For example, > there is no way for the DBMS to know (or record the fact) whether a > table represents a "noun" or a "repeating adjective". What difference does it make?
> Seeing as nouns > and adjectives are fundamentally different in language, I would have > thought the inability of a DBMS to distinguish between the data > equivalents would be a serious handicap... The DBMS also can't distinguish past tense from present perfect tense. So what? What does the "data equivalent" mean?
Your argument seems to be that because the relational data model doesn't mirror language in its fundamental structures, that it's "fundamentally unable" to distinguish between several different things that need to be distinguished. Assuming that one could model the vagaries of human language (which might be doubtful), I don't see why relational would do a worse job than other data models. How does MV do so?
Tony - 07 Apr 2004 17:45 GMT > > The problem I see with relational, is that it does not cope with - > > indeed, denies the existence of - emergent complexity. > > I'll address the specific example below, but what does this mean in general? I have learnt from past experience that Wol likes to throw in irrelevant but important-sounding scientific concepts from time to time. Heisenberg's Uncertainty Priciple comes up quite often, along with Godel's Incompleteness Theorem and of course Relativity v. Newton! The relational model is deemed wanting because it fails to take these into account, though it is far from clear how MV or anything else does other, nor why they should.
It's best to try to ignore these digressions, but I usually fail to do so...
Dawn M. Wolthuis - 07 Apr 2004 18:21 GMT > > > The problem I see with relational, is that it does not cope with - > > > indeed, denies the existence of - emergent complexity. [quoted text clipped - 11 lines] > It's best to try to ignore these digressions, but I usually fail to do > so... I tend to agree, although I have learned that Wol has a point even if obscured by his seeming lack of interest in or respect for mathematical models. I agree with him wholeheartedly that a model is a model -- just that, no more. Pointing out both flaws and usefulness in particular models is helpful. Thinking all mathematical models are hogwash is not, however. I don't need to emperically determine that 1000 + 1000 = 2000 because I have a model, a mathematical theory, that starts with some axioms and builds on that and that convinces me, without any emperical data, that if I have 1000 objects and add in another 1000 objects, then I will have 2000.
So in spite of the shared appreciation that I have for the underdocumented PICK "model" (loose use of the term in this case) along with Wol, he and I have had this mathematics vs science discussion before in other forums. If we have a mathematical model that is useful AND we have some emperical data to back up the usefulness of the model, then that would be great. I think "language" belongs in there somewhere too, but not sure exactly where.
I don't know of any data approaches that have both a solid mathematical model (and no, it will not be both complete and consistent) and emperical data that gives evidence of the usefulness of the model.
But if anyone else does, I'm interested. --dawn
Eric Kaun - 07 Apr 2004 20:53 GMT > > "Eric Kaun" <ekaun@yahoo.com> wrote in message > news:<hARcc.51781$ca7.11809@newssvr16.news.prodigy.com>... [quoted text clipped - 37 lines] > > But if anyone else does, I'm interested. --dawn I have no doubt that Wol knows his stuff when it comes to physics, far better than I do. I do have severe doubts about the applicability of reasoning about physical models to computing, though we're getting into philosophy here.
I'm certainly willing to entertain language notions - I just haven't heard anything concrete enough to serve as the basis for a data model. Loose correspondence to English is, in my opinion, not a good metric.
- erk
Jan Hidders - 07 Apr 2004 21:43 GMT > I'm certainly willing to entertain language notions - I just haven't heard > anything concrete enough to serve as the basis for a data model. Loose > correspondence to English is, in my opinion, not a good metric. Did you ever look at the philosophy behind ORM (Object-Role Modelling)? Or the philosophy behind predicate logic, for that matter?
-- Jan Hidders
Eric Kaun - 07 Apr 2004 22:12 GMT > > I'm certainly willing to entertain language notions - I just haven't heard > > anything concrete enough to serve as the basis for a data model. Loose > > correspondence to English is, in my opinion, not a good metric. > > Did you ever look at the philosophy behind ORM (Object-Role Modelling)? > Or the philosophy behind predicate logic, for that matter? No, can't say I have in any specific way. Pointers/links?
Jan Hidders - 07 Apr 2004 22:59 GMT >>>I'm certainly willing to entertain language notions - I just haven't > [quoted text clipped - 7 lines] > > No, can't say I have in any specific way. Pointers/links? For ORM see http://www.orm.net and as an introcution perhaps http://www.orm.net/pdf/ORMwhitePaper.pdf
For predicate logic you can always read Frege's Begriffsschrift. :-) Sorry, only kidding. Can't think of a good reference right now.
The point is that all these *are* in fact based upon a correspondance to language. In some sense that was what the great discovery by Aristotle was: the fact that you can sometimes reason on the basis of only the *form* of statements. Hence Formal logic.
-- Jan Hidders
Eric Kaun - 08 Apr 2004 14:58 GMT > >>>I'm certainly willing to entertain language notions - I just haven't > > [quoted text clipped - 18 lines] > was: the fact that you can sometimes reason on the basis of only the > *form* of statements. Hence Formal logic. Certainly - since computers can't understand meaning in the way that we can (which we ourselves don't understand), it follows that form is about all there is, and thus the mechanizability of symbolic logic. I'm not downplaying the role of language, but languages that computers understand are far different than those we understand. So if we're going to use human language as a basis for computing, we have to specify which aspects of it, or subset of it, and furthermore impose rules that might not make much difference to our comprehension.
Jan Hidders - 08 Apr 2004 17:38 GMT >>The point is that all these *are* in fact based upon a correspondance to >>language. In some sense that was what the great discovery by Aristotle [quoted text clipped - 9 lines] > or subset of it, and furthermore impose rules that might not make much > difference to our comprehension. I couldn't agree more. I guess the point that I was trying to make is that this has already been done for a while and there is lot's of literature on it. The outcome is more or less what we know as the differenct brances of symbolic logic, the very thing the relational model is based upon.
-- Jan Hidders
Dawn M. Wolthuis - 08 Apr 2004 17:45 GMT > > >>>I'm certainly willing to entertain language notions - I just haven't > > > [quoted text clipped - 27 lines] > or subset of it, and furthermore impose rules that might not make much > difference to our comprehension. Obvously, the computer itself need not understand the language in order for us to understand language that is output from the computer. So, if we don't split up English sentences (for example) unnecessarily when we feed them to the computer as "data," then we can retrieve them in a form closer to the original.
So, if "the Pizza has Mozzarella and Parmesan cheese" is stored as:
Pizza Mozzarella Parmesan
in a single, uh, record, then if the ordering is unwittingly useful, we didn't harm that ordering.
--dawn
Eric Kaun - 08 Apr 2004 20:37 GMT > > > >>>I'm certainly willing to entertain language notions - I just haven't > > > > [quoted text clipped - 44 lines] > in a single, uh, record, then if the ordering is unwittingly useful, we > didn't harm that ordering. Of course. This assumes that your main reason for storing the data is to display it again, in its original form, for humans. If that's all you're doing, then many different simple systems will suffice - a Word document, a spreadsheet, etc. But if you're trying to reason about the data, then you need to structure it in a way amenable to automated deduction. You are also perfectly free to keep a copy of the original, which would then be dependent upon the key, the whole key, and nothing but the key per normalization rules.
Dawn M. Wolthuis - 08 Apr 2004 21:24 GMT > > > > >>>I'm certainly willing to entertain language notions - I just > haven't [quoted text clipped - 61 lines] > upon the key, the whole key, and nothing but the key per normalization > rules. I want my cake and eat it too! The PICK structure does what I have described and is "amenable to automated deduction" and it seems to me that there is some value in that, but I'm still poking and prodding to clarify what that might be. The original Pizza example and this example about retaining order (related to another thread) both show up some advantages of the PICK or XML approach vs. the relational approach. I realize they fall far short of proving anything, however. Cheers! --dawn
Eric Kaun - 12 Apr 2004 16:40 GMT > > Of course. This assumes that your main reason for storing the data is to > > display it again, in its original form, for humans. If that's all you're [quoted text clipped - 12 lines] > there is some value in that, but I'm still poking and prodding to clarify > what that might be. Of course it's amenable, just much less so. It's its lack of symmetry and consistency that poses a problem. By nesting values inside values (and then a further layer inside that, I believe), you complicate the algebras, closure, and optimizations. Relational is much simpler, hence its power.
On a related note, things like objects add a great deal of complexity in the pursuit of "intuitive" modeling techniques; specifically hierarchies, graphs, and a persistent (no pun intended) confusion between variables and values. That additional complexity always comes at a cost, and it's unfortunate that people are so uncomfortable with symbolic logic and related disciplines that they bite into much more complexity than they would otherwise have to, simply because the "operational" (procedural) approach appears intuitive.
- erk
Dawn M. Wolthuis - 12 Apr 2004 17:35 GMT > > > Of course. This assumes that your main reason for storing the data is to > > > display it again, in its original form, for humans. If that's all you're [quoted text clipped - 19 lines] > a further layer inside that, I believe), you complicate the algebras, > closure, and optimizations. Relational is much simpler, hence its power. Possibly conceptually simpler as a mathematical model, but in real life it does not seem to be simpler at all. The end-user sees it as symmetric once the vocabulary is in place for that to be the case.
LIST STUDENTS COURSES WITH COURSE_NAME LIKE "A..." might list the students and all of their courses with a name starting with A
LIST COURSES STUDENTS WITH STUDENT_NAME LIKE "A..." might list the courses and all of their students with a name starting with A
It is not symmetric when the vocabularly has not been put in place for it to be so, for example, if the student name (which would obviously not be the stored foreign key for the student and is not physically stored in the COURSES file) is not placed in the vocabularly of the COURSES file/graph/tree/function.
> On a related note, things like objects add a great deal of complexity in the > pursuit of "intuitive" modeling techniques; specifically hierarchies, [quoted text clipped - 4 lines] > otherwise have to, simply because the "operational" (procedural) approach > appears intuitive. It doesn't just "appear" intuitive -- it typically IS more ntuitive. --dawn
P.S. I've been mulling over Date's write-up on the issue of whether a class can "be" a relation. The argument against it seems to come down to ... "If you were to create a class that defines a relation type, then you would no longer be consistently following the relational model". I suspect I'll have some questions or opinions on that matter soon.
Jan Hidders - 13 Apr 2004 21:38 GMT > [...] By nesting values inside values (and then > a further layer inside that, I believe), you complicate the algebras, > closure, and optimizations. Relational is much simpler, hence its power. That's funny. I somehow remember Date and Darwen proposing a relational model where you could have tuples containing sets that contained tuples .. et cetera. The horror! The heresy!! In fact, the whole 1NF thing seemed to have gone up in smoke, somehow. It still gives me the creeps when I think about it, every now and then ... Surely, that must have been a bad dream, right?
-- Jan Hidders
Eric Kaun - 13 Apr 2004 22:31 GMT > > [...] By nesting values inside values (and then > > a further layer inside that, I believe), you complicate the algebras, > > closure, and optimizations. Relational is much simpler, hence its power. > > That's funny. I'm not laughing. :-)
> I somehow remember Date and Darwen proposing a relational > model where you could have tuples containing sets that contained tuples > .. et cetera. The horror! The heresy!! In fact, the whole 1NF thing > seemed to have gone up in smoke, somehow. It still gives me the creeps > when I think about it, every now and then ... Surely, that must have > been a bad dream, right? OK, I expressed myself badly. Logically, the relational user sees values in relations. The types of those values can be anything, including lists and such, but there shouldn't be operators in the data model to manipulate those - rather, those are user-defined operators for the specified types. Pick-like MV exposes the nesting as part of its logical model (at least I think, since it's not formally defined).
Is that better? Date's paper "What First Normal Form Really Means" talks about this in great detail...
- erk
Jan Hidders - 13 Apr 2004 23:34 GMT > OK, I expressed myself badly. Logically, the relational user sees values in > relations. The types of those values can be anything, including lists and > such, but there shouldn't be operators in the data model to manipulate > those - rather, those are user-defined operators for the specified types. So, one would expect that the NEST and UNNEST operators of the nested relational algebra would not be allowed, wouldn't one? I know I would. What else could "logically expose" mean for a relation-valued column? Or the nested relational calculus, which doesn't have such operators, but still allows you to operate on (and combine) the nested sets, would that be Ok?
> Is that better? Date's paper "What First Normal Form Really Means" talks > about this in great detail... Ah, well, let me say here and now that I'm not a big fan of Chris Date, to put it mildly, and the arrogance of dbdebunk makes me physically sick. I know the paper you refer to, and I find it rather unconvincing.
-- Jan Hidders
Dawn M. Wolthuis - 14 Apr 2004 01:33 GMT > > OK, I expressed myself badly. Logically, the relational user sees values in > > relations. The types of those values can be anything, including lists and [quoted text clipped - 14 lines] > to put it mildly, and the arrogance of dbdebunk makes me physically > sick. I know the paper you refer to, and I find it rather unconvincing. Ditto on all three points in that last paragraph with the caveat that I am impressed with Date (so a fan to that extent) even though I disagree with him on quite a bit. I know that it is either far too easy or far too difficult to disagree with me when I lack precision -- Date lays it out there so that in some ways it is much easier to find points of disagreement with him than with others and I appreciate that. --dawn
Eric Kaun - 14 Apr 2004 01:49 GMT > > OK, I expressed myself badly. Logically, the relational user sees values in > > relations. The types of those values can be anything, including lists and [quoted text clipped - 3 lines] > So, one would expect that the NEST and UNNEST operators of the nested > relational algebra would not be allowed, wouldn't one? Do you mean GROUP? As far as I know, those are merely shorthand, not something new. And there's certainly no
> I know I would. > What else could "logically expose" mean for a relation-valued column? Operators on an attribute/column of type T are exposed, same as operators on any other type are exposed for use on values of that type. But one could certainly argue that the "nesting" in values/subvalues in Pick are simply exposed operators of a 2/3 level type. Hmmm. Mayhap I've argued myself into a corner. Or maybe it's just late.
> Or the nested relational calculus, which doesn't have such operators, but > still allows you to operate on (and combine) the nested sets, would that > be Ok? Again, unless I'm thinking of the wrong operators, they're mere shorthand for combinations of other relational operators. Much like intervals, I think.
> > Is that better? Date's paper "What First Normal Form Really Means" talks > > about this in great detail... > > Ah, well, let me say here and now that I'm not a big fan of Chris Date, > to put it mildly, and the arrogance of dbdebunk makes me physically > sick. I can certainly see that, and I don't claim to be an expert - from what you've written, I'm fairly sure you're much more knowledgable than I on relational matters. But I find their site interesting, and useful as a bulwark against the wave of "novel" new data management techniques. While I'm not familiar with the "deep" research, I don't see much understanding of relational in common practice, and think it's certainly better than the ad hoc approaches being advocated.
> I know the paper you refer to, and I find it rather unconvincing. I actually thought that was the point - to show the relativism inherent in normalization, including 1NF. I found that paper to be non-absolutist.
> -- Jan Hidders Jan Hidders - 14 Apr 2004 22:32 GMT >>So, one would expect that the NEST and UNNEST operators of the nested >>relational algebra would not be allowed, wouldn't one? > > Do you mean GROUP? As far as I know, those are merely shorthand, not > something new. ?? Are you saying that the GROUP / UNGROUP operators, as Date calls them, can be expressed with the operators of the flat relational algebra?
>>I know I would. >>What else could "logically expose" mean for a relation-valued column? [quoted text clipped - 4 lines] > exposed operators of a 2/3 level type. Hmmm. Mayhap I've argued myself into > a corner. Or maybe it's just late. Maybe. Maybe it's just Date. :-)
>>Ah, well, let me say here and now that I'm not a big fan of Chris Date, >>to put it mildly, and the arrogance of dbdebunk makes me physically [quoted text clipped - 7 lines] > relational in common practice, and think it's certainly better than the ad > hoc approaches being advocated. I agree with all of the above, although I would add that the "is certainly better than" should be qualified. An RDBMS is not *always* under all circumstances certainly better. As much as I dislike dbdebunk and Date's tendency to speak to us ex-cathedra of all things relational, I consider myself very much in the "relational camp" and believe it has the best (scientific and non-scientific) arguments of them all. That's exactly why we don't need all this religious zealotry with the apparently necessary condescending attitude and oversimplifications.
-- Jan Hidders
Dan - 15 Apr 2004 04:14 GMT > >>So, one would expect that the NEST and UNNEST operators of the nested > >>relational algebra would not be allowed, wouldn't one? [quoted text clipped - 4 lines] > ?? Are you saying that the GROUP / UNGROUP operators, as Date calls > them, can be expressed with the operators of the flat relational algebra? I believe these types of operators wouldn't conform to the rules of constructing well-formed formulas as well, which of course impacts FOL-based queries. In predicate calculus, the construction of a wff using a form G(P(x),...) is not allowed. I haven't had a chance to think much in terms of what the implications would be exactly however. Perhaps you know.
- Dan
> >>I know I would. > >>What else could "logically expose" mean for a relation-valued column? [quoted text clipped - 29 lines] > > -- Jan Hidders Jan Hidders - 15 Apr 2004 20:30 GMT >>>>So, one would expect that the NEST and UNNEST operators of the nested >>>>relational algebra would not be allowed, wouldn't one? [quoted text clipped - 11 lines] > think much in terms of what the implications would be exactly however. > Perhaps you know. You go from first-order logic to higher-order logic, but in a very restricted way because all your sets are finite and typed (and therefore also finite in depth). Queries are still computable but can now be exponential, and deciding satisfiability was already gone in the flat relational model anyway. Note that you can always simulate this in the flat model by allowing abstract identifiers that denote the sets and a binary table that encodes the element-of relationship, plus a constraint that says that two set-identifiers are the same if they have exactly the same elements associated with them.
-- Jan Hidders
Eric Kaun - 15 Apr 2004 16:59 GMT > >>So, one would expect that the NEST and UNNEST operators of the nested > >>relational algebra would not be allowed, wouldn't one? [quoted text clipped - 4 lines] > ?? Are you saying that the GROUP / UNGROUP operators, as Date calls > them, can be expressed with the operators of the flat relational algebra? Ah, probably not... I once again forgot to bring the book in. I was probably thinking of his temporal/6NF stuff. On second thought, this doesn't seem likely, but I'll re-read his introduction of GROUP.
> >>I know I would. > >>What else could "logically expose" mean for a relation-valued column? [quoted text clipped - 6 lines] > > Maybe. Maybe it's just Date. :-) heh
> >>Ah, well, let me say here and now that I'm not a big fan of Chris Date, > >>to put it mildly, and the arrogance of dbdebunk makes me physically [quoted text clipped - 16 lines] > exactly why we don't need all this religious zealotry with the > apparently necessary condescending attitude and oversimplifications. A good point indeed.
- Eric
Anthony W. Youngman - 15 Apr 2004 22:11 GMT >> I want my cake and eat it too! The PICK structure does what I have >> described and is "amenable to automated deduction" and it seems to me that [quoted text clipped - 5 lines] >a further layer inside that, I believe), you complicate the algebras, >closure, and optimizations. Relational is much simpler, hence its power. Define "simple".
So if I run a query, that gives me a view of a complex object, why does it give me an indeterminate number of copies of a piece of data that is stored just once?
Or, I want to find all the attributes of a single, real-world object. With relational, that's a complex query. With Pick, it's "here's the (singular) primary key, get the data".
As I said, define "simple" :-)
As for saying "it complicates the algebra...", WHY? It makes it easier to do stupid things, but it doesn't affect the algebra and, indeed, despite making it easier to be stupid, in practice it seems to make it less common.
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Mikito Harakiri - 16 Apr 2004 01:58 GMT > So if I run a query, that gives me a view of a complex object, why does > it give me an indeterminate number of copies of a piece of data that is > stored just once? Consider query "return representation of 5 in unary base system". It should return
1 1 1 1 1
Question, where "1" is stored? Do you understand now the difference between logical model and naive physical representation in Pick?
Eric Kaun - 19 Apr 2004 14:42 GMT > >> I want my cake and eat it too! The PICK structure does what I have > >> described and is "amenable to automated deduction" and it seems to me that [quoted text clipped - 7 lines] > > Define "simple". Good question; in this case I meant simple as advocated by Occam's Razor. Fewer concepts, and more consistent notation.
> So if I run a query, that gives me a view of a complex object, why does > it give me an indeterminate number of copies of a piece of data that is > stored just once? I don't know what you mean here.
> Or, I want to find all the attributes of a single, real-world object. > With relational, that's a complex query. Define "complex." It usually depends on the "object" - oh, can you define that as well?
> With Pick, it's "here's the > (singular) primary key, get the data". That's certainly possible in SQL too, just as it's possible to be normalized in Pick. However, your statement is just hand-washing, since "the data" is what the entire disagreement is about. Is it a set of attribute values? Is it an object graph? Is it both, meaning we've decided in advance the view of "the object" everyone must have because it's "real world" (can you define that?).
> As I said, define "simple" :-) > > As for saying "it complicates the algebra...", WHY? It makes it easier > to do stupid things, but it doesn't affect the algebra and, indeed, > despite making it easier to be stupid, in practice it seems to make it > less common. You have separate notation for values, sub-values, sub-sub-values, ... sub^N-values, whatever N is (once I heard 3, once I heard 6).
- erk
Anthony W. Youngman - 19 Apr 2004 21:33 GMT >> >> I want my cake and eat it too! The PICK structure does what I have >> >> described and is "amenable to automated deduction" and it seems to me [quoted text clipped - 13 lines] >Good question; in this case I meant simple as advocated by Occam's Razor. >Fewer concepts, and more consistent notation. Occam's razor - as phrased by ?Einstein - "Things should be as simple as possible - but no simpler".
To me, relational has simplified too far - "The car is blue and green", "John is Mike's dad". If I model both these situations in relational, I need a relation to link the colours to the car. I need a relation to link John and Mike. In the first situation I'm linking an entity with its attributes. In the second I'm linking two entities. Relational makes no distinction between the two types of link ...
>> So if I run a query, that gives me a view of a complex object, why does >> it give me an indeterminate number of copies of a piece of data that is >> stored just once? > >I don't know what you mean here. I think I'm confusing relational and SQL here ...
>> Or, I want to find all the attributes of a single, real-world object. >> With relational, that's a complex query. > >Define "complex." It usually depends on the "object" - oh, can you define >that as well? Well, I'd define a real world object as something described by a noun - an invoice, a car, a person ...
Certainly with an invoice, or a person, the relational query could be quite complex with joins across quite a lot of tables.
>> With Pick, it's "here's the >> (singular) primary key, get the data". [quoted text clipped - 5 lines] >"the object" everyone must have because it's "real world" (can you define >that?). Typically, it's a set of attribute values. And yes, I see what you mean - given the primary key, a relational view will return (after a lot of work if it's scattered across many tables) all the attributes associated with an entity. With Pick, it is a single "row" in a single "table".
>> As I said, define "simple" :-) >> [quoted text clipped - 5 lines] >You have separate notation for values, sub-values, sub-sub-values, ... >sub^N-values, whatever N is (once I heard 3, once I heard 6). Well, yes, the fact that in relational N is only ever 2 makes life simple. But that's a "cosmological constant" which upsets purist physicists. If I can solve a problem for N where N is any number, it's a far better solution than if it only works for "N=2" :-) (and actually, both the values you've quoted for N I've met in practice :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 20 Apr 2004 14:45 GMT > >Good question; in this case I meant simple as advocated by Occam's Razor. > >Fewer concepts, and more consistent notation. > > Occam's razor - as phrased by ?Einstein - "Things should be as simple as > possible - but no simpler". Agreed.
> To me, relational has simplified too far - "The car is blue and green", > "John is Mike's dad". If I model both these situations in relational, I > need a relation to link the colours to the car. I need a relation to > link John and Mike. In the first situation I'm linking an entity with > its attributes. In the second I'm linking two entities. Relational makes > no distinction between the two types of link ... Does it have to? Remember, we're modeling data, not "the real world" (unless you want to build a small model of a car and paint it blue and green, which is up to you). It's not a model in the same sense as you discuss elsewhere with physical models.
Unless, of course, you claim to have seen DATA in the "real world" with your own eyes. I don't mean Brent Spiner in Star Trek: The Next Generation.
And in any event, the case you cite is very simple (colors, which are in most models scalars, make the argument for 1NF less persuasive than in other cases). Other cases like children, line items, and any business data I can think of are another matter. The line between entity and relationship, while there, is fuzzy. Basing your statements of fact on normalized predicates is much less ambiguous, and this clarity of design guidance is another benefit of relational. There will always be an appeal to the business domain you're modeling, but given that, normalization rules are much better guides than anything else I've seen.
> >Define "complex." It usually depends on the "object" - oh, can you define > >that as well? > > Well, I'd define a real world object as something described by a noun - > an invoice, a car, a person ... Attributes are nouns.
I'm not just being argumentative. I think it's a credit to relational that it (tries to) base itself on predicates, as opposed to entities, relationships, and attributes, is stronger for many reasons covered in books already mentioned...
> > "the data" is > >what the entire disagreement is about. Is it a set of attribute values? Is [quoted text clipped - 6 lines] > work if it's scattered across many tables) all the attributes associated > with an entity. With Pick, it is a single "row" in a single "table". And the decision of what "it" you're talking about is likely to be premature and limited in usefulness. The up-front simplicity of modeling based on entities is undercut by its application focus.
> Well, yes, the fact that in relational N is only ever 2 makes life > simple. HA! So you admit it? :-)
> But that's a "cosmological constant" which upsets purist > physicists. If I can solve a problem for N where N is any number, it's a > far better solution than if it only works for "N=2" :-) (and actually, > both the values you've quoted for N I've met in practice :-) Uh... I suspect there's another model / meta-model confusion here, but the margin of this message is too narrow to contain the marvelous proof I've concocted.
- erk
Dawn M. Wolthuis - 20 Apr 2004 18:10 GMT <snip>
> Uh... I suspect there's another model / meta-model confusion here, but the > margin of this message is too narrow to contain the marvelous proof I've > concocted. So my instincts were right -- you were Fermat in a previous life, h? --dawn
Eric Kaun - 20 Apr 2004 20:55 GMT > <snip> > [quoted text clipped - 4 lines] > So my instincts were right -- you were Fermat in a previous life, > h? --dawn And now I just ferment.
Anthony W. Youngman - 22 Apr 2004 00:14 GMT >> >Good question; in this case I meant simple as advocated by Occam's Razor. >> >Fewer concepts, and more consistent notation. [quoted text clipped - 15 lines] >is up to you). It's not a model in the same sense as you discuss elsewhere >with physical models. But the data is representing the real world -if it doesn't, what's the point of it?
>Unless, of course, you claim to have seen DATA in the "real world" with your >own eyes. I don't mean Brent Spiner in Star Trek: The Next Generation. [quoted text clipped - 16 lines] > >Attributes are nouns. I thought they were adjectives. For example, "the car is blue" - "blue" is an adjective.
>I'm not just being argumentative. I think it's a credit to relational that >it (tries to) base itself on predicates, as opposed to entities, [quoted text clipped - 17 lines] >and limited in usefulness. The up-front simplicity of modeling based on >entities is undercut by its application focus. In practice, it actually seems to work extremely well...
>> Well, yes, the fact that in relational N is only ever 2 makes life >> simple. [quoted text clipped - 11 lines] > >- erk Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Eric Kaun - 23 Apr 2004 00:02 GMT > >> >Good question; in this case I meant simple as advocated by Occam's Razor. > >> >Fewer concepts, and more consistent notation. [quoted text clipped - 18 lines] > But the data is representing the real world -if it doesn't, what's the > point of it? Yes, the data represents things about the "real world" (
|
|