Database Forum / General DB Topics / DB Theory / July 2005
Three Kinds of Logical Trees
|
|
Thread rating:  |
Marshall Spight - 15 Jul 2005 09:17 GMT I've been thinking about trees in the abstract lately, and trying to classify them. I am not talking about trees as a physical data structure, such as BTrees or Red-Black, but rather trees as logical data structures. In other words, the *interface* to a tree.
I've identified three distinct kinds. "homogeneous tree" All nodes are the same type, tree has varying structure
"dynamic heterogenous" Nodes are varying types, tree has varying structure
"static heterogeneous" Nodes are varying type, tree has fixed structure.
Examples
homogeneous: org chart. Every node is a person record, but the structure of the organization may be of whatever form.
dynamic heterogeneous: parse tree. There are specific kinds of nodes, but the structure of the tree is relatively unconstrained.
static heterogeneous: customer/account/invoice/line item. Each level in the tree is fixed, with fixed relationships.
Here "dynamic" and "static" refer to the structure of the tree.
SQL handles the third kind of tree extremely well. The first two kinds, not so much. In particular, noth that the transitive closure problem applies to the first two kinds of trees, but not to the third.
The functional programming people have lots of examples in their books of dynamic heterogeneous trees, and the mixture of union types and pattern matching as language features seems to handle these structures quite well. You could probably use the same techniques to handle homogenous trees equally well.
Interestingly, I note that static tree nodes have a reference (fk) to their parent, while the homogeneous and dynamic hetero types are done with a node having references to its children.
Comments? Has anyone else written about this sort of thing?
Marshall
lennart@kommunicera.umea.se - 16 Jul 2005 09:24 GMT [...]
> Examples > > homogeneous: org chart. Every node is a person record, but the > structure of the organization may be of whatever form. A bit off topic, but IMO an organisation consists of different type of nodes, say for simplicity Company, Department and Employee. Boss is a role in the organisation, and the persons who has this role is a leaf in the tree, just like every other person. Given your classification I would say the org chart instead belongs to "dynamic heterogeneous" instead of "homogeneous".
[...]
/Lennart
dawn - 16 Jul 2005 16:22 GMT > I've been thinking about trees in the abstract lately, and > trying to classify them. I am not talking about trees as [quoted text clipped - 16 lines] > homogeneous: org chart. Every node is a person record, but the > structure of the organization may be of whatever form. Or a simpler example of a tree with an interger on each node.
> dynamic heterogeneous: parse tree. There are specific kinds of > nodes, but the structure of the tree is relatively unconstrained. There are people here much more knowledgeable about trees than I (but that has not stopped me before, eh?) but even with the lose definition, I wouldn't say "relatively unconstrained" but perhaps "either unconstrained or constrained by a grammar." The grammar constraint might not be fully encoded into the tree, due to the complexity of the grammar.
> static heterogeneous: customer/account/invoice/line item. Each > level in the tree is fixed, with fixed relationships. > > Here "dynamic" and "static" refer to the structure of the tree. > > SQL handles the third kind of tree extremely well. I'm not so sure. If your tree extends from metadata down to data values, you might have to add in that less-than-satisfying term "scalar value" for your leaf nodes. Otherwise, if you have such a structure and you have a value that is either multipart or multivalued within this static structure, you still don't have SQL-92 support and most SQL-DBMS's don't make it easy to have such structures, even if they are supported in some way.
Interesting topic, but I not quite sure about this partitioning as yet. Are you trying to carve out how the relational model fits within a tree model by giving a term to such trees? If so and if I understand your terms, then there are at least NF2 (non-first normal form) models that would fit your "static" heterogeneous term. Cheers! --dawn
Marshall Spight - 21 Jul 2005 16:01 GMT > > I've been thinking about trees in the abstract lately, and > > trying to classify them. I am not talking about trees as [quoted text clipped - 18 lines] > > Or a simpler example of a tree with an interger on each node. Yes. But I'm not sure how realistic an example that is. Note that I'm trying to classify *logical* trees, so the structure of the tree needs to have some meaning apart from just being a data structure. Usually when we talk about a tree of ints, we've got some data structure going because we want logn lookup. That I would call a set, even if the underlying physical structure is a tree.
> > dynamic heterogeneous: parse tree. There are specific kinds of > > nodes, but the structure of the tree is relatively unconstrained. [quoted text clipped - 5 lines] > might not be fully encoded into the tree, due to the complexity of the > grammar. Yes, exactly. In other words, one has, say, 20 different kinds of nodes; each node type may have a fixed or variable number of children; each specific child may be constrained to be of a particular type: a parse tree. I'm looking for a term that captures the fact that a node has a specific structure. Each node has fixed *local* structure, but the tree's *overall* structure is not fixed. (Hence "dynamic heterogeneous.")
> > static heterogeneous: customer/account/invoice/line item. Each > > level in the tree is fixed, with fixed relationships. [quoted text clipped - 6 lines] > values, you might have to add in that less-than-satisfying term "scalar > value" for your leaf nodes. (I am unclear why you use the term "metadata" here. To me that means the system catalog; one doesn't usually join from the catalog to user-define tables, yes?)
I agree "scalar" is an unsatisfying term. I have in mind a better one, but it requires a type system that's a lot different from SQL's. (Which I assert is an "opportunity." :-)
> Otherwise, if you have such a structure > and you have a value that is either multipart or multivalued within > this static structure, you still don't have SQL-92 support and most > SQL-DBMS's don't make it easy to have such structures, even if they are > supported in some way. I'm not saying that this is necessarily the best way to go, but you can certainly handle this case in a 1NF way.
I think this kind of structure is what one often ends up with in SQL. And I do think it works really well. Consider the customer/account/ invoice/invoice line item sort of case. The structure is rigid, but any of the queries you want to ask against this sort of structure are quite easy.
> Interesting topic, but I not quite sure about this partitioning as yet. > Are you trying to carve out how the relational model fits within a > tree model by giving a term to such trees? Yes!
SQL handles the static heterogeneous case with ease, but really chokes on the other two cases. The big difference seems to be the fixed depth vs. variable depth issue. It's impossible to handle variable depth without something more powerful than the basic RM; you need something at least as powerful as transitive closure; recursive queries or (ugh) iteration will also work.
So the area I'm exploring is, what kinds of operations do we want to do on the dynamic tree types, and what smallest bit of power to we have to add to the RM to handle that well?
> If so and if I understand > your terms, then there are at least NF2 (non-first normal form) models > that would fit your "static" heterogeneous term. The issue for me is not the model so much, as it is what operators do we need to work on them. Modeling structure is relatively easy; the hard part is querying, updating, and constraining.
Marshall
dawn - 22 Jul 2005 15:41 GMT > > > I've been thinking about trees in the abstract lately, and > > > trying to classify them. I am not talking about trees as [quoted text clipped - 26 lines] > That I would call a set, even if the underlying physical structure > is a tree. hmmm. I'll process, but I figured that putting a person record on each node when trying to classify logical trees was muddying the example. I don't think I'm fully tapped into what you are thinking yet, but this gives me more clues.
> > > dynamic heterogeneous: parse tree. There are specific kinds of > > > nodes, but the structure of the tree is relatively unconstrained. [quoted text clipped - 13 lines] > but the tree's *overall* structure is not fixed. (Hence "dynamic > heterogeneous.") OK, so the tree structure is not the only variable here -- you are also looking at the structure within a node. Got it.
> > > static heterogeneous: customer/account/invoice/line item. Each > > > level in the tree is fixed, with fixed relationships. [quoted text clipped - 8 lines] > > (I am unclear why you use the term "metadata" here. If you look at a DOM tree for an XML document (or just zero in on an XHTML document if easier), you see that you move from tag to tag in a path until you get to a value that has no children. So, you step down the tree from <html> to <body> to <div> to <p> to a value that is the text in the paragraph. The leaf nodes have values and others have metadata.
> To me that means > the system catalog; one doesn't usually join from the catalog to > user-define tables, yes?) With the topic of trees, I flipped out of "relational". So, I would say that it is the case when I work with a tree or graph data structure (which for me are typically old data structures rather than new) I might step from metadata, such as the name space (by whatever name -- this might be a data source or schema), down the tree to a file (for example), to a record based on a key value, to a to a field, to a data value. If that data value is the value of a key in another file, then I might step to a record in another file, then to a field, and then finally to the target data value. Everything above the leaf value is really just data about the data I'm after -- it is metadata, although only the names might be considered such -- the name of the name space, of the two files and of the two fields.
But the point is that in my mind the data trees do have metadata and paths through the tree lead to data values.
> I agree "scalar" is an unsatisfying term. I have in mind a better one, > but it requires a type system that's a lot different from SQL's. > (Which I assert is an "opportunity." :-) I'm planning to start, uh, blogging, this Fall and I decided to test out various tools this past spring, so I have a first entry (which likely looks like I abandoned the cause) at my not-exactly-perfect web site (they say that women apologize more than men) about the types I have at the top of my types tree (under the "type" type) in my ideal LOGICAL system.
http://tincatgroup.com/mewsings
I'm guessing yours are different?
> > Otherwise, if you have such a structure > > and you have a value that is either multipart or multivalued within [quoted text clipped - 4 lines] > I'm not saying that this is necessarily the best way to go, but you > can certainly handle this case in a 1NF way. Sure, but that doesn't mean that if you have a tree that matches your definition then it IS in a SQL-DBMS structure. If so, then, yes, obviously SQL handles that type of tree well, but if you permit other instances of such a type then, no, SQL doesn't handle all trees of this type well, so this partitioning of types of trees did not isolate information about SQL and trees. Did that make sense?
> I think this kind of structure is what one often ends up with in SQL. > And I do think it works really well. Consider the customer/account/ > invoice/invoice line item sort of case. The structure is rigid, but > any of the queries you want to ask against this sort of structure > are quite easy. If you are saying that you can put data structures that could be implemented in a SQL-DBMS into such a structure, OK, but if you are saying that SQL handles such tree structures well, then NO -- it only handles the SQL-like flavors well.
> > Interesting topic, but I not quite sure about this partitioning as yet. > > Are you trying to carve out how the relational model fits within a > > tree model by giving a term to such trees? > > Yes! You have possibly made it to a category that is a superset. It might be necessary to use such a structure (I'm not quite fully tapped into it, so I'll hedge on that), but it is not sufficient unless you add a translator to take a tree with multipart values or multivalue and explode it (that really is formally the term we use in my neck of the woods) into a tree that SQL does like.
> SQL handles the static heterogeneous case with ease, am I correct that this is only tree for a subset of this type of tree by your definitions?
> but really > chokes on the other two cases. The big difference seems to be > the fixed depth vs. variable depth issue. Yes, that is an issue, but only one.
> It's impossible to > handle variable depth without something more powerful than the > basic RM; you need something at least as powerful as transitive > closure; recursive queries or (ugh) iteration will also work. yes & yes and iteration is precisely how I drive from my house to my parents' house, among other things. How do you think relational operators are implemented? ;-)
> So the area I'm exploring is, what kinds of operations do we want > to do on the dynamic tree types, and what smallest bit of power > to we have to add to the RM to handle that well? That is precisely the wrong question ;-) if you are talking about the logical data model IMO. If a human is interacting with the data model, then that person can apply either the metaphor of a graph (tree or otherwise) or of sets and benefit from the use of both metaphors, rather than being restricted by one. If we can model the idea of "travel" by showing a picture of a person on a bicycle or by showing an airplane, then how can we extend the picture of the bicycle minimally so that it gets at the idea of being able to go over water? That question is as relevant in my mind as the one you asked (OK, that is almost the case -- I used a metaphor, so it is necessarily flawed and also limited).
> > If so and if I understand > > your terms, then there are at least NF2 (non-first normal form) models > > that would fit your "static" heterogeneous term. > > The issue for me is not the model so much, as it is what operators > do we need to work on them. Give freedom to the functions -- they can be relational operators or functions to move down a path or whatever is useful.
> Modeling structure is relatively easy; > the hard part is querying, updating, and constraining. Yup. I check in on XQuery on occasion just for the entertainment. smiles. --dawn
> Marshall Marshall Spight - 24 Jul 2005 22:06 GMT > hmmm. I'll process, but I figured that putting a person record on each > node when trying to classify logical trees was muddying the example. I > don't think I'm fully tapped into what you are thinking yet, but this > gives me more clues. I just picked org chart because it shows an example of a complex structure in which the structure of each node is the same.
> > > > dynamic heterogeneous: parse tree. There are specific kinds of > > > > nodes, but the structure of the tree is relatively unconstrained. [quoted text clipped - 16 lines] > OK, so the tree structure is not the only variable here -- you are also > looking at the structure within a node. Got it. Well, yes, but only insofar as the node structure influences the overall structure. In other words, a node for + would have two children, a node for ! would have only one.
> > (I am unclear why you use the term "metadata" here. > [quoted text clipped - 4 lines] > text in the paragraph. The leaf nodes have values and others have > metadata. That strikes me as a nonstardard definition of the use of metadata, but no matter.
> > I agree "scalar" is an unsatisfying term. I have in mind a better one, > > but it requires a type system that's a lot different from SQL's. [quoted text clipped - 8 lines] > > http://tincatgroup.com/mewsings The pun in the URL is awful. I like it!
> I'm guessing yours are different? Well, that column seemed to be mostly about document management and not data management. I'm really starting to see them as two entirely different disciplines, and I believe that most of what's going on in XML is about document management and not data management.
I admit that text documents are an important data type, but it's *just one* datatype among millions. Limiting ourselves to looking at only a single datatype doesn't put us in a good position to think about datatypes in general.
> > > Otherwise, if you have such a structure > > > and you have a value that is either multipart or multivalued within [quoted text clipped - 11 lines] > type well, so this partitioning of types of trees did not isolate > information about SQL and trees. Did that make sense? Sorry, but it didn't. At least not enough to tell you whether I agree or not.
I'll try rephrasing: If you have data that conceptually matches what I'm calling "static heterogeneous", (specifically, the data hierarchy is fixed, as in Customer/Account/Invoice/InvoiceLineItem) then you will be able to model, query, update, and constrain this data very well in SQL. In constrast, if you have an org chart, in which the structure is not fixed, then you will have a harder time, particularly with querying and constraining, and maybe also updating.
> If you are saying that you can put data structures that could be > implemented in a SQL-DBMS into such a structure, OK, but if you are > saying that SQL handles such tree structures well, then NO -- it only > handles the SQL-like flavors well. I don't really get it. *Any* data structure can be implemented in SQL. I'm trying to get at the question of what specific ones are a good match for sql, and I'm saying static heterogeneous is, and the others aren't.
> > > Interesting topic, but I not quite sure about this partitioning as yet. > > > Are you trying to carve out how the relational model fits within a [quoted text clipped - 13 lines] > am I correct that this is only tree for a subset of this type of tree > by your definitions? I'm going to assume you mean "true" when you first said "tree", because otherwise I can't make the sentence parse. If so, then *no*, I'm saying it's true for all tree structures that match my definition of static homogeneous. (And, explicitly, *not* true for the other two kinds. They're hard to work with in SQL.)
> > but really > > chokes on the other two cases. The big difference seems to be > > the fixed depth vs. variable depth issue. > > Yes, that is an issue, but only one. Okay. What are the others, as you see it?
> > It's impossible to > > handle variable depth without something more powerful than the [quoted text clipped - 3 lines] > yes & yes and iteration is precisely how I drive from my house to my > parents' house, among other things. I do not agree. It is just as accurate to say that in going from point A to B to C to D, you invoke a pure function to transform your position A to B, which then invokes a function to transform your position from B to C, which then invokes a function to transform your position from C to D. If you want to prove to me that the universe is inherrently iterative, you're going to have to point out to me a loop counter somewhere in the real world. Under a rock, say.
The fact that so many progammers favor iteration over recursion is something I consider odd, given that iteration is strictly less powerful than recursion. It is possible to transform any iterative algorithm into a recursive one; the reverse is not true.
> How do you think relational operators are implemented? ;-) Alas, this is irrelevant. We build software in layers, and each layer is implemented in whatever paradigm it chooses, but this does not constrain the choice of paradigm of higher layers.
> > So the area I'm exploring is, what kinds of operations do we want > > to do on the dynamic tree types, and what smallest bit of power [quoted text clipped - 5 lines] > otherwise) or of sets and benefit from the use of both metaphors, > rather than being restricted by one. I would be interested to hear what you consider to be the right question.
You regularly mention graphs. Do you have some demonstration of why you consider them a good choice? Maybe a description of some minimal set of operators and what you can do with them? I am quite pleased with the fact that the RM has its minimal set of operations and its correspondence with first order logic; can you describe anything comparable?
> If we can model the idea of > "travel" by showing a picture of a person on a bicycle or by showing an [quoted text clipped - 3 lines] > almost the case -- I used a metaphor, so it is necessarily flawed and > also limited). Alas, I did not follow this metaphor at all. :-( Could you try to rephrase your point, perhaps without using a metaphor?
> > > If so and if I understand > > > your terms, then there are at least NF2 (non-first normal form) models [quoted text clipped - 11 lines] > Yup. I check in on XQuery on occasion just for the entertainment. > smiles. --dawn I hear XMLSchema is a laugh riot.
Marshall
dawn - 25 Jul 2005 00:01 GMT > > > (I am unclear why you use the term "metadata" here. > > [quoted text clipped - 7 lines] > That strikes me as a nonstardard definition of the use of metadata, > but no matter. I used html tags, but these could have been metadata from any structure, so a path could be <Person> then <lastName> then a value. I haven't done a lot of work with XML compared to others, but unless I am misunderstanding something, this would be a common way of shaping metadata & data from an RDBMS into an xml dom tree.
> > > I agree "scalar" is an unsatisfying term. I have in mind a better one, > > > but it requires a type system that's a lot different from SQL's. [quoted text clipped - 10 lines] > > The pun in the URL is awful. I like it! Yes it is and thanks.
> > I'm guessing yours are different? > > Well, that column seemed to be mostly about document management > and not data management. but if you knew me, you would know that I don't talk about document management, I talk about data management. But at the logical level, we only care about a value such as 4 if we have some context, semantics. We model propositions and retrieve the same. I used to say that I just need "String" and "Binary" as my data types, with other types inheriting from those. When I type in a 13 as a number it is a string of two integers, the same as if I type it in as a string. The computer can do what it wants to store it in a smaller number of 1's & 0's, I just want to enter strings and get them back. I used the term "Document" because I am not interfacing with the devices via voice, so it seems like I wan to enter data from a document and get documents back. I would be OK with the term "Word" or "Sentence Fragment" or even "String" or even "Page" as this type, but I thought Document fit well with how a person thinks. Am I working with a computer related to music files, video files, picture files, document files?
> I'm really starting to see them as two > entirely different disciplines, just when they are coming so much closer together. Maybe it is easier for you to see a comment attribute as a component in a document than another tag and related value, but at least you can think of data processed reports as documents, right? Can you then make the leap to see the input forms as documents too? And what is the interface -- the input to and the output from, right?
> and I believe that most of what's > going on in XML is about document management and not data > management. I tell people who either a) fear xml or b) worship xml that it is simply a minor advancement over comma-quote format. If you buy that, then you should be able to see that xml is all about data. So I suspect your concerns are with the "management" word. I will grant that managing data using xml and related documents such as dtd or xsd is not the same as managing data with an RDBMS. However, from a model perspective, managing data as trees (and using xlink as other graph structures) is an alternative to managing data as relations. Then some of the same data management features as a typical DBMS can be added to this model.
> I admit that text documents are an important data type, but > it's *just one* datatype among millions. It is the type of data that I am entering when I stream in numbers from a temperature device or type in my name and address again. If you want to think of it as String instead of Document, feel free. The average person might not feel as in touch with Strings as with Documents, however.
> Limiting ourselves > to looking at only a single datatype doesn't put us in a good > position to think about datatypes in general. It is not a limit, but the next level down from "Type" or "Object" as the type from which all types flow. So, Document is-a Type and Mime is-a Type. Date is a Document, HTML is-a Document, Integer is-a Document (replace Document with String and it will sound better to you). It is only when you look at the interface between the software and the machine that you would want to think of a number as not being a type of string. If you look at the input to and the output from the database, you will see it is all strings of data. Mime types are also strings, but they differ in that they do not build up to documents, but to songs or videos or object code or ...
So, all non-binary data that I use when working with a computer are in terms of strings and I could call all such forms for getting these strings in and out of the computer "documents".
> > > > Otherwise, if you have such a structure > > > > and you have a value that is either multipart or multivalued within [quoted text clipped - 14 lines] > Sorry, but it didn't. At least not enough to tell you whether I agree > or not. I disagree with your statement that SQL handles a particular type of tree well because there are trees that match your def of that type that SQL does not handle well. They could be reshaped into trees that SQL does handle, but I did not think that was your point.
> I'll try rephrasing: If you have data that conceptually matches > what I'm calling "static heterogeneous", (specifically, the data > hierarchy is fixed, as in Customer/Account/Invoice/InvoiceLineItem) > then you will be able to model, query, update, and constrain this > data very well in SQL. I suspect that I disagree, but I might not be understanding what your tree looks like.
For example, if InvoiceLineItem has an attribute "discount" which has a value that is a list, such as
10.0 20.0
would your tree look the same as if this attribute were constrained to be a single value? If the two trees would look the same, then SQL can handle one and not the other (without first changing it to a different tree). So, I think you have to get into restricting data types before you can say that SQL will handle a tree in this shape.
I will grant that I might not be seeing the same tree that you are seeing. Mine would have a path like this:
namespace:datasource/schema identifier(s) such as a URI or host:port:datasourcename
Customer customerID=12345
Account accountNbr=12345-01
Invoice invoiceNbr=3828193
InvoiceLineItem lineNbr=3 / InvoiceLineItem discount=10.0 \ and a second child node of the InvoiceLineItem lineNbr=3 would be discount=20.0
a) Does this meet your criteria for what type of tree it is? b) Does your tree look similar? c) Do you agree that SQL doesn't query this tree very well as it gets hung up on having two values for the discount attribute of the InvoiceLineItem at lineNbr=3?
> In constrast, if you have an org chart, > in which the structure is not fixed, then you will have a harder [quoted text clipped - 10 lines] > match for sql, and I'm saying static heterogeneous is, and the others > aren't. It might simply be that I'm not catching on to your terms here and if so, I apologize. If I am understanding, then I can agree that you can say that the trees that SQL handles well are static heterogeneous, but you cannot say that SQL handles all such trees well.
> > > > Interesting topic, but I not quite sure about this partitioning as yet. > > > > Are you trying to carve out how the relational model fits within a [quoted text clipped - 16 lines] > I'm going to assume you mean "true" when you first said "tree", because > otherwise I can't make the sentence parse. tree (I'm using it again as a synonym for "true" -- do you have a problem with that? I mean, oops, sorry).
> If so, then *no*, I'm > saying it's true for all tree structures that match my definition > of static homogeneous. (And, explicitly, *not* true for the other > two kinds. They're hard to work with in SQL.) Do you understand the tree that I am handing you that I think is a counter-example? Is it one?
> > > but really > > > chokes on the other two cases. The big difference seems to be [quoted text clipped - 3 lines] > > Okay. What are the others, as you see it? My counter-example of a complex data type (a list in this case), would be one.
> > > It's impossible to > > > handle variable depth without something more powerful than the [quoted text clipped - 12 lines] > out to me a loop counter somewhere in the real world. Under a rock, > say. I have no reason to push that one further. I'd prefer a function that beams me up to one that requires me to get 1 tank of gas and then another (2) tank of case and so on until I reach the destination.
> The fact that so many progammers favor iteration over recursion > is something I consider odd, given that iteration is strictly > less powerful than recursion. Yes, but if you can iterate instead of using recursion, then you don't set yourself up for a stack overflow, for example. With recursion, you push the stack with each iteration and don't free up that memory until you are done with the entire process. If an iterative algorithm is possible, then I would only use recursion if that algorithm gives improved performance or increased maintainability (at least that is all I can think of now).
> It is possible to transform any > iterative algorithm into a recursive one; the reverse is not true. Yes indeed.
> > How do you think relational operators are implemented? ;-) > > Alas, this is irrelevant. We build software in layers, and each > layer is implemented in whatever paradigm it chooses, but this > does not constrain the choice of paradigm of higher layers. OK. So, some functions can be defined using iteration, but you want to be done with the iteration in the underlying layers and not in any software you or I would write. As long as it remains an option, should I choose to use it instead of playing any games to get around using it, then I'm OK with that approach.
> > > So the area I'm exploring is, what kinds of operations do we want > > > to do on the dynamic tree types, and what smallest bit of power [quoted text clipped - 8 lines] > I would be interested to hear what you consider to be the right > question. How about this one -- How can we have an API to data that permits us to view it in the variety of ways that would be helpful? The answer might include operators for joining relations as well as for navigating from one row of data in one relation to another relation, using a foreign key.
> You regularly mention graphs. Do you have some demonstration of > why you consider them a good choice? I would love to have some emperical data, but I don't. I do have anecdotal data and first-hand experience of software development teams being much more productive and/or requiring many fewer developers when using a DBMS that employs a tree/graph model along with a set approach, instead of a SQL-DBMS. I have no data that proves that this is the case in general, nor that shows that the data model is key to the productivity gains. I have asked before in this forum what kind of an experiment could be set up that would demonstrate that. I cannot think of anything I could set up as an experiment for a small cost where the outcome could sway anyone to anything related to the choice of a data model.
I have only my intuition for that, which runs at about .72 (and no, I have no intuition for whether to play red or black in vegas, so don't think this implies that I'm right 72% of the time, just 72% of the time that I am running with an intuition).
> Maybe a description of some > minimal set of operators and what you can do with them? I am quite > pleased with the fact that the RM has its minimal set of operations > and its correspondence with first order logic; can you describe > anything comparable? There are some papers along this line that Jan has pointed me to before (so they are on my stolen computer). Perhaps googling for functional-data-model or xml-data-model (I know, I know) would yield results. I'll check at some point, but let me know if you find something.
> > If we can model the idea of > > "travel" by showing a picture of a person on a bicycle or by showing an [quoted text clipped - 6 lines] > Alas, I did not follow this metaphor at all. :-( Could you try to > rephrase your point, perhaps without using a metaphor? My point is that each metaphor is some partial version of the whole. I just listened to a CD of a theologian talking about how Jesus used parables and he said something like "a parable is a metaphor and a metaphor is a lie". If you know the poem about the blind men and the elephant, that makes a similar point to mine. We gain something by working with propositions viewed as relations AND as "webs" (do you prefer that to "graphs"?) rather than restricting ourselves to one of these.
> > > > If so and if I understand > > > > your terms, then there are at least NF2 (non-first normal form) models [quoted text clipped - 13 lines] > > I hear XMLSchema is a laugh riot. I'll check it out. smiles. --dawn
Marshall Spight - 25 Jul 2005 04:06 GMT > > That strikes me as a nonstardard definition of the use of metadata, > > but no matter. [quoted text clipped - 4 lines] > misunderstanding something, this would be a common way of shaping > metadata & data from an RDBMS into an xml dom tree. Again, this does not appear to be standard terminology to me. As I am used to understanding the term, "metadata" means higher-order data, not simply higher level data. It would not properly be used to describe data in an enclosing scope. In an xhtml example, for some text inside <b> tags, metadata would not be anything about the enclosing <p>, but rather the metadata would be the xhtml dtd.
Metadata is schema information, or type information.
> > > I'm guessing yours are different? > > [quoted text clipped - 3 lines] > but if you knew me, you would know that I don't talk about document > management, I talk about data management. I'm not sure I agree.
> But at the logical level, we > only care about a value such as 4 if we have some context, semantics. > We model propositions and retrieve the same. I used to say that I just > need "String" and "Binary" as my data types, with other types > inheriting from those. When I type in a 13 as a number it is a string > of two integers, the same as if I type it in as a string. Okay, you're headed down a dead end road here. Watch out!
When you type, you are typing characters. There is no way to type in 13 as a number, not even with ^M. You can only type characters. When you type '1' and then '3', you have typed two characters. You don't have a number yet.
> The computer > can do what it wants to store it in a smaller number of 1's & 0's, I > just want to enter strings and get them back. I would be surprised if all you really care about is strings. If all you have is strings, you can't add, for example. The fact that some programming languages will implicitly convert a string to a number in some contexts and add the resulting numbers is simply a distraction; it does not mean that strings and numbers are the same thing.
> I used the term > "Document" because I am not interfacing with the devices via voice, so [quoted text clipped - 8 lines] > > just when they are coming so much closer together. They aren't, though. It's just that some people with a document management background are claiming their tools do everything the data management tools do. But their claims are false.
> Maybe it is easier > for you to see a comment attribute as a component in a document than > another tag and related value, but at least you can think of data > processed reports as documents, right? Not so much. Mostly I think of reports as result sets. You might format them as an html table, pretty them up, and embed it in a page, though. But that's mere presentation.
> Can you then make the leap to > see the input forms as documents too? And what is the interface -- the > input to and the output from, right? The machine interface is, we type characters on the keyboard, and we see pixels on the screen. (We also specify x,y coordinates and have a few more buttons, non-character this time, on the mouse.)
Honestly, I consider the html way of talking about the world, with input forms, hypertext, etc. as a decidedly inferior model of human-computer interaction than the one available 20 years ago. I am fond of saying that Tim Berners-Lee got one important thing right and every other important thing wrong, and in so doing set software back 20 years. The two bloodiest casualties have been UI and data management.
> > and I believe that most of what's > > going on in XML is about document management and not data [quoted text clipped - 3 lines] > simply a minor advancement over comma-quote format. If you buy that, > then you should be able to see that xml is all about data. XML is all about strings.
> So I > suspect your concerns are with the "management" word. I will grant [quoted text clipped - 4 lines] > of the same data management features as a typical DBMS can be added to > this model. Having the tree as the only possible structure is worse than having the relation as the only possible structure, and I agree with you that the latter is too limiting. I would also propose that having the object as the only possible structure is a lose. Further, I think the random mishmash of stuff that appears in most programming languages is also not the way to go.
I would propose that the solution would support at least relations, lists, tuples, and enum types. In fact, that might be a complete set.
Also: having strings as the only datatype is inferior to having a variety of different types. The perl/tcl/html approach, where everything is a string and you have a bajillion kinds of implicit conversions might be fine if your goal is fast-and-loose, but if you want any kind of discipline, which you certainly do if you want to manage data, then it's out the door.
> > I admit that text documents are an important data type, but > > it's *just one* datatype among millions. [quoted text clipped - 4 lines] > person might not feel as in touch with Strings as with Documents, > however. By limiting yourself to how Joe Sixpack thinks about these things today, you may gain some usability benefit in applications aimed at the lowest common denominator, but you're not going to discover anything profound that way. And, If Everyone Did It, (as Mom would say) the forward progress of mankind would halt.
So I'm not much interested in how the average person might think about their data.
> > Limiting ourselves > > to looking at only a single datatype doesn't put us in a good [quoted text clipped - 5 lines] > Document (replace Document with String and it will sound better to > you). Integer is most certainly not a document.
Question: if you add two documents together, what is the result? Is addition of documents commutative?
> It is only when you look at the interface between the software > and the machine that you would want to think of a number as not being a > type of string. As a mathematician who has never touched a computer (well, they used to exist) if a number is a kind of Engish text.
> If you look at the input to and the output from the > database, you will see it is all strings of data. You are confusing things again. I could just as well say it's all pixels, because we use graphic displays these days. Would you agree that the job of a dbms is to manage pixels? When you click the left mouse button, which character is that?
> Mime types are also > strings, but they differ in that they do not build up to documents, but > to songs or videos or object code or ... I don't see how you can consider a video and a text ocument to be the same thing.
> I disagree with your statement that SQL handles a particular type of > tree well because there are trees that match your def of that type that [quoted text clipped - 12 lines] > For example, if InvoiceLineItem has an attribute "discount" which has a > value that is a list, such as Ah, now I think I see what you're getting at.
First of all, I completely agree that SQL handles lists poorly. Ordered data, when the order is not derived from some ordering function on the data but is implicit, is a weak point for SQL.
However, I was attempting to isolate the question of tree structure; list data, whether embedded in a tree of whatever kind or just on its own, is a real issue, but one that I see as being orthogonal to the tree-structure taxonomy I'm working on.
And I'm sure we can both agree that SQL doesn't do nested relations or nested anything, really. I am fully signed up to the value of nested structure, and to the value of lists. So you don't have to try to convince me on either point. :-)
> a) Does this meet your criteria for what type of tree it is? > b) Does your tree look similar? > c) Do you agree that SQL doesn't query this tree very well as it gets > hung up on having two values for the discount attribute of the > InvoiceLineItem at lineNbr=3? Yes, yes, and yes.
The part that SQL handles well, though, is queries across the node types (or levels), which is the part that is important (at least to me) in classifying these kinds of trees. SQL has a hard time with list data or multivalue data, whether it's in a tree or not, so again I consider it an orthogonal issue.
> It might simply be that I'm not catching on to your terms here and if > so, I apologize. If I am understanding, then I can agree that you can > say that the trees that SQL handles well are static heterogeneous, but > you cannot say that SQL handles all such trees well. I think we're in agreement at this point.
> > The fact that so many progammers favor iteration over recursion > > is something I consider odd, given that iteration is strictly > > less powerful than recursion. > > Yes, but if you can iterate instead of using recursion, then you don't > set yourself up for a stack overflow, for example. Tail call optimization can make most or all of this issue go away. (This raises a question for me, which is, is it the case that TCO can convert any iterative construct into a recursive one that uses only constant stack space? I think the answer might be yes, because I think I can see how to write a tail-recursive 'while', and I think all iterative constructs are just syntax on while.)
> With recursion, you > push the stack with each iteration and don't free up that memory until > you are done with the entire process. If an iterative algorithm is > possible, then I would only use recursion if that algorithm gives > improved performance or increased maintainability (at least that is all > I can think of now). If programmers put as much effort into recursion as they put into iteration, I assert it would provide increased maintainability.
> > It is possible to transform any > > iterative algorithm into a recursive one; the reverse is not true. [quoted text clipped - 12 lines] > I choose to use it instead of playing any games to get around using it, > then I'm OK with that approach. Ha ha! I think I agree.
> > > > So the area I'm exploring is, what kinds of operations do we want > > > > to do on the dynamic tree types, and what smallest bit of power [quoted text clipped - 10 lines] > operators for joining relations as well as for navigating from one row > of data in one relation to another relation, using a foreign key. Okay, sure. But trying to ask this question for all possible data structures is too big for me to go after all at once. So right now I'm focusing just on trees: trying to classify them, and trying to figure out what kinds of operations I might want to do with them.
The question in the large is: what are all the different logical data structures, how might we query them, and how might we update them and constrain them? I think mankind will be working on this for some time.
> > You regularly mention graphs. Do you have some demonstration of > > why you consider them a good choice? > > I would love to have some emperical data, but I don't. [...] Whoops! You answered a question that wasn't quite what I meant to ask. What I meant to ask was, can you show me a simple graph(s) and some simple operations on them. Do you have a framework for saying that some particular set of operations on these graphs is complete, or even just some framework for saying what the possible graph operators are?
I care about empirical data, too, but when I'm *here* I'm thinking about theory.
> > Maybe a description of some > > minimal set of operators and what you can do with them? I am quite [quoted text clipped - 7 lines] > results. I'll check at some point, but let me know if you find > something. I would suggest that knowing what the graph operators are is something you ought to have a good grip on if you're going to be advocating for graphs.
> My point is that each metaphor is some partial version of the whole. I > just listened to a CD of a theologian talking about how Jesus used [quoted text clipped - 3 lines] > working with propositions viewed as relations AND as "webs" > (do you prefer that to "graphs"?) I prefer "graphs."
> rather than restricting ourselves to one of these. But one can also *gain* from restrictions, as well. This point is often missed. It's why constraints are valuable, and it's why a minimal formalism is valuable.
Marshall
dawn - 25 Jul 2005 06:49 GMT > > > That strikes me as a nonstardard definition of the use of metadata, > > > but no matter. [quoted text clipped - 8 lines] > As I am used to understanding the term, "metadata" means higher-order > data, not simply higher level data. If lastName is an attribute (in RM terminology) of Person, then the names "lastName" and "Person" are both metadata for the Person relation, right?
> It would not properly be used > to describe data in an enclosing scope. I think this is just a representation issue. Since you are talking about representing data in a tree model, I refered to an xml representation of the metadata and data rather than a table representation.
> In an xhtml example, for some > text inside <b> tags, metadata would not be anything about the > enclosing <p>, but rather the metadata would be the xhtml dtd. > > Metadata is schema information, or type information. Yes, indeed. Surely it includes the names of relations and header information such as column names, right?
> > > > I'm guessing yours are different? > > > [quoted text clipped - 5 lines] > > I'm not sure I agree. I guess I can only state it as my intent. I don't mind chatting about .mp3 data or .rm or .mov or .class data, but I'm particularly interested in data that comes from language modeled for software development purposes. This would be Words, or Sentence Fragments, or Strings, etc, and "Documents" might be too large a concept, but it at least gives the hint that there would be operators that can parse individual data values of this type and not only set operators.
> > But at the logical level, we > > only care about a value such as 4 if we have some context, semantics. [quoted text clipped - 4 lines] > > Okay, you're headed down a dead end road here. Watch out! I don't think so, but I'll bring my mace just in case.
> When you type, you are typing characters. Precisely. And without loss of generalization we can consider my "Document" type to be that which can be typed in on a keyboard and read out loud in a language. So, under Type in my type hierarchy, I have Strings/Documents and Binary/MIME types. All other types are in the type tree below these.
> There is no way to type > in 13 as a number, not even with ^M. You can only type characters. Yes, yes -- you can only type characters/strings/words/sentences/language = Documents.
> When you type '1' and then '3', you have typed two characters. > You don't have a number yet. I do have a number, but I am only ttreating as a string (supertype). A number is a string with some additional functions. So, once I realize this string is a number, I can apply numeric functions. But for any data of type String or Document, I can treat it as a String.
> > The computer > > can do what it wants to store it in a smaller number of 1's & 0's, I > > just want to enter strings and get them back. > > I would be surprised if all you really care about is strings. > If all you have is strings, you can't add, for example. 13 is in the Integer subclass of String/Document, in my type hierarchy.
> The > fact that some programming languages will implicitly convert > a string to a number in some contexts and add the resulting > numbers is simply a distraction; it does not mean that strings > and numbers are the same thing. The representation of a number, like the representation of a word, are Strings and that is what we are working with in our software applications. An Integer is-a String.
> > I used the term > > "Document" because I am not interfacing with the devices via voice, so [quoted text clipped - 12 lines] > management background are claiming their tools do everything > the data management tools do. But their claims are false. I come from the data side of the house, but respect the fact that if you start marking up propositions in a consistent way, you can have a representation of structured data that moves the document in the direction of a database.
> > Maybe it is easier > > for you to see a comment attribute as a component in a document than > > another tag and related value, but at least you can think of data > > processed reports as documents, right? > > Not so much. Mostly I think of reports as result sets. If I ask you to e-mail me one of those result sets, nevermind, I know you can see it as a document if you want to.
> You might > format them as an html table, pretty them up, and embed it in > a page, though. But that's mere presentation. Exactly. It is not the data, but it is how we can perceive the data -- one possible representation and the one that we use as humans both for entering data directly and for getting it back from the computer.
> > Can you then make the leap to > > see the input forms as documents too? And what is the interface -- the [quoted text clipped - 8 lines] > of human-computer interaction than the one available 20 years > ago. Along with way-cool, hip languages and tools, I also work with database products that date back 40 years (I have a paper from 1965 I want to scan in). Every piece of data is a string, but you can also define the string as a number and apply numeric functions if you want to. I mention xml instead only because more people know that lingo. The model is very similar between the 40-year-old system (definitely about data) as it evolved and the document model from the xml folks. It is as if flared hip-hugger jeans were back in style. And if the document folks want to take the credit for this new hip idea, it's ok by me.
> I am fond of saying that Tim Berners-Lee got one important > thing right and every other important thing wrong, and in so doing > set software back 20 years. The two bloodiest casualties have > been UI and data management. Codd got some things right, Berners-Lee got some right, and I'm with those who are suggesting we put the chocolate and the peanut butter together.
> > > and I believe that most of what's > > > going on in XML is about document management and not data [quoted text clipped - 5 lines] > > XML is all about strings. now we are getting somewhere. Some of those strings are numbers, some are dates, right?
<snip>
> I would propose that the solution would support at least relations, > lists, tuples, and enum types. In fact, that might be a complete set. > > Also: having strings as the only not only!! It is the ANCESTOR of all non-binary, language data.
> datatype is inferior to having > a variety of different types. The perl/tcl/html approach, where > everything is a string and you have a bajillion kinds of implicit > conversions might be fine if your goal is fast-and-loose, I've never been called that! But maybe we can get bigger bang for the buck s/w development with what you termed fast-and-loose.
> but > if you want any kind of discipline, and I definitely do
> which you certainly do if > you want to manage data, then it's out the door. My type hierarchy has booleans, ints, etc, but these have an ancestor of String.
> > > I admit that text documents are an important data type, but > > > it's *just one* datatype among millions. [quoted text clipped - 8 lines] > today, you may gain some usability benefit in applications aimed > at the lowest common denominator, I'm aiming for a data model that works for humans and related software that does likewise.
> but you're not going to discover > anything profound that way. Profound is not my goal - I'm looking for ease and low cost in software development and maintenance.
> And, If Everyone Did It, (as Mom would > say) the forward progress of mankind would halt. A logical data model is about the interface between humans and the machine, even if those humans are s/w developers. It would be progress to have the computer meet the human closer to where the human lives.
> So I'm not much interested in how the average person might think > about their data. I'm not (currently) aiming for a non-professional to prepare a data model or write software here. But the average programmer is an average person. I want to let the computer do the work of shaping data the way a computer needs to see it and make the API between person and representation of a data model more like the human intuitively perceives it. This will (does, from my experience) improve the ability of the s/w developer to write and maintain big-bang-for-the-buck software.
> > > Limiting ourselves > > > to looking at only a single datatype doesn't put us in a good [quoted text clipped - 7 lines] > > Integer is most certainly not a document. in the interface between me and the computer, I only pass integers as strings, in document formats, typically with some metadata about the integer visible somewhere. But if you prefer, Integer is-a String.
> Question: if you add two documents together, what is the result? > Is addition of documents commutative? Replace "docuemnt" with "string" and "add" with "concatenate". Addition is not a function of a String. It shows up down the type hierarchy for types that are numeric.
> > It is only when you look at the interface between the software > > and the machine that you would want to think of a number as not being a > > type of string. > > As a mathematician who has never touched a computer (well, they used > to exist) if a number is a kind of Engish text. For that mathematician, we could say that the number, such as 1, is something they can hear as a word.
> > If you look at the input to and the output from the > > database, you will see it is all strings of data. > > You are confusing things again. I could just as well say it's > all pixels, because we use graphic displays these days. Would > you agree that the job of a dbms is to manage pixels? Nope. That is the representation of my document, words, sentences, strings or language in the computer. I am not using the term document as a single representation.
> When you > click the left mouse button, which character is that? We definitely aren't hitting each other right on this one.
> > Mime types are also > > strings, but they differ in that they do not build up to documents, but > > to songs or videos or object code or ... > > I don't see how you can consider a video and a text ocument to be the > same thing. I don't, I don't!! That is why in the type hierarchy, I have two types -- binary which in my type hierarchy I'd like those to all be Mime types (otherwise I could call them all Binary) and then String language types, which I'm calling Documents.
> > I disagree with your statement that SQL handles a particular type of > > tree well because there are trees that match your def of that type that [quoted text clipped - 40 lines] > types (or levels), which is the part that is important (at least to > me) in classifying these kinds of trees. Unless I am misunderstanding, a counter-example to that statement would be a tree where instead of a list type, we have a name for a two-attribute value in our tree. So, in our relation, we have attribute A and attribute B and in our tree, we have names for A, B and also the name C for A and B together (a COBOL FD for a VSAM file just popped into my brain as clear as day, yikes). This tree with C having children of A and B doesn't look like a SQL-happy tree.
> SQL has a hard time with > list data or multivalue data, whether it's in a tree or not, so [quoted text clipped - 20 lines] > I think I can see how to write a tail-recursive 'while', and I think > all iterative constructs are just syntax on while.) Over my head on that one -- TCO to me is only total-cost-of-ownership. Are you saying that if I write a recursive method in Java then the compiler has or might have a feature that mitigates this or that the run-time environment would have this feature? Have I unnecessarily dragged along a concern that I could have dropped long ago? Did I miss a memo that everyone else got that said not to worry about recursion eating memory?
> > With recursion, you > > push the stack with each iteration and don't free up that memory until [quoted text clipped - 5 lines] > If programmers put as much effort into recursion as they put into > iteration, I assert it would provide increased maintainability. I'll keep that in mind. If you have any "instead of doing this common iteration, try it as this recursion" examples, pass them along, even if OT.
> > > It is possible to transform any > > > iterative algorithm into a recursive one; the reverse is not true. [quoted text clipped - 39 lines] > them and constrain them? I think mankind will be working on this > for some time. I think it is done. They named it XQuery.
But seriously, while it isn't perfect by any stretch (I believe I called it "dog ugly" in this forum), XQuery (with the update capabilities too) does give me some hope. Unlike SQL, I think I could build the api I would want to have on top of it.
<snip>
> > > Maybe a description of some > > > minimal set of operators and what you can do with them? I am quite [quoted text clipped - 11 lines] > you ought to have a good grip on if you're going to be advocating for > graphs. I'm just a practitioner dabbling in theory (and, worse yet, maybe just a s/w dev manager dabbling in practice) but if I were told I had to enumerate the graph functions I use, I would look up the xpath functions on w3.org and start with those.
> > My point is that each metaphor is some partial version of the whole. I > > just listened to a CD of a theologian talking about how Jesus used [quoted text clipped - 5 lines] > > I prefer "graphs." of course I knew that ;-)
> > rather than restricting ourselves to one of these. > > But one can also *gain* from restrictions, as well. This point > is often missed. It's why constraints are valuable, and it's > why a minimal formalism is valuable. I agree that such are valuable. I do have a big problem with the way we handle constraints, however, as I've mentioned in the past. The minimal formalism is good for theory and for a maintainable implementation under the covers, but not necessarily the best api.
cheers! --dawn
Marshall Spight - 25 Jul 2005 16:01 GMT > > It would not properly be used > > to describe data in an enclosing scope. [quoted text clipped - 3 lines] > representation of the metadata and data rather than a table > representation. It seems to me that what's happening here is that because xml spews attribute names all through its file format, you're considering this as normal. But that pretty much only happens with xml.
> > Metadata is schema information, or type information. > > Yes, indeed. Surely it includes the names of relations and header > information such as column names, right? Yes.
> > > But at the logical level, we > > > only care about a value such as 4 if we have some context, semantics. [quoted text clipped - 6 lines] > > I don't think so, but I'll bring my mace just in case. What, are you hawkgirl now? :-)
> > When you type '1' and then '3', you have typed two characters. > > You don't have a number yet. > > I do have a number, but I am only ttreating as a string (supertype). You're conflating lexical and semantic issues. This is that dead end I warned you about, and I fear you've driven into it at 60 mph.
> A number is a string with some additional functions. So, once I realize > this string is a number, I can apply numeric functions. But for any > data of type String or Document, I can treat it as a String. It's certainly possible to build a system like this, but I wouldn't want to use it. For one thing, it throws static typing out the window.
> > The > > fact that some programming languages will implicitly convert [quoted text clipped - 5 lines] > Strings and that is what we are working with in our software > applications. An Integer is-a String. Again the pushing together of lexical and semantic issues. The software that I write, in statically typed languages, does not consider integer to be a subtype of string. Your source code isn't your program, any more than the word "water" is wet.
> I come from the data side of the house, but respect the fact that if > you start marking up propositions in a consistent way, you can have a > representation of structured data that moves the document in the > direction of a database. These all-string representations are intended for reading, not for processing. Human reading is an important application of code and data, but it's certainly not the only one.
> > I am fond of saying that Tim Berners-Lee got one important > > thing right and every other important thing wrong, and in so doing [quoted text clipped - 4 lines] > those who are suggesting we put the chocolate and the peanut butter > together. Except Berners-Lee got so many things wrong, and incompatibly wrong with so many right things. In essence, he did one tiny valuable thing, which is put a GUI on a slightly updated FTP.
> > XML is all about strings. > > now we are getting somewhere. Some of those strings are numbers, some > are dates, right? No. Lexically, all my source code is a string; semantically, it is many different types. A Java source file is one big string, but there are int and classes and so forth in there that, when you execute the program, are not strings at all. You could make a system where they were strings at runtime as well, but such a system would lose many valuable properties that Java has, such as the ability to do static analysis, both by the human and by the computer, for both correctness and performance reasons. This price is too high for me; static analysis is one of the most powerful tools available.
> > datatype is inferior to having > > a variety of different types. The perl/tcl/html approach, where [quoted text clipped - 3 lines] > I've never been called that! But maybe we can get bigger bang for the > buck s/w development with what you termed fast-and-loose. Sure; for prototyping and other small-scale applications. But not for data management applications in which the cost of corruption is high.
> > but if you want any kind of discipline, > [quoted text clipped - 5 lines] > My type hierarchy has booleans, ints, etc, but these have an ancestor > of String. So, how do two strings sort: lexicographically or numerically? I guess it depends on whether they are also numbers, right? When you sort a list of strings, some of which are numbers, which compare function do you use? Or does it vary depending on which strings you're looking at?
Since int <: string, (<: means "is a subtype of") then I presume that int has all the string methods available? So I can have a variable of type int, with an int value in it (which is also a string) and invoke a method to prepend a ~ character to the string, right? Now my variable declared to be int does not contain an int value anymore.
> > By limiting yourself to how Joe Sixpack thinks about these things > > today, you may gain some usability benefit in applications aimed [quoted text clipped - 8 lines] > Profound is not my goal - I'm looking for ease and low cost in software > development and maintenance. Yes, but you're trying to do it by dumbing things down. I don't think that's going to work. I think what's needed is to smarten things up. Not complicate them, mind you; make them smart and simple.
> > And, If Everyone Did It, (as Mom would > > say) the forward progress of mankind would halt. [quoted text clipped - 6 lines] > representation of a data model more like the human intuitively > perceives it. I'm imagining you and some mathematician about a millenium ago, sitting in an ivory tower. The guy comes to you and says, "I'm thinking about this idea for a new number, which I call 'zee-row.' It represents the absence of a number. You could use it as the result of some operations that are currently considered illegal today, like subtracting X from X." And you'd say, "But that's not how the average person intuitively perceives subtraction. Let's not pursue that approach; let's do something more user-friendly."
> > Integer is most certainly not a document. > > in the interface between me and the computer, I only pass integers as > strings, in document formats, typically with some metadata about the > integer visible somewhere. But if you prefer, Integer is-a String. And once you type the integer in, you can just forget about it? No; the human is also in charge of the integer as it moves around the computer, across function calls, across the network, into the database, etc. And to manage this process effectively, he needs a strong suite of tools, chief among them a type system, static analysis, and a way to structure and constrain data.
It's not the case that the only thing that matters to the human is the point of interface between the computer and the human.
> > > It is only when you look at the interface between the software > > > and the machine that you would want to think of a number as not being a > > > type of string. Looking at this paragraph again, this is exactly what I disagree with.
> > > If you look at the input to and the output from the > > > database, you will see it is all strings of data. [quoted text clipped - 6 lines] > strings or language in the computer. I am not using the term document > as a single representation. Correct. Pixels are a mere representation of strings. And in *exactly the same way* strings are a mere representation of your integers.
> > > For example, if InvoiceLineItem has an attribute "discount" which has a > > > value that is a list, such as [quoted text clipped - 34 lines] > popped into my brain as clear as day, yikes). This tree with C having > children of A and B doesn't look like a SQL-happy tree. Okay, I thought I said it pretty well the first time, but I'll try again: in thinking about trees, I'm trying *just* to think about trees. Anything that's also a problem when you have exactly one level will of course also be a problem when you have a multi-level tree. Solve the one-level case and you've solved the multi-level case, assuming you handle multi-level data. So I don't consider SQL's null problems to be a tree issue, even though those problems *also* show up when you're thinking about trees.
I also said:
> > SQL has a hard time with > > list data or multivalue data, whether it's in a tree or not, so > > again I consider it an orthogonal issue. which puts in fairly well, I think.
> > > > The fact that so many progammers favor iteration over recursion > > > > is something I consider odd, given that iteration is strictly [quoted text clipped - 11 lines] > > Over my head on that one -- TCO to me is only total-cost-of-ownership. My fault; I went from the term ("tail call optimization") to the abbreviation ("TCO") too abruptly.
> Are you saying that if I write a recursive method in Java then the > compiler has or might have a feature that mitigates this or that the > run-time environment would have this feature? The Java compiler probably won't, but it might. The Scheme compiler is required to. Since Java is chock-full of iterative constructs, it's not much of an issue; no one uses recursion much.
> Have I unnecessarily > dragged along a concern that I could have dropped long ago? Uh, yes.
> Did I miss > a memo that everyone else got that said not to worry about recursion > eating memory? Well, you still have to worry if your language's implementation doesn't have TCO. But it's not a *fundamental* problem. You also have to worry if your recursive method isn't tail-recursive, but I'm proposing that a recursive translation of an iterative algorithm can be necessarily tail recursive. I'll still have to check on that.
> > If programmers put as much effort into recursion as they put into > > iteration, I assert it would provide increased maintainability. > > I'll keep that in mind. If you have any "instead of doing this common > iteration, try it as this recursion" examples, pass them along, even if > OT. Of course, my background is about 98% iteration. I work for a living which means I've had to use C++ or Java for most of the time. (Before that it was C and Fortran. :-)
As for cool examples, check out quicksort in Haskell: http://www.haskell.org/aboutHaskell.html
Blew my mind the first time I saw it.
> > The question in the large is: what are all the different logical > > data structures, how might we query them, and how might we update > > them and constrain them? I think mankind will be working on this > > for some time. > > I think it is done. They named it XQuery. Uh, does it have natural join?
> I'm just a practitioner dabbling in theory (and, worse yet, maybe just > a s/w dev manager dabbling in practice) but if I were told I had to > enumerate the graph functions I use, I would look up the xpath > functions on w3.org and start with those. I have no reason to do this. I need some tiny glimmer of a reason to suspect there's something interesting there before I look. Nothing so far.
> > But one can also *gain* from restrictions, as well. This point > > is often missed. It's why constraints are valuable, and it's [quoted text clipped - 4 lines] > minimal formalism is good for theory and for a maintainable > implementation under the covers, but not necessarily the best api. Sure sure; we've had that converastion to death. I believe we entirely agree on the analysis of the problem (for once,:-) and I think we mostly agree on the characteristics of the solution.
Marshall
PS. Good grief we are both long-winded, eh?
dawn - 25 Jul 2005 21:22 GMT <snip>
> > > > But at the logical level, we > > > > only care about a value such as 4 if we have some context, semantics. [quoted text clipped - 8 lines] > > What, are you hawkgirl now? :-) We look alike and both carry mace, but otherwise no.
> > > When you type '1' and then '3', you have typed two characters. > > > You don't have a number yet. [quoted text clipped - 3 lines] > You're conflating lexical and semantic issues. This is that dead > end I warned you about, and I fear you've driven into it at 60 mph. I'll step on the gas then.
Let's take petQty=1 and hairColor=brown The "1" is no more a number than "brown" is a color. They are morphemes. They are character/string/keyboard representations related to oneness and brownness.
So, my Type hierarchy is different from others in that I recognize up top that what I'm working with are representations -- that's the type of stuff I've got for the computer to work with.
Next level, adding in semantics for more precision I can further refine the types of 1 and brown by designating the 1 as a string that represents an Integer and the brown as string that represents a Color type, both of which can be descendants of the String/Document/Words/Sentences/Language type. They are not strings that represent songs or videos in the computer, afterall, they are the content of documents. Adding in the semantics does not take the string "brown" and turn it into brown, it simply recognizes that beyond the fact that I have a string, it is a string that represents a Color. So, not only can I extract a character from it, I can also find shoes to match. Then if I further identify that this is not just any color, it is a HairColor, I can apply more functions, such as determining what would need to be added to the hair color to turn it strawberry blond.
I'm guessing you think I'm switching levels here between the character string "1" and what it represents, but I am always talking about the character string and functions on that string. I am using semantics for the design and interpretation of the software but the computer never has to comprehend the meaning.
> > A number is a string with some additional functions. So, once I realize > > this string is a number, I can apply numeric functions. But for any > > data of type String or Document, I can treat it as a String. > > It's certainly possible to build a system like this, but I wouldn't > want to use it. For one thing, it throws static typing out the window. It changes it, but definitely does not toss it out the window. It becomes more flexible.
> > > The > > > fact that some programming languages will implicitly convert [quoted text clipped - 11 lines] > source code isn't your program, any more than the word "water" > is wet. I think I am more consistent in not pretending that the word "brown" really is a Color nor that "1" really is a number.
> > I come from the data side of the house, but respect the fact that if > > you start marking up propositions in a consistent way, you can have a [quoted text clipped - 4 lines] > processing. Human reading is an important application of code and > data, but it's certainly not the only one. I'm thinking of the API between human and computer related to the data model to be the software API for the data as well. Software works with data models all the time and I want it to be easier and more consistent, independent of whether data are to be stored on disk or not. When it comes to computations and processing, the functions for the more specific types can be applied as appropriate. If my "1" is-a Integer, I can add 2 to it to get another representation -- "3"
<snip>
> Except Berners-Lee got so many things wrong, and incompatibly wrong > with so many right things. In essence, he did one tiny valuable > thing, which is put a GUI on a slightly updated FTP. and it spread like wild fire.
> > > XML is all about strings. > > [quoted text clipped - 11 lines] > This price is too high for me; static analysis is one of the > most powerful tools available. You don't lose that as completely as you are suggesting.
> > > datatype is inferior to having > > > a variety of different types. The perl/tcl/html approach, where [quoted text clipped - 7 lines] > for data management applications in which the cost of corruption > is high. I'm definitely aiming for highly scalable apps and quality data. Make it painful (even if not technically hard) to alter a data name or type when requirements change and you will get bad data and work-arounds. I'll cut the rest of this PA, 'cause this thread is getting long.
> So, how do two strings sort: lexicographically or numerically? if you are treating values as strings, then lex... and if they are both of the subtype number, then the sorting function there overrides the string sort.
> I guess > it depends on whether they are also numbers, right? When you sort a > list > of strings, some of which are numbers, which compare function do you > use? Or does it vary depending on which strings you're looking at? You cannot sort a set of Colors and Integers together unless you bump up in the type hierarchy until you are seeing them both as Strings or something with the same ordering.
> Since int <: string, (<: means "is a subtype of") then I presume that > int has all the string methods available? yes
> So I can have a variable > of type int, with an int value in it (which is also a string) and > invoke a method to prepend a ~ character to the string, right? Yes and that would not violate that this was a string. I do realize this introduces back in some of the problems that the DBMS was built to eliminate. Different tools are then needed to do something similar to what the dbms does now. The biggest problems I see with this are in cases where there is a DBMS that is maintained directly through the dbms's api with appliations from different top level owners where there is no ability to have tools that inspect source code. Since I want that source code all persisted with any databases it updates, each database would have all the data and functions it needs to address inconsistencies.
This is not unlike what mountain man was interested in doing, but he was taking everything out of other s/w apps and putting it in the dbms as code, while I'm taking it out of the dbms as code and giving it back to the dbms as data (some of which is code). And, granted, I have a concept but the devil is in the details. Until perfection is reached, I would have a different set of risks and flexibility than with a current sql-dbms.
<snip
> but you're trying to do it by dumbing things down. I don't think > that's going to work. I think what's needed is to smarten things up. I smartened them up. The software now knows that I really can put a tilde in front of a string even though it used to be a number and that I just stopped it from being viewed as a number. It was smart enough to accomodate this change to data values without me having to do anything other than code the application differently and address any warnings my tools give me.
> Not complicate them, mind you; make them smart and simple. precisely.
<snip>
> I'm imagining you and some mathematician about a millenium ago, sitting > in an ivory tower. The guy comes to you and says, "I'm thinking about [quoted text clipped - 5 lines] > perceives subtraction. Let's not pursue that approach; let's do > something more user-friendly." humorous, but not accurate nor to the point IMO. I wrote a paragraph on the flaws in this analogy, but it was boring even if true, so I'll spare you.
> > > Integer is most certainly not a document. > > [quoted text clipped - 7 lines] > database, etc. And to manage this process effectively, he needs > a strong suite of tools, Yes, she does.
> chief among them a type system, static > analysis, and a way to structure and constrain data. I don't disagree, just have a more flexible way of doing that IMO.
<snip>
> > > > It is only when you look at the interface between the software > > > > and the machine that you would want to think of a number as not being a > > > > type of string. > > Looking at this paragraph again, this is exactly what I disagree with. In order for me to agree with it, I have to add to the start "Within the software, ...". The computer (behind the scenes software) might want to persist integers differently in memory or on disk than if they were strings of numeric characters. It can do that behind the scenes. Otherwise it simply needs to know what functions to apply and how to apply them for all subtypes of strings.
Then outside of the computer, the s/w developer needs to know semantics in order to develop the software properly, defining and applying functions appropriate to the types of variables, for example.
Basically, I'm taking the schema and constraints out of the dbms tool and putting it with all of the rest of the code, so it is handled just like other data models used in the code, such as the UI data model. The software applications should be able to execute a function on a model of some data that gets the output to a browser and another that gets the output to a database for storage on disk. It should be able to execute a function on a model of data that pulls in values from a browser or from a web service, xml document, or database.
<snip>
> > Unless I am misunderstanding, a counter-example to that statement would > > be a tree where instead of a list type, we have a name for a [quoted text clipped - 7 lines] > again: in thinking about trees, I'm trying *just* to think about > trees. Trees with nodes that could have random values, completely independent of anything else? Then how do you get SQL into this picture. You are right, I'm confused.
> Anything that's also a problem when you have exactly one > level will of course also be a problem when you have a multi-level > tree. Solve the one-level case and you've solved the multi-level > case, assuming you handle multi-level data. So I don't consider > SQL's null problems to be a tree issue, even though those problems > *also* show up when you're thinking about trees. You don't have to try to get me to understand, but I really am confused about the trees you are looking at and if you give my brain (I swear it used to be a whole lot better) another chance, I will try again. You are saying that all trees that have a certain form are easy for SQL. But without something on those nodes, and only a general shape for the tree, I'm just not getting it.
> I also said: > [quoted text clipped - 3 lines] > > which puts in fairly well, I think. Then what does SQL have to do with your tree? What does your tree look like and how can SQL work with it?
> > > > > The fact that so many progammers favor iteration over recursion > > > > > is something I consider odd, given that iteration is strictly [quoted text clipped - 14 lines] > My fault; I went from the term ("tail call optimization") to the > abbreviation ("TCO") too abruptly. No, I caught that you were using TCO for tail call optimization -- I just hadn't heard it before.
> > Are you saying that if I write a recursive method in Java then the > > compiler has or might have a feature that mitigates this or that the [quoted text clipped - 8 lines] > > Uh, yes. OK, I can still be taught new tricks.
> > Did I miss > > a memo that everyone else got that said not to worry about recursion > > eating memory? > > Well, you still have to worry if your language's implementation doesn't > have TCO. OK, so I won't do a major shift right now then.
> But it's not a *fundamental* problem. You also have to worry > if your recursive method isn't tail-recursive, but I'm proposing that > a recursive translation of an iterative algorithm can be necessarily > tail recursive. this is outside of anything I know about
> I'll still have to check on that. While optimizations are taking place, perhaps it could rewrite the code to show recursion instead of iteration so I don't have to change? :-)
> > > If programmers put as much effort into recursion as they put into > > > iteration, I assert it would provide increased maintainability. [quoted text clipped - 11 lines] > > Blew my mind the first time I saw it. Will do.
> > > The question in the large is: what are all the different logical > > > data structures, how might we query them, and how might we update [quoted text clipped - 4 lines] > > Uh, does it have natural join? I'm guessing you know the answer, eh?
> > I'm just a practitioner dabbling in theory (and, worse yet, maybe just > > a s/w dev manager dabbling in practice) but if I were told I had to [quoted text clipped - 4 lines] > to suspect there's something interesting there before I look. Nothing > so far. fair enough.
> > > But one can also *gain* from restrictions, as well. This point > > > is often missed. It's why constraints are valuable, and it's [quoted text clipped - 13 lines] > > PS. Good grief we are both long-winded, eh? Yup, let's just hope no one else is attempting to follow this one. smiles --dawn
Marshall Spight - 26 Jul 2005 06:06 GMT > > > > Okay, you're headed down a dead end road here. Watch out! > > > [quoted text clipped - 3 lines] > > We look alike and both carry mace, but otherwise no. Ha ha! Hawkgirl is my second favorite member of the JLA.
> > You're conflating lexical and semantic issues. This is that dead > > end I warned you about, and I fear you've driven into it at 60 mph. > > I'll step on the gas then. I admire your spirit!
> > > A number is a string with some additional functions. So, once I realize > > > this string is a number, I can apply numeric functions. But for any [quoted text clipped - 5 lines] > It changes it, but definitely does not toss it out the window. It > becomes more flexible. I don't believe it's possible to have a static type system where you can update variables in such a way that they become of a more general type. Once you do that, the type of the variable has to change at runtime, and if that happens, you do not have a *static* type system by definition.
I can see how to make your idea work with a dynamically typed language, but not with a statically typed one.
> The representation of a number, like the representation of a word, are > Strings and that is what we are working with in our software > applications. An Integer is-a String. I do not agree that what our software works on is simply the string representation of our data. (It is in TCL, and some other system, but not generally.)
> If my "1" is-a > Integer, I can add 2 to it to get another representation -- "3" You sure can, but only in a dynamically typed language with implicit coercions. These are certainly workable, but I don't consider them a good choice for data management.
> > No. Lexically, all my source code is a string; semantically, it > > is many different types. A Java source file is one big string, [quoted text clipped - 8 lines] > > You don't lose that as completely as you are suggesting. So you say, but you don't explain how to get around the problem with my below "prepend the tilde" example.
> > So, how do two strings sort: lexicographically or numerically? > > if you are treating values as strings, then lex... and if they are both > of the subtype number, then the sorting function there overrides the > string sort. And does this determination happen at runtime or at compile time? If the answer is "at runtime" then you've precluded static typing. Which is not necessarily disastrous, but it's not a choice I'd make.
> You cannot sort a set of Colors and Integers together unless you bump > up in the type hierarchy until you are seeing them both as Strings or > something with the same ordering. But this should be automatic, right? I mean, that's the definition of subtyping: being able to use a more specific type in place of a more general supertype. In this case, since strings can be sorted, and you've stated that everything is a subtype of string, then everything can be sorted. So you necessarily can sort colors and ints together. The question is, what sort function is used?
Or are you doing away with substitution?
> > So I can have a variable > > of type int, with an int value in it (which is also a string) and > > invoke a method to prepend a ~ character to the string, right? > > Yes and that would not violate that this was a string. But it *would* violate that this was an int, and so the *variable* would either have to change its type, or not have one in the first place (dynamic typing) or else your system will allow variables to contain values of a different type than the variable is declared as. Or you just don't allow update operators.
Or you just don't have variables. But then it's hard to manage updatable data.
On a related note, I really don't think that if you sat down and, without thinking about representation, wrote down all the operators that apply to string and all the opertors that apply to int, I don't think you'd see much overlap. You certainly don't in most popular languages. Now, it's certainly possible to treat strings as if they were integers via a partial mapping. It's also possible to have a (partial or total) mapping from integers into strings. This doesn't mean they are the same thing, though, any more than a mapping from the even numbers to the odd numbers means that even and odd numbers are the same. You can *represent* every even number with an odd number, you know.
> Since I want > that source code all persisted with any databases it updates, each > database would have all the data and functions it needs to address > inconsistencies. This seems like it would really increase the coupling, which I don't think you'd want to do. Wouldn't it be better to have the system such that the applications were *less* coupled to the dbms rather than *more* coupled?
> > I'm imagining you and some mathematician about a millenium ago, sitting > > in an ivory tower. The guy comes to you and says, "I'm thinking about [quoted text clipped - 9 lines] > on the flaws in this analogy, but it was boring even if true, so I'll > spare you. Well, I didn't particularly intend it to be humorous, although perhaps I am hilarious just out of habit. Ha ha, I'm funny!
My point was that I don't think it's a good idea to use "how people think about things today" as a hard design constraint, because it precludes any possiblility of coming up with *a better way to think about things*, which is where the *real* progress is made.
> > And once you type the integer in, you can just forget about it? > > No; the human is also in charge of the integer as it moves around [quoted text clipped - 3 lines] > > Yes, she does. [Let me just state for the record that my singular indefinite "he" is not gender specific. Rather it is a consequence of the lack of a gender inspecific pronoun in the English language, coupled with a wish to avoid the difficulties of speaking of indefinite people in the plural. Void where prohibited. Driver carries no change.]
> > chief among them a type system, static > > analysis, and a way to structure and constrain data. > > I don't disagree, just have a more flexible way of doing that IMO. I don't think you can do any static analysis with your approach. You can still do structure and constraints, though.
> In order for me to agree with it, I have to add to the start "Within > the software, ...". The computer (behind the scenes software) might > want to persist integers differently in memory or on disk than if they > were strings of numeric characters. It can do that behind the scenes. It cannot do these things without a static type system...
> Otherwise it simply needs to know what functions to apply and how to > apply them for all subtypes of strings. ... But this part does not require static typing.
> Then outside of the computer, the s/w developer needs to know semantics > in order to develop the software properly, defining and applying > functions appropriate to the types of variables, for example. This also does not require static typing. (Although I and others claim this is easier for the developer to do when static typing is available-- but that claim is merely anecdotal.)
> Basically, I'm taking the schema and constraints out of the dbms tool > and putting it with all of the rest of the code, so it is handled just [quoted text clipped - 4 lines] > execute a function on a model of data that pulls in values from a > browser or from a web service, xml document, or database. Have you worked much with multi-application databases? Because this seems hard to do in that situation. The UI for a particular program is not shared among different applications; the schema and constraints are.
> > Okay, I thought I said it pretty well the first time, but I'll try > > again: in thinking about trees, I'm trying *just* to think about [quoted text clipped - 3 lines] > of anything else? Then how do you get SQL into this picture. You are > right, I'm confused. It's not that they are random, nor that they are independent of anything else. It's simply that I'm trying to *isolate* those properties specific to trees.
Java does not have a way to declare that a reference type must not be null. (Other languages, including Nice, and SQL, do.) Let's say I was looking at tree handling in Java. I can handle dynamic trees nicely in Java. But HA! you point out: Java doesn't have a way to specify that a reference type within that tree must be non-null. So Java doesn't handle trees that well. I say, yes, that *is* a flaw with Java, but it's not a flaw with how Java handles *trees*, it's a flaw with how Java handles reference types. It's not a tree issue at all; it's a reference type issue. If you fix this issue, as has been done in Nice, it doesn't mean you can handle trees any better; it means you can handle reference types better, whether they occur in trees, or in lists, or in objects, or by themselves.
Likewise, SQL's limitations regarding multivalue attributes, or ordered data, are real, but you run into them all over the place; they don't have anything to do with trees *per se.* If you added a generic list type to SQL, it wouldn't make it any better or any worse at handling static heterogeneous
> You don't have to try to get me to understand, but I really am confused > about the trees you are looking at and if you give my brain (I swear it > used to be a whole lot better) another chance, I will try again. You > are saying that all trees that have a certain form are easy for SQL. > But without something on those nodes, and only a general shape for the > tree, I'm just not getting it. I hope the above helps. Again, it's not that the nodes don't have anything in them, it's just that, if I'm thinking specifically about trees, I don't want to consider *at the same time* issues that SQL has whether my data is tree-structured or not.
> > > Did I miss > > > a memo that everyone else got that said not to worry about recursion [quoted text clipped - 4 lines] > > OK, so I won't do a major shift right now then. If you're thinking about *language design*, you should probably incorporate this information right away. I believe that it is a better design choice for a language to include recursion along with a "marketing technique" for conveniently expressing iterative algorithms that is implemented in terms of recursion, for reeling in the C++/Java people. I base this on the fact that recursion is strictly more powerful than iteration. Any time I see two language features, and I have to have both of them, and one is a superset of the other, I figure it makes sense to put the larger one in, and define the smaller one in terms of the larger. Roughly speaking.
> > As for cool examples, check out quicksort in Haskell: > > http://www.haskell.org/aboutHaskell.html > > > > Blew my mind the first time I saw it. > Will do. I can't recommend looking at lots of different languages enough. If all you've ever encountered is the Algol-family, like I had when I started this whole thing, you've encountered only a very narrow slice of what's possible.
> > > > The question in the large is: what are all the different logical > > > > data structures, how might we query them, and how might we update [quoted text clipped - 6 lines] > > I'm guessing you know the answer, eh? I'm guessing it's "no." Having used join a lot and been wildly impressed by its expressive power, I now consider it a must-have language feature.
I have to say, I don't think the process is done yet.
> > PS. Good grief we are both long-winded, eh? > > Yup, let's just hope no one else is attempting to follow this one. On the one hand, one imagines that there are a thousand lurkers for every poster. On the other hand, this thread feels like just you + me + crickets. I think I hear them chirping now.
Marshall
dawn - 26 Jul 2005 15:18 GMT I'll try to snip mercilessly and attempt short responses
> I don't believe it's possible to have a static type system where > you can update variables in such a way that they become of a more > general type. ...
> Once you do that, the type of the variable has to > change at runtime, A value can be seen as being of a more general type with a different variable. I might not have my terms correct, but I think of it as static typing if each variable itself doesn't change type. There is no reason to require exactly one name/type per data attribute value. That is what I propose, recognizing there are associated risks. A compiler can only find inconsistencies within a compiled unit, but a database that includes all code that uses the dbms api would have all such units available. ...
> > > So I can have a variable > > > of type int, with an int value in it (which is also a string) and [quoted text clipped - 4 lines] > But it *would* violate that this was an int, and so the *variable* > would either have to change its type, The key to this is that you also have another variable referencing the very same value, but of a different type. You asked if you could have a variable of type int, with an int value and invoke a method ... and you can, but you would not stick the tilde into that value using your int variable. If there is a chance for your int variable to do something with such a value after you do that, that would give a compiler error. ...
> On a related note, I really don't think that if you sat down and, > without thinking about representation, wrote down all the operators > that apply to string and all the opertors that apply to int, I don't > think you'd see much overlap. No, but look at your data and see how often a value intended to be an int needs to be treated as a string (e.g. UI I-O)
> > Since I want > > that source code all persisted with any databases it updates, each > > database would have all the data and functions it needs to address > > inconsistencies. > > This seems like it would really increase the coupling, Where are schema stored today? Do you have this same concern with mountain man's approach of putting everything in the dbms?
> which I don't > think you'd want to do. Wouldn't it be better to have the system > such that the applications were *less* coupled to the dbms rather > than *more* coupled? Software apps could be seen as metadata, including validations, contraints, etc. Lots more could be said on this one. ...
> > Basically, I'm taking the schema and constraints out of the dbms tool > > and putting it with all of the rest of the code, so it is handled just [quoted text clipped - 6 lines] > > Have you worked much with multi-application databases? Yes, but not with databases where more than one company could use the database api directly. I'll grant that in this case (do you know any such cases, I'm sure there are some?), my thinking is flawed (and it is likely flawed elsewhere too).
> Because this > seems hard to do in that situation. The UI for a particular > program is not shared among different applications; the schema > and constraints are. Don't forget that I'm requiring all code that uses the dbms api to be available to the dbms. ...
> I hope the above helps. Again, it's not that the nodes don't have > anything in them, it's just that, if I'm thinking specifically about > trees, I don't want to consider *at the same time* issues that SQL > has whether my data is tree-structured or not. That didn't clarify it for me. I think at this point I would need your definition of this type of tree restated, with an example of such a tree and your claim about how SQL has any relationship to this type of tree. However, if you are sure that what you are thinking is accurate, you don't need to try to bring me along for the ride. Sometimes my brain just doesn't work. ...
> If you're thinking about *language design*, you should probably > incorporate this information right away. I have no plans to design a language, but I'm happy to learn anyway ...
> Any time I see two language > features, and I have to have both of them, and one is a superset > of the other, I figure it makes sense to put the larger one in, > and define the smaller one in terms of the larger. Roughly speaking. Funny, but that is precisely why I advocate for a graph model over a strictly relational model. There is no practical problem of which I am aware in including set operations along with a graph model. ...
> I can't recommend looking at lots of different languages enough. > If all you've ever encountered is the Algol-family, like I had > when I started this whole thing, you've encountered only a very > narrow slice of what's possible. I'm at least passable in the following general purpose languages: Java, COBOL/CICS, Fortran & BASIC (all of which I have taught to college students at one time or another) and dabbled with several others (Pascal, C, C++, RPG(!), ...). I did read a bit about Haskell and other functional languages, but haven't played with them. Of course, this is in addition to several non-general purpose languages such as SQL, HTML (is it a language or a parm file?), maybe even JavaScript.
> > > > ... XQuery. > > > [quoted text clipped - 6 lines] > by its expressive power, I now consider it a must-have language > feature. I don't actually know the answer, I just figured you did, given the question. Perhaps an investigation is in order?
cheers! --dawn ==========OT below =============================
> > > And once you type the integer in, you can just forget about it? > > > No; the human is also in charge of the integer as it moves around [quoted text clipped - 9 lines] > a wish to avoid the difficulties of speaking of indefinite people > in the plural. Void where prohibited. Driver carries no change.] I was hoping you would jump on that one. I just had a break-through in this ongoing discussion with my husband. I proofread something he wrote (that might sound ridiculous for anyone reading me here as I don't proofread these and sometimes ramble on and on like this). I told him that I really disliked the alternating he/she pronouns and much preferred the plural pronoun to match a singular noun.
I used this specific document to show him that when the author is intending to make the reader relate to a scenario but then tosses in a pronoun that is not a match for the reader, the reader adapts by thinking of some other person of the matching gender. They can still understand the sentence and might not mind it, but they do not relate to the statement in the same way. (See how easy it was to read "they" for the reader -- it will get even easier to roll with it over time). The scenario is then about someone else and not about the reader.
Given that we already use "you" for both singular and plural, what is the harm in adapting our language to use "they" and "their" for both as well? Let's just do it and be done with it. He agreed with the argument this time (perhaps I've just worn him down) but changed the wording to avoid the problem.
Marshall Spight - 26 Jul 2005 15:49 GMT > > > > So I can have a variable > > > > of type int, with an int value in it (which is also a string) and [quoted text clipped - 12 lines] > something with such a value after you do that, that would give a > compiler error. Okay, but that means you can't in general substitute a subtype for a supertype. That means your language doesn't have subtyping, only inheritance. Of the two, subtyping is the more valuable, so I don't think this is a good design choice.
> ... > > On a related note, I really don't think that if you sat down and, [quoted text clipped - 4 lines] > No, but look at your data and see how often a value intended to be an > int needs to be treated as a string (e.g. UI I-O) Yes, that's common. But sharing common functionality isn't how we normally think of subtyping. Substitutibility is.
> > > Since I want > > > that source code all persisted with any databases it updates, each [quoted text clipped - 5 lines] > Where are schema stored today? Do you have this same concern with > mountain man's approach of putting everything in the dbms? I'm not sure I understand mountain man's ideas. But I certainly think hard coupling applications to schema is going to reduce maintainablitity. Look at how bad this issue is today. (Hey, how do you do ad-doc queries if the database has to know everything ahead of time?) I think the better approach is to figure out how to have applications that are able to adjust to schema dynamicaly.
> > which I don't > > think you'd want to do. Wouldn't it be better to have the system [quoted text clipped - 3 lines] > Software apps could be seen as metadata, including validations, > contraints, etc. Lots more could be said on this one. I don't see it. Type information, schema, and data values need integrity management. What do apps need managed?
> > If you're thinking about *language design*, you should probably > > incorporate this information right away. > > I have no plans to design a language, but I'm happy to learn anyway I think a lot of the issues you are talking about above are language design issues.
> ... > > Any time I see two language [quoted text clipped - 5 lines] > strictly relational model. There is no practical problem of which I am > aware in including set operations along with a graph model. But since the reverse is also true, and the relational model is simpler, that tends in favor of the relational model as the design choice.
> I'm at least passable in the following general purpose languages: Java, > COBOL/CICS, Fortran & BASIC (all of which I have taught to college > students at one time or another) and dabbled with several others > (Pascal, C, C++, RPG(!), ...) [...] Of course, > this is in addition to several non-general purpose languages such as > SQL, HTML (is it a language or a parm file?), maybe even JavaScript. Mostly, these languages are all of the same family. SQL and Javascript are the exceptions. We all know about SQL, ha ha. With Javascript you might not have run into the differences because Javascript often is used in a very dynamically-typed-Java style.
> ==========OT below ============================= > > > > And once you type the integer in, you can just forget about it? [quoted text clipped - 13 lines] > argument this time (perhaps I've just worn him down) but changed the > wording to avoid the problem. Yeah, but then everything else has to become plural, and that's often awkward.
"And once they type the integer in, can they just forget about it? No; the humans are also in charge of the integer as it moves around the computer, across function calls, across the network, into the database, etc. And to manage this process effectively, they need a strong suite of tools,"
Marshall
dawn - 26 Jul 2005 16:47 GMT only time for one of the topics
> > ... > > > Any time I see two language [quoted text clipped - 7 lines] > > But since the reverse is also true, but didn't you start out trying to figure out how to minimally extend the relational model to handle graphs or something like that?
> and the relational model is simpler, Perhaps it is theoretically simpler, which translates to simplicity for the lower level software, but I have never seen it be simpler in practice for building applications. "Simpler" for whom?
> that tends in favor of the relational model as the > design choice. My experience related to my and my teams' productivity with (flawed) implementations of each, tells me otherwise (by orders of magnitude in dollars). Admittedly, there are trade-offs and there were more differences than relational vs graph models, such as 3VL vs 2VL, strong vs. weak typing, etc.
--dawn
Gene Wirchenko - 26 Jul 2005 18:01 GMT [snip]
>On a related note, I really don't think that if you sat down and, >without thinking about representation, wrote down all the operators ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I think that this is the important part.
>that apply to string and all the opertors that apply to int, I don't >think you'd see much overlap. You certainly don't in most popular >languages. Now, it's certainly possible to treat strings as if [snip]
>> > I'm imagining you and some mathematician about a millenium ago, sitting >> > in an ivory tower. The guy comes to you and says, "I'm thinking about [quoted text clipped - 18 lines] >way to think about things*, which is where the *real* progress >is made. And even so, it can take a long time to get it into use.
[snip]
>[Let me just state for the record that my singular indefinite "he" >is not gender specific. Rather it is a consequence of the lack of >a gender inspecific pronoun in the English language, coupled with >a wish to avoid the difficulties of speaking of indefinite people >in the plural. Void where prohibited. Driver carries no change.] There are two. One is "he" which has a gender-neutral meaning when gender is unknown. The other is "it".
[snip]
>> > As for cool examples, check out quicksort in Haskell: >> > http://www.haskell.org/aboutHaskell.html [quoted text clipped - 6 lines] >when I started this whole thing, you've encountered only a very >narrow slice of what's possible. That was interesting code. I am interested in how a language works in general, since it is not enough just to code the part that language is good for.
[snip]
>> Yup, let's just hope no one else is attempting to follow this one. Bzzzt!
>On the one hand, one imagines that there are a thousand lurkers >for every poster. On the other hand, this thread feels like just >you + me + crickets. I think I hear them chirping now. I do *not* chirp while reading.
Sincerely,
Gene Wirchenko
dawn - 26 Jul 2005 20:56 GMT > [snip] > > >On a related note, I really don't think that if you sat down and, > >without thinking about representation, wrote down all the operators > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > I think that this is the important part. definitely important for the conceptual and logical data models
> >[Let me just state for the record that my singular indefinite "he" > >is not gender specific. Rather it is a consequence of the lack of [quoted text clipped - 4 lines] > There are two. One is "he" which has a gender-neutral meaning > when gender is unknown. but does not (typically) render a gender-neutral picture in people's minds (at least not in my mind). Men might view themselves as a possible "he" while women will tend to picture someone apart from themselves filling that role.
I have no idea to what extent such sloppiness in the language from the start has lead to incorrect perceptions, but I'm quite sure the set thereof is not null.
> The other is "it". I'd prefer to be called an "it" than a "he". In the 80's most of my IT mailings were addressed to Donald Wolthuis. So, some people even ensure that the proper nouns sound male. Enough already!
> [snip] > [quoted text clipped - 18 lines] > > Bzzzt! Rats -- you found us. And, as with any of my postings, if I wrote anything that wasn't exactly brilliant, please disregard and don't count it against me. Are you left with a null? smiles. --dawn
Gene Wirchenko - 26 Jul 2005 23:21 GMT [snip]
>> >[Let me just state for the record that my singular indefinite "he" >> >is not gender specific. Rather it is a consequence of the lack of [quoted text clipped - 13 lines] >start has lead to incorrect perceptions, but I'm quite sure the set >thereof is not null. Nor I.
>> The other is "it". > >I'd prefer to be called an "it" than a "he". In the 80's most of my IT >mailings were addressed to Donald Wolthuis. So, some people even >ensure that the proper nouns sound male. Enough already! I, OTOH, have had my name corrected to "Jen" in one mailing where it seems it was supposed that only women would be interested. I am interested in how women there are in business.
[snip]
>> >> Yup, let's just hope no one else is attempting to follow this one. >> [quoted text clipped - 3 lines] >anything that wasn't exactly brilliant, please disregard and don't >count it against me. Are you left with a null? smiles. --dawn Zero, ah, zee-row is not the same as null.
Sincerely,
Gene Wirchenko
dawn - 27 Jul 2005 05:22 GMT >> Are you left with a null? > > Zero, ah, zee-row is not the same as null. I missed the word "set" before the ? I use a 2VL where a null is like a null set. cheers --dawn
Marshall Spight - 27 Jul 2005 16:44 GMT > I missed the word "set" before the ? I use a 2VL where a null is like > a null set. What is a 'null set'? Is that like the empty set?
Marshall
dawn - 27 Jul 2005 17:34 GMT > > I missed the word "set" before the ? I use a 2VL where a null is like > > a null set. > > What is a 'null set'? Is that like the empty set? yes
http://www.swif.uniba.it/lei/foldop/foldoc.cgi?null+set http://en.wikipedia.org/wiki/Null_set
--dawn
Marshall Spight - 27 Jul 2005 19:47 GMT > > > I missed the word "set" before the ? I use a 2VL where a null is like > > > a null set. [quoted text clipped - 5 lines] > http://www.swif.uniba.it/lei/foldop/foldoc.cgi?null+set > http://en.wikipedia.org/wiki/Null_set Wow. I hadn't heard that term before. Given how much confusion there is around the semantics of null in SQL, I think I'm going to steer clear of it, especially in this field. I can easily see it causing confusion where the more popular "empty set" term wouldn't.
Marshall
dawn - 27 Jul 2005 20:10 GMT > > > > I missed the word "set" before the ? I use a 2VL where a null is like > > > > a null set. [quoted text clipped - 7 lines] > > Wow. I hadn't heard that term before. I'm really surprised by that. I wonder if/when that term lost favor. I'll make a mental note.
> Given how much confusion > there is around the semantics of null in SQL, I think I'm going > to steer clear of it, especially in this field. Hmmm. It seems an interesting sport to derive a term from one discipline, alter the meaning, and then stop using it in the discipline it was taken from so as not to confuse things. Language is so fluid.
> I can easily > see it causing confusion where the more popular I didn't realize it was. I'm sure I've spoken the words "null set" more than "empty set" in my life. We both learned somethin' there. Thanks. --dawn
> "empty set" term > wouldn't. Marshall Spight - 28 Jul 2005 03:50 GMT > > > > What is a 'null set'? Is that like the empty set? > > > > [quoted text clipped - 5 lines] > [...] We both learned somethin' there. > Thanks. --dawn If my ignorance can help just one person, it will have been worth it.
Marshall
Gene Wirchenko - 27 Jul 2005 17:45 GMT >>> Are you left with a null? >> >> Zero, ah, zee-row is not the same as null. > >I missed the word "set" before the ? I use a 2VL where a null is like >a null set. cheers --dawn What are the two values of your logic? I use true and false myself.
Sincerely,
Gene Wirchenko
dawn - 27 Jul 2005 19:12 GMT > >>> Are you left with a null? > >> [quoted text clipped - 4 lines] > > What are the two values of your logic? I use true for the physical switch being set to the up position, right?
> and false and down position. Yes, Gene, those are the two I choose as well, ignoring the shades of gray. --dawn
Jonathan Leffler - 28 Jul 2005 06:47 GMT >> What are the two values of your logic? I use true > [quoted text clipped - 4 lines] > and down position. Yes, Gene, those are the two I choose as well, > ignoring the shades of gray. --dawn These comments suggest an American (USA) perspective on the polarity of switches. In the UK, a (light) switch that is up is normally off and one that's down is normally on - at least, for most switches. It still catches me out.
(I hope I have the attributions right - apologies if not.)
 Signature Jonathan Leffler #include <disclaimer.h> Email: jleffler@earthlink.net, jleffler@us.ibm.com Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/
dawn - 28 Jul 2005 16:00 GMT > >> What are the two values of your logic? I use true > > [quoted text clipped - 9 lines] > one that's down is normally on - at least, for most switches. It still > catches me out. I have never thought about whether hardware switches had this same pattern. I wasn't referring to just any switches, but those found on the front panel of a computer way back when. I hit the tail-end of the hardware switch era, but I do recall setting physical switches in the late 70's. This was then imitated with parameter settings. Well into the 80's I recall variables named with -sw or -switch if they held values of 1 or 0.
So, I wonder if computers in the UK had hardware switches reversed from US hardware. Were there computers where up was 0 and down was 1? --dawn
> (I hope I have the attributions right - apologies if not.) > > -- > Jonathan Leffler #include <disclaimer.h> > Email: jleffler@earthlink.net, jleffler@us.ibm.com > Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/ paul c - 26 Jul 2005 19:11 GMT Marshall Spight wrote:
> ... >>>database, etc. And to manage this process effectively, he needs [quoted text clipped - 6 lines] > a gender inspecific pronoun in the English language, > ... notwithstanding its idiosyncratic grammar, part of the economy of written English is due to the fact that 'he' was regarded for centuries as standing for anybody, ie. it was understood, at least in print, to be gender non-specific. literal revisionists have created just as many problems based on imaginary slurs that don't really need to be solved in the social sciences as they have in the computer sciences. i'm surprised that they haven't tried to replace the royal 'we' that so many writers still use.
p
 Signature Apologies for my broken keyboard. I'm using the keyboard combination 'kw' to substitute for the broken key that stands for the letter that falls between 'p' and 'r' in this alphabet.
-CELKO- - 17 Jul 2005 11:23 GMT In trees and hierarchies broke them into fast/slow changing nodes and fast/slow changing structures. An org chart has slow structural changes, and higher node (personnel) changes. A message board has fast structural changes (postings), and very slow node (message) changes. Etc.
Marshall Spight - 17 Jul 2005 15:40 GMT > In trees and hierarchies broke them into fast/slow changing nodes and > fast/slow changing structures. An org chart has slow structural > changes, and higher node (personnel) changes. A message board has fast > structural changes (postings), and very slow node (message) changes. > Etc. Interesting. This kind of analysis would be for performance concerns, yes? And in words two through four of your post, you're referring to ISBN: B0002Z31P4, right? I've gotta read that.
Marshall
-CELKO- - 18 Jul 2005 15:05 GMT >> in words two through four of your post.. <, Yes, it is the book. Sorry my typing is bad right now. I separates two fighting dogs and my fingers are in band-aids.
dawn - 18 Jul 2005 13:57 GMT > I've been thinking about trees in the abstract lately, and > trying to classify them. I am not talking about trees as [quoted text clipped - 19 lines] > dynamic heterogeneous: parse tree. There are specific kinds of > nodes, but the structure of the tree is relatively unconstrained. Or "often constrained by the grammar of a language"?
> static heterogeneous: customer/account/invoice/line item. Each > level in the tree is fixed, with fixed relationships. > > Here "dynamic" and "static" refer to the structure of the tree. > > SQL handles the third kind of tree extremely well. If you looking at these trees having metadata as nodes down to values as leaf nodes, then I would say that SQL handles some subset of such trees. If a name on a node refers to a multipart value, or a value that is a tuple of dimension > 1, so that it has two or more non-leaf nodes as children, then SQL would require that name to refer to a relation. Also, if a name on a node refers to multiple values, that name must refer to a relation (not a list, for example).
It might be the case, however, that trees of this type can be converted into a tree structure that SQL can handle where only metadata is lost in the conversion. For example, if "Organization" has a child node of "address" which has child nodes that include "city" and "postcode" then SQL isn't going to do well with references made to "address". Or am I misunderstanding?
> The first > two kinds, not so much. In particular, noth that the transitive [quoted text clipped - 10 lines] > (fk) to their parent, while the homogeneous and dynamic hetero > types are done with a node having references to its children. I work with a model that I think fits into your static homogeneous category, but where an fk can go either direction (using multivalues).
> Comments? Has anyone else written about this sort of thing? I find it somewhat interesting and worth pursuing, but this partitioning doesn't resonate with me. cheers --dawn
> Marshall
|
|
|