Database Forum / General DB Topics / DB Theory / January 2009
native xml processing vs what Postgres and Oracle offer
|
|
Thread rating:  |
salmobytes - 10 Nov 2008 14:17 GMT I'm thinking about starting a hobby project. I wrote a files-based Bulletin Board years ago. I'd like to convert it to a more database-like system, so password-identified users could edit old posts.
Forums are inherently hierarchical and hierarchies are tough with relations. I have modeled relational hierarchies in the past, with what Joe Celko calls "path enumeration." That works, but it's a bit ugly and making indexes on loooong path-strings that all start off the same way is messy.
I'd rather work directly with hierarchies: with XML. For native XML processing you can work with SleepyCat, Exist or even Xindice (or expensive proprietary products like Mark Logic or Ipedo).
But Oracle and Postgres have a way to stuff XML blobs into a column now, and also a way to query with XPath over those XML blobs
The native XML databases (SleepyCat etc) all require running a tomcat server, while postgres is a bit easier to setup and install.
PUNCHLINE QUESTION, sort of. I've worked with SleepyCat, Exist and Tomcat. It's pretty powerful stuff and it has a lot to offer. Mapping between GUI and data is oh so easy with hierarchies, compared to relations. But what about performance? I've never worked with Oracle/Postgres XML XPath querying. And I've heard rumors it's dog slow above a certain size/traffic threshold. Any comments? Anybody done much with Postgres/XML? Have any comparisons to SleepyCat?
paul c - 10 Nov 2008 18:40 GMT ...
> Forums are inherently hierarchical and hierarchies are > tough with relations. I have modeled relational hierarchies in > the past, with what Joe Celko calls "path enumeration." > That works, but it's a bit ugly and making indexes on loooong > path-strings that all start off the same way is messy. > ... Suggest that it is more accurate to say that forums are typically implemented with hierarchical techniques. It would be even more accurate to say that forums are inherently ordered, eg., by date & time within topic. I don't know what a relational hierarchy could be, but for sure the RM is not concerned with ordering. I suspect you are not talking about real hierarchies but rather about graphical presentation techniques, about which the RM has nothing to say.
The forums I see are concerned with messages but not message content, whereas it is the readers who are concerned with message content, eg., it is the audience who operate on the content, not the system. I think you are talking about using the RM for physical storage, implying that you don't need its inference ability and in the same breath saying that it is ponderous to use the RM to implement ordering. Well, if that's what you mean, yes it is ponderous to use the RM to implement ordering because the RM doesn't support ordering in the first place!
whileone - 10 Nov 2008 19:07 GMT > Suggest that it is more accurate to say that forums are typically > implemented with hierarchical techniques. It would be even more > accurate to say that forums are inherently ordered, eg., by date & time > within topic. I'm not sure what you point was.
Yes, forum "topic headings" are ordered by date and time. But each topic also has 0 or more child responses, and child responses might be responses to responses, rather than responses to topic headings. That's a tree (a root node with nested children). And a tree is a hierarchy. You do need all those parent/child relationships.
Using relations to model hierarchies is possible but tricky (Joe Celko has a good book). With XML and XPath it is a snap. "Relational Databases" like Oracle and Postgres and now Mysql_5.1 (it turns out) do now (also) support XPath queries over XML (hybrid systems where the XML is stored as big text blob....while new system functions khow how o forget all about SQL and do XPath over those blobs). The how-its-done details are a little more straightforward with 'nativel XML' databases. But conceptually (from a client developer's point of view) it's all much the same. Hierarchical XML is better at hierarchies than relations.
paul c - 10 Nov 2008 20:20 GMT >> Suggest that it is more accurate to say that forums are typically >> implemented with hierarchical techniques. It would be even more >> accurate to say that forums are inherently ordered, eg., by date & time >> within topic. > > I'm not sure what you point was. Forums are *not* inherently hierarchical. You can choose to present them that way, but it's not necessary and possibly misleading.
> Yes, forum "topic headings" are ordered by date and > time. But each topic also has 0 or more child responses, and child [quoted text clipped - 4 lines] > do need all those > parent/child relationships. Believe that if you want but there is no guarantee in any forum I've ever seen that response n, quoting response n-1, has any relationship to say, response n-2, or vice-versa. It might be seen as some kind of graph but not necessarily a tree.
(Usenet rfc 1036 dictates a generated value called MESSAGE-ID. I gather that many html-based forums can't guarantee such an attribute, maybe that complicates them. RM doesn't say anything about generated attribute values, except indirectly in the Information Principle.)
> Using relations to model hierarchies is possible but tricky (Joe Celko > has a good book). [quoted text clipped - 10 lines] > much the same. Hierarchical XML is better at hierarchies than > relations. Just an aside, I suspect that these relatively low-level programming gizmos result in implementations that are just as complicated as the rarely implemented TCLOSE relational operator. One reason I think TCLOSE is fundamental is that while a relation such as {Part#, Sub-Part#} is capable of the same information content as a tree, I believe that without something akin to TCLOSE, it is impossible to express certain constraints, such as preventing cycles.
whileone - 10 Nov 2008 20:33 GMT > Forums are *not* inherently hierarchical. You can choose to present > them that way, but it's not necessary and possibly misleading. Oy yoy yoy. This thread started off by me (with a different reader, different login name) saying "I have a files-based forum that I WROTE that I want to convert to a more database like system."
Well, MY FORUM is hierarchical. I want to preserve that. In my forum every post is either a topic heading, a response directly to a topic heading, or a response to a response.
The parent/child relationships are important. I was not at any point referring to usenet. The subject was: forums that happen to be hierarchical, but you had to read too carefully to gather that, perhaps.
whileone - 10 Nov 2008 20:41 GMT http://dirtyharrysplace.com/wp-content/uploads/2008/05/fam_pepto_bismol_liquid_8 _oz-5715.jpg
paul c - 10 Nov 2008 20:49 GMT >> Forums are *not* inherently hierarchical. You can choose to present >> them that way, but it's not necessary and possibly misleading. [quoted text clipped - 8 lines] > In my forum every post is either a topic heading, a response > directly to a topic heading, or a response to a response. Have it your way, but that sounds like three distinct relations to me. (That is, if the RM is to be used. Just out of curiousity, what is wrong with two relations, posts that aren't responded to and posts that are responses?)
> The parent/child relationships are important. > I was not at any point referring to usenet. > The subject was: forums that happen to be hierarchical, > but you had to read too carefully to gather that, perhaps. As we know, it is sadly true that the technocrats will usually tend towards force-fitting an application to their pet programming protocol. Usually, when Joe C's name is mentioned in a theory group, one may safely stop reading. I'll admit there are exceptions, but he can obfuscate with the best of them.
Bob Badour - 10 Nov 2008 22:06 GMT >>Forums are *not* inherently hierarchical. You can choose to present >>them that way, but it's not necessary and possibly misleading. [quoted text clipped - 6 lines] > > Well, MY FORUM is hierarchical. I want to preserve that. Ordinarily, when one re-writes something from scratch, one tries to correct the fundamental design flaws from the first version. If you want to go to extra effort to perpetuate your design flaws, nobody will stop you.
What I don't understand is why you would want to publicise the fact or talk about it in public.
Brian Selzer - 30 Nov 2008 21:56 GMT >>>Forums are *not* inherently hierarchical. You can choose to present >>>them that way, but it's not necessary and possibly misleading. [quoted text clipped - 14 lines] > What I don't understand is why you would want to publicise the fact or > talk about it in public. You're a moron. Structure that is inherent to that which is being modeled can be reflected in a design but cannot be a design flaw because it isn't introduced or imposed by a designer.
paul c - 01 Dec 2008 14:43 GMT ...
> You're a moron. ... No he isn't (unless all morons are so incisive). His acuity is to be envied by most of us. If I had your initials I'd be more circumspect with my slurs and not so willful in my technical positions.
You could have at least said "pardon me" first.
(Not to prolong the irrelevance forever, but luckily my own initials and the mood of present times in the western world insulate me for now but that could change if I live long enough, so fair's fair.)
Brian Selzer - 01 Dec 2008 22:31 GMT > ... >> You're a moron. ... > > No he isn't (unless all morons are so incisive). His acuity is to be > envied by most of us. If I had your initials I'd be more circumspect with > my slurs and not so willful in my technical positions. Acuity??? ROFLOL!!!
> You could have at least said "pardon me" first. I reserve that for those deserving of common courtesy. As far as I'm concerned, Badour isn't.
> (Not to prolong the irrelevance forever, but luckily my own initials and > the mood of present times in the western world insulate me for now but > that could change if I live long enough, so fair's fair.) patrick61z@yahoo.com - 19 Nov 2008 15:41 GMT > On Nov 10, 1:20 pm, paul c <toledobythe...@oohay.ac> wrote:> Forums are *not* inherently hierarchical. You can choose to present > > them that way, but it's not necessary and possibly misleading. [quoted text clipped - 13 lines] > The subject was: forums that happen to be hierarchical, > but you had to read too carefully to gather that, perhaps. Actually, usenet is often displayed as being hierarchical, for instance with so called "threaded" newsreaders, because within a list of discussions, replies to replies are often more comprehensible when you can follow the subthreads.
This is not just your preference, this is a very typical way of viewing discussions, if you get any flack from that, its not because of the nature of your design, its the fault of the relational advocates that get blinded to these sorts of problems for which the relational data model is not the optimal solution. As you pointed out, even the major vendors are supporting different solutions, any flack you'll get is simply from those dogmatic enough to pound every fixture with a hammer, even if that particular fixture might be better fastened with a screwdriver.
Not everyone with a relational bent is like that, but when posting here you sort of have to put up with that particular breed of poster. 2 cents.
paul c - 20 Nov 2008 02:36 GMT >> On Nov 10, 1:20 pm, paul c <toledobythe...@oohay.ac> wrote:> Forums are *not* inherently hierarchical. You can choose to present >>> them that way, but it's not necessary and possibly misleading. [quoted text clipped - 18 lines] > you can follow the subthreads. > ... Nobody said there's anything wrong with hierarchical displays (or hierarchical physical storage for that matter).
> This is not just your preference, this is a very typical way of > viewing discussions, if you get any flack from that, its not because [quoted text clipped - 9 lines] > here you sort of have to put up with that particular breed of poster. > 2 cents. As the general level of literacy continues to decline more and more of those who fail to recognize the possibility of a logical model will have to put up with that dwindling breed.
rpost - 26 Nov 2008 08:38 GMT [...]
>> Actually, usenet is often displayed as being hierarchical, for >> instance with so called "threaded" newsreaders, because within a list [quoted text clipped - 4 lines] >Nobody said there's anything wrong with hierarchical displays (or >hierarchical physical storage for that matter). [...]
>As the general level of literacy continues to decline more and more of >those who fail to recognize the possibility of a logical model will have >to put up with that dwindling breed. You're evading the question.
Does your logical model for USENET include message ids?
 Signature Reinier
paul c - 26 Nov 2008 16:34 GMT > [...] > [quoted text clipped - 14 lines] > You're evading the question. > ... What question would that be? (The original question was to do with the best product to use to display hierarchical data. The OP planned to invent his own forum, presumably not Usenet-based. I pointed out that he was wrong to assume a forum is hierarchical.)
> Does your logical model for USENET include message ids? It's not always clear in this group what people mean by logical, whether they mean a formal logic or merely something that "makes sense" to them.
So I'm not sure what "message id" means here, whether you mean it in a general sense or to stand for what some rfc's call "message-id" or "msg id" (which seem to have a tarnished history). However if I were to make a model for Usenet, it would try to be logical, ie., it would follow logical principals, like relational algebra does, where, courtesy of closure, one can always choose to validate results by comparing extensions as opposed to re-evaluating a program that is full of navigation verbs.
If you are talking about keys, I would note the similarity with the RM and this from rfc 3977: "Each article MUST have a unique message-id; two articles offered by an NNTP server MUST NOT have the same message-id."
I don't want to disparage the original rfc authors as many of the rfc's date from the days when ignorance of data models was more innocent than today's willful arrogance but it looks to me as if some of them might have been confused about such a basic principle as keys, eg., in rfc 2822: "Though optional, every message SHOULD have a "Message-ID:" field." (Rfc 977 is pretty vague.) Without some key, one never knows precisely what fact one is stating. I suppose all the optional "fields" in the various rfc's might admit a kind of irregular notion of key where some articles or messages might use different sets of fields as keys. For sure, the rfc's I've seen are pre-occupied with syntax and other physical matters and the usual cursor operators of hierarchical stores, "Next" and so forth. Looks like they anticipated, in their own idiosyncratic way, the wild goose chase of the XML fans to find the mythical semantic web, ie., invent some pet syntax and blindly assume it can be a basis for fundamental transformations, ie., new information from a starting point that is as vacuous as the Emperor's New Clothes.
Walter Mitty - 27 Nov 2008 15:26 GMT > I don't want to disparage the original rfc authors as many of the rfc's > date from the days when ignorance of data models was more innocent than > today's willful arrogance A great deal of today's ignorance is innocent.
All babies are born completely ignorant. (I'm discounting genetic "memory" and learning in the womb). Many babies who grow up, graduate from college or graduate school with a CS or SE major ,and emerge from that field of study with no exposure to data modeling or database design. In most cases, they are completely innocent of that omission. If there is any willful arrogance here, it's on the part of the curriculum designers in the faculty.
Many former babies spend from 5 to 15 years in the programming profession before they are given their first assignment where data modeling or database design skills are critical to project success. By that time, they have acquired a kind of arrogance that's based on the following: "Well, I've been a competent professional for about 10 years now, and I've gotten along just fine without data modeling or database design. I'll just pick it up on the fly. How hard can it be?"
This is arrogance, but it's not clear how wilfull it is.
Add to this the fact that many of these heroes have mastered object modeling, and are likely to view data modeling as a trivial subset of object modeling, and you have all the ingredients for ""willful arrogance". Forgive them, for they know not what they do.
Cimode - 28 Nov 2008 09:30 GMT [Snipped]
> Forgive them, for they know not what they do. This is not a church and self promotion is a fact for ignorant who sell bullshit...Being on denial on the good nature of the bad intent of ignorance in database industry is at best pointless, at worst totally offtopic...
Anyway, thank you for making me laugh...
What is going to be next ? A book on called *psychology of the database ignorant or how ignorance should determine relational algebra*...
rpost - 30 Nov 2008 15:53 GMT >> [...] >> [quoted text clipped - 19 lines] >invent his own forum, presumably not Usenet-based. I pointed out that >he was wrong to assume a forum is hierarchical.) To which he replied: but a forum message is often a reply, and in that case, a reply to a specific other message; this is not a presentation feature but a basic structural property of his forum (and of USENET as well); not just of the implementation but at the functional requirement level. You seemed to be flat-out denying this, which raised the question: how would *you* model USENET or his forum?
>> Does your logical model for USENET include message ids? > >It's not always clear in this group what people mean by logical, whether >they mean a formal logic or merely something that "makes sense" to them. I suspect you're being evasive again. By 'your logical model' I meant an implementation independent relational model in terms of the technique you prefer to use yourself.
>So I'm not sure what "message id" means here, whether you mean it in a >general sense or to stand for what some rfc's call "message-id" or "msg >id" (which seem to have a tarnished history). The latter, or I wouldn't have referred to the RFCs. You're evading a straight answer once again.
>However if I were to make >a model for Usenet, it would try to be logical, ie., it would follow >logical principals, like relational algebra does, where, courtesy of >closure, one can always choose to validate results by comparing >extensions as opposed to re-evaluating a program that is full of >navigation verbs. Exactly, and my question is: how would *you* deal, when describing USENET in this way, with the *functional* requirement that a message is usually a reply to a specific other message?
>If you are talking about keys, I would note the similarity with the RM >and this from rfc 3977: "Each article MUST have a unique message-id; two >articles offered by an NNTP server MUST NOT have the same message-id." Exactly. I went into this in another reply in this thread, where I agree with you that this is probably not a good idea, but also give an argument in favor. My question to you is: what specific alternative would you prefer? You reject it because it smells bad, without providing an alternative, let alone proving its superiority. That isn't good enough.
>I don't want to disparage the original rfc authors as many of the rfc's >date from the days when ignorance of data models was more innocent than [quoted text clipped - 5 lines] >in the various rfc's might admit a kind of irregular notion of key where >some articles or messages might use different sets of fields as keys. You can read it that way, but I think it's more reasonable to see it as an example of "be liberal in what you accept": *correct* implementations guarantee the present of a unique Message-ID on each postings and their use in the References: header to establish threads, but the NNTP protocol authors are in no position to guarantee correctness of all newsreaders used on USENET and therefore provide some workarounds to help coping with them. In a logical model I'd omit the workarounds and just say that USENET messages have Message-ID as their primary key.
>For sure, the rfc's I've seen are pre-occupied with syntax and other >physical matters and the usual cursor operators of hierarchical stores, [quoted text clipped - 3 lines] >can be a basis for fundamental transformations, ie., new information >from a starting point that is as vacuous as the Emperor's New Clothes. I share your sentiment but I don't see the connection. NNTP is an implementation-neutral network protocol, it *must* be all about detail.
 Signature Reinier
paul c - 01 Dec 2008 14:19 GMT ...
> To which he replied: but a forum message is often a reply, and in that case, > a reply to a specific other message; this is not a presentation feature > but a basic structural property of his forum (and of USENET as well); For all we know, the OP's forum could be some idiosyncratic mutant, eg., one-user-at-a-time and synchronous. I'd say it would be more useful to consider USENET. Regarding whatever a "basic structural property" is, to be more accurate, the basic structure of USENET is a message. As far as USENET is concerned, a message isn't complete when a user submits it, it is complete when some server has massaged the user message and introduced various "headers" to it. Those headers are the relevant "basic structural property" (attributes, to use Codd's lingo).
> not just of the implementation but at the functional requirement level. > You seemed to be flat-out denying this, which raised the question: > how would *you* model USENET or his forum? > ... I'm not denying any such thing. As far as how I would model USENET goes, first, I'd list my "functional requirements" and single out the ones that are expressible in terms of formal constraints, as well as apply the Information Principle and identify the attributes (properties if you like) that I wanted to record for each message, then form predicates and constraints that would admit whatever display presentation or presentations I desired.
I doubt if after that, there would remain any pertinent reason, ie., need, for the word "forum" other than as a label for the resulting application. Probably the constraints would have to counter various loopholes in the RFC's as well.
Walter Mitty - 01 Dec 2008 15:30 GMT > ... >> [quoted text clipped - 6 lines] > one-user-at-a-time and synchronous. I'd say it would be more useful to > consider USENET. Why wouldn't it be more useful to respond to the OP? Are you trying to answer a question raised by the OP, in terms that make sense to the OP? Or are you trying to generalize the OP's question into one that is relevant across a largwer universe of messages?
paul c - 01 Dec 2008 16:05 GMT >> ... >>> To which he replied: but a forum message is often a reply, and in that [quoted text clipped - 9 lines] > Or are you trying to generalize the OP's question into one that is relevant > across a largwer universe of messages? I did reply to him in the first place. Suggested he was barking up the wrong tree by fastening on xml, what he called path enumeration and some difficulty he imagined to do with "enumerating long strings". I wish somebody had told me the same thirty years ago. Maybe he will see the forest, maybe he will not.
Looks like I was wrong about one thing, though. As far as the "larger universe" of this group is concerned, it seems the confusion of presentation with a model's underlying representation is more wide-spread than I thought.
Walter Mitty - 02 Dec 2008 16:28 GMT >>> ... >>>> To which he replied: but a forum message is often a reply, and in that [quoted text clipped - 15 lines] > somebody had told me the same thirty years ago. Maybe he will see the > forest, maybe he will not. AFAICT, you didn't mention XML in your first reponse to the OP.
paul c - 03 Dec 2008 01:11 GMT ...
> AFAICT, you didn't mention XML in your first reponse to the OP. He did, hierarchies too. I try not to, at least in a db theory group, so there!
rpost - 09 Dec 2008 12:45 GMT >... >> [quoted text clipped - 10 lines] >introduced various "headers" to it. Those headers are the relevant >"basic structural property" (attributes, to use Codd's lingo). Yes.
>> not just of the implementation but at the functional requirement level. >> You seemed to be flat-out denying this, which raised the question: [quoted text clipped - 8 lines] >predicates and constraints that would admit whatever display >presentation or presentations I desired. Yes, and my specific question was: how would you deal with the requirement that messages can be replies to specific other messages, or do you deny that requirement?
>I doubt if after that, there would remain any pertinent reason, ie., >need, for the word "forum" other than as a label for the resulting >application. Probably the constraints would have to counter various >loopholes in the RFC's as well. Perhaps, but I'm not interested in covering loopholes in RFCs, but in the basic question that the OP implied in my eyes, namely, how to model and work with the is-reply-to relationship between messages in a relational framework.
 Signature Reinier
paul c - 09 Dec 2008 17:01 GMT ...
> Yes, and my specific question was: how would you deal with the requirement > that messages can be replies to specific other messages, or do you deny > that requirement? > ... When you talk of "replies to replies", as you have done in this thread, it makes me think that nothing I can say will cause you to think more precisely, which is what is needed. Clearly there are messages that arise that aren't replies. Clearly there are replies to such messages. If you want replies to replies, then you are actually talking about replies to replies to such messages. Obviously there are least three predicates here. If you are "in the trees", I invite you to try to put all three in one hierarchy, but I don't think I want you to show me the result3.
rpost - 12 Dec 2008 11:25 GMT >... >> Yes, and my specific question was: how would you deal with the requirement [quoted text clipped - 5 lines] >it makes me think that nothing I can say will cause you to think more >precisely, which is what is needed. It would help focus discussions if you could bring yourself to omit stuff like this. It tempts me to comment on its accuracy, which is not what this newsgroup is for.
> Clearly there are messages that >arise that aren't replies. Clearly there are replies to such messages. >If you want replies to replies, then you are actually talking about >replies to replies to such messages. Yes, and replies to replies to replies to replies to ... to replies to such messages. That's how discussion fora, including USENET, work.
> Obviously there are least three >predicates here. No, two will do: message and reply, where reply is-a message (i.e. same primary key with a dependency key(reply) \subseteq key(message)).
An auxiliary predicate may be used to keep reply's transitive closure.
>If you are "in the trees", I invite you to try to put >all three in one hierarchy, but I don't think I want you to show me the >result3. I don't understand this remark.
 Signature Reinier
paul c - 12 Dec 2008 20:15 GMT ...
>> When you talk of "replies to replies", as you have done in this thread, >> it makes me think that nothing I can say will cause you to think more [quoted text clipped - 4 lines] > which is not what this newsgroup is for. > ... You could do well to be so tempted. The opposite is dooming oneself to endless dead-ends (apologies to Yogi Berra). Bring it on! ...
>> Obviously there are least three >> predicates here. [quoted text clipped - 4 lines] > An auxiliary predicate may be used to keep reply's transitive closure. > ... In other words, 2 + 1 = 3 predicates.
>> If you are "in the trees", I invite you to try to put >> all three in one hierarchy, but I don't think I want you to show me the >> result3. > > I don't understand this remark. I know you don't.
JOG - 04 Jan 2009 03:27 GMT > Yes, and replies to replies to replies to replies to ... to replies > to such messages. That's how discussion fora, including USENET, work. "Fora"? Bah humbug to faux latin pluralization of good, honest, hard workin' english words....
That is unless, of course, you use the dative form when you post "to" a USENET foro. Then hats off ;)
Bob Badour - 04 Jan 2009 05:25 GMT >>Yes, and replies to replies to replies to replies to ... to replies >>to such messages. That's how discussion fora, including USENET, work. > > "Fora"? Bah humbug to faux latin pluralization of good, honest, hard > workin' english words.... Speaking of good, honest, hard workin' english words, what's wrong with "fake" or "false" ?
Gene Wirchenko - 04 Jan 2009 06:49 GMT >>>Yes, and replies to replies to replies to replies to ... to replies >>>to such messages. That's how discussion fora, including USENET, work. [quoted text clipped - 4 lines] >Speaking of good, honest, hard workin' english words, what's wrong with >"fake" or "false" ? "The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore. We don't just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary." -- James D. Nicoll
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
Brian Selzer - 30 Nov 2008 21:45 GMT >> [...] >> [quoted text clipped - 19 lines] > invent his own forum, presumably not Usenet-based. I pointed out that he > was wrong to assume a forum is hierarchical.) Pardon me for sticking my nose in, Paul, but you are ignoring facts as plain as day: The content of a forum is a directed graph without any circuits--that is, a collection of trees--each message being a node and each response being a directed edge. How can you possibly argue that it is not heirarchical?
paul c - 01 Dec 2008 14:22 GMT >>> [...] >>> [quoted text clipped - 22 lines] > response being a directed edge. How can you possibly argue that it is not > heirarchical? I'm amazed that in this day and age there can be any dispute about something so simple. As I said before, one may choose to display messages in a hierarchical way, but that is not at all the same thing as basing a server or reader on a hierarchical model.
The essence of a hierarchy is position and record order. Position ignores the Information Principle and the order is logically extraneous.
Hierarchical implementations depend on pointers or adjacency or both. Without those, some, maybe most, hierarchies are limited to a single presentation. A relational implementation doesn't have that problem.
If you believe that the horizon is flat and airplane window glass makes it appear as a curve then it might be accurate to conclude that the earth is flat, but it wouldn't be pertinent. Same goes for starting with the belief that hierarchy is inherent in the typical forum's actual messages. If that were so, it would just as reasonable to say that every Accounts Receivable data model must be a hierarchy.
Brian Selzer - 01 Dec 2008 22:09 GMT >>>> [...] >>>> [quoted text clipped - 28 lines] > in a hierarchical way, but that is not at all the same thing as basing a > server or reader on a hierarchical model. I didn't say anything about a heirarchical model. I'm arguing that the content of a forum--messages and responses to messages--is in essence heirarchical. This is not about how messages are displayed, it's about what they are: each message either starts a topic or is a reply to another message.
> The essence of a hierarchy is position and record order. Position ignores > the Information Principle and the order is logically extraneous. The essence of a heirarchy is precedence. A heirarchy is a collection of individuals (objects) connected in such a way that each individual has at most one direct predecessor and that no individual can be a direct or indirect predecessor of itself. Records and the position of records are at best orthogonal.
> Hierarchical implementations depend on pointers or adjacency or both. > Without those, some, maybe most, hierarchies are limited to a single [quoted text clipped - 6 lines] > If that were so, it would just as reasonable to say that every Accounts > Receivable data model must be a hierarchy. paul c - 01 Dec 2008 22:50 GMT >>>>> [...] >>>>> [quoted text clipped - 35 lines] > message. > ... The OP clearly had XML in mind. While I don't think it's any kind of coherent data model (eg., structure that involves only syntax and operators that are catch-as-catch-can without the kind of foundation of the boolean ops and predicate logic) and while it's not obvious that the OP claimed any such thing, I feel quite justified in jumping on anybody who brings XML up as some kind of data design solution in a theory group.
On top of that, some people who know quite a lot of detail about the RM then piped up (including you, for shame) to persist in encouraging the Humpty-Dumpty kindergarten school of design that imagines appearance is everything. To quote the OP: "mapping between GUI and data is oh so easy with hierarchies, compared to relations". That makes it clear to me that he is confusing the logical model with presentation/display.
>> The essence of a hierarchy is position and record order. Position ignores >> the Information Principle and the order is logically extraneous. [quoted text clipped - 4 lines] > indirect predecessor of itself. Records and the position of records are at > best orthogonal. That's fair enough. I should have said the essence of the hierarchical programming schemes (not models, which XML isn't - I think Codd was being kind to even suggest that it is reasonable to compare his model to a hierarchy, those being apples and oranges) is position and order.
(What the programming hackers seem to perpetually miss, is not so much the question of whether relational ideas are superior, but the idea of a formal logic to guide and gauge data designs. This is reminiscent of political arguments I read in the daily general press. The world would be no worse off if they would try to apply the same discipline to their creations. I for one don't think it inconceivable that something even better might result. But I refuse to compare apples to oranges outside of the kitchen.)
Brian Selzer - 03 Dec 2008 12:58 GMT >>>>>> [...] >>>>>> [quoted text clipped - 52 lines] > with hierarchies, compared to relations". That makes it clear to me that > he is confusing the logical model with presentation/display. It wasn't my intent to encourage or discourage any particular school of design.
>>> The essence of a hierarchy is position and record order. Position >>> ignores the Information Principle and the order is logically extraneous. [quoted text clipped - 9 lines] > kind to even suggest that it is reasonable to compare his model to a > hierarchy, those being apples and oranges) is position and order. Does that mean that you now acknowledge the fact that a forum is in essence heirarchical? If so the discussion can shift gears to focus on the best way to implement heirarchies in sets of relations, or on how XML falls short as a means for storing and manipulating data if not as a means of transmission, or both.
> (What the programming hackers seem to perpetually miss, is not so much the > question of whether relational ideas are superior, but the idea of a [quoted text clipped - 4 lines] > better might result. But I refuse to compare apples to oranges outside of > the kitchen.) paul c - 03 Dec 2008 14:19 GMT ...
> Does that mean that you now acknowledge the fact that a forum is in essence > heirarchical? ... Nope. You can display messages that way, store them that way if you insist (I wouldn't), but if your programming operators don't use a relational structure or an similarly powerful abstraction/indirection (assuming that somewhere, somebody has devised such a thing), you are giving up what Codd called symmetric exploitation.
Looks to me that anybody who uses xml or its ilk to manipulate data gives up that ability from the get-go. I can sympathize with people who are more or less forced by common platforms to display things by using that ponderous and closed-door syntax but syntax has nothing to do with data design. Using a hierarchical data interface, which is actually what the OP was assuming, is just asking for endless headaches from my point of view although I will admit that technocrats see it differently, as "jobs for the boys". Of course, when a requirement you didn't first imagine came up, you could invent some attributes to make your hierarchy look like relations but that seems like a lot of wasted work to me. It would be easier to start with relations.
This thread reminds me of long-ago meetings that truly never ended because there was always somebody who didn't get the basic point and would drag them on forever. I knew a former jet pilot who went into data modelling to avoid a lifetime hitch under a dictator. When he tried to sell a simpler but more versatile programming model, one transport industry customer just couldn't make the switch. He compared them to hot air balloon users who didn't think winged vehicles would work because the typical plane doesn't have a gondola.
The other day, I saw a djikstra note that mentions this inability at: http://www.cs.utexas.edu/users/EWD/ewd10xx/EWD1036.PDF
He suggested a general ignorance and fear of what he called "radical novelty". I don't know if he ever met Codd and if he did whether their conversation was on-topic or more mundane (as it was when Groucho met Eliot and Ford met Edison, for the first time)!
paul c - 03 Dec 2008 14:48 GMT > .... >> Does that mean that you now acknowledge the fact that a forum is in [quoted text clipped - 6 lines] > giving up what Codd called symmetric exploitation. > ... Eg., yesterday, a friend who's used his exotic 220 volt welder at many industrial sites was pulling his hair out trying to wire it up in his house. Various electricians have shown him over the years how to connect the different wire colours in other settings. His basic problem is that nobody taught him an abstract model for basic wiring circuits. Once he gets it working, assuming his house is unburnt, the connections will have no intrinsic understanding of the theory used to make them. But the circuit model is the mental view he must have in order to see that he must connect connect a white wire to a "red" or "black" prong instead of to a "white" prong. Without that, he might as switch to xml programming.
Brian Selzer - 04 Dec 2008 03:02 GMT > ... >> Does that mean that you now acknowledge the fact that a forum is in [quoted text clipped - 5 lines] > (assuming that somewhere, somebody has devised such a thing), you are > giving up what Codd called symmetric exploitation. So you're saying that what looks like a rose, smells like a rose--has thorns like a rose--isn't a rose?
> Looks to me that anybody who uses xml or its ilk to manipulate data gives > up that ability from the get-go. I can sympathize with people who are [quoted text clipped - 24 lines] > conversation was on-topic or more mundane (as it was when Groucho met > Eliot and Ford met Edison, for the first time)! paul c - 05 Dec 2008 15:37 GMT ...
> So you're saying that what looks like a rose, smells like a rose--has thorns > like a rose--isn't a rose? > ... I'm saying it isn't always a rose (as a number of marriages I've seen prove, not to mention artificial roses). I find it suspicious that various newsreaders resort to graphical devices such as indentations, dotted lines and chevrons to depict a perceived hierarchy. I don't see why authors of those programs shouldn't use relations to depict the concepts they are enamoured with, such as "threads", message order, position and so forth. If the rfc's used relational examples instead of dialogues using the very context-sensitive ops such as "next", the specs might be tighter and maybe there would fewer discrepancies in the implemmentations.
Aside: Some of the web-based interfaces look to me as if their authors are stuck in a hierachical rut. Google groups is one of the easier ones to use, seems to have a neat and polished display, but I don't use it because it doesn't seem to have the ability to simply show me the latest messages, regardless of thread. (I hope somebody will point out if I'm wrong about that.)
Bob Badour - 05 Dec 2008 15:53 GMT > Aside: Some of the web-based interfaces look to me as if their authors > are stuck in a hierachical rut. Google groups is one of the easier ones > to use, seems to have a neat and polished display, but I don't use it > because it doesn't seem to have the ability to simply show me the latest > messages, regardless of thread. (I hope somebody will point out if I'm > wrong about that.) A much bigger problem with google groups is the inability to filter the self-aggrandizing ignorants.
paul c - 05 Dec 2008 16:53 GMT ...
> A much bigger problem with google groups is the inability to filter the > self-aggrandizing ignorants. Filtering looks like a functional requirement to me, I guess it would amount to restriction in relational terms. With the hierarchy if person B answers (person) A and C filters B, then it gets murky if person D responds to B. Maybe there are now two disconnected trees, maybe there is one, who knows?
rpost - 09 Dec 2008 12:58 GMT >... >> A much bigger problem with google groups is the inability to filter the [quoted text clipped - 5 lines] >responds to B. Maybe there are now two disconnected trees, maybe there >is one, who knows? Filtering commonly happens on threads, i.e. the transitive closure of the is-reply-to relationship between messages. E.g. in my newsreader I can just enter
/Bob Badour/f:,
to omit all messages from Bob Badour with all direct or indirect replies. So the modelling language and/or query language must be capable of expressing this; e.g. one solution would be to explicitly add the is-eventually-reply-to relation with constraints to express its being the closure of is-reply-to.
 Signature Reinier
Brian Selzer - 06 Dec 2008 04:49 GMT >> Aside: Some of the web-based interfaces look to me as if their authors >> are stuck in a hierachical rut. Google groups is one of the easier ones [quoted text clipped - 5 lines] > A much bigger problem with google groups is the inability to filter the > self-aggrandizing ignorants. Strange...why is the self-aggrandizing ignorant lamenting the fact that others can't filter him?
Brian Selzer - 06 Dec 2008 04:41 GMT > ... >> So you're saying that what looks like a rose, smells like a rose--has [quoted text clipped - 11 lines] > might be tighter and maybe there would fewer discrepancies in the > implemmentations. So, the heirarchies that I contend constitute the content of a forum are nothing more than "an undigested bit of beef, a blot of mustard, a crumb of cheese, a fragment of an underdone potato," right?
> Aside: Some of the web-based interfaces look to me as if their authors are > stuck in a hierachical rut. Google groups is one of the easier ones to > use, seems to have a neat and polished display, but I don't use it because > it doesn't seem to have the ability to simply show me the latest messages, > regardless of thread. (I hope somebody will point out if I'm wrong about > that.) paul c - 06 Dec 2008 21:38 GMT ...
> So, the heirarchies that I contend constitute the content of a forum are > nothing more than "an undigested bit of beef, a blot of mustard, a crumb of > cheese, a fragment of an underdone potato," right? > ... Thanks for that, now at least I needn't worry that I'm responsible for the silliest post in some thread or other!
David BL - 20 Nov 2008 03:55 GMT > > Yes, forum "topic headings" are ordered by date and > > time. But each topic also has 0 or more child responses, and child [quoted text clipped - 9 lines] > say, response n-2, or vice-versa. It might be seen as some kind of > graph but not necessarily a tree. If every post apart from the first post for a topic is made in response to a previously existing post then inevitably it is possible to define a tree structure.
Are you suggesting: 1) that isn't actually the case; 2) a post shouldn't actually be regarded as a response to some previous post; or 3) the tree structure can be defined but isn't necessarily pertinent?
paul c - 20 Nov 2008 13:31 GMT >>> Yes, forum "topic headings" are ordered by date and >>> time. But each topic also has 0 or more child responses, and child [quoted text clipped - 19 lines] > 3) the tree structure can be defined but isn't necessarily > pertinent? 3). - Traditionally, most email and news programs have rolled their own file structure with a point of view that sprang from a pet programming language and whatever OS file support was available, ignoring the possibility of Codd's approach which emphasized the structure of data as the central focus, especially his information principle and the relation as the basic programming interface and, dare I say, his rather universal operators. The result is that most mail servers and readers have extremely arcane programming operators and probably don't offer the ability to manipulate more than a few of the many tags that have been defined in the rfc's. 1) and 2) are simply possibly aspects for end users.
paul c - 20 Nov 2008 13:41 GMT ...
To try to say it a different way, if one is determined to organize data in a hierarchy, there isn't much point in asking how data theory applies. (My theory as to why this happens so often has to do with the long-time but, I think, discredited, programming saw of starting with the desired output and working backwards. That is more popular than ever, from what I can see. The naive continue to seek technocratic blessing but are usually satified by the assurances of witch doctors.)
paul c - 20 Nov 2008 14:28 GMT ...
> 3) the tree structure can be defined but isn't necessarily > pertinent Regarding "pertinent" some people may get a few laughs from the Scientific American article, now about ten years old, at:
http://www.sciam.com/article.cfm?id=xml-and-the-second-genera
The first paragraph comes right out and advocates the Humpty-Dumpty school of design:
"Give people a few hints, and they can figure out the rest. They can look at this page, see some large type followed by blocks of small type and know that they are looking at the start of a magazine article. They can look at a list of groceries and see shopping instructions. They can look at some rows of numbers and understand the state of their bank account."
I'm not laughing at the authors, they sound like cheerful, earnest idiots who get to their "point" without much ado, eg., before describing their enhancement to HTML, they acknowledge that HTML is "superficial", and they aren't going to let that slow them down! But the Sci Am editors who let this bumpf ride could have been accomplices to the tailor in the "Emperor's New Clothes". I would have thought a serious proposal would discuss http, not html.
paul c - 20 Nov 2008 15:04 GMT > .... >> 3) the tree structure can be defined but isn't necessarily >> pertinent ...
> The first paragraph comes right out and advocates the Humpty-Dumpty > school of design: > ... Part of how it was explained to me many years ago: "start at the beginning, continue until you come to the end, and then stop".
In the early 1980's it became fashionable for consultants to add the word "normalized" to their CV's. Being as ignorant as anybody else in those days, I felt compelled to ask one of them what he was doing and why was it taking so long. All he could explain was that it was hard to get the tables right.
Later, some of the normalization theory became more widely known and the consulting industry which had previously touted a hodge-podge of methodologies jumped on the novel idea of trying to apply formal techniques to design.
But formal techniques are pointless without a formal starting point. It took me some years to realize how much wheel spinning went on because people didn't bother to figure out the predicates they wanted in the first place. Those being crucial to decide on, because the human meaning is effectively dropped in the machine manifestation, abstract names being substituted in a regular form that is amenable to mechanical repetition.
The supposed application of XML today reminds me of all this - very precise syntax sitting on an amorphous hodge-podge of undefined informal concepts. Whatever meaning those concepts have is murky and jumbled, very little separation of concepts, more concerned with presentation than with any formal data organization.
rboudjakdji@gmail.com - 23 Nov 2008 10:17 GMT [Snipped]
> The supposed application of XML today reminds me of all this - very > precise syntax sitting on an amorphous hodge-podge of undefined informal > concepts. Whatever meaning those concepts have is murky and jumbled, > very little separation of concepts, more concerned with presentation > than with any formal data organization. Hi paul...
He he he...Would not that also describe HTML? I somehow believe that HTML is even a more perfect representation of chaos than XML. I even call it the perfect class in an OO perspective: a perfect stack of various concepts designed for presentation...Still I respect HTML because, it, at least does not have any other pretention than presenting...
Walter Mitty - 21 Nov 2008 19:56 GMT >> > Yes, forum "topic headings" are ordered by date and >> > time. But each topic also has 0 or more child responses, and child [quoted text clipped - 20 lines] > 3) the tree structure can be defined but isn't necessarily > pertinent? I agree with you, David. The fact that a message is either a topic starter or a response to some specific prior message is inherent in the way forums work. It isn't just a matter of whether the analyst chooses to see it that way. I haven't read the specs on usenet messages, and I don't know whether messages are identified by ID or my title. But either way, the distinction is clear to the user who clicks on either "Reply to Group" or "Write new message".
patrick61z@yahoo.com - 24 Nov 2008 18:37 GMT > >> > Yes, forum "topic headings" are ordered by date and > >> > time. But each topic also has 0 or more child responses, and child [quoted text clipped - 28 lines] > is clear to the user who clicks on either "Reply to Group" or "Write new > message". I remember an interesting read a while ago by a threaded newsreader author and the bottom line is the author first worked with the "in reference to" part of the headers (References:) and then the subject line and posting time, just due to the fact that there were so many newsreaders that just one method wasn't going to cut it. While in theory, using the references header you could rebuild the tree (as the references would accumulate using the replied to articles list of references), in practice usenet is subjected to any number of news clients some being better than others.
rpost - 26 Nov 2008 09:30 GMT >I remember an interesting read a while ago by a threaded newsreader >author and the bottom line is the author first worked with the "in [quoted text clipped - 5 lines] >references), in practice usenet is subjected to any number of news >clients some being better than others. Good point; however, there is a specification (NNTP, RFC 977 and 1036) of the protocol, which implicitly contains a 'physical design' of the data structures used, in the form of requirements on message headers. The misbehaving newsreaders are *broken*.
From a relational perspective, the protocol spec should indeed have been preceded by a separate logical design. In the NNTP design, the decision was made to postulate the unique identifiability of messages regardless of their contents or other attributes; the alternative (which most here generally advocate) is to identify entities based on their attributes.
I never analysed large sets of USENET messages with this in mind, but it seems pretty clear to me that this alternative would indeed have been superior. E.g. assuming we can only post one messsage to an NNTP server at a time (as RFC 977 assumes), a message can be identified by a server identification (e.g. hostname) plus a timestamp. Requiring the presence and correctness of these two attributes on each message would have been a better decision, as far as I can see now, than requiring the presence and uniqueness of a message ID.
It would have created the problems of having to specify the permissible format and exact meaning of these attributes. E.g. may the server use its own local clock and its own date/time format? If it may, may it also reset its clock at any point in time? I suppose IDs are so popular because they allow this kind of detail to be avoided.
 Signature Reinier
JOG - 26 Nov 2008 14:42 GMT > patrick...@yahoo.com wrote: > >I remember an interesting read a while ago by a threaded newsreader [quoted text clipped - 18 lines] > (which most here generally advocate) is to identify entities based on > their attributes. While I am one of those advocates, it would be silly to ignore the efficiency of using a surrogate ID's in practical situations (and with the tools we currently have).
However, to the OP: in terms of /the theory/ (this is a theory newsgroup after all) a good design takes a "message entity" and asks what it is exactly that defines its identity (what is the ID a surrogate for?). Note that there is no single "true" answer to this - that's important - because what a "message" actually is can be a whole host of things:
1) {author, timestamp}: a message as a submission from an author at a certain time. If the content is edited later on it is still viewed as the same message as the original.
2) {author, timestamp, content}: a message as a piece of text, submitted by someone at a certain time. If the content is edited it is then viewed as a different message to the original.
3) {author, timestamp, parent}: a message is a position in a thread tree. If it is moved it is viewed as a different message. If this is not desired, you can still have a separate positioning table of course. Position just becomes a normal attribute however, and not part of the message's identity.
4) {author, timestamp, content, parent}: a message is a piece of content at some position in a thread.
While we may call all of the entities that these identities produce "messages", they are subtly different things (perhaps 1 might be specialized as a "post", while 3 is a "response", etc.) It is vital to pick the one at design time that suits the task in hand. If you pick the wrong one it will bite you on the a.s later on.
Regards, Jim.
> I never analysed large sets of USENET messages with this in mind, but > it seems pretty clear to me that this alternative would indeed have [quoted text clipped - 13 lines] > -- > Reinier rpost - 09 Dec 2008 13:53 GMT [...]
>> From a relational perspective, the protocol spec should indeed have >> been preceded by a separate logical design. In the NNTP design, the [quoted text clipped - 17 lines] >certain time. If the content is edited later on it is still viewed as >the same message as the original. For USENET this would suffice: it does not allow messages to be edited; it allow them to be superseded, but that doesn't work well in practice. For a web-based forum it would also work. The issue for USENET is to what extent the <author,timestamp> that will actually be used can be guaranteed to be accurate, or at least unique.
>2) {author, timestamp, content}: a message as a piece of text, >submitted by someone at a certain time. If the content is edited it is >then viewed as a different message to the original. This is only necessary if the same author can post multiple messages at the same time, which as far as I can see NNTP doesn't allow, and neither does web forum software. So even if messages can be edited, 1 is a better idea.
>3) {author, timestamp, parent}: a message is a position in a thread >tree. If it is moved it is viewed as a different message. If this is >not desired, you can still have a separate positioning table of >course. Position just becomes a normal attribute however, and not part >of the message's identity. The same remark applies: even if messages can be moved (which some web forum software supports), author and timestamp suffice for identification, if they are reliable in the first place. If they don't, then adding the parent won't fix it unless the posting protocol has some really unusual properties.
There is also a fundamental objection: a parent is itself a message.
>4) {author, timestamp, content, parent}: a message is a piece of >content at some position in a thread. The same objections apply.
In short, I think the main issue in picking attributes and keys here is not in determining how the data is to be used, but in determining realistic commitments from the supporting software on the accuracy of the attribute values supplied. For a web forum, <author, timestamp> seems a good choice of key, even if they aren't always accurate, as long as the server never accepts multiple messages with the same author and timestamp.
 Signature Reinier
salmobytes - 01 Jan 2009 00:47 GMT > In short, I think the main issue in picking attributes and keys here is > not in determining how the data is to be used, but in determining [quoted text clipped - 3 lines] > as the server never accepts multiple messages with the same author and > timestamp. This is why web-based forum coders (working with a relational database at the back end) don't like "threading."
...it's hard to do and you pay a huge performance penalty to make it happen (with relational modeling anyway). So relational programmers fool themselves into thinking hierarchies are bad design. Hierarchies are not bad design, they are the weak underbelly of the relational model. Hierarchies are part of the real world. They just don't fit well into the relational scheme of things.
With XML querying hierarchies is a snap. So if you have a hierarchical problem, XML is a better technology.
Someone referred to XML as messy technology that couldn't be optimised. But SleepyCat and XPath is faster than any relational system running any one of the ugly, complex and slow-as-mollases "relational solutions" to the hierarchical problem.
For some problems you don't need a database at all: grep or perhaps lucene or HyperEstaier are all that's needed.
For some problems XML is the best choice, particularly if the data is naturally hierarchical.
For other problems--particularly for *large* data problems--relational systems are the best choice....but almost never when hierarchies are involved.
Bob Badour - 01 Jan 2009 01:39 GMT >> In short, I think the main issue in picking attributes and keys here is >> not in determining how the data is to be used, but in determining [quoted text clipped - 10 lines] > happen (with relational modeling anyway). So relational programmers > fool themselves into thinking hierarchies are bad design. You are an idiot. Have a Happy New Year!
rpost - 07 Jan 2009 18:40 GMT [...]
>Hierarchies are part of the real world. They just don't fit well into >the relational scheme of things. This is a broad statement. What is needed as far as I can see is efficient traversal of relations (i.e. arbitrarily wide, very selective joins). This can be supported, even if many existing RDMBSes don't.
>With XML querying hierarchies is a snap. >So if you have a hierarchical problem, XML is a better technology. Not so fast.
XML itself is just a standard for serializing labelled trees. In my experience, most of my "trees" are really arbitrary graphs (relations), and while XML supports crosslinks as well, XML definition and manipulation languages tend not to support their traversal well. For a forum this need not be an issue.
But XML also assumes that all data is stored as documents and processed by operating on documents one at a time. USENET does the same thing, but it really isn't very practical.
Some issues from a database perspective: What if we want to query or manipulate across the whole collection? Why do we always have to parse documents whenever we want to use the data they contain? Why do I, when writing queries or transformations on my data (e.g. with XPath, XQuery or XSLT) or schema definition (XML Schema - please) I always have to concern myself with stupid serialization and document management issues such as the consistent use of file names and URLs, file system limitations, character encodings, etcetera?
Not to mention that XPath, XSLT and XQuery are still pretty hideous languages, both syntactically and semantically, although they have much improved. Try representing an arbitrary relation (graph) in XML, then writing, say, an XPath expression to compute its connected components.
Not such a good match for a discussion forum, if you ask me. XPath queries may be expressive enough, but what about speed? Do you want to represent the whole forum contents as a single XML document that is updated whenever some posting or edit is performed? Or are you thinking of some solution that keeps the whole thing in memory in parsed form? How to make it scale?
>Someone referred to XML as messy technology that couldn't be optimised. >But SleepyCat and XPath is faster than any relational system running any >one of the ugly, complex and slow-as-mollases "relational solutions" to >the hierarchical problem. I'm not familiar with this, but does it work well for a big discusion forum? How sophisticated is the querying you allow?
>For some problems you don't need a database at all: grep or perhaps >lucene or HyperEstaier are all that's needed. > >For some problems XML is the best choice, particularly if the data >is naturally hierarchical. ... and consists of small enough bits (documents) that don't need to be queried or manipulated collectively.
>For other problems--particularly for *large* data problems--relational >systems are the best choice....but almost never when hierarchies are >involved. I think this is far too strong a statement.
 Signature Reinier
Tegiri Nenashi - 10 Nov 2008 20:32 GMT > Hierarchical XML is better at hierarchies than relations. You would have a hard time convincing anybody here that XML is particularly good at *anything*. Now, if you want to talk about representing hierarchies in relational databases, there is a lot of sources on the web. It is as good if no better quality than any alternative solution that has XML buzzword attached to it. And, yes, messy technology can't suggest anything any optimization insight, so be prepared to all kinds of performance issues.
JOG - 21 Nov 2008 14:07 GMT > I'm thinking about starting a hobby project. > I wrote a files-based Bulletin Board years ago. [quoted text clipped - 29 lines] > Any comments? Anybody done much with Postgres/XML? > Have any comparisons to SleepyCat? Haven't we finally done with XML? I thought it was JSON a-go-go these days?
Keith H Duggar - 03 Jan 2009 23:02 GMT > I'm thinking about starting a hobby project. > I wrote a files-based Bulletin Board years ago. > I'd like to convert it to a more database-like system, so > password-identified users could edit old posts. > > Forums are inherently hierarchical Discussions that evolve in forums are in fact not hierarchal. Claims that they are arise, I believe, chiefly from a lack of imagination and brainwashing by current interfaces.
For example, one often finds the need to respond, with one post, to many prior posts across multiple levels in a typical hierarchal view such as the "tree" view Google groups creates. That is what I am doing write now. This paragraph responds to several posts at different levels in the google tree that all claim forums are hierarchies. However, since google provides the capability to "reply" to but a single message I had to choose one thus perpetuating this false structuring.
What's more, a forum post may respond to content from other forum topics, other forums or even entirely different sources such as articles, emails, books, television, etc.
Even more amusing is that posts can actually preemptively respond to posts from the future! This most often happens when ignorant or lazy or time constrained or just plain stupid participants blurt out their two cents without having comprehended or read or cared (respectively) about said prior post that already address their belched vociferous reply.
Furthermore, different parts of single post may reply to different subsets of prior posts, topics, forums, external, or future sources. Likewise those parts may respond only to parts of said sources.
Thus, often in a general and very useful sense a post does not have a "parent" post in the narrow sense of a hierarchal tree as some have claimed here.
To improve the design flaws or your (and most or all other forums) I would humbly (because am and certainly not expert enough to claim this as a very "good" set of requirements) suggest that you aim to achieve at least the following:
Phase 1 : Basic For every post the ability to: 1) refer to multiple posts (including THIS post and posts in other threads and forums) 2) refer to external sources 3) denote that a referent REPLIES to a referent
Phase 2 : Content Parts For arbitrary parts of posts the ability to: 4) refer to multiple arbitrary parts of multiple posts
Phase 3 : Temporal Correction For arbitrary content parts the ability to 5) edit the content part to add or remove referents
Phase 4 : Semantic Enrichment 6) In addition to the basic REPLIES, the ability to denote that a referent SUPPORTS, DISPUTES, REBUTS, AGREES, CLARIFIES, CALLS-UTTER-BULLSHIT, etc a referent (possibility including THIS).
I think you would find that the above far more advanced forum fits nicely into a relational model and would support more efficient and productive discussion. For example, imagine how much easier it would be to refute a vociferous ignoramus when they continue to repeat the same bullshit. You can simply edit one of your prior responses adding a CALLS-UTTER-BULLSHIT reference to their latest post and immediately it could appear in various forum views.
KHD
Keith H Duggar - 06 Jan 2009 03:54 GMT > > I'm thinking about starting a hobby project. > > I wrote a files-based Bulletin Board years ago. [quoted text clipped - 72 lines] > > KHD Since none of salmonbytes, whileone, BS, etc have any response I can only surmise that they now realize a forum discussion is in fact not a hierarchy. Glad I could help.
KHD
rpost - 07 Jan 2009 17:03 GMT >> I'm thinking about starting a hobby project. >> I wrote a files-based Bulletin Board years ago. [quoted text clipped - 6 lines] >Claims that they are arise, I believe, chiefly from a lack of >imagination and brainwashing by current interfaces. I strongly doubt it.
>For example, one often finds the need to respond, with one >post, to many prior posts across multiple levels in a typical >hierarchal view such as the "tree" view Google groups creates. Indeed, sometimes I do; but not often. Is this due to an arbitrary restricion in the interfaces, or is it due to a more fundamental restriction in how discussions proceed? I think the latter. Reply to multiple postings would be more complex in character, e.g. quoted material would now have to be marked with the originating posting in some way and it's not clear whether they would be sufficiently understandable to those who arrive at them having read just one or only a few of them. Will readers be prepared to back up all the time into threads they haven't read in order to make sense of the exchange? Won't the result produce the 'lost in hyperspace' problem that has caused pretty much every hypertext and website to structure its material into a hierarchy full of crosslinks even when there is little or no technological support to do so? I think it will.
But you have a good point: has it even been tried?
>That is what I am doing write now. This paragraph responds to >several posts at different levels in the google tree that all >claim forums are hierarchies. However, since google provides >the capability to "reply" to but a single message I had to >choose one thus perpetuating this false structuring. In my posting software I can arbitrarily edit the References: header, but you're right, all the viewers I know only present threads as trees, never as arbitrary directed acyclic graphs.
>What's more, a forum post may respond to content from >other forum topics, other forums or even entirely different >sources such as articles, emails, books, television, etc. Cross-linking in discussion happens a lot in web-based writing of course. E.g. blogs responding to each other, with talkback/pings to create the forward links. This approaches what you have in mind, I think. Yet, while blogs are full of hyperlinks, their internal organization is nearly always linear or hierarchical. This is not because of necessary tehcnological limitations, but because of limitations in their users: if they weren't, postings would be much harder to find, to read and to write. E.g. I find editing and organizing Wikis pretty difficult.
>Even more amusing is that posts can actually preemptively >respond to posts from the future! This most often happens >when ignorant or lazy or time constrained or just plain >stupid participants blurt out their two cents without having >comprehended or read or cared (respectively) about said prior >post that already address their belched vociferous reply. Yes, but we can't preemptively guess NNTP Message-IDs. This is of course an implementation restriction, not a fundamental one.
>Furthermore, different parts of single post may reply to >different subsets of prior posts, topics, forums, external, >or future sources. Likewise those parts may respond only >to parts of said sources. Yes, this happens all the time, and in USENET well-established conventions exist for keeping this manageable (that I'm using here). A strong point is that they are really simple and expressed in plain text. Can something equally simple suffice for a discussion environment in which multi-replying is the norm?
>Thus, often in a general and very useful sense a post does >not have a "parent" post in the narrow sense of a hierarchal >tree as some have claimed here. No, but the question is how useful it would be for the discussion environment to allow postings with *multiple* parents (meaning, I suppose, that we can navigate the postings as a DAG rather than just a tree).
>To improve the design flaws or your (and most or all other >forums) I would humbly (because am and certainly not expert [quoted text clipped - 6 lines] > posts in other threads and forums) > 2) refer to external sources What does this mean, exactly? That we can follow the reference? Just hyperlink to it, quote it or attach a copy. That we have multiple documents open while browsing? In my web browser I have this all the time. That we can quickly determine a specific set of documents that become parents when initiating a reply? This is harder. Hypertext systems of the past supported stuff like this but I don't know how user-friendly it is.
> 3) denote that a referent REPLIES to a referent What does this mean? That when at the referenced source we can follow the reference backwards to arrive at the reply? This is also hard, because the software controlling the creatin of the reply doesn't usually control how the referred sources are presented (usually to others, and written by others). But e.g. trackbacks/pings address it.
Anything more?
>Phase 2 : Content Parts > For arbitrary parts of posts the ability to: > 4) refer to multiple arbitrary parts of multiple posts How to do this in a sufficiently useable way?
>Phase 3 : Temporal Correction > For arbitrary content parts the ability to > 5) edit the content part to add or remove referents Some forum software allows this. Replies may become invalid. What you end up with is not a discussion forum, but a Wiki: writing for Wikis is very different.
>Phase 4 : Semantic Enrichment > 6) In addition to the basic REPLIES, the ability to > denote that a referent SUPPORTS, DISPUTES, REBUTS, > AGREES, CLARIFIES, CALLS-UTTER-BULLSHIT, etc a > referent (possibility including THIS). The problem with this idea, as with any semantic enrichment, is that the labels, even when users can be trained to apply them, will rarely be accurate, unambiguous or complete. E.g. I may agree with your premise, but disagree that it supports your conclusion. Do I get to modify your SUPPORTS to CALLS-INTO-QUESTION?
>I think you would find that the above far more advanced forum >fits nicely into a relational model It doesn't make any difference.
The basic issue is the need to traverse along the discussion threads, which relational systems aren't usually optimized for, if they can express it at all. Whether the relation forms trees or arbitrary DAGS doesn't make any difference.
The resolution, I think, is to optimize this type of use, either within the query engine or in some other way.
>and would support more >efficient and productive discussion. For example, imagine how [quoted text clipped - 3 lines] >reference to their latest post and immediately it could appear >in various forum views. You can; but will you? And where do you stop? E.g. why not label with specific logical fallacies? (STRAW-MAN, AD-NAUSEAM, BEGS-QUESTION). I'll tell you: the labelers won't agree on when to use which labels.
>KHD
 Signature Reinier
|
|
|