Database Forum / General DB Topics / DB Theory / April 2004
newby (very) question on XML DB theory
|
|
Thread rating:  |
ccc31807 - 16 Mar 2004 14:27 GMT I have an assignment to survey XML database technologies in about three weeks. I've looked closely at eXcite, reviewed XQuery and XPath (and others) at w3.org, and essentially have tried to all the due diligence I can.
I *know* that I will get a question like this, "Where can I go to read up on the theory of XML databases?" If I cite to internet articles, I will be told, "If it's not in print, it's worthless."
Question: does anyone suggest a source of XML DB theory, preferably in journals of a professional and academic nature? IEEE and ACM are not a problem.
Besides, I need to do a fast read on this, so I'm looking for some substantial stuff that's not overly dense or highly theoretical.
Please reply to this message rather than privately. And accept my gratitude in advance for all those who reply.
Thanks, Charles Carter.
Eric Kaun - 16 Mar 2004 17:35 GMT > I have an assignment to survey XML database technologies in about > three weeks. I've looked closely at eXcite, reviewed XQuery and XPath [quoted text clipped - 8 lines] > journals of a professional and academic nature? IEEE and ACM are not a > problem. The most lucent explanation of XML "data theory" is an article called The Essence of XML, by Philip Wadler, formerly of functional/monad "fame".
In addition, look for W3C articles on the XML Infoset. Finally, look for articles by Don Chamberlin about XQuery. I think XQuery is a mess, but those explain it fairly well.
> Besides, I need to do a fast read on this, so I'm looking for some > substantial stuff that's not overly dense or highly theoretical. So you're looking for theory that's not highly theoretical? Hmmm...
> Please reply to this message rather than privately. And accept my > gratitude in advance for all those who reply. - Eric
Mikito Harakiri - 16 Mar 2004 18:38 GMT > I have an assignment to survey XML database technologies in about > three weeks. I've looked closely at eXcite, reviewed XQuery and XPath [quoted text clipped - 14 lines] > Please reply to this message rather than privately. And accept my > gratitude in advance for all those who reply. Checkout V.Vianu review in Sigmod Record
http://www.acm.org/sigmod/record/issues/0306/D2-DBP.VictorVianu.pdf
ccc31807 - 30 Mar 2004 19:20 GMT Thanks much for these responses. The nature of the project has changed, but my assignment hasn't, and I've found several articles in SIGMOD that were very helpful.
Some very smart people have told me that there can be no such thing as a native XML database, but when pressed, they acknowledge that XML is a superior way to mark up data. Because of the time factor, searches will be done using program modules, but I'm still to go ahead with the presentation.
CC
Eric Kaun - 30 Mar 2004 21:26 GMT > Thanks much for these responses. The nature of the project has > changed, but my assignment hasn't, and I've found several articles in [quoted text clipped - 5 lines] > will be done using program modules, but I'm still to go ahead with the > presentation. What is the meaning of "mark up", and why would one want to do it? What is the difference between "marking up" and actually defining your data in some rigorous way? I assume you're talking about marking up TEXT, rather than data.
ccc31807 - 08 Apr 2004 19:20 GMT > What is the meaning of "mark up", and why would one want to do it? What is > the difference between "marking up" and actually defining your data in some > rigorous way? I assume you're talking about marking up TEXT, rather than > data. In this case, the data is all text (ASCII or ISO-8859-1). The data will mostly be searched. That is, it will be updated very seldom. The results of searches need to be displayed via HTTP. In this case, it just makes sense to create a little application that can search the documents using XPath, and use CSS to display the resulting data.
We can assume that the datums will look something like this:
<course no="CPSC101" credit="3" hours="3" lab="4"> <title>Introduction to Programming Logic</title> <prerequisite>none</prerequisite> <description>Whatever the description is ... </description> </course>
CC
Eric Kaun - 08 Apr 2004 20:31 GMT > > What is the meaning of "mark up", and why would one want to do it? What is > > the difference between "marking up" and actually defining your data in some [quoted text clipped - 14 lines] > <description>Whatever the description is ... </description> > </course> Of course you can do it that way, although frequently-updated data aren't the only data which benefit from a relational database. If you already have it in that format, won't need to share the data with other programs, don't need much efficiency, won't be doing ad hoc queries, etc. etc., then using XPath on text in that format is the quickest way to achieve HTML output.
ccc31807 - 09 Apr 2004 14:26 GMT Basically, we have a lot of data in documents with a highly irregular structure that we need to make available over networks. Historically, this evolved from printed text documents, to a spreadsheet, to a flat file database, to a real database. The question is: can we do this without the overhead of the RDBMS. My job was not to answer yes or no, but point the decision makers in the direction of the information.
We're fairly well versed in DB stuff but totally ignorant as to XML database possibilities.
CC
> Of course you can do it that way, although frequently-updated data aren't > the only data which benefit from a relational database. If you already have > it in that format, won't need to share the data with other programs, don't > need much efficiency, won't be doing ad hoc queries, etc. etc., then using > XPath on text in that format is the quickest way to achieve HTML output. Akmal B. Chaudhri - 09 Apr 2004 16:06 GMT > Basically, we have a lot of data in documents with a highly irregular > structure that we need to make available over networks. Historically, [quoted text clipped - 5 lines] > We're fairly well versed in DB stuff but totally ignorant as to XML > database possibilities. O.K. Two very good sites with resources on XML and Databases and XML Databases, etc. [1], [2]. Several books also available. Try a search at Amazon on XML Databases, for example, and you should get a couple of hits, e.g. [3], [4], [5].
HTH
akmal
[1] http://www.rpbourret.com/xml/ [2] http://xml.coverpages.org/xmlAndDatabases.html [3] Professional XML Databases (if you plan to use an RDBMS) [4] XML Data Management (this is one I helped edit) [5] Designing XML Databases (if you plan to use an RDBMS)
Eric Kaun - 12 Apr 2004 17:06 GMT > Basically, we have a lot of data in documents with a highly irregular > structure that we need to make available over networks. Historically, [quoted text clipped - 5 lines] > We're fairly well versed in DB stuff but totally ignorant as to XML > database possibilities. I'd recommend looking at the basics: www.dbdebunk.com will give you a very critical view of XML, and will point you to the writings of Chris Date and Fabian Pascal. Using an XML database assumes you'll only ever want to spit out XML in the same way you stored it; you sacrifice the meaning of the data to the god of presentation. Stick with a relational DBMS, and try some different ones. Is raw speed your main requirement? Do clients always want the data only in one format?
I'm sure others can point you to XML database information, but rest assured it's got a very feeble foundation.
- Eric
Dawn M. Wolthuis - 12 Apr 2004 17:47 GMT > > Basically, we have a lot of data in documents with a highly irregular > > structure that we need to make available over networks. Historically, [quoted text clipped - 11 lines] > out XML in the same way you stored it; you sacrifice the meaning of the data > to the god of presentation. That is true of some databases that persist the documents as documents without the ability to query the values stored within. But you can have both -- easy presentation of data the way that it will most likely benefit the user and the ability to query the data stored in those documents "simply" by storing the data in nested (graph/tree) structures. A query tool that has been available under many different names for almost (but not quite) 40 years (!!!) is the query language associated with PICK (such as UniQuery, English, Retieve, Access, AQL, jQL, and many more).
> Stick with a relational DBMS, and try some > different ones. Is raw speed your main requirement? Do clients always want > the data only in one format? I'm still thinking that is not your best bang for the buck. I'm hoping the XML databases get to the point where their query language is as easy as PICK and have hope for them yet (in spite of those tags they drag along with every value, ugh!)
> I'm sure others can point you to XML database information, but rest assured > it's got a very feeble foundation. Xindice (I think that is apache.org), Berkely DB-XML at sleepycat.com and if you check the w3c.org site it will point you to others, I'm pretty sure. Good luck! --dawn
Jan Hidders - 13 Apr 2004 21:18 GMT > I'm still thinking that is not your best bang for the buck. I'm hoping the > XML databases get to the point where their query language is as easy as PICK > and have hope for them yet (in spite of those tags they drag along with > every value, ugh!) You think XQuery is too difficult? By the way, what makes you think that in XQuery / XML databases you have to drag tags along with values?
-- Jan Hidders
Dawn M. Wolthuis - 14 Apr 2004 06:26 GMT > > I'm still thinking that is not your best bang for the buck. I'm hoping the > > XML databases get to the point where their query language is as easy as PICK > > and have hope for them yet (in spite of those tags they drag along with > > every value, ugh!) > > You think XQuery is too difficult? I think it is a language for IT professionals and would like to see a standard "end-user query language." I'll admit that I haven't done enough work with XQuery to see just how simple one could make queries against a database by defining functions, virtual data, and "views" of the data so that the user need not think in terms of navigating. I realize that GUI's can be put "on top" of it as it is, but such a GUI would presumably be proprietary rather than a standard end-user tool for database access. Users who can ask their database now with a 40-year-old language by typing LIST COURSES WITH INSTRUCTOR_LAST_NAME LIKE "VAN..." (even though the instructor last name is not stored in the courses "file") might not find the comparable XQuery statement an advance in query languages. I'm optimistic that after I dig into it further it will become clearer to me how XQuery will really BE an advance. I DEFINITELY like the fact that it reads non-1NF data.
> By the way, what makes you think that > in XQuery / XML databases you have to drag tags along with values? I don't think you need to do so for XQuery, but I did think that was the case with XML databases, or at least that your input to and output from the databases using "read" and "writes" would retrieve XML documents. I haven't coded anything to use an XML database, so I could be completely wrong on that -- feel free to enlighten me. Thanks. --dawn
Laconic2 - 14 Apr 2004 13:15 GMT Why does it have to be a language?
Why can't it just be some kind of point-and-click drag-and-drop pick from the menu graphics oriented package like powerplay? Once you have the data loaded, slicing and dicing it any way you want, and packaging and presenting it is child's play.
Isn't that the whole idea? To turn data query into an arcade game?
Dawn M. Wolthuis - 14 Apr 2004 15:55 GMT > Why does it have to be a language? > [quoted text clipped - 4 lines] > > Isn't that the whole idea? To turn data query into an arcade game? Yes, but have you seen any industry standards for query GUI's? --dawn
Laconic2 - 14 Apr 2004 16:27 GMT > Yes, but have you seen any industry standards for query GUI's? --dawn No, and that's the whole point. If you share persistent data, and you don't have standards, you get an unrecognizable mess.
If you impose rigid standards on data as presented to the user, you prevent the user from "doing whatever the user wants".
I wish there could be user interface that would make the data "look like" PICK data to you, while the real underlying data is actually in 1NF, and maybe more. Or how about a PICK with an SQL interface?
Dawn M. Wolthuis - 14 Apr 2004 16:35 GMT > > Yes, but have you seen any industry standards for query GUI's? --dawn > [quoted text clipped - 7 lines] > PICK data to you, while the real underlying data is actually in 1NF, and > maybe more. Or how about a PICK with an SQL interface? I've "done" both and I'm thankful that with "XML data model" languages (no need to correct me on that) on the horizon, there will be no need to take the non-1NF data and translate it to 1NF for any purposes. I have a few issues with various web services standards (so don't get me started) but I sure appreciate that we don't have to 1NF the data anywhere in the rocess. --dawn
Eric Kaun - 14 Apr 2004 21:17 GMT > I've "done" both and I'm thankful that with "XML data model" languages (no > need to correct me on that) on the horizon, there will be no need to take > the non-1NF data and translate it to 1NF for any purposes. I have a few > issues with various web services standards (so don't get me started) but I > sure appreciate that we don't have to 1NF the data anywhere in the > rocess. --dawn It seems a huge price to pay for avoiding 1NF... don't throw the baby out with the bathwater.
Anthony W. Youngman - 28 Apr 2004 18:06 GMT >> I've "done" both and I'm thankful that with "XML data model" languages (no >> need to correct me on that) on the horizon, there will be no need to take [quoted text clipped - 5 lines] >It seems a huge price to pay for avoiding 1NF... don't throw the baby out >with the bathwater. 1NF is a huge price to pay for having normalised data ... if you've got any sense you'll throw the cuckoo out the nest :-)
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Anthony W. Youngman - 28 Apr 2004 18:04 GMT >> Yes, but have you seen any industry standards for query GUI's? --dawn > [quoted text clipped - 7 lines] >PICK data to you, while the real underlying data is actually in 1NF, and >maybe more. Or how about a PICK with an SQL interface? In other words, any modern Pick?
AFAIK they pretty much all work fine with SQL - it's just that their native query language is easier to use (if you're querying, not updating).
Cheers, Wol
 Signature Anthony W. Youngman - wol at thewolery dot demon dot co dot uk HEX wondered how much he should tell the Wizards. He felt it would not be a good idea to burden them with too much input. Hex always thought of his reports as Lies-to-People. The Science of Discworld : (c) Terry Pratchett 1999
Jan Hidders - 14 Apr 2004 21:55 GMT >>You think XQuery is too difficult? > [quoted text clipped - 7 lines] > who can ask their database now with a 40-year-old language by typing > LIST COURSES WITH INSTRUCTOR_LAST_NAME LIKE "VAN..." //Courses[starts-with(.//@instructor_last_name, "VAN")]
I've taught this stuff to CS students and non-CS students. I have no idea why you think this would be too difficult for the latter. I do, however, have an idea why a non-declarative query language that requires programming if queries get a little bit more difficult would be problematic for them.
Not that I would claim that XQuery is as simple as can be. Far from that.
>>By the way, what makes you think that >>in XQuery / XML databases you have to drag tags along with values? > > I don't think you need to do so for XQuery, but I did think that was the > case with XML databases, or at least that your input to and output from the > databases using "read" and "writes" would retrieve XML documents. ?? Obviously you have to indicate which documents you are querying. What does that have to do with dragging tags along?
-- Jan Hidders
Dawn M. Wolthuis - 14 Apr 2004 22:42 GMT <snip>
From:
>> LIST COURSES WITH INSTRUCTOR_LAST_NAME LIKE "VAN..." to:
> //Courses[starts-with(.//@instructor_last_name, "VAN")] That's how far we have moved in 40 years? It doesn't look like it is even in a forward direction, does it? Read each out loud. Are you pleased with that? Look at where hardware has come during those same 40 years! Cheers! --dawn
<snip>
Jan Hidders - 14 Apr 2004 23:24 GMT > <snip> > [quoted text clipped - 7 lines] > > That's how far we have moved in 40 years? No, it's just a single XQuery expression that shows that a certain query is not as complicated in XQuery as you implied it was. Why you or anyone would think that one such example describes completely and in all aspects "how far we have moved" is beyond me and strikes me as a bit simplistic.
-- Jan Hidders
Dawn M. Wolthuis - 15 Apr 2004 00:21 GMT > > <snip> > > [quoted text clipped - 13 lines] > aspects "how far we have moved" is beyond me and strikes me as a bit > simplistic. We are not yet on the same wavelength on this one, Jan. Please understand that I'm excited about the possibilities with XQuery more than your average SQL-kinda-guy. I have not used XQuery yet, but I have read much of "XQuery from the Experts" and have verified some of the aspects I care about, such as use of a 2-valued logic. I'm not concerned about the complexity of the language "for me" but am lamenting the loss that I can see exemplified (not proven) in what you see above. This is the change from the language written as GIRLS in the mid-60's to XQuery 40 years later, both working against very similar data models. XQuery has many features that GIRLS (the PICK query language) lacks and will be, on the whole, a move forward.
But, I don't think this is just one of those "I like my tools because I know them" issues -- I think that everyone here in Iowa could see what I mean by reading those statements -- can you see it and understand at least a little bit my lament on this? Couldn't we retain the human-language-like nature of both PICK and to a lesser extent, SQL, and still accomplish the goals of XQuery? Must it look and read like the language of a computer, even though we know it is? --dawn
Jan Hidders - 15 Apr 2004 20:08 GMT > But, I don't think this is just one of those "I like my tools because I know > them" issues -- I think that everyone here in Iowa could see what I mean by > reading those statements -- can you see it and understand at least a little > bit my lament on this? Absolutely.
> Couldn't we retain the human-language-like nature of > both PICK and to a lesser extent, SQL, and still accomplish the goals of > XQuery? No. But I'm quite sure that for a very limited subset one could come up with an elegant and more natural language interface.
-- Jan Hidders
Mikito Harakiri - 14 Apr 2004 22:59 GMT > //Courses[starts-with(.//@instructor_last_name, "VAN")] > [quoted text clipped - 3 lines] > programming if queries get a little bit more difficult would be > problematic for them. May I suggest that XQuery is more complex than SQL? Because it's less pure:-?
Take outer join for, example. As purity is broken (by allowing nulls into result set) selection and outer join operation don't commute anymore. It took me some time (with usenet help) to realize that
select * from t1 left join t2 on t1.id=t2.id and t2.id=2
is different from
select * from t1 left join t2 on t1.id=t2.id where t2.id=2
for example. This is never an issue with ordinary joins and selections. Simplicity of the underlying algebra is the key for the query language success. (Hmm, what about SQL?)
Jan Hidders - 15 Apr 2004 00:08 GMT >>//Courses[starts-with(.//@instructor_last_name, "VAN")] >> [quoted text clipped - 5 lines] > > May I suggest that XQuery is more complex than SQL? You may. :-) Seriously though, who has claimed otherwise?
> Take outer join for, example. As purity is broken (by allowing nulls into > result set) selection and outer join operation don't commute anymore. It [quoted text clipped - 7 lines] > > for example. This is never an issue with ordinary joins and selections. I'm not sure that is a good example because null values (for XML: undefined attributes or missing elements) are actually dealt with in XQuery in a very clear and consistent manner. But if your point is that XML is a much more complicated data model and therefore its query language in general a much more complicated query language, then, yes, I would certainly agree.
> Simplicity of the underlying algebra is the key for the query language > success. (Hmm, what about SQL?) Actually the original definitions of semistructured data models were really quite simple and elegant, and the corresponding languages based on solid theory. But then XML came along and the whole thing got messy. Sounds familiar, no?
-- Jan Hidders
Timothy J. Bruce - 15 Apr 2004 19:47 GMT Mr. Carter:
You have been sent on a fool's errand. There is no such animal as an XMLDBMS. While it is true that anything which stores data could be called a database, and thus you may indeed have an XML database by virtue of having any XML file, data existing on its own in a vacuum is useless w/o a _formal_ theory and definition (hint: it consists of one sentence and three bullets). While there are RDBMS and SQLDBMS systems I have yet to see an XMLDBMS.
Maybe you should make one, Timothy J. Bruce uniblab@hotmail.com </RANT>
|
|
|