Database Forum / General DB Topics / DB Theory / November 2007
XML storing and management
|
|
Thread rating:  |
karsoods53 - 26 Sep 2007 13:16 GMT I get XML feeds as input and have to store this data on our server. I have worked with databases but new to XML. Can someone tell me how I can store and manage this data?
JOG - 26 Sep 2007 14:53 GMT > I get XML feeds as input and have to store this data on our server. I > have worked with databases but new to XML. Can someone tell me how I > can store and manage this data? This isn't really the right place for this sort of question - XML is somewhat of an anathema to a database theory group.
Assuming that you have no choice however, when I have been forced to deal with XML storage directly I have found Oracle tools to be adequate..
Jan Hidders - 26 Sep 2007 16:04 GMT > > I get XML feeds as input and have to store this data on our server. I > > have worked with databases but new to XML. Can someone tell me how I > > can store and manage this data? > > This isn't really the right place for this sort of question - XML is > somewhat of an anathema to a database theory group. You might want to inform the good people at database theory conferences such as PODS and ICDT about this, because they seem to differ somewhat. I have no doubt they will accept your authority in these matters immediately. :-)
-- Jan Hidders
Bob Badour - 26 Sep 2007 16:11 GMT >>>I get XML feeds as input and have to store this data on our server. I >>>have worked with databases but new to XML. Can someone tell me how I [quoted text clipped - 9 lines] > > -- Jan Hidders Sigh. Such a waste of talent and resources.
paul c - 28 Sep 2007 03:51 GMT >>>> I get XML feeds as input and have to store this data on our server. I >>>> have worked with databases but new to XML. Can someone tell me how I [quoted text clipped - 11 lines] > > Sigh. Such a waste of talent and resources. resources, maybe, assuming "talent" excludes inspiration!
thesaboteurs@gmail.com - 26 Sep 2007 19:29 GMT > > > I get XML feeds as input and have to store this data on our server. I > > > have worked with databases but new to XML. Can someone tell me how I [quoted text clipped - 9 lines] > > -- Jan Hidders Heh, point taken. If only they would....
JOG - 27 Sep 2007 01:19 GMT On Sep 26, 7:29 pm, thesabote...@gmail.com wrote:
> > > > I get XML feeds as input and have to store this data on our server. I > > > > have worked with databases but new to XML. Can someone tell me how I [quoted text clipped - 11 lines] > > Heh, point taken. If only they would.... Must start trying to remember i'm not the only person using google groups in my house ;) Jim.
Ok, so why is it exactly cdt, despite the inherent flaws of a hierarchical model such as XML, it has seen such widespread uptake?
Bob Badour - 27 Sep 2007 02:01 GMT > On Sep 26, 7:29 pm, thesabote...@gmail.com wrote: > [quoted text clipped - 19 lines] > Ok, so why is it exactly cdt, despite the inherent flaws of a > hierarchical model such as XML, it has seen such widespread uptake? Hans Christian Andersen did an excellent piece on that very topic a while back.
Jan Hidders - 27 Sep 2007 14:23 GMT > Ok, so why is it exactly cdt, despite the inherent flaws of a > hierarchical model such as XML, it has seen such widespread uptake? It's all hype, of course.
Btw., what fundamental flaws?
-- Jan Hidders
Bob Badour - 27 Sep 2007 15:27 GMT >>Ok, so why is it exactly cdt, despite the inherent flaws of a >>hierarchical model such as XML, it has seen such widespread uptake? [quoted text clipped - 4 lines] > > -- Jan Hidders Well, let's see... How about we start with: "The inability to re-order the data without changing meaning and without destroying information." ?
Jan Hidders - 27 Sep 2007 17:44 GMT > >>Ok, so why is it exactly cdt, despite the inherent flaws of a > >>hierarchical model such as XML, it has seen such widespread uptake? [quoted text clipped - 5 lines] > Well, let's see... How about we start with: "The inability to re-order > the data without changing meaning and without destroying information." ? "Hierarchical models such as XML" are not necessarily ordered-only data models. In fact most proposals for semistructured data models before XML weren't.
But even in XML this is not a big problem. Whether reordering destroys information or not depends on your interpretation of the data. If you send me an XML document and in addition tell me that certain parts represent sets then I can reorder them without destroying any information. The fact that XML is an ordered data model only implies that it *might* destroy informaton, not that it *must*. There are ways to declare in languages such as XQuery that certain parts, or the whole tree, or certain final or intermediate results should be interpreted as unordered, so also that is not really a big or fundamental problem.
Of course, I'm not claiming that it could not have been better. I'd personally prefer it if there had been a standardized way of indicating unorderedness and that this would have been taken into account in query languages such as XQuery. But the fact is that up to now neither the people that are using XQuery in practice nor the people that are implementing XQuery have been listing this as their prime concern. And you can theorize as much as you like, but it is practice that decides whether something is really a fundamental or inherent problem or not.
-- Jan Hidders
Bob Badour - 27 Sep 2007 18:07 GMT >>>>Ok, so why is it exactly cdt, despite the inherent flaws of a >>>>hierarchical model such as XML, it has seen such widespread uptake? [quoted text clipped - 16 lines] > information. The fact that XML is an ordered data model only implies > that it *might* destroy informaton, not that it *must*. That's a nit. If one cannot always safely reorder, then one cannot safely reorder.
[snip]
Jan Hidders - 27 Sep 2007 21:40 GMT > >>>>Ok, so why is it exactly cdt, despite the inherent flaws of a > >>>>hierarchical model such as XML, it has seen such widespread uptake? [quoted text clipped - 19 lines] > That's a nit. If one cannot always safely reorder, then one cannot > safely reorder. I thought we were having a serious discussion, not playing trivial word games. My mistake.
-- Jan Hidders
Bob Badour - 27 Sep 2007 21:52 GMT >>>>>>Ok, so why is it exactly cdt, despite the inherent flaws of a >>>>>>hierarchical model such as XML, it has seen such widespread uptake? [quoted text clipped - 22 lines] > I thought we were having a serious discussion, not playing trivial > word games. My mistake. I am having a serious discussion. I am not the one picking at nits.
What your position boils down to is: XML is needlessly complex. As a result of the needless complexity, one cannot re-order the data without changing meaning and without destroying information. BUT if we add even more complexity, we can sometimes re-order data. Sometimes.
Pretending the fundamental flaw went away doesn't make it go away.
Jan Hidders - 27 Sep 2007 22:29 GMT > >>>>>>Ok, so why is it exactly cdt, despite the inherent flaws of a > >>>>>>hierarchical model such as XML, it has seen such widespread uptake? [quoted text clipped - 24 lines] > > I am having a serious discussion. I am not the one picking at nits. I disagree. I think you are.
> What your position boils down to is: XML is needlessly complex. Not really. What I said is that concerning the aspect we were discussing it is actually missing a construct. So my position is more accurately described as that it is "too simple", not "too complex".
> As a > result of the needless complexity, one cannot re-order the data without > changing meaning and without destroying information. That is too imprecise to be correct. You can in some sense always reorder if you want to. What I said is that whether this loses information or not is a matter of interpretation. Note by the way that this is also true for the Relational Model: you cannot always arbitrarily permute the atomic values in a relation without risking changing its meaning. Also there it is a matter of interpretation whether this is actually a problem or not.
> BUT if we add even > more complexity, we can sometimes re-order data. Sometimes. Yes, when it is appropriate, which is not always.
-- Jan Hidders
mAsterdam - 27 Sep 2007 23:36 GMT Jan Hidders schreef:
> ... You can in some sense always > reorder if you want to. What I said is that whether this loses > information or not is a matter of interpretation. Note by the way that > this is also true for the Relational Model: you cannot always > arbitrarily permute the atomic values in a relation without risking > changing its meaning. Not sure I read this right. Could you give an example / please elaborate?
> Also there it is a matter of interpretation > whether this is actually a problem or not. Bob Badour - 11 Nov 2007 16:44 GMT >>>>>>>>Ok, so why is it exactly cdt, despite the inherent flaws of a >>>>>>>>hierarchical model such as XML, it has seen such widespread uptake? [quoted text clipped - 26 lines] > > I disagree. I think you are. Suppose I said my pickup truck is no good for digging trenches. Your position amounts to saying: "If you hooked up a hydraulic system and welded a back-hoe on the back, it would dig trenches just fine."
One simply cannot re-order an arbitrary XML document without destroying information. One can re-order any relation without ever destroying information.
>>What your position boils down to is: XML is needlessly complex. > > Not really. What I said is that concerning the aspect we were > discussing it is actually missing a construct. So my position is more > accurately described as that it is "too simple", not "too complex". Actually, it is both, but I will accept the above correction.
>>As a >>result of the needless complexity, one cannot re-order the data without [quoted text clipped - 3 lines] > reorder if you want to. What I said is that whether this loses > information or not is a matter of interpretation. So, if I reorder all of the children of several nodes immediately after just one of those nodes, it's only a matter of interpretation whether that changed the meaning?!? You are joking, right?
Note by the way that
> this is also true for the Relational Model: you cannot always > arbitrarily permute the atomic values in a relation without risking > changing its meaning. Also there it is a matter of interpretation > whether this is actually a problem or not. Could you provide an example?
>>BUT if we add even >>more complexity, we can sometimes re-order data. Sometimes. > > Yes, when it is appropriate, which is not always. So, it's too complex but adding complexity will sometimes but not always correct the problem. Sounds wonderful.
V.J. Kumar - 27 Sep 2007 18:33 GMT Jan Hidders <hidders@gmail.com> wrote in news:1190911498.923039.151330@ 19g2000hsx.googlegroups.com:
....
> -- Jan Hidders I find XML versus the relational model ball-crushingly boring. The only interesting and puzzling thing about such discussions is that seemingly smart people try to find some merit in the XML monstrosity, perhaps the greatest calamity that has ever befallen data management technology !.
There is an essay written called "A Relational Model of Data for Large Shared Data Banks". "Sapienti sat !" as they say in Sanskrit.
Roy Hann - 27 Sep 2007 19:13 GMT > Jan Hidders <hidders@gmail.com> wrote in news:1190911498.923039.151330@ > 19g2000hsx.googlegroups.com: [quoted text clipped - 6 lines] > smart people try to find some merit in the XML monstrosity, perhaps the > greatest calamity that has ever befallen data management technology !. Tut tut. There isn't much wrong with XML. It is an OK way to label tokens so that they can be address and relabelled in wonderful ways. The problem is people who confuse typesetting with database management. Anyone who is capable of that is capable of anything.
Roy
Cimode - 27 Sep 2007 20:00 GMT [Snipped] <<The problem is people who confuse typesetting with database management. Anyone who is capable of that is capable of anything.>> I would add that anybody who places relational model on the same sentence than XML.(in a clueless comparaison of type A Vs Bf) simply has no clue what he is talking about....
Jan Hidders - 27 Sep 2007 21:51 GMT > Jan Hidders <hidd...@gmail.com> wrote in news:1190911498.923039.151330@ > 19g2000hsx.googlegroups.com: [quoted text clipped - 3 lines] > > I find XML versus the relational model ball-crushingly boring. Who is doing a "versus"?
> The only > interesting and puzzling thing about such discussions is that seemingly > smart people try to find some merit in the XML monstrosity, perhaps the > greatest calamity that has ever befallen data management technology !. Yes, yes, I know, I know. The horror, the horror. But maybe with your background you will understand that XML is actually just another manifestation of the relational model. After all, in essence XPath is just first order logic over ordered node labeled trees.
> There is an essay written called "A Relational Model of Data for Large > Shared Data Banks". "Sapienti sat !" as they say in Sanskrit. They do? Tha'ts funny. We say that in Latin. :-) But I'll keep that in mind next time I treat that paper again with my students.
-- Jan Hidders
V.J. Kumar - 28 Sep 2007 01:31 GMT ...
>> There is an essay written [by Codd] called "A Relational Model of >> Data for Large Shared Data Banks". "Sapienti sat !" as they say in >> Sanskrit. > > They do? Tha'ts funny. We say that in Latin. :-) Yeah, Latin is just a Hindi-European dialect Sanskrit simplified for Europeans' use as Sir William Jones discovered to his utter amazement more than 200 hundred years ago ;)
JOG - 28 Sep 2007 01:39 GMT > ... > [quoted text clipped - 7 lines] > Europeans' use as Sir William Jones discovered to his utter amazement > more than 200 hundred years ago ;) According to wikipedia, Sir William Jones believed that Sanskrit and Latin had a common ancestor, not that one was derived from the other.
I guess it depends on how valid you deem the source. But then I also like to believe wikipedia was written by one jolly person, without much of a life, but an unending supply of donuts. Its like a safety blanket for me.
Bob Badour - 28 Sep 2007 01:58 GMT >>... >> [quoted text clipped - 15 lines] > much of a life, but an unending supply of donuts. Its like a safety > blanket for me. Trisyllabic laxing. It's a feature of languages from that same root language. Latin, greek, sanskrit, english, german, french etc.
JOG - 28 Sep 2007 00:11 GMT > > >>Ok, so why is it exactly cdt, despite the inherent flaws of a > > >>hierarchical model such as XML, it has seen such widespread uptake? [quoted text clipped - 9 lines] > data models. In fact most proposals for semistructured data models > before XML weren't. You make a good point because from what I can tell from the literature, XML has pretty much hijacked all previous work on "semistructured" data (Lore, etc).
But nervmind that - what, pray tell exactly is this "semi-structure" anyhow? I ask this question mischeviously of course, having done a lot of research on pinning down a definition, discovered that there is none, or at best it is recursive. Some definitions in peer-reviewed ACM papers are downright dreadful - "data that doesn't fit in a relational database"!?
At the moment "semi-structure" as a term it is academic flim-flam. Nonetheless I believe this can be remedied, and have been doing some work on it recently.
> But even in XML this is not a big problem. Whether reordering destroys > information or not depends on your interpretation of the data. If you [quoted text clipped - 18 lines] > > -- Jan Hidders JOG - 28 Sep 2007 00:02 GMT > > Ok, so why is it exactly cdt, despite the inherent flaws of a > > hierarchical model such as XML, it has seen such widespread uptake? > > It's all hype, of course. Well my sarcasm alarm is buzzing, so ifs its not all hype what is it? I was wondering more - what in the psyche of business managers has encouraged them to use XML. Its deceptive simplicity I imagine - the fact that anyone can knockup a bit of markup?
> Btw., what fundamental flaws? If data is forced into a tree structure where no /single/ hierarchy for it naturally exists, query bias is generated. This encompasses just about every possible situation where data is shared in my experience. And query bias leads to databases that do 1 single task very well, but the other deluge of n-1 tasks awfully. Nevermind the fact that the XML is model navigational anyhow, so queries are already going to be very long compared to a declarative approach. I mean xpath has died on its arse right.
As a wise little man once told me XML leads to fear, fear leads to hate, and hate leads to the dark side. Unless I need something off the shelf and non-time critical (extremely non-time critical) to pass about data that is only every going to be accessed in one single way.
> -- Jan Hidders Marshall - 26 Sep 2007 16:17 GMT > > I get XML feeds as input and have to store this data on our server. I > > have worked with databases but new to XML. Can someone tell me how I [quoted text clipped - 6 lines] > deal with XML storage directly I have found Oracle tools to be > adequate.. Ooooh! An actual thread! It's been so quiet around here.
Hey, we haven't had an actual XML bashing thread in a while; can we do that now? I'll start. Instead of trotting out my old favorites, I have a new one I heard the other day.
"The programmer has a problem. His software needs a configuration file. So he uses XML. Now the programmer has two problems."
Marshall
Bob Badour - 26 Sep 2007 18:21 GMT >>>I get XML feeds as input and have to store this data on our server. I >>>have worked with databases but new to XML. Can someone tell me how I [quoted text clipped - 15 lines] > "The programmer has a problem. His software needs a configuration > file. So he uses XML. Now the programmer has two problems." Only two?
Jan Hidders - 26 Sep 2007 15:53 GMT > I get XML feeds as input and have to store this data on our server. I > have worked with databases but new to XML. Can someone tell me how I > can store and manage this data? Depends on what you want to do with it, what kind of data it is, what type of servers you have available or have to use. In general, if the RDBMS you use already has XML capabilities, that is probably not a bad choice.
-- Jan Hidders
Marshall - 26 Sep 2007 16:20 GMT > I get XML feeds as input and have to store this data on our server. I > have worked with databases but new to XML. Can someone tell me how I > can store and manage this data? A lot depends on what you want to do with this data. At the low end, maybe it's enough to just record the strings. At the other extreme, perhaps you need a complete schema for the data coming in, and want to transform the data from its XML hierarchy into a relational form.
Marshall
Cimode - 27 Sep 2007 12:04 GMT > > I get XML feeds as input and have to store this data on our server. I > > have worked with databases but new to XML. Can someone tell me how I [quoted text clipped - 7 lines] > > Marshall Common Marshall...Do not encourage the questionner into using an old technology such as XML...In fact, the questionner should be using MOML...It is a powerful language with very specific use that I designed...It is so powerful that it can up to ten times the space that is consumed by XML (yes ten times)...The SQLML committee has already encouraged its use... It is of course sold under license because it has MOB capabilities (you have not misheard yes: MOB capabilities!!)
BTW MOML stands for My Own Markup Language...and MOB for My Own Bug....
Sorry about the fiction above (someone said lately that humor is sometime the only sane response to absurdity)..I must confess I have epidermic responses to anything that is related to XML....
David Cressey - 27 Sep 2007 13:54 GMT > > > I get XML feeds as input and have to store this data on our server. I > > > have worked with databases but new to XML. Can someone tell me how I [quoted text clipped - 10 lines] > Common Marshall...Do not encourage the questionner into using an old > technology such as XML... The OP doesn't explicitly want to use XML as a technology, nor does Marshall's response encourage him in that direction. The OP explicitly wants to use DATA that has been supplied to him in XML format. The question "how can I store and manage this data" is exactly the right question. Marshall's two suggestions establish a spectrum of possible rational responses.
Cimode, I appreciate your sense of humor. The OP has asked a reasonable question, and I'd like to see him get a reasonable response.
For the OP: Some DBMS products have a loader designed to transform incoming XML into the tables that have been created for this purpose. The question on how those tables should be designed is a fundamental database design question, and the answer, as Marshall has said, depends on what you are going to do with the data.
XML, in and of itself, doesn't answer a lot of the questions about the internal structuring of the data. Is the data stored in multiple instances of the same fixed form records? In other words, is the incoming data stored in flat files dressed up to look like XML documents?
Or is stored in some deeply nested scheme that would be useful in the context of a hierarchical DBMS, but is nearly meaningless in the context of an SQL DBMS?
In other words, what is the visible structure of the XML data? The answer to this question could guide your answer. Also, what resources are you prepared to expend in order to store and manage this data "correctly"? This goes full circle back to "what do you intend to do with the data"?
You may end up designing some tables whose sole function is to receive the data in its incoming structure (minus the XML), as loaded very easily by a fast and powerful loader. You may then design a schema of tables that are a logical view of the data that is ... er, well .... "logical". Finally, you would have to write some SQL code that would transform incoming data from the incoming tables to the good logical tables. This may seem like the long way around, but it might be simpler than transforming the data in one fell swoop.
Cimode - 27 Sep 2007 15:12 GMT [Snipped]
> > Common Marshall...Do not encourage the questionner into using an old > > technology such as XML... [Snipped]
> Cimode, I appreciate your sense of humor. The OP has asked a reasonable > question, and I'd like to see him get a reasonable response. When it comes to database theory, there is no such thing as a *reasonnable* response in anything related to XML. Commenting further the foolishness of XML is at best proposing hacks, at worst misleading beginners with utter ignorance... (Glad you appreciate my humor though ;)) [Snipped]
djmcmahon@gmail.com - 04 Oct 2007 22:58 GMT > I get XML feeds as input and have to store this data on our server. I > have worked with databases but new to XML. Can someone tell me how I > can store and manage this data? Depends on what you need to do with it. If all you want to do is store messages from the feed, you could just shove the XML into a LOB column along with other columns for the date/time received, etc. No different than if you were receiving, say, jpeg images.
A step up from that would be to extract relevant single-valued scalars as part of intake processing in your host programming language (e.g. Java) and stick them in querieable, indexable columns. For example if every XML message had exactly one <title> element in it, you could extract that as you were receiving a message from the feeds, then put it into a TITLE column as well as putting the message into a LOB column. Again no different than, say, extracting the horizontal and vertical resolution from a jpeg image and storing them in columns along with the jpeg itself.
If you need to be able to run queries over the contents of the XML, and especially if you want the query result sets to yield fragments of the XML messages instead of whole ones, you have a lot more work to do.
Databases such as Oracle's allow you to create XML typed columns which still store the XML in a LOB but let you dig out the content in queries, index it, etc. If you have XML schemas for your feeds, you could even go so far as to specify decompositions of the XML into relational tables. These databases also let you query by XML expressions, extract subsets of the XML, etc.
(Trouble arises when you need to do this with multi-valued elements, or elements embedded within the hierarchy where multi-valued ancestors might exist.)
FWIW I prefer to do as little as possible with the XML while it's in the database, it's easier to think of it as an opaque object similar to e.g. a jpeg image. (Even a jpeg file has an internal structure, containing information that might be useful in queries, e.g. the resolution, bit depth, whatever.) It's usually not too hard, especially with XML, to extract whatever you think you'll need to support searches and such, and put that into real columns as part of the intake processing in a host language. You can then do any XML/ XQuery style filtering of the internal content of the XML objects as part of host-language post-processing, after using SQL in the usual way to find objects of possible interest. This might or might not fit your application.
If you absolutely must be able to drive queries from internal multi- valued elements, path expressions, etc., then you'll need help from the database in the form of XML extensions such as Oracle's, or if none are available, you'll have to do the relevant decomposition of the XML yourself in the host language and then populate your own "index tables". For example, if your input XML contains <address> elements, and possibly more than one such per message, and you think it's important to index the <zipcode> element underneath <address> to support searches for messages based on zip code, you'll need to have one table with {MSG_ID, MSG_XML_CONTENT} and another table with {ADDRESS_ZIP_CODE, MSG_ID} that you will have to populate as part of intake processing.
djmcmahon@gmail.com - 04 Oct 2007 23:02 GMT > I get XML feeds as input and have to store this data on our server. I > have worked with databases but new to XML. Can someone tell me how I > can store and manage this data? Depends on what you need to do with it. If all you want to do is store messages from the feed, you could just shove the XML into a LOB column along with other columns for the date/time received, etc. No different than if you were receiving, say, jpeg images.
A step up from that would be to extract relevant single-valued scalars as part of intake processing in your host programming language (e.g. Java) and stick them in querieable, indexable columns. For example if every XML message had exactly one <title> element in it, you could extract that as you were receiving a message from the feeds, then put it into a TITLE column as well as putting the message into a LOB column. Again no different than, say, extracting the horizontal and vertical resolution from a jpeg image and storing them in columns along with the jpeg itself.
If you need to be able to run queries over the contents of the XML, and especially if you want the query result sets to yield fragments of the XML messages instead of whole ones, you have a lot more work to do.
Databases such as Oracle's allow you to create XML typed columns which still store the XML in a LOB but let you dig out the content in queries, index it, etc. If you have XML schemas for your feeds, you could even go so far as to specify decompositions of the XML into relational tables. These databases also let you query by XML expressions, extract subsets of the XML, etc.
|
|
|