Database Forum / General DB Topics / DB Theory / October 2005
Flat Query
|
|
Thread rating:  |
Jenski - 12 Oct 2005 12:47 GMT I've heard people use the term "Flat Query" what is it? Is it a query that is performed on one table?
Mikito Harakiri - 12 Oct 2005 16:26 GMT > I've heard people use the term "Flat Query" what is it? > Is it a query that is performed on one table? Some people are lazy to update their database driver. Hence the term. If you use the latest 3D driver your queries flower in wonderful shapes.
Marshall Spight - 13 Oct 2005 03:28 GMT > I've heard people use the term "Flat Query" what is it? > Is it a query that is performed on one table? I have generally found that use of the word "flat" is a good sign that I don't have to listen to what the person says. The word doesn't mean anything, but a lot of people insist that it does, right up until you ask them what.
You also hear the term "flat file", which means, I believe, "file."
Marshall
dawn - 13 Oct 2005 18:45 GMT > > I've heard people use the term "Flat Query" what is it? > > Is it a query that is performed on one table? [quoted text clipped - 3 lines] > says. The word doesn't mean anything, but a lot of people > insist that it does, right up until you ask them what. When I have used the term, I did mean something ;-) I'll buy that there is nothing consistent in the use of the term, and it is very informal.
> You also hear the term "flat file", which means, I believe, > "file." I have used this terminology for putting data in 1NF, for example. You take a file that includes nested lists and turn it into "flat files" or relational tables. A flat file might be the opposite of one definition of a multidimensional file. --dawn
> Marshall Mark D Powell - 14 Oct 2005 16:55 GMT The term 'flat file' has been in use for more than 20 years. It was generally applied to files accessed in a sequential manner as opposed to being accessed via direct access or ISAM, Indexed Sequential Acess Method, files. Over time, especially, in the non-mainframe world the term was usually used a synonym for a text file.
In the database world the term usually refers to OS files that are used to feed data to a load process or hold the results of an extract process. In other words, ETL, operations.
HTH -- Mark D Powell --
Anne & Lynn Wheeler - 14 Oct 2005 17:27 GMT > The term 'flat file' has been in use for more than 20 years. It was > generally applied to files accessed in a sequential manner as opposed > to being accessed via direct access or ISAM, Indexed Sequential Acess > Method, files. Over time, especially, in the non-mainframe world the > term was usually used a synonym for a text file. or no-index and/or (physical) no-structure ... other then sequential set of bits (modulo sequential record boundaries).
however, there were some work on things like flat files having an implied structure ... like sorted records ... and entries were found by doing binary searches (taking advantage of the implied sorted record structure and file system that structure that would support reading random records from file ... in much the same way that disks allow reading random records ... as opposed to tape which has been strictly sequential).
there were some resources side-tracked from the system/r activity (original relational/sql implementation) http://www.garlic.com/~lynn/subtopic.html#systemr
for an internal, online phone book. this was done as a large (sorted record) flat file (over 25 years ago). there was work done comparing binary search to radix search i.e. rather than treating records as consisting of totally random bits for a binary search ... pro-rate the search probes based on letter sequence of the search argument. initially assuming an uniform letter frequency distribution. this was further refined by supplying the phone book search program with the actual first letter frequency distribution of names in the phone book.
binary search assumes that avg. search probes is the binary root of the size of the file ... i.e. 64k records requires 16 probes. letter frequency radix search reduced that to closer to five probes.
translation to unix filesystem was done assuming avg. record size. mainframe filesystems supported the concept of records ... and API semantics that allowed reading random records (both fixed-length records ... which as simpler case as well as variable-length records ... which is a little more difficult). unix filesystem API basically allow reading random bytes from a file. record structures are a characteristic of implicit in-band data in the file (i.e. null terminated) as opposed to explicit out-of-band information directly supported by the filesystem. As a result, the letter frequency radix search had to perform a little magic simulating random record reads on top of an (unix) API that provided only simple random character reads.
one might also consider this helped contribute to lots of databases being implemented on unix platforms with raw disk ... instead of thru the filesystem .... since raw disk at least provided the record oriented api semantics (which got quite obfuscated in a unix filesystem environment).
 Signature Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Anne & Lynn Wheeler - 14 Oct 2005 17:44 GMT oh yes, part of the transition to explicit dbms index structures ... basically was when updates & rebuild of a complete sorted flat file became more expensive than the disk space overhead (and associated processing) for explicit indexes that allowed doing incremental updates on the fly (w/o having to rebuild the complete sorted file after each batch of updates).
... oh, and the phone book letter frequency radix search was actually better than the 16-to-5 probe comparison. the phone book was organized using physical 4k records with an avg. of approx. 50 phone records per physical record. the letter frequency radix search could frequently pick the related physical record on the first probe.
 Signature Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
David Cressey - 14 Oct 2005 19:36 GMT > The term 'flat file' has been in use for more than 20 years. It was > generally applied to files accessed in a sequential manner as opposed [quoted text clipped - 7 lines] > > HTH -- Mark D Powell -- Actually if you go back 25 years or more, the term referred to files that not only were processed sequentially, but also that did not contain records within records, or records grouped into record groups.
Anne & Lynn Wheeler - 14 Oct 2005 20:33 GMT > Actually if you go back 25 years or more, the term referred to files > that not only were processed sequentially, but also that did not > contain records within records, or records grouped into record > groups. definitely true of the strong tape heritage enforcing sequential access ... however as files from tape started showing up on disks (or dasd ... direct access storage device) ... which could be randomly accessed .... you did start to see sorted files that were being queried using techniques like binary search.
one of the things transition/migration to more strucutre was that update & rebuild of complete file didn't scale. past a certain point the cost of a complete sort & file rebuild was more than the overhead of infrastructure (indexes and other processing) that allowed for incremental updates w/o having to rebuild the complete file every time.
 Signature Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
David Cressey - 14 Oct 2005 20:45 GMT > > Actually if you go back 25 years or more, the term referred to files > > that not only were processed sequentially, but also that did not [quoted text clipped - 6 lines] > accessed .... you did start to see sorted files that were being > queried using techniques like binary search. Only if the record's addresses could be computed (or pointed to). In general, the only kinds of unindexed files whose record address was computable were fixed length records. And in general, fixed lentgth records corresponded to flat files.
Sure you can come up with exceptions. But i'm describing the general scenario in which that language gained usage.
Anne & Lynn Wheeler - 14 Oct 2005 20:57 GMT > Only if the record's addresses could be computed (or pointed to). > In general, the only kinds of unindexed files whose record address [quoted text clipped - 3 lines] > Sure you can come up with exceptions. But i'm describing the > general scenario in which that language gained usage. lots of files were structured as fixed length records specifically for that reason ... however there were also tricks with variable blocked file type ... where there was some filesystem out-of-band infrastructure support that minimized having to perform sequential reads to do random query against a variable length record file.
-- Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Anne & Lynn Wheeler - 14 Oct 2005 21:12 GMT > Only if the record's addresses could be computed (or pointed to). > In general, the only kinds of unindexed files whose record address [quoted text clipped - 3 lines] > Sure you can come up with exceptions. But i'm describing the > general scenario in which that language gained usage. it was also somewhat the battle in the 70s that went on between the stl 60s physical database people ... and the sjr system/r people ... original relational/sql http://www.garlic.com/~lynn/subtopic.html#systemr
i.e. the phsycial database ... had records linked to other records ... where the linking was done by physical record pointers that were fields that were part of the record data ... these weren't traditional flat file ... but record location semantics was exposed.
one of the points of system/r effort was to abstract away the reoord pointers ... by using indexing. the stl people claimed that system/r doubled the physical disk space (for the indexes) along with significant increase in processing overhead ... associated with all the index processing gorp. the sjr people claimed that system/r eliminated a lot of human effort that went into administrative and rebuilding efforts associated with the embedded record pointers. I did some of the system/r code ... but i also did some of the code for various other projects ... so I wasn't particularly on one side or anther.
the 80s saw 1) big demand increase for online information ... putting pressure on scarce database people resources, 2) significant increase in physical disk space and decrease in price/bit, and 3) large increase in processor memory that could be used for caching indexes.
The disk technology change drastically reduced the perceived cost of the extra index structures ... and the significant processor memory increses allowed significant caching ... which in turn reduced the perceived overhead of index processing.
 Signature Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
David Cressey - 15 Oct 2005 14:01 GMT > > Only if the record's addresses could be computed (or pointed to). > > In general, the only kinds of unindexed files whose record address [quoted text clipped - 34 lines] > increses allowed significant caching ... which in turn reduced the > perceived overhead of index processing. Good Summary.
About the only place I still see the argument between exposed pointers and indexes is... right here in the comp.databases.theory newsgroup, where our resident gadfly is still trying to persuade us to go back to pointers, and start over!
Branching off on a tangent... By collecting all the pointers in indexes, and putting them under control of a subsystem of the DBMS, it becomes possible to move a table (perhaps to another disk), and update all the pointers in the indexes that need it.
By contrast, in the World wide web, there is, in general, no way of knowing how many hyperlinks will be broken if an object is moved from one URL to another, or how to fix them. People seem to be willing to live with this decifiency, but I suspect that their patience will eventually run out.
Anne & Lynn Wheeler - 15 Oct 2005 17:37 GMT > Good Summary. > [quoted text clipped - 13 lines] > willing to live with this decifiency, but I suspect that their > patience will eventually run out. i wonder ... do i have on archived post on this topic from last decade http://www.garlic.com/~lynn/94.html#26 Misc. more on bidirectional links
except this involved network database where all the links/pointers were implemented as indexes (abstracted pointers into indexes somewhat analogous to what was done by system/r ... the original implementation was going on concurrently with the system/r implementation on the same system platform) ... and enforced bidirectional "connections", getting referrential integrity ... and also addressed the www unidirectional issue.
some minor historical regression ...
the html stuff traces back to waterloo's script implementation ... aka cern was a vm/cms shop ... and waterloo's script is clone of the cms script document formating command done at the cambridge science center http://www.garlic.com/~lynn/subtopic.html#545tech
in fact, cms used to stand for cambridge monitor system ... before it was renamed conversational monitor system. gml was invented at the science center in 69 and support added to script command (aka gml is from "G", "M", and "L", the three inventors ... then had to come up with the markup language part): http://www.garlic.com/~lynn/subtopic.html#sgml
and system/r http://www.garlic.com/~lynn/subtopic.html#systemr
was also a vm/cms implementation.
and of course, hyperlink stuff traces back to Nelson's xanadu http://www.xanadu.net/
how about: WWW, what went wrong http://xanadu.com.au/xanadu/6w-paper.html
and Engelbart's nls/augment http://sloan.stanford.edu/MouseSite/dce-bio.htm
from above ...
In 1977 Tymshare bought the commercial rights to NLS, renamed it AUGMENT, and set it up as a principal line of business in a newly formed Office Automation Division. There the focus switched from R&D to commercialization, and in spite of Engelbart's efforts, the human/organizational work was cut off, including his carefully cultivated user group. In 1984 Tymshare was acquired by McDonnell Douglas Corporation, where Engelbart began working closely with the aerospace components on issues of integrated information system architectures and associated evolutionary strategies (a welcome extension of his work at SRI).
... snip ..
... tymshare was a couple miles up the road from sjr. for the M/D purchase ... i was brought in to do due dilligence on gnosis which had been developed by tymshare and was being spun off as keykos http://www.agorics.com/Library/keykosindex.html http://www.cis.upenn.edu/~KeyKOS/
 Signature Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
JOG - 15 Oct 2005 20:24 GMT > > Good Summary. > > > > About the only place I still see the argument between exposed > > pointers and indexes is... right here in the comp.databases.theory > > newsgroup, where our resident gadfly is still trying to persuade us > > to go back to pointers, and start over! and most of the arguments i have read seem to degenerate because of mixed definitions of what a pointer _is_. I'd be very interested if someone has a precise definition of the term as used in the database field (including the distinction of a logical pointer).
> i wonder ... do i have on archived post on this topic from last > decade http://www.garlic.com/~lynn/94.html#26 Misc. more on bidirectional links [quoted text clipped - 9 lines] > and Engelbart's nls/augment > http://sloan.stanford.edu/MouseSite/dce-bio.htm I have worked extensively with Nelson (and Engelbart) and part of the misconceptions about "pointers" and the web, lie in the bastardization of Nelson's term "hyperlink".
A bi-directional link is of course nonsensensical. A link is, by definition, always bidirectional (as the links of a chain are). To Vannevar Bush the link was no more than a physical mechanism to create an informational trail (this is Nelson's main influence) which could be followed in either direction. Berners-Lee extracted from Xanadu the link concept, but dropped the bi-directional requirement, creating weblinks, a form of "half link" - but no more than a pointer. Berners-Lee (perhaps correctly) reconciles this in terms of the "worse is better" argument, and that the approach more appropriately suits the logistics of the web and its ad-hoc chaotic nature. Unfortunately the word's meaning seems to have devolved because of this.
However, the web's pointers are still a concession. In terms of functionality there is no instance where using a pointer (half-link) instead of a link offers any logical advantage, and rather consistently deleterious, generating administrative turmoil (404's or endless reorganisational housework).
Now in the RM, links exist (in the original sense of the word) and are assumed by much of relatioanl calculus. They are of course implict, encoded into equality comparisons (dynamic links), as opposed to the explicit static links of traditional hypertext - but the concept applies nonetheless. So in terms of a logical informational model, I can see no point whatsoever of removing this bidirectional functionality that took decades to obtain, and that the web still lacks.
David Cressey - 15 Oct 2005 21:51 GMT > and most of the arguments i have read seem to degenerate because of > mixed definitions of what a pointer _is_. I'd be very interested if > someone has a precise definition of the term as used in the database > field (including the distinction of a logical pointer).
> I have worked extensively with Nelson (and Engelbart) and part of the > misconceptions about "pointers" and the web, lie in the bastardization > of Nelson's term "hyperlink". > > A bi-directional link is of course nonsensensical. A link is, by > definition, always bidirectional (as the links of a chain are). In the development of MDL at project MAC we briefly toyed with the idea of requiring (bi-directional) links in place of pointers, but discarded the idea. Of course, our ambitions were quite different.
Jenski - 14 Oct 2005 17:06 GMT Flat File refers to a single relation database....
I'm starting to think that a "Flat Query" was made up by this person...
Roy Hann - 16 Oct 2005 12:51 GMT > Flat File refers to a single relation database.... If that is so then it is puzzling. Why would anyone think of describing something N-dimensional as being "flat"?
Roy
Anne & Lynn Wheeler - 16 Oct 2005 17:12 GMT > If that is so then it is puzzling. Why would anyone think of describing > something N-dimensional as being "flat"? it would make some sense if it was used to refer to single table that only had a primary index. ... it that case, primary index look up would be very similar operation to a query that did a binary search on a sorted (flat) file (and non-indexed searches of a single table would also be similar to operations performed on records of a flat file).
 Signature Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
Tony D - 14 Oct 2005 16:54 GMT I haven't heard the term "flat query", but "query flattening" (at least in Ingres) refers to rewriting a query which contains subselects to replace those subselects with logically equivalent joins & restrictions. In Ingres, the query optimiser will do this for you automatically.
- Tony
|
|
|