Database Forum / General DB Topics / DB Theory / April 2008
CODASYL-like databases
|
|
Thread rating:  |
Troels Arvin - 01 Apr 2008 07:42 GMT Hello,
RDBMSes are sometimes described as a reaction against network-/ hierarchical databases. I believe that I've read that CODASYL-like DBMSes resulted in databases which were very hard to maintain.
Does someone know of more concrete descriptions of what problems the old CODASYL-like databases resulted in?
 Signature /home/troels/.signature
-CELKO- - 01 Apr 2008 14:20 GMT The last standard for the network model was NDL; I still have a copy of the document. The real problems were (1) They could not be optimized since they had fixed access paths to the data (2) They were procedural and not declarative, so the programmer did all the work. (3) Everyone had a different language instead of a Standard like SQL to use (NDL came too late).
Most oft he business data in the world is still in IMS today.
DBMS_Plumber - 01 Apr 2008 19:47 GMT > Most oft he business data in the world is still in IMS today. While this passes as received wisdom, I can't find any actual evidence to support it. At least, nothing since about 1992.
When I talk to folk who make and sell disk systems and what-not, they tell me that they for every 1G of disk they're selling under IMS/ network systems, they're selling 100G for "relational" systems (largely warehouses), and 1000G for file-systems (emails, and attachments).
When I talk to folk who try to sell services to IT departments they tell me that the overwhelming preponderance of "business data" today is in Excel Spreadsheets (quotes, models and so forth) and the majority of "business transaction data" is being managed by RDBMSs.
Very, very few of the web based applications developed since 1995 have used IMS.
It's absolutely true that there are still some very large IMS systems out there but their number, and the amount of data they manage, doesn't seem to be growing. From what I can tell much of the data they hold is short-codes-and-numbers stuff, while modern businesses are increasingly dependent on other kinds of data; transcriptions of support calls, customer feeback text, and so forth.
If you define "business data" extremely narrowly - essentially to just buy/sell/ship/receive stuff - then it's remotely possible that IMS still dominates (though I doubt it). But if you include all the personnel records management, sales management, and what-not, no way.
Tegiri Nenashi - 02 Apr 2008 15:57 GMT On Apr 1, 10:47 am, DBMS_Plumber <paul_geoffrey_br...@yahoo.com> wrote:
> > Most oft he business data in the world is still in IMS today. > If you define "business data" extremely narrowly - essentially to just > buy/sell/ship/receive stuff - then it's remotely possible that IMS > still dominates (though I doubt it). But if you include all the > personnel records management, sales management, and what-not, no way. The other way to estimate the significance (or rather perceived significance) of technology is to calculate the number of book titles, perhaps weighted by their popularity. For that matter I can't remeber seen any IMS book in the local barnes-n-noble. Granted there is a boring factor (for example, there aren't that many SAP or peoplesoft titles either), still the adjective "dominates" can hardly apply to IMS judged by any criteria.
Ken North - 03 Apr 2008 21:41 GMT Book titles carried by a Barnes & Noble are not an indication of what mature technology is in use. They are a better indication of what technology has captured reader's interest - often because it's new and book buyers are learning. COBOL and IMS have been around for four decades so it would be hard to find a publisher who wants to publish a title about either. The market is too small.
But COBOL and IMS are far from dead. It was only about 5 years ago that the number of Java programmers exceeded the number of COBOL programmers. In 2005, IBM reported IMS was handling over 15 million gigabytes of data and 50 billion transactions per day:
ftp://ftp.software.ibm.com/software/data/ims/shelf/presentations/IMS_Today.pdf
======== Ken North =========== www.KNComputing.com
www.WebServicesSummit.com www.SQLSummit.com www.GridSummit.com
On Apr 1, 10:47 am, DBMS_Plumber <paul_geoffrey_br...@yahoo.com> wrote:
> On Apr 1, 6:20 am, -CELKO- <jcelko...@earthlink.net> wrote: > > Most oft he business data in the world is still in IMS today. > If you define "business data" extremely narrowly - essentially to just > buy/sell/ship/receive stuff - then it's remotely possible that IMS > still dominates (though I doubt it). But if you include all the > personnel records management, sales management, and what-not, no way. The other way to estimate the significance (or rather perceived significance) of technology is to calculate the number of book titles, perhaps weighted by their popularity. For that matter I can't remeber seen any IMS book in the local barnes-n-noble. Granted there is a boring factor (for example, there aren't that many SAP or peoplesoft titles either), still the adjective "dominates" can hardly apply to IMS judged by any criteria.
paul c - 03 Apr 2008 14:58 GMT > The last standard for the network model was NDL; I still have a copy > of the document. The real problems were (1) They could not be [quoted text clipped - 3 lines] > to use (NDL came too late). > ... They have an unnecessarily complicated data interface, are basically impossible for an end-user to handle without a lot of expert help and programmers new to an app have a steep learning curve. Everybody has to know what kinds of 'records' have the data they want and then decide which of many operators and options are needed for access, then organize some other code and place the operators very carefully in that, then likely have to cope with numerous arcane environmental exceptions/return codes and then construct the kind of 'record' they wanted in the first place. The same data in two different databases might be kept in different record types and need completely different operators.
How could a 'standard' possibly improve that situation?
-CELKO- - 03 Apr 2008 16:14 GMT >> How could a 'standard' possibly improve that situation? << The only use NDL ever saw was to give standardized terminology for the network products -- child, parent, link, etc. first appeared in that document. What I find weird is that SQL programmers that newbie SQL programmers use those terms but with the word "table" attached to it (I.e. "parent table" instead of "referenced table", "child table" instead of "referencing table", "linking table" instead of "relationship table", etc.)
I am pretty sure that they never saw NDL or any other network product; they must have inherited the old terms from much older people with whom they worked.
Nobody cried when NDL died on its 5 year expiration date.
David Cressey - 01 Apr 2008 14:34 GMT > Hello, > [quoted text clipped - 4 lines] > Does someone know of more concrete descriptions of what problems the old > CODASYL-like databases resulted in? I would describe the introduction of the relational model of data for DBMS work as forward progress rather than as a reaction against something. The early RDBMS products were, to some extent, an attempt to obtain the power and simplicity of the relational model without discarding a code base that had been based on hierarchical or network models of databases. This complicates the history.
To get back to your question, there are really two sets of problems with the prerelational DBMS products: problems inherent in the graph theory of data and problems in the implementation of RDBMS products up to that point in time. The theoretical problems are more easily discussed, and I expect that other contributors will answer that part of the question thoroughly.
I'm going to talk about the practical problems inherent in VAX DBMS when compared to VAX Rdb in about 1985. (Note: VAX Rdb did not become an SQL database until 1986). I'm choosing those two products because they shared so much code base that the comparison might actually be meaningful. VAX DBMS was a fairly good CODASYL database, while VAX Rdb gave Oracle quite a run for the money, until the Oracle Corp. bought out Rdb in 1994.
First, there's difficulty in learning. In order to build a VAX DBMS database, you practically had to become a database expert, even if the database you were building was a very simple, tinker toy database. You could build a starter VAX Rdb database with a lot less learning. This gave the initial advantage to "relational", although there's a danger here. Many people who learned just enough to build a tinker toy database stopped learning at that point, and built major databases that suffered from lack of design knowledge.
Second, there's difficulty in modifying the schema. There are many schema changes in VAX DBMS that necessitated unloading most or all of the data, and reloading the data into a new empty database. Not only that, many of these changes requires extensive maintenance work on the application programs. By comparison, many changes (such as adding a new table, a new index, or a new column to an existing table) could be made to an Rdb database without even taking the database off the air, and with little or no maintenance on most application programs.
Some of this is due to the implementations of VAX DBMS and VAX Rdb, but a large part of it is due to the difference between the graph model of data and the relational model of data. The more you learn about "data independence", the more apparent this will be to you.
Third, there's difficulty in implementing unanticipated queries. In VAX DBMS, queries that were not contemplated at the time the database was built were often either imossible to program or resulted in such disastrous performance that a database redesign was in order. By comparison, unanticipated queries on VAX Rdb databases frequently fell into the class of problems that are easily solved, and resulted in acceptable performance, sometimes requiring a new index, but otherwise not requiring any database work at all.
This is mostly due to the relational model, and not to the two implementations.
Having said this, the speed advantage in 1985 was still on the side of VAX DBMS, if the entire application was well understood, and an optimal design were made. It wasn't until much later that Rdb began to win the speed matches, and even this was due to enhancements made to Rdb, while VAX DBMS had been relegated to "mature" status.
There's more to this story, but this response is already too long.
Rob - 01 Apr 2008 18:35 GMT I'm going to disagree with several points you make in your response. cdt has become so adversarial lately, I feel compelled to state that I can disagree with your pov w.o. being disagreeable.
> > Hello, > [quoted text clipped - 7 lines] > I would describe the introduction of the relational model of data for DBMS > work as forward progress rather than as a reaction against something. Disagree. Codd was a doer as well as a thinker. In his 1970 paper, he perceived that data dependencies in present systems (ordering dependence, indexing dependence and access path dependence) implied that "changes to the characteristics of data representation logically impaired some application programs". (I am paraphrasing here. He goes on to give examples of all three dependencies.) The economics of data processing in those days (now I think referred to as Information Technology) made these "impairments" expensive -- changes in the data representation required rewriting applications. So that from a business perspective, an investment was lost. To me, this sounds like a rather practical economic argument in favor of developing data independent representations so that building applications upon them would preserve the applications development investment.
> The > early RDBMS products were, to some extent, an attempt to obtain the power > and simplicity of the relational model without discarding a code base that > had been based on hierarchical or network models of databases. Maybe. To me, your answer seems anachronistic. In his 1980 paper "DATA MODELS IN DATABASE MANAGEMENT", Codd points out (under HISTORY OF DATA MODEL DEVELOPMENT) that "[h]ierarchical and network systems were developed prior to 1970, but it was not until 1973 that data models for these systems were defined." And "[t]hus, hierarchic and network systems preceded the hierarchic and network models, whereas relational systems came after the relational model and used the model as a foundation." This is consistent with my experience. The earliest experimental systems were System R at IBM and Ingres at UCBerkeley. I know Ingres had no existing dbms codebase. I suspect System R used classic IBM access methods (ISAM, etc.), but I doubt that the translation of SEQUEL (predecessor to SQL) employed any "native" IMS code. ORACLE brought the first commercial RDBMS to market. There was no prior product, so I'm guessing no existing codebase either.
Your response seems to be *very* specific to DEC's RDBMS offerings a decade later. In that context, you are probably correct.
In response to the OP (Troels Arvin <tro...@arvin.dk>): There really weren't many CODASYL-based DBMSs around before 1970, so the landscape was basically 1.) DBMS vendors were providing hierarchical dbms products, and 2) CODASYL and the Relational Model were slugging it out to determine which would provide a theoretical basis for the next, "data independent" generation of products. Personally, I believe that the Relational Model prevailed because SQL was more natural (read: English-like) and non-procedural whereas the CODASYL sublanguage(s) were programmatic and navigational. Although higher-level languages could of course be built upon CODASYL sublanguages, SQL already seemed usable by non-programmers. (Consistent w. Cressey's other observations, SQL had functionality for data definition as well as data manipulation, so that potential users didn't have to master multiple products for defining and using their databases.) In effect, SQL won the hearts and minds of product developers because it addressed a vastly larger TAM (Total Available Market) than the navigational/programmatic CODASYL.
Again, my opinion.
Rob
David Cressey - 01 Apr 2008 20:31 GMT >I'm going to disagree with several points you make in your response. >cdt has become so adversarial lately, I feel compelled to state that I >can disagree with your pov w.o. being disagreeable. I appreciate the effort to disagree without being disagreeable. I think you succeeded. I'm going to try to maintain the same tone.
(My newsreader is not cooperating in marking what I'm replying to with the ">" mark. My apologies if a make some errors here.)
>> > Hello, >> [quoted text clipped - 21 lines] >independent representations so that building applications upon them >would preserve the applications development investment. Your points are well taken. I fail to understand how they disagree with what I said.
>> The >> early RDBMS products were, to some extent, an attempt to obtain the power [quoted text clipped - 15 lines] >code. ORACLE brought the first commercial RDBMS to market. There was >no prior product, so I'm guessing no existing codebase either. By "early RDMS products" I meant the large number of products that came out during the "relational derby" of the early 1980s. Perhaps my choice of words was unfortunate. I did not specifically mean Oracle and Ingress. There were several products that came out as a "now we're relational" rework of existing products. Some of these were simply the same old DBMS as before, with a thin veneer of relational interface to them. Most of what I know of these other products comes from reading commentary in c.d.t. Hopefully, other contributors will discuss those products in greater detail than I can.
>Your response seems to be *very* specific to DEC's RDBMS offerings a > decade later. In that context, you are probably correct. By no stretch of the imagination did I think of my response as a definitive response to the question raised by the OP. I focussed on those two products for several reasons.
First, because I know them a little better than other products.
Second, because they illustrate the difference between a relational DBMS and a CODASYL DBMS, in terms of customer acceptance and vendor support, better than some other pairs of products might. They were built and supported by the same engineering team, sold by the same vendor, ran on the same platform, and integrated with the same programming languages. This cuts down on the extent to which extraneous factors influenced the difference between the trajectory of the two products. They provide a fairly clean contrast between the strengths and weaknesses of Relational and CODASYL, in the practical world. That's one of the things I thought the OP was trying to elicit.
As to the ten year delay, the early 1980s is the time frame during which the question of which DBMS would be the flagship DBMS, in the part of the world that I was working in. The collision between CODASYL and Relational may have happened a decade earlier in other venues, or a decade later in yet others.
>In response to the OP (Troels Arvin <tro...@arvin.dk>): >There really weren't many CODASYL-based DBMSs around before 1970, so [quoted text clipped - 13 lines] >developers because it addressed a vastly larger TAM (Total Available >Market) than the navigational/programmatic CODASYL. If I understand the term CODASYL correctly, there weren't any CODASYL-based DBMS products before the DBTG task force defined CODASYL. That was, IIRC, about 1970.
Neither your comments nor mine address the difference between "relational" and SQL. That difference has been much commented on, in times gone by, in c.d.t. I expect others to comment on this difference. It may or may not be important to the discussion.
>Again, my opinion. My opinion, too.
Rob - 02 Apr 2008 16:10 GMT Sorry. After rereading your post and my response, I realize my effort to be concise resulted in confusion. Here are 3 statements:
1. > > RDBMSes are sometimes described as a reaction against network-/ hierarchical databases. ("Troels Arvin" <tro...@arvin.dk>)
I agree with this wrt hierarchcial dbms's, not network. In 1970, hierarchical dbms products were everywhere, network dbms products hardly existed. Codd's relational model and the CODASYL/DBTG network proposals were both reactions to maintenance/management problems with hierarchical products. The relational model prevailed.
2. > I would describe the introduction of the relational model of data for DBMS work as forward progress rather than as a reaction against something. ("David Cressey" <cresse...@verizon.net>)
Here I do disagree. The relational model was revolutionary thinking. Before 1970, database management wasn't even a computer science concept. Until at least 1974, SIGMOD was SIGFIDET-- Special Interest Group on File Description & Translation. Perhaps my "disagreement" is based on my personal experience of that time: IMS and all it's lookalikes were uninteresting data processing products; the relational model was a giant leap forward that was worthy of interest to a computer scientist. You imply "evolutionary" -- I need "revolutionary".
3. > The early RDBMS products were, to some extent, an attempt to obtain the power and simplicity of the relational model without discarding a code base that had been based on hierarchical or network models of databases. ("David Cressey" <cresse...@verizon.net>)
Wrt codebases, the only contribution from hierarchical systems were the Access Methods (ISAM, VSAM, ...). Wrt network systems, the only reused code I know of (from your postings) were the DEC products. In the grand scheme of things, these hardly appeared on the radar screen. (That is not meant to disparage your knowledge or experience: I was the system architect on a RDBMS in 1980-84 and designed the optimizing compiler for another RDBMS in 1985. I guarantee you nobody remembers either!)
Not really that much disagreement here, just differences in perspective.
Cheers, Rob
Cimode - 02 Apr 2008 17:34 GMT > Sorry. After rereading your post and my response, I realize my effort > to be concise resulted in confusion. Here are 3 statements: [quoted text clipped - 41 lines] > > Cheers, Rob I concur. The innovative aspect was that before RM, there was no direct relationship between mathematical concepts and database management.
Ken North - 05 Apr 2008 01:32 GMT > The innovative aspect was that before RM, there was no > direct relationship between mathematical concepts and database > management. Not exactly.
Assuming RM is a reference to Codd's seminal work ("A Relational Model of Data for Large Data Banks", 1970), you need only look at the papers he cited to find an earlier paper that showed the relationship between mathematics and database management.
David L. Childs taught mathematics at the University of Michigan and Codd cited his work. Here's the abstract for a March 1968 paper by Childs:
DESCRIPTION OF A SET-THEORETIC DATA STRUCTURE A set-theoretic data structure (STDS) is virtually a 'floating' or pointer-free structure allowing quicker access, less storage, and greater flexibility than fixed or rigid structures that rely heavily on internal pointers or hash-coding, such as 'associative or relational structures,' 'list structures,' 'ring structures,' etc. An STDS relies on set-theoretic operations to do the work usually allocated to internal pointers. A question in an STDS will be a set-theoretic expression. Each set in an STDS is completely independent of every other set, allowing modification of any set without perturbation of the rest of the structure; while fixed structures resist creation, destruction, or changes in data. An STDS is essentially a meta-structure, allowing a question to 'dictate' the structure or data-flow. A question establishes which sets are to be accessed and which operations are to be performed within and between these sets. In an STDS there are as many 'structures' as there are combinations of set-theoretic operations; and the addition, deletion, or change of data has no effect on set-theoretic operations, hence no effect on the 'dictated structures.' Thus in a floating structure like an STDS the question directs the structure, instead of being subservient to it.
And there's also ARPA-funded research described in this August 1968 research paper presented to the IFIP Congress 1968. Note the INTRODUCTION below.
"Childs, D. L., Feasibility of a Set-Theoretic Data Structure: A General Structure Based on a Reconstituted Definition of Relation"
ABSTRACT This paper is motivated by an assumption that many problems dealing with arbitrarily related data can be expedited on a digital computer by a storage structure which allows rapid execution of operations within and between sets of datum names. In order for such a structure to be feasible, two problems must be considered: 1. the structure should be general enough that the sets involved may be unrestricted, thus allowing sets of sets of sets...; sets of ordered pairs, ordered triples...; sets of variable length n-tuples, n-tuples of arbitrary sets; etc.; 2. the set-operations should be general in nature, allowing any of the usual set theory operations between sets as described above, with the assurance that these operations will be executed rapidly. A sufficient condition for the latter is the existence of a well-ordering relation on the union of the participating sets. These problems are resolved in this paper with the introduction of the concept of a 'complex,' which has an additional feature of allowing a natural extension of properties of binary relations to properties of general relations.
TABLE OF CONTENTS Page ABSTRACT............................................ I. INTRODUCTION.................................. II. COMPLEXES.................................... 5 III. EXTENSION OF SET OPERATIONS TO COMPLEXES...... 9 IV. ORDERED PAIR DEFINED BY A COMPLEX............. 11 V. FUNCTIONS..................................... 13 VI. DEVELOPMENT OF AN N-TUPLE..................... 14 VII. RECONSTITUTED DEFINITION OF RELATION.......... 18 VIII. TAU-ORDERING............................ 21 IX. EXTENDED OPERATIONS FOR RELATIONS............. 28 X. EXAMPLES.................................. 31 XI. CONCLUSION............................... 33 DEFINED SYMBOLS..................................... 35 REFERENCES.......................................... 37 vii
I. INTRODUCTION The overall goal, of which this paper is a part, is the development of a machine-independent data structure allowing rapid processing of data related by arbitrary assignment such as: the contents of a telephone book, library files, census reports, family lineage, networks, etc. Data which are non-intrinsically related have to be expressed (stored) in such a way as to define the way in which they are related before any data structure is applicable. Since any relation can be expressed in set theory as a set of ordered pairs and since set theory provides a wealth of operations for dealing with relations, a set-theoretic data structure appears worth investigation.
Ken North - 03 Apr 2008 21:41 GMT > I agree with this wrt hierarchcial dbms's, not network. In 1970, > hierarchical dbms products were everywhere, network dbms products > hardly existed. Bachman's GE IDS was available in the late '60s and it heavily influenced the network model (CODASYL standard). Charles Bachman was a key player in the task group that released the first database standard in 1971. IBM was originally involved in the DBTG but it put its focus on IMS instead of a CODASYLS DBMS. Other computer companies eventually jumped in with CODASYL-compliant products.
> Before 1970, database management wasn't even a computer science > concept. The notion of using a data base, instead of ad hoc data stores, dates back long before 1970. The survey of the CODASYL data base task group in 1968 included dozens of existing products - reproduced here: http://www.sqlsummit.com/PDF/DatabaseSurvey_CODASYL_1968.pdf
Systems such as IBM GIS (1966), GIM (1965) and IDS (1965) were quite sophisticated. For example, GIM implemented demand-paged memory managed by the DBMS executive (because virtual memory wasn't yet an IBM OS feature). It dynamically loaded and cached code pages and data pages during an era when CPU memory was expensive.
This page highlights early contributions to database technology: http://ourworld.compuserve.com/homepages/ken_north/db_hall.htm
> Until at least 1974 ... In October, 1974 my lab cooperated with UCLA in running the "Comparative Data Base Management Systems Seminar". In two days, we covered the leading 12 DBMS products (with assistance form the vendors). None of those products were based on Codd's relational model and IBM presented IMS.
The ACM Computing Surveys of March 1976 (Vol. 8, Number 1) focused on "Data-Base Management Systems. Guest editor E.H. Sibley's introduction was titled "The Development of Data-Base Technology". Sibley wrote:
"Data-base technology is one of the most rapidly growing areas of computer and information science. In less than twenty years, with the greatest part of the development in the past eight years, data-base systems have come from nothing to be a topic of major interest."
So Sibley refers to a period from 1956 on, with the major development on data-base systems occuring from 1968 to 1976.
> 3. > The early RDBMS products were, to some extent, an attempt to > obtain the power and simplicity of the relational model without > discarding a code base that had been based on hierarchical or network > models of databases. I'm not sure I understand what code base was not discarded.
The code to operate on hierarchical and network databases was primarily in programs written in FORTRAN, COBOL and PL/I. And some DBMS products of that era had only an integrated (interpreted) query and data manipulation language, without an interface for higher-order (compiled) languages.
Products such as Ingres, Oracle and Informix offered embedded SQL and pre-compilers for languages such as COBOL and C. However, the logic of CODASYL database applications is quite different from SQL applications. It's not the path of least resistance to try to create SQL applications by starting with the code of a CODASYL or hierarchical database application.
The SQL DBMS products may have used some common OS code, such as buffer managers and interrupt service routines, but preserving the investment in older database software wasn't generally a goal for database companies pushing the relational model.
Ken North - 02 Apr 2008 09:14 GMT > I would describe the introduction of the relational model of data for > DBMS > work as forward progress rather than as a reaction against something. Rob wrote < Disagree. Codd was a doer as well as a thinker. In his 1970 paper, he < perceived that data dependencies in present systems (ordering < dependence, indexing dependence and access path dependence) implied
> that "changes to the characteristics of data representation logically < impaired some application programs".
However Codd did not introduce the notion of avoiding those physical dependencies. In his first paper on the relational model, Codd cited the work of David L. Childs on Set Theoretic Data Structures.
Childs' 1968 papers and Codd's 1970 paper discussed structure (independent sets, no fixed structure, access by name instead of by pointers) and operations (union, restriction, etc.). Childs' papers included benchmark times for doing set operations on an IBM 7090. Codd's 1970 paper introduced normal forms, and his subsequent papers introduced the integrity rules. What's interesting is the University of Michigan connection. Codd, Bing Yao, and Michael Stonebraker were graduates. Some of the work done at University of Michigan during that time (Childs' STDS, Ash and Sibley's TRAMP relational memory) was for the CONCOMP project. It was funded by the US government and the research was available only to "qualified requesters".
======== Ken North =========== www.KNComputing.com
www.WebServicesSummit.com www.SQLSummit.com www.GridSummit.com
Ken North - 03 Apr 2008 20:23 GMT > In his 1980 paper "DATA MODELS IN DATABASE MANAGEMENT", Codd points > out (under HISTORY OF DATA > MODEL DEVELOPMENT) that "[h]ierarchical and network systems were > developed prior to 1970, but it was not until 1973 that data models > for these systems were defined." Interesting comment that's open to interpretation based on how you define a data model.
The network model that emerged from the CODASYL Data Base Task Group was derived from GE IDS, much as today's XQuery was derived from Quilt. Charles Bachman's IDS was released by GE in 1965. Bachman and Codd were both recipients of the ACM Turing Award and they faced off in a famous 1970s debate about navigational vs. relational data access.
The CODASYL network model is much like a persistent representation of a doubly-linked list. That model for traversing lists was well-known before the CODASYL spec of 1971. Linked lists date back to the 1950s. They were supported by LISP in the 1960s, and were described in Knuth's writings.
> Personally, I believe that the Relational Model prevailed because SQL > was more natural (read: English-like) and non-procedural whereas the > CODASYL sublanguage(s) were programmatic and navigational. Before Boyce and Chamberlin developed SQL, there were earlier attempts to develop English-like query languages - including a query language that came from IBM in the 1960s.
In 1968 the head of our systems center was part of the CODASYL committee, so we looked closely at the "data base" (database) systems that existed at the time. He'd been a consultant involved in field testing IBM's Generalized Information System (GIS) at Royal Dutch Shell in Venezuela in 1966. GIS used very simple English-like commands designed for non-programming users.
David Cressey - 03 Apr 2008 20:32 GMT > The CODASYL network model is much like a persistent representation of a > doubly-linked list. That model for traversing lists was well-known > before the CODASYL spec of 1971. Linked lists date back to the 1950s. > They were supported by LISP in the 1960s, and were described in Knuth's > writings. Thanks for an informative and well written comment.
One minor correction. LISP dates back to the 1950s. AFAIK linked lists were supported from the very first implementation of LISP. Doubly linked lists were well understood, as a programming pattern, from 1960. The difference between singly linked and doubly linked lists is a bit more subtle than might appear on the surface to some observers.
Rob - 03 Apr 2008 21:56 GMT > > In his 1980 paper "DATA MODELS IN DATABASE MANAGEMENT", Codd points > > out (under HISTORY OF DATA [quoted text clipped - 16 lines] > They were supported by LISP in the 1960s, and were described in Knuth's > writings. I think you are mixing the notion of models with implementations.
If "The CODASYL network model is much like a persistent representation of a doubly-linked list", then what does the SET OWNER have to do with it? What I mean is that it is the relationship between one set owner and zero or more set members that forms the basis of the model, not it's implementation as a doubly-linked list with a logical link from each set member to the set owner.
The CODASYL data model is much closer to the notion of recursive lists: A CODASYL set is an ordered list with each list element (set owner) having a sublist (the set members). Composition of these 2- level lists yields 3-level lists and so on.
I haven't looked at the CODASYL stuff in many years, but I've always assumed that the data model was a blueprint for an implementation engine upon which applications and higher-level interfaces would be built. I didn't really think it was intended as an enduser interface. (paul c's earlier comment resonates with this: "They have an unnecessarily complicated data interface, are basically impossible for an end-user to handle without a lot of expert help ...")
By that same reasoning, an SQL engine could support higher-level, enduser interfaces, but that never really has happened. As a consequence, nonprogrammers and programmers have had to struggle with SQL to interact with relational databases, or, rely on SQL experts to build custom interfaces, textual/graphical or APIs.
Here's two (I think) interesting network- vs. relational theoretical questions: 1. Suppose you had a relational DBMS. Could you implement a CODASYL DBMS upon it? 2. Suppose you had a CODASYL DBMS. Could you implement a relational DBMS upon it?
This is an equivalence test. If the answer to both were "yes", it would mean the data models are equivalent.
David Cressey - 03 Apr 2008 23:16 GMT > If "The CODASYL network model is much like a persistent representation > of a doubly-linked list", then what does the SET OWNER have to do with > it? What I mean is that it is the relationship between one set owner > and zero or more set members that forms the basis of the model, not > it's implementation as a doubly-linked list with a logical link from > each set member to the set owner. The doubly linked list forms a ring. In that ring are zero, one, or more than one "member records", and exactly one "owner record". Figuring out which record is the owner is mainly a matter of record types, but some implementations make it easier and quicker than that.
Bob Badour - 03 Apr 2008 21:57 GMT >>In his 1980 paper "DATA MODELS IN DATABASE MANAGEMENT", Codd points >>out (under HISTORY OF DATA [quoted text clipped - 10 lines] > both recipients of the ACM Turing Award and they faced off in a famous > 1970s debate about navigational vs. relational data access. Indeed, and it was in that debate that Codd, himself, defined the hierarchic and network data models to enable sensible comparisons with the relational data model.
[snip]
Ken North - 02 Apr 2008 09:14 GMT The CODASYL standard was a network model database. Conceptually it was similar to an on-disk representation of a doubly-linked list, with an additional pointer from a set member to its owner. So you could navigate through sets by following an next, previous or owner pointer.
CODASYL databases put a premium on database design because there was nothing comparable to ALTER TABLE. Best practice called for diligence in determining what the contents of sets were and the data (type, precision). The reason was that making changes was arduous.
For example, if you had a database with name address information, such as subscribers and customers, and you decided to add a second telephone number:
1. You recompiled your schema and sub-schemas (embedded in application source code) 2. Then you had to reload the entire database (often not the best way to spend a weekend).
CODASYL-type database were fast but inflexible. And broken pointers could cause anomalies on updates and insertions.
The group that defined the CODASYL standard did a survey of existing database systems in the early phase of its work. Here's what they reported in that 1968 survey:
http://www.sqlsummit.com/PDF/DatabaseSurvey_CODASYL_1968.pdf
======== Ken North =========== www.KNComputing.com
www.WebServicesSummit.com www.SQLSummit.com www.GridSummit.com
> Hello, > [quoted text clipped - 6 lines] > old > CODASYL-like databases resulted in?
|
|
|