Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / April 2008

Tip: Looking for answers? Try searching our database.

CODASYL-like databases

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Troels Arvin - 01 Apr 2008 07:42 GMT
Hello,

RDBMSes are sometimes described as a reaction against network-/
hierarchical databases. I believe that I've read that CODASYL-like DBMSes
resulted in databases which were very hard to maintain.

Does someone know of more concrete descriptions of what problems the old
CODASYL-like databases resulted in?

Signature

/home/troels/.signature

-CELKO- - 01 Apr 2008 14:20 GMT
The last standard for the network model was NDL; I still have a copy
of the document.  The real problems were (1) They could not be
optimized since they had fixed access paths to the data (2) They were
procedural and not declarative, so the programmer did all the work.
(3) Everyone had a different language instead of a Standard like SQL
to use (NDL came too late).

Most oft he business data in the world is still in IMS today.
DBMS_Plumber - 01 Apr 2008 19:47 GMT
> Most oft he business data in the world is still in IMS today.

While this passes as received wisdom, I can't find any actual evidence
to support it. At least, nothing since about 1992.

When I talk to folk who make and sell disk systems and what-not, they
tell me that they for every 1G of disk they're selling under IMS/
network systems, they're selling 100G for "relational" systems
(largely warehouses), and 1000G for file-systems (emails, and
attachments).

When I talk to folk who try to sell services to IT departments they
tell me that the overwhelming preponderance of "business data" today
is in Excel Spreadsheets (quotes, models and so forth) and the
majority of "business transaction data" is being managed by RDBMSs.

Very, very few of the web based applications developed since 1995 have
used IMS.

It's absolutely true that there are still some very large IMS systems
out there but their number, and the amount of data they manage,
doesn't seem to be growing. From what I can tell much of the data they
hold is short-codes-and-numbers stuff, while modern businesses are
increasingly dependent on other kinds of data; transcriptions of
support calls, customer feeback text, and so forth.

If you define "business data" extremely narrowly - essentially to just
buy/sell/ship/receive stuff - then it's remotely possible that IMS
still dominates (though I doubt it). But if you include all the
personnel records management, sales management, and what-not, no way.
Tegiri Nenashi - 02 Apr 2008 15:57 GMT
On Apr 1, 10:47 am, DBMS_Plumber <paul_geoffrey_br...@yahoo.com>
wrote:
> > Most oft he business data in the world is still in IMS today.
> If you define "business data" extremely narrowly - essentially to just
> buy/sell/ship/receive stuff - then it's remotely possible that IMS
> still dominates (though I doubt it). But if you include all the
> personnel records management, sales management, and what-not, no way.

The other way to estimate the significance (or rather perceived
significance) of technology is to calculate the number of book titles,
perhaps weighted by their popularity. For that matter I can't remeber
seen any IMS book in the local barnes-n-noble. Granted there is a
boring factor (for example, there aren't that many SAP or peoplesoft
titles either), still the adjective "dominates" can hardly apply to
IMS judged by any criteria.
Ken North - 03 Apr 2008 21:41 GMT
Book titles carried by a Barnes & Noble are not an indication of what
mature technology is in use. They are a better indication of what
technology has captured reader's interest - often because it's new and
book buyers are learning. COBOL and IMS have been around for four
decades so it would be hard to find a publisher who wants to publish a
title about either. The market is too small.

But COBOL and IMS are far from dead. It was only about 5 years ago that
the number of Java programmers exceeded the number of COBOL programmers.
In 2005, IBM reported IMS was handling over 15 million gigabytes of data
and 50 billion transactions per day:

ftp://ftp.software.ibm.com/software/data/ims/shelf/presentations/IMS_Today.pdf

======== Ken North ===========
www.KNComputing.com

www.WebServicesSummit.com
www.SQLSummit.com
www.GridSummit.com

On Apr 1, 10:47 am, DBMS_Plumber <paul_geoffrey_br...@yahoo.com>
wrote:
> On Apr 1, 6:20 am, -CELKO- <jcelko...@earthlink.net> wrote:
> > Most oft he business data in the world is still in IMS today.
> If you define "business data" extremely narrowly - essentially to just
> buy/sell/ship/receive stuff - then it's remotely possible that IMS
> still dominates (though I doubt it). But if you include all the
> personnel records management, sales management, and what-not, no way.

The other way to estimate the significance (or rather perceived
significance) of technology is to calculate the number of book titles,
perhaps weighted by their popularity. For that matter I can't remeber
seen any IMS book in the local barnes-n-noble. Granted there is a
boring factor (for example, there aren't that many SAP or peoplesoft
titles either), still the adjective "dominates" can hardly apply to
IMS judged by any criteria.
paul c - 03 Apr 2008 14:58 GMT
> The last standard for the network model was NDL; I still have a copy
> of the document.  The real problems were (1) They could not be
[quoted text clipped - 3 lines]
> to use (NDL came too late).
> ...

They have an unnecessarily complicated data interface, are basically
impossible for an end-user to handle without a lot of expert help and
programmers new to an app have a steep learning curve.  Everybody has to
know what kinds of 'records' have the data they want and then decide
which of many operators and options are needed for access, then organize
some other code and place the operators very carefully in that, then
likely have to cope with numerous arcane environmental exceptions/return
codes and then construct the kind of 'record' they wanted in the first
place.  The same data in two different databases might be kept in
different record types and need completely different operators.

How could a 'standard' possibly improve that situation?
-CELKO- - 03 Apr 2008 16:14 GMT
>> How could a 'standard' possibly improve that situation? <<

The only use NDL ever saw was to give standardized terminology for the
network products -- child, parent, link, etc. first appeared in that
document.  What I find weird is that SQL programmers that newbie SQL
programmers use those terms but with the word "table" attached to it
(I.e. "parent table" instead of "referenced table", "child table"
instead of "referencing table", "linking table" instead of
"relationship table", etc.)

I am pretty sure that they never saw NDL or any other network product;
they must have inherited the old terms from much older people with
whom they worked.

Nobody cried when NDL died on its 5 year expiration date.
David Cressey - 01 Apr 2008 14:34 GMT
> Hello,
>
[quoted text clipped - 4 lines]
> Does someone know of more concrete descriptions of what problems the old
> CODASYL-like databases resulted in?

I would describe the introduction of the relational model of data for DBMS
work as forward progress rather than as a reaction against something.  The
early RDBMS products were, to some extent, an attempt to obtain the power
and simplicity of the relational model without discarding a code base that
had been based on hierarchical or network models of databases. This
complicates the history.

To get back to your question,  there are really two sets of problems with
the prerelational DBMS products:  problems inherent in the graph theory of
data and problems in the implementation of RDBMS products up to that point
in time. The theoretical problems are more easily discussed,  and I expect
that other contributors will answer that part of the question thoroughly.

I'm going to talk about the practical problems inherent in VAX DBMS when
compared to VAX Rdb in about 1985.  (Note:  VAX Rdb did not become an SQL
database until 1986).  I'm choosing those two products because they shared
so much code base that the comparison might actually be meaningful.  VAX
DBMS was a fairly good CODASYL database,  while VAX Rdb gave Oracle quite a
run for the money,  until the Oracle Corp. bought out Rdb in 1994.

First,  there's difficulty in learning.  In order to build a  VAX DBMS
database,  you practically had to become a database expert,  even if the
database you were building was a very simple,  tinker toy database.   You
could build a starter VAX Rdb database with a lot less learning.  This gave
the initial advantage to "relational",  although there's a danger here.
Many people who learned just enough to build a tinker toy database stopped
learning at that point,  and built major databases that suffered from lack
of design knowledge.

Second,  there's difficulty in modifying the schema.  There are many schema
changes in VAX DBMS that necessitated unloading most or all of the data,
and reloading the data into a new empty database.  Not only that,  many of
these changes requires extensive maintenance work on the application
programs.  By comparison,  many changes (such as adding a new table,  a new
index,  or a new column to an existing table)  could be made to an Rdb
database without even taking the database off the air, and with little or no
maintenance on most application programs.

Some of this is due to the implementations of VAX DBMS and VAX Rdb,  but a
large part of it is due to the difference between the graph model of data
and the relational model of data.  The more you learn about "data
independence",  the more apparent this will be to you.

Third,  there's difficulty in implementing unanticipated queries.  In VAX
DBMS,  queries that were not contemplated at the time the database was built
were often either imossible to program or resulted in such disastrous
performance that a database redesign was in order.  By comparison,
unanticipated queries on VAX Rdb databases frequently fell into the class of
problems that are easily solved,  and resulted in acceptable performance,
sometimes requiring a new index,  but otherwise not requiring any database
work at all.

This is mostly due to the relational model,  and not to the two
implementations.

Having said this,  the speed advantage in 1985 was still on the side of VAX
DBMS,  if the entire application was well understood,  and an optimal design
were made.  It wasn't until much later that Rdb began to win the speed
matches,  and even this was due to enhancements made to Rdb,  while VAX DBMS
had been relegated to "mature" status.

There's more to this story,  but this response is already too long.
Rob - 01 Apr 2008 18:35 GMT
I'm going to disagree with several points you make in your response.
cdt has become so adversarial lately, I feel compelled to state that I
can disagree with your pov w.o. being disagreeable.

> > Hello,
>
[quoted text clipped - 7 lines]
> I would describe the introduction of the relational model of data for DBMS
> work as forward progress rather than as a reaction against something.

Disagree. Codd was a doer as well as a thinker. In his 1970 paper, he
perceived that data dependencies in present systems (ordering
dependence, indexing dependence and access path dependence) implied
that "changes to the characteristics of data representation logically
impaired some application programs". (I am paraphrasing here. He goes
on to give examples of all three dependencies.) The economics of data
processing in those days (now I think referred to as Information
Technology) made these "impairments" expensive -- changes in the data
representation required rewriting applications. So that from a
business perspective, an investment was lost. To me, this sounds like
a rather practical economic argument in favor of developing data
independent representations so that building applications upon them
would preserve the applications development investment.

> The
> early RDBMS products were, to some extent, an attempt to obtain the power
> and simplicity of the relational model without discarding a code base that
> had been based on hierarchical or network models of databases.

Maybe. To me, your answer seems anachronistic. In his 1980 paper "DATA
MODELS IN DATABASE MANAGEMENT", Codd points out (under HISTORY OF DATA
MODEL DEVELOPMENT) that "[h]ierarchical and network systems were
developed prior to 1970, but it was not until 1973 that data models
for these systems were defined." And "[t]hus, hierarchic and network
systems preceded the hierarchic and network models, whereas relational
systems came after the relational model and used the model as a
foundation." This is consistent with my experience. The earliest
experimental systems were System R at IBM and Ingres at UCBerkeley. I
know Ingres had no existing dbms codebase. I suspect System R used
classic IBM access methods (ISAM, etc.), but I doubt that the
translation of SEQUEL (predecessor to SQL) employed any "native" IMS
code. ORACLE brought the first commercial RDBMS to market. There was
no prior product, so I'm guessing no existing codebase either.

Your response seems to be *very* specific to DEC's RDBMS offerings a
decade later. In that context, you are probably correct.

In response to the OP (Troels Arvin <tro...@arvin.dk>):
There really weren't many CODASYL-based DBMSs around before 1970, so
the landscape was basically 1.) DBMS vendors were providing
hierarchical dbms products, and 2) CODASYL and the Relational Model
were slugging it out to determine which would provide a theoretical
basis for the next, "data independent" generation of products.
Personally, I believe that the Relational Model prevailed because SQL
was more natural (read: English-like) and non-procedural whereas the
CODASYL sublanguage(s) were programmatic and navigational. Although
higher-level languages could of course be built upon CODASYL
sublanguages, SQL already seemed usable by non-programmers.
(Consistent w. Cressey's other observations, SQL had functionality for
data definition as well as data manipulation, so that potential users
didn't have to master multiple products for defining and using their
databases.) In effect, SQL won the hearts and minds of product
developers because it addressed a vastly larger TAM (Total Available
Market) than the navigational/programmatic CODASYL.

Again, my opinion.

Rob
David Cressey - 01 Apr 2008 20:31 GMT
>I'm going to disagree with several points you make in your response.
>cdt has become so adversarial lately, I feel compelled to state that I
>can disagree with your pov w.o. being disagreeable.

I appreciate the effort to disagree without being disagreeable.  I think you
succeeded.  I'm going to try to maintain the same tone.

(My newsreader is not cooperating in marking what I'm replying to with the
">" mark.  My apologies if a make some errors here.)

>> > Hello,
>>
[quoted text clipped - 21 lines]
>independent representations so that building applications upon them
>would preserve the applications development investment.

Your points are well taken.  I fail to understand how they disagree with
what I said.

>> The
>> early RDBMS products were, to some extent, an attempt to obtain the power
[quoted text clipped - 15 lines]
>code. ORACLE brought the first commercial RDBMS to market. There was
>no prior product, so I'm guessing no existing codebase either.

By "early RDMS products"  I meant the large number of products that came out
during the "relational derby" of the early 1980s.  Perhaps my choice of
words was unfortunate.  I did not specifically mean Oracle and Ingress.
There were several products that came out as a "now we're relational"
rework of existing products.  Some of these were simply the same old DBMS as
before,  with a thin veneer of relational interface to them.  Most of what I
know of these other products comes from reading commentary in c.d.t.
Hopefully,  other contributors will discuss those products in greater detail
than I can.

>Your response seems to be *very* specific to DEC's RDBMS offerings a
> decade later. In that context, you are probably correct.

By no stretch of the imagination did I think of my response as a definitive
response to the question raised by the OP.  I focussed on those two products
for several reasons.

First, because I know them a little better than other products.

Second,  because they illustrate the difference between a relational DBMS
and a CODASYL DBMS,  in terms of customer acceptance and vendor support,
better than some other pairs of products might.  They were built and
supported by the same engineering team,  sold by the same vendor,  ran on
the same platform,  and integrated with the same programming languages.
This cuts down on the extent to which extraneous factors influenced the
difference between the trajectory of the two products.  They provide a
fairly clean contrast between the strengths and weaknesses  of Relational
and CODASYL,  in the practical world.  That's one of the things I thought
the OP was trying to elicit.

As to the ten year delay,  the early 1980s is the time frame during which
the question of which DBMS would be the flagship DBMS,  in the part of the
world that I was working in.  The collision between CODASYL and Relational
may have happened a decade earlier in other venues,  or a decade later in
yet others.

>In response to the OP (Troels Arvin <tro...@arvin.dk>):
>There really weren't many CODASYL-based DBMSs around before 1970, so
[quoted text clipped - 13 lines]
>developers because it addressed a vastly larger TAM (Total Available
>Market) than the navigational/programmatic CODASYL.

If I understand the term CODASYL correctly,  there weren't any CODASYL-based
DBMS products before the DBTG task force defined CODASYL.  That was, IIRC,
about 1970.

Neither your comments nor mine address the difference between "relational"
and SQL.  That difference has been much commented on, in times gone by, in
c.d.t.  I expect others to comment on this difference.  It may or may not be
important to the discussion.

>Again, my opinion.

My opinion, too.
Rob - 02 Apr 2008 16:10 GMT
Sorry. After rereading your post and my response, I realize my effort
to be concise resulted in confusion. Here are 3 statements:

1. > > RDBMSes are sometimes described as a reaction against network-/
hierarchical databases. ("Troels Arvin" <tro...@arvin.dk>)

I agree with this wrt hierarchcial dbms's, not network. In 1970,
hierarchical dbms products were everywhere, network dbms products
hardly existed. Codd's relational model and the CODASYL/DBTG network
proposals were both reactions to maintenance/management problems with
hierarchical products. The relational model prevailed.

2. > I would describe the introduction of the relational model of data
for DBMS
work as forward progress rather than as a reaction against something.
("David Cressey" <cresse...@verizon.net>)

Here I do disagree. The relational model was revolutionary thinking.
Before 1970, database management wasn't even a computer science
concept. Until at least 1974, SIGMOD was SIGFIDET-- Special Interest
Group on File Description & Translation. Perhaps my "disagreement" is
based on my personal experience of that time: IMS and all it's
lookalikes were uninteresting data processing products; the relational
model was a giant leap forward that was worthy of interest to a
computer scientist. You imply "evolutionary" -- I need
"revolutionary".

3. > The early RDBMS products were, to some extent, an attempt to
obtain the power and simplicity of the relational model without
discarding a code base that had been based on hierarchical or network
models of databases. ("David Cressey" <cresse...@verizon.net>)

Wrt codebases, the only contribution from hierarchical systems were
the Access Methods (ISAM, VSAM, ...). Wrt network systems, the only
reused code I know of (from your postings) were the DEC products. In
the grand scheme of things, these hardly appeared on the radar screen.
(That is not meant to disparage your knowledge or experience: I was
the system architect on a RDBMS in 1980-84 and designed the optimizing
compiler for another RDBMS in 1985. I guarantee you nobody remembers
either!)

Not really that much disagreement here, just differences in
perspective.

Cheers, Rob
Cimode - 02 Apr 2008 17:34 GMT
> Sorry. After rereading your post and my response, I realize my effort
> to be concise resulted in confusion. Here are 3 statements:
[quoted text clipped - 41 lines]
>
> Cheers, Rob

I concur.  The innovative aspect was that before RM, there was no
direct relationship between mathematical concepts and database
management.
Ken North - 05 Apr 2008 01:32 GMT
> The innovative aspect was that before RM, there was no
> direct relationship between mathematical concepts and database
> management.

Not exactly.

Assuming RM is a reference to Codd's seminal work ("A Relational Model
of Data for Large Data Banks", 1970), you need only look at the papers
he cited to find an earlier paper that showed the relationship between
mathematics and database management.

David L. Childs taught mathematics at the University of Michigan and
Codd cited his work. Here's the abstract for a March 1968 paper by
Childs:

DESCRIPTION OF A SET-THEORETIC DATA STRUCTURE
A set-theoretic data structure (STDS) is virtually a 'floating' or
pointer-free structure allowing quicker access, less storage, and
greater flexibility than fixed or rigid structures that rely heavily on
internal pointers or hash-coding, such as 'associative or relational
structures,' 'list structures,' 'ring structures,' etc. An STDS relies
on set-theoretic operations to do the work usually allocated to internal
pointers. A question in an STDS will be a set-theoretic expression. Each
set in an STDS is completely independent of every other set, allowing
modification of any set without perturbation of the rest of the
structure; while fixed structures resist creation, destruction, or
changes in data. An STDS is essentially a meta-structure, allowing a
question to 'dictate' the structure or data-flow. A question establishes
which sets are to be accessed and which operations are to be performed
within and between these sets. In an STDS there are as many 'structures'
as there are combinations of set-theoretic operations; and the addition,
deletion, or change of data has no effect on set-theoretic operations,
hence no effect on the 'dictated structures.' Thus in a floating
structure like an STDS the question directs the structure, instead of
being subservient to it.

And there's also ARPA-funded research described in this August 1968
research paper presented to the IFIP Congress 1968. Note the
INTRODUCTION below.

"Childs, D. L., Feasibility of a Set-Theoretic Data Structure: A General
Structure Based on a Reconstituted Definition of Relation"

ABSTRACT This paper is motivated by an assumption that many problems
dealing with arbitrarily related data can be expedited on a digital
computer by a storage structure which allows rapid execution of
operations within and between sets of datum names. In order for such a
structure to be feasible, two problems must be considered: 1. the
structure should be general enough that the sets involved may be
unrestricted, thus allowing sets of sets of sets...; sets of ordered
pairs, ordered triples...; sets of variable length n-tuples, n-tuples of
arbitrary sets; etc.; 2. the set-operations should be general in nature,
allowing any of the usual set theory operations between sets as
described above, with the assurance that these operations will be
executed rapidly. A sufficient condition for the latter is the existence
of a well-ordering relation on the union of the participating sets.
These problems are resolved in this paper with the introduction of the
concept of a 'complex,' which has an additional feature of allowing a
natural extension of properties of binary relations to properties of
general relations.

TABLE OF CONTENTS Page
ABSTRACT............................................ I.
INTRODUCTION..................................
II. COMPLEXES.................................... 5
III. EXTENSION OF SET OPERATIONS TO COMPLEXES...... 9
IV. ORDERED PAIR DEFINED BY A COMPLEX............. 11
V. FUNCTIONS..................................... 13
VI. DEVELOPMENT OF AN N-TUPLE..................... 14
VII. RECONSTITUTED DEFINITION OF RELATION.......... 18
VIII. TAU-ORDERING............................ 21
IX. EXTENDED OPERATIONS FOR RELATIONS............. 28
X. EXAMPLES.................................. 31
XI. CONCLUSION............................... 33
DEFINED SYMBOLS..................................... 35
REFERENCES.......................................... 37
vii

I. INTRODUCTION The overall goal, of which this paper is a part, is the
development of a machine-independent data structure allowing rapid
processing of data related by arbitrary assignment such as: the contents
of a telephone book, library files, census reports, family lineage,
networks, etc. Data which are non-intrinsically related have to be
expressed (stored) in such a way as to define the way in which they are
related before any data structure is applicable. Since any relation can
be expressed in set theory as a set of ordered pairs and since set
theory provides a wealth of operations for dealing with relations, a
set-theoretic data structure appears worth investigation.
Ken North - 03 Apr 2008 21:41 GMT
> I agree with this wrt hierarchcial dbms's, not network. In 1970,
> hierarchical dbms products were everywhere, network dbms products
> hardly existed.

Bachman's GE IDS was available in the late '60s and it heavily
influenced the network model (CODASYL standard). Charles Bachman was a
key player in the task group that released the first database standard
in 1971. IBM was originally involved in the DBTG but it put its focus on
IMS instead of a CODASYLS DBMS. Other computer companies eventually
jumped in with CODASYL-compliant products.

> Before 1970, database management wasn't even a computer science
> concept.

The notion of using a data base, instead of ad hoc data stores, dates
back long before 1970. The survey of the CODASYL data base task group in
1968 included dozens of existing products - reproduced here:
http://www.sqlsummit.com/PDF/DatabaseSurvey_CODASYL_1968.pdf

Systems such as IBM GIS (1966), GIM (1965) and IDS (1965) were quite
sophisticated. For example, GIM implemented demand-paged memory managed
by the DBMS executive (because virtual memory wasn't yet an IBM OS
feature). It dynamically loaded and cached code pages and data pages
during an era when CPU memory was expensive.

This page highlights early contributions to database technology:
http://ourworld.compuserve.com/homepages/ken_north/db_hall.htm

> Until at least 1974 ...

In October, 1974 my lab cooperated with UCLA in running the "Comparative
Data Base Management Systems Seminar". In two days, we covered the
leading 12 DBMS products (with assistance form the vendors). None of
those products were based on Codd's relational model and IBM presented
IMS.

The ACM Computing Surveys of March 1976 (Vol. 8, Number 1) focused on
"Data-Base Management Systems. Guest editor E.H. Sibley's introduction
was titled "The Development of Data-Base Technology". Sibley wrote:

"Data-base technology is one of the most rapidly growing areas of
computer and information science. In less than twenty years, with the
greatest part of the development in the past eight years, data-base
systems have come from nothing to be a topic of major interest."

So Sibley refers to a period from 1956 on, with the major development on
data-base systems occuring from 1968 to 1976.

> 3. > The early RDBMS products were, to some extent, an attempt to
> obtain the power and simplicity of the relational model without
> discarding a code base that had been based on hierarchical or network
> models of databases.

I'm not sure I understand what code base was not discarded.

The code to operate on hierarchical and network databases was primarily
in programs written in FORTRAN, COBOL and PL/I. And some DBMS products
of that era had only an integrated (interpreted) query and data
manipulation language, without an interface for higher-order (compiled)
languages.

Products such as Ingres, Oracle and Informix offered embedded SQL and
pre-compilers for languages such as COBOL and C. However, the logic of
CODASYL database applications is quite different from SQL applications.
It's not the path of least resistance to try to create SQL applications
by starting with the code of a CODASYL or hierarchical database
application.

The SQL DBMS products may have used some common OS code, such as buffer
managers and interrupt service routines, but preserving the investment
in older database software wasn't generally a goal for database
companies pushing the relational model.
Ken North - 02 Apr 2008 09:14 GMT
> I would describe the introduction of the relational model of data for
> DBMS
> work as forward progress rather than as a reaction against something.

Rob wrote
< Disagree. Codd was a doer as well as a thinker. In his 1970 paper, he
< perceived that data dependencies in present systems (ordering
< dependence, indexing dependence and access path dependence) implied
> that "changes to the characteristics of data representation logically
< impaired some application programs".

However Codd did not introduce the notion of avoiding those physical
dependencies. In his first paper on the relational model, Codd cited the
work of David L. Childs on Set Theoretic Data Structures.

Childs' 1968 papers and Codd's 1970 paper discussed structure
(independent sets, no fixed structure, access by name instead of by
pointers) and operations (union, restriction, etc.). Childs' papers
included benchmark times for doing set operations on an IBM 7090. Codd's
1970 paper introduced normal forms, and his subsequent papers introduced
the integrity rules.
What's interesting is the University of Michigan connection. Codd, Bing
Yao, and Michael Stonebraker were graduates. Some of the work done at
University of Michigan during that time (Childs' STDS, Ash and Sibley's
TRAMP relational memory) was for the CONCOMP project. It was funded by
the US government and the research was available only to "qualified
requesters".

======== Ken North ===========
www.KNComputing.com

www.WebServicesSummit.com
www.SQLSummit.com
www.GridSummit.com
Ken North - 03 Apr 2008 20:23 GMT
> In his 1980 paper "DATA MODELS IN DATABASE MANAGEMENT", Codd points
> out (under HISTORY OF DATA
> MODEL DEVELOPMENT) that "[h]ierarchical and network systems were
> developed prior to 1970, but it was not until 1973 that data models
> for these systems were defined."

Interesting comment that's open to interpretation based on how you
define a data model.

The network model that emerged from the CODASYL Data Base Task Group was
derived from GE IDS, much as today's XQuery was derived from Quilt.
Charles Bachman's IDS was released by GE in 1965. Bachman and Codd were
both recipients of the ACM Turing Award and they faced off in a famous
1970s debate about navigational vs. relational data access.

The CODASYL network model is much like a persistent representation of a
doubly-linked list. That model for traversing lists was well-known
before the CODASYL spec of 1971. Linked lists date back to the 1950s.
They were supported by LISP in the 1960s, and were described in Knuth's
writings.

> Personally, I believe that the Relational Model prevailed because SQL
> was more natural (read: English-like) and non-procedural whereas the
> CODASYL sublanguage(s) were programmatic and navigational.

Before Boyce and Chamberlin developed SQL, there were earlier attempts
to develop English-like query languages - including a query language
that came from IBM in the 1960s.

In 1968 the head of our systems center was part of the CODASYL
committee, so we looked closely at the "data base" (database) systems
that existed at the time. He'd been a consultant involved in field
testing IBM's Generalized Information System (GIS) at Royal Dutch Shell
in Venezuela in 1966. GIS used very simple English-like commands
designed for non-programming users.
David Cressey - 03 Apr 2008 20:32 GMT
> The CODASYL network model is much like a persistent representation of a
> doubly-linked list. That model for traversing lists was well-known
> before the CODASYL spec of 1971. Linked lists date back to the 1950s.
> They were supported by LISP in the 1960s, and were described in Knuth's
> writings.

Thanks for an informative and well written comment.

One minor correction.  LISP dates back to the 1950s.  AFAIK linked lists
were supported from the very first implementation of LISP.  Doubly linked
lists were well understood, as a programming pattern,  from 1960.  The
difference between singly linked and doubly linked lists is a bit more
subtle than might appear on the surface to some observers.
Rob - 03 Apr 2008 21:56 GMT
> > In his 1980 paper "DATA MODELS IN DATABASE MANAGEMENT", Codd points
> > out (under HISTORY OF DATA
[quoted text clipped - 16 lines]
> They were supported by LISP in the 1960s, and were described in Knuth's
> writings.

I think you are mixing the notion of models with implementations.

If "The CODASYL network model is much like a persistent representation
of a doubly-linked list", then what does the SET OWNER have to do with
it? What I mean is that it is the relationship between one set owner
and zero or more set members that forms the basis of the model, not
it's implementation as a doubly-linked list with a logical link from
each set member to the set owner.

The CODASYL data model is much closer to the notion of recursive
lists: A CODASYL set is an ordered list with each list element (set
owner) having a sublist (the set members). Composition of these 2-
level lists yields 3-level lists and so on.

I haven't looked at the CODASYL stuff in many years, but I've always
assumed that the data model was a blueprint for an implementation
engine upon which applications and higher-level interfaces would be
built. I didn't really think it was intended as an enduser interface.
(paul c's earlier comment resonates with this: "They have an
unnecessarily complicated data interface, are basically impossible for
an end-user to handle without a lot of expert help ...")

By that same reasoning, an SQL engine could support higher-level,
enduser interfaces, but that never really has happened. As a
consequence, nonprogrammers and programmers have had to struggle with
SQL to interact with relational databases, or, rely on SQL experts to
build custom interfaces, textual/graphical or APIs.

Here's two (I think) interesting network- vs. relational theoretical
questions:
1. Suppose you had a relational DBMS. Could you implement a CODASYL
DBMS upon it?
2. Suppose you had a CODASYL DBMS. Could you implement a relational
DBMS upon it?

This is an equivalence test. If the answer to both were "yes", it
would mean the data models are equivalent.
David Cressey - 03 Apr 2008 23:16 GMT
> If "The CODASYL network model is much like a persistent representation
> of a doubly-linked list", then what does the SET OWNER have to do with
> it? What I mean is that it is the relationship between one set owner
> and zero or more set members that forms the basis of the model, not
> it's implementation as a doubly-linked list with a logical link from
> each set member to the set owner.

The doubly linked list forms a ring.  In that ring are zero, one, or more
than one "member records",  and exactly one "owner record".  Figuring out
which record is the owner is mainly a matter of record types, but some
implementations make it easier and quicker than that.
Bob Badour - 03 Apr 2008 21:57 GMT
>>In his 1980 paper "DATA MODELS IN DATABASE MANAGEMENT", Codd points
>>out (under HISTORY OF DATA
[quoted text clipped - 10 lines]
> both recipients of the ACM Turing Award and they faced off in a famous
> 1970s debate about navigational vs. relational data access.

Indeed, and it was in that debate that Codd, himself, defined the
hierarchic and network data models to enable sensible comparisons with
the relational data model.

[snip]
Ken North - 02 Apr 2008 09:14 GMT
The CODASYL standard was a network model database. Conceptually it was
similar to an on-disk representation of a doubly-linked list, with an
additional pointer from a set member to its owner. So you could navigate
through sets by following an next, previous or owner pointer.

CODASYL databases put a premium on database design because there was
nothing comparable to ALTER TABLE. Best practice called for diligence in
determining what the contents of sets were and the data (type,
precision). The reason was that making changes was arduous.

For example, if you had a database with name address information, such
as subscribers and customers, and you decided to add a second telephone
number:

1. You recompiled your schema and sub-schemas (embedded in application
source code)
2. Then you had to reload the entire database (often not the best way to
spend a weekend).

CODASYL-type database were fast but inflexible. And broken pointers
could cause anomalies on updates and insertions.

The group that defined the CODASYL standard did a survey of existing
database systems in the early phase of its work. Here's what they
reported in that 1968 survey:

http://www.sqlsummit.com/PDF/DatabaseSurvey_CODASYL_1968.pdf

======== Ken North ===========
www.KNComputing.com

www.WebServicesSummit.com
www.SQLSummit.com
www.GridSummit.com
> Hello,
>
[quoted text clipped - 6 lines]
> old
> CODASYL-like databases resulted in?
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.