Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / February 2008

Tip: Looking for answers? Try searching our database.

A research effort on a computing model...

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Cimode - 07 Feb 2008 23:01 GMT
On the last years, I have been putting some modest but consistent
effort to build a computing model whose principles, I embodied under
the form of a db core that allows relation definition and manipulation
as well as basic relational operations.

As a part of a genuine research effort, I hope this computing model
and subsequent db core
to be more faithful to core relational principles than what we have
get accustomed to with
direct image systems.  Because I am getting closer to draw out a first
stable version of
it (probably end of 2008) that I plan on making open source (a mix of
assembly and C), I
would like to share some of the aspects of this model to have some
feedback and eventually
correct some mistakes I may have mislooked or underestimated.

I always felt frustrated that we are using systems that are not
delivering the tenth of what RM promesses so here are some
achievements/aspects (I let you decide) of how this db core may get us
one step closer to what could be a TRDBMS. A a note, I have made which
may seem *unconventional* but nevertheless have serious fundamental
justifications I would be happy to
discuss or amend if I am wrong.

The computing model has the following features and characteristics:

About the computing model and core fundamental concepts...

> The computing model and subsequent db core exclusively respects binary logic.
> The model and implemented system has 3 layers of abstraction: the media layer(that represents the methodology and protocols used to presents information to the user), the
logical and the physical layer.
> The subsequent core uses new algorhytms that allow, based on a specific logical representation, to bind relational concepts to rules of probability to set theory to represents sets in a way I have never observed elsewhere.  The immediate consequence of such relationship is to allow the minimization the number of logical reads and writes at run time but not only. The systems also allows to exploit in an interesting way properties such as symmetry or commutativity.
> Relation definition is done exclusively through domain definition and constraints addition/update on attributes.
> Basic inter relation operations are supported.
> The notion of relation is equated with the notion of type.  A new relation *de facto* constitutes a new type that may be reused in other relation definitions.
> The system supports subtyping and makes more practical relation decomposition.  It makes
use of the only useful concept in OO: inheritance

About a possible language and opportunities of presented by the
sublanguage

> The SQL SELECT verb is not used.  Only WITH is emphasized to express conditionality
carrying over an atribute.
> The SQL INSERT/UPDATE/DELETE verbs are not used to the profit of more generic MAKE
associated with WHEN (in case of UPDATE) to modify the body of a
relation.  The purpose of
the WHEN verb is to reflect the dynamic nature of a specific relation.
> The concept of table in a SQL sense totally disappears. A relation has a runtime logical
representation that must explicitely be embodied as what is required
in the media layer.

For instance, I use the PRESENT2D verb and associate it to the
relation definition to represent data as a table.  To put it short,
the compiler I am currently working on allows to write

PRESENT2D -->states that the expected presentation layer is a
bidimensional table
[<RELATION NAME> WITH ATTRIBUTE1 = <ATTRIBUTE 1 DOMAIN VALUE>]

Example using the PART relation with PART: {number, name}

PRESENT2D [PART WITH NAME = 'screw']

By supposing, I would need to send by FTP the output of the PART
relation then I
would simply write:

SENDFTP('192.168.10.99:2050') [PART WITH NAME = 'screw']

But I can also write...

[MAKE PART = 'other' WHEN NAME NOT 'screw'] in case of UPDATE of PARTS
not being 'screw'
and...
[MAKE PART = 1, 'other'] in case of an INSERT of an additional row
and...
[MAKE PART = PART MINUS [PART WITH NAME = 'screw']] in case of trying
of a DELETE of all PARTS with name 'screw'

These are the basics but the grammatical and semantic separation
allows interesting
properties when expressing more complex operations.

On the logical side of things...
> Logical order is never a prerequisite in logical layer for operating relations.
> The concept of key does not exists as well as the concept of primary key in the traditional SQL sense.  The unique identifier is *de facto* a total set of attributes constituing the relation.
> The concept of foreign key does not exist as well for the same above reason.
> Based on the above principles, dupplicates as a result of an insert are impossible.  The
system automatically discards any dupplicate that may potentially be
the output of an
insert operation.
> Based on the above principles, dupplicates as a result of update are impossible.  The
system automatically discards any dupplicate that may potentially be
the output of an
update.

On the physical side of things...

> Physical order is irrelevant because of the nature of physical data organization.
> No indexing scheme of any kind is used.
> As a consequence of non direct image representation, the size of physical files may actually decrease as the number of tuples increases.
> Operations are exclusively done on disk.  No RAM caching of any sort is performed to operate relations.
> As far as response time is concerned, I have no PRESENT2D statement running above 5ms (7200 rpm), no matter how big the table (up to the limit of 64 bits memory registers).  Response time increase as the number of rows increases is quasi neglectable.

There are many many other things I would like to add but I will try
instead to put it as part of a future documentation.  Any comments
would be appreciated.  Just keep in mind that this is a single man's
research effort far from being perfect, I am aware of it.  So please
don't be too harsh.
Brian Selzer - 08 Feb 2008 01:00 GMT
> On the last years, I have been putting some modest but consistent
> effort to build a computing model whose principles, I embodied under
[quoted text clipped - 104 lines]
>> The concept of foreign key does not exist as well for the same above
>> reason.

Trivializing functional dependencies?  Eliminating inclusion dependencies?
Sounds to me like you'll have a big job selling that.

>> Based on the above principles, dupplicates as a result of an insert are
>> impossible.  The
[quoted text clipped - 26 lines]
> research effort far from being perfect, I am aware of it.  So please
> don't be too harsh.
Cimode - 08 Feb 2008 09:05 GMT
> "Cimode" <cim...@hotmail.com> wrote in message

> Trivializing functional dependencies?  Eliminating inclusion dependencies?
> Sounds to me like you'll have a big job selling that.
Thanks for the comments.
Quite frankly, I do not recall mentionning anything about dependencies
which are logical concepts that can be supported otherwise than by
using keys.  Instead of keys, a concept prior to RM, I prefered the
concept of unique identifier, type and un-ary type.  Just keep in mind
that this is not a SQL system.  Besides, I really am not trying to
sell anything (the db core is designed to be open source) but rather
as a way to explore new pathes and open up the ground for new ideas
and perspectives to emerge .

To give you an idea:

if you consider...

PART: NUMBER(integer), NAME(string) and CAR: MODEL(string),
PART(part) , PART in CAR has type PART , therefore *foreign key* is
implicit though use of user defined type.  I mentionned that a
relation *de facto* constitutes a type with limited number of values
that some other relation can draw from.  I somehow believe that the
concept was to use by Codd as a marketing effort to convince his IBM
audience

Hope this clarifies...
Brian Selzer - 08 Feb 2008 10:23 GMT
>> "Cimode" <cim...@hotmail.com> wrote in message
>
[quoted text clipped - 24 lines]
>
> Hope this clarifies...

It does not.

Keys are also a logical concept.  Keys imply join dependencies (including
functional dependencies), and due to the Principle of Full Normalization,
join dependencies can imply keys.

Also, you're assumption that foreign keys should be implicit is misguided.
Consider:

MachinesEmployeesAreTrainedOn {Employee, Machine}
   KEY{Employee, Machine};

MachinesEmployeesAreRunning {Employee, Machine}
   KEY{Employee, Machine}.

Without any foreign key, the two are unrelated: an employee can be running a
machine that he /is not/ trained on, an employee can be running a machine
that he /is/ trained on, or an employee can be trained on a machine that he
is not running.  (Or neither, assuming, of course, that there is an
Employees relation.)

With a foreign key, on the other hand, from MachinesEmployeesAreRunning to
MachinesEmployeesAreTrainedOn, an employee can only run a machine that
they're trained on.

Without the foreign key, each tuple in

MachinesEmployeesAreTrainedOn JOIN MachinesEmployeesAreRunning

represents an instance where an employee is running a machine that he is
trained on.

With the foreign key, on the other hand, each tuple in
MachinesEmployeesAreRunning represents an instance where an employee is
running a machine that he is trained on.
Cimode - 08 Feb 2008 11:54 GMT
[Snipped]

> Keys are also a logical concept.  Keys imply join dependencies (including
> functional dependencies), and due to the Principle of Full Normalization,
> join dependencies can imply keys.

Actually Keys are initially a physical concept in IBM mainframes that
Codd reused to clarify the relational model.  I do not believe it to
be such a determining factor.  On the other hand, domains are a much
more powerful concept to describe functionnal dependencies.

> Also, you're assumption that foreign keys should be implicit is misguided.
> Consider:
[quoted text clipped - 4 lines]
> MachinesEmployeesAreRunning {Employee, Machine}
>     KEY{Employee, Machine}.

I do not see any real contradiction, except on the imperative use of
KEY terminology to clarify JOIN dependency, using your example...

Assuming a M:N cardinality between Machines and Employee, associative
relations such as MachinesEmployeesAreTrainedOn and
MachinesEmployeesAreRunning are simply sub types of Machine_Employee
domain defined according to relation/type Employee and Machine.  In
other words, MachinesEmployeesAreTrainedOn and
MachinesEmployeesAreRunning relations are subsets of Machine_Employee
cartesian product.

> Without any foreign key, the two are unrelated: an employee can be running a
> machine that he /is not/ trained on, an employee can be running a machine
> that he /is/ trained on, or an employee can be trained on a machine that he
> is not running.  (Or neither, assuming, of course, that there is an
> Employees relation.)
I do see your point. Fair enough.  Suppose the following relations

EMPLOYEE
MACHINES
EMPLOYEE_MACHINE (representing the associative relation between
EMPLOYEE and MACHINES)

Nothing prevents from declaring MachinesEmployeesAreTrainedOn and
MachinesEmployeesAreRunning  as independet subsets of EMPLOYEE_MACHINE
or as subset of one another.  In the case they would not be
independent functional dependency can simply be expressed by adding
additional constraints on both MachinesEmployeesAreTrainedOn and
MachinesEmployeesAreRunning relation definition.  At definition time
we can simply write:

EMPLOYEE: fname(string), lname(string)
MACHINES: model(string), number(string)
EMPLOYEE_MACHINE: employee(employee), machine(machine)
MachinesEmployeesAreTrainedOn: employee_machine(employee_machine WITH
first set of constraints) and assuming Employee can run only machines
they have been trained on, simply define
MachinesEmployeesAreRunning:employee_machine(MachinesEmployeesAreTrainedOn)
and in this case MachinesEmployeesAreRunning is necessarily a subset
of MachinesEmployeesAreTrainedOn.  Functional dependencies is
preserved without having to use the KEY concept.

[Snipped]
> With the foreign key, on the other hand, each tuple in
> MachinesEmployeesAreRunning represents an instance where an employee is
> running a machine that he is trained on.
See above.

I believe mentionned that the unique identifier (equivalent of primary
key) necessarily constitutes a superkey and that I express the foreign
key concept implicitely through subtyping.
Brian Selzer - 08 Feb 2008 15:56 GMT
> [Snipped]
>>
[quoted text clipped - 6 lines]
> be such a determining factor.  On the other hand, domains are a much
> more powerful concept to describe functionnal dependencies.

What does it matter that the physical concept in IBM mainframes uses the
same term?  I don't think Codd's use was merely for clarification.  Keys and
foreign keys are fundamental.  I suggest you read Ronald Fagin's paper on
DKNF.

>> Also, you're assumption that foreign keys should be implicit is
>> misguided.
[quoted text clipped - 38 lines]
> MachinesEmployeesAreRunning relation definition.  At definition time
> we can simply write:

How far do you want to go?  Any juxtaposition of two attribute values could
refer to an individual in the Universe of Discourse.  Consider the case of a
relation that is in 3NF but not in BCNF due to overlapping keys.  If A, B,
and C are prime attributes in a relation,
   {A, B, C, D} with keys {A, B}, {B, C} and {A, C},
would you have a separate relation for A*B, a separate relation for B*C, and
a separate relation for A*C so that the result would be {AB, BC, AC, D}?
What would be the point?  Why not just use IDENTITY fields.

> EMPLOYEE: fname(string), lname(string)
> MACHINES: model(string), number(string)
[quoted text clipped - 16 lines]
> key) necessarily constitutes a superkey and that I express the foreign
> key concept implicitely through subtyping.
Cimode - 08 Feb 2008 16:14 GMT
> > [Snipped]
>
[quoted text clipped - 11 lines]
> foreign keys are fundamental.  I suggest you read Ronald Fagin's paper on
> DKNF.
Done thanks.

I have not stated that keys were useless but simply that the concepts
they underline can be expressed differently.  Their importance in RM
seems overrated.

> >> Also, you're assumption that foreign keys should be implicit is
> >> misguided.
[quoted text clipped - 40 lines]
>
> How far do you want to go?
Quite frankly: as far as possible from current implementations.  I
have nevertheless answered your question based on an example.  As I
have mentionned in the header, some choices I made are not
conventional among which the fact that I do not place so much emphasis
on the concept of key.

> Any juxtaposition of two attribute values could
> refer to an individual in the Universe of Discourse.  Consider the case of a
[quoted text clipped - 3 lines]
> would you have a separate relation for A*B, a separate relation for B*C, and
> a separate relation for A*C so that the result would be {AB, BC, AC, D}?
The above reason and questions only applies when keys are central to
the problem.  When this is not the case, BCNF somehow becomes moot
concept.  The overlapping example is a specific problem I have not yet
adressed but it is in my top list for next year (as well as temporal
data).

> What would be the point?  Why not just use IDENTITY fields.
Why use them if we can simply use custom made typing and superkeys.

> > EMPLOYEE: fname(string), lname(string)
> > MACHINES: model(string), number(string)
[quoted text clipped - 16 lines]
> > key) necessarily constitutes a superkey and that I express the foreign
> > key concept implicitely through subtyping.
Brian Selzer - 08 Feb 2008 19:39 GMT
>> > [Snipped]
>>
[quoted text clipped - 90 lines]
>> What would be the point?  Why not just use IDENTITY fields.
> Why use them if we can simply use custom made typing and superkeys.

I brought that up because it appears that you're going to require that every
key be unary.  Whether you call it a unique identifier or whatever, there
are problems associated with forcing all keys to be unary, and I think that
before there can be general acceptance of your ideas, you will have to
address those issues thoroughly.  On the one hand, unique identifiers bear a
striking resemblance to object identifiers, and on the other hand, there are
issues, particulary due to redundancy, with nested relations.  How to you
propose to get around those?

>> > EMPLOYEE: fname(string), lname(string)
>> > MACHINES: model(string), number(string)
[quoted text clipped - 17 lines]
>> > key) necessarily constitutes a superkey and that I express the foreign
>> > key concept implicitely through subtyping.
Cimode - 08 Feb 2008 20:19 GMT
> >> "Cimode" <cim...@hotmail.com> wrote in message
>
[quoted text clipped - 100 lines]
> before there can be general acceptance of your ideas, you will have to
> address those issues thoroughly.
Noted.  I just have not mentionned that I was forcing use of un-ary
keys...The example provided of associative entity certainly is not an
un-ary key.

> On the one hand, unique identifiers bear a
> striking resemblance to object identifiers, and on the other hand, there are
> issues, particulary due to redundancy, with nested relations.  How to you
> propose to get around those?
What are these issues.  Providing an example I can attempt bringing an
anwser to may help.  As I mentionned, dupplicates are systematically
rejected.  They are inherently impossible. as a part of the logical
model for the system.
Brian Selzer - 09 Feb 2008 00:00 GMT
> "Cimode" <cim...@hotmail.com> wrote in message
>
[quoted text clipped - 113 lines]
> before there can be general acceptance of your ideas, you will have to
> address those issues thoroughly.
Noted.  I just have not mentionned that I was forcing use of un-ary
keys...The example provided of associative entity certainly is not an
un-ary key.

> On the one hand, unique identifiers bear a
> striking resemblance to object identifiers, and on the other hand, there
> are
> issues, particulary due to redundancy, with nested relations.  How to you
> propose to get around those?

(My newsreader is not quoting correctly again.  Sorry about this.)
<<<<<<<<<<<<<<<<<<<<<<<
What are these issues.  Providing an example I can attempt bringing an
anwser to may help.  As I mentionned, dupplicates are systematically
rejected.  They are inherently impossible. as a part of the logical
model for the system.

The problems with object identifiers have been discussed here time and time
again.  I don't think a recap would serve any purpose.  The problems with
nested relations are due to having the same information recorded in the
database several times, possibly in several relations.  This does not
necessarily mean that there are duplicates, as there can be in a SQL table.
A relation that is in 1NF but not in 2NF exhibits redundancy due to the fact
that there is unrelated information represented in a relation.  A relation
that is in 2NF but not in 3NF exhibits redundancy because there is
information that depends on a set of attributes that is not a key, and
therefore there is redundancy whenever more than one instance of that set of
attributes appears.  Similarly, when relations are nested, and a value that
appears both several tiers down one path of the hierarchy and a few tiers
down another, and that value is composed of several other values (subtrees,
if you will), the fact that that value appears several places can cause
problems, the most obvious being the need to update the value in several
places whenever the individual it represents changes or whenever any of the
components of that individual lower in the heirarchy changes.
Cimode - 09 Feb 2008 12:31 GMT
> > "Cimode" <cim...@hotmail.com> wrote in message
>
[quoted text clipped - 148 lines]
> places whenever the individual it represents changes or whenever any of the
> components of that individual lower in the heirarchy changes.

It seems we have a communication problem.  I do honnestly do not see
how you can come up to te conclusion that I am using object
identifiers as unique identifier based on the posted examples.  Let me
describe the process by which the system allows t identify tuples.

Once a specific candidate key is identified by designer as a valid
identifier, the system *internally* decomposes the relation into 2
relations as a single relation on which the candidate key  is stored
and a 2nd relation that implements a 1:0 cardinality to the first
relation.  I have never mentionned that the first relation to have an
un-ary unique identifier.  Here is an example:

suppose we have a PERSON relation : fname, lname, dob...suppose that
lname, dob constitues the unique identifer for PERSON...The system
internally (invisible) and logically decomposes PERSON as PERSON_PRIME
with lname,dob and implements unicity on lname and dob only then it
declare PERSON_SECOND with a reference to PERSON_PRIME and consider
lname, dob as a type of its own.  ON a designer perspective, the
system still allows to manipulate PERSON  as a relation only that it
internally considers it as a view.  In such example, redudancy is
impossible and missing data is dealt way internally by the system.

I do not see then how that could lead anyone to the conclusion that
there may be some kind of un-ary relation prerequisite of any kind.
Subtyping is a process by which one can easily implement a reference
without having to go through the hassle of per-attribute analysis.
The methods decomposition and subtyping have been clarified over and
over by Hugh Darwen in thirdmaifesto.com even though he did it for
the purpose of clarifying correct ways of hadling missing information.

Hope this clarifies...
Cimode - 08 Feb 2008 13:36 GMT
> On the last years, I have been putting some modest but consistent
> effort to build a computing model whose principles, I embodied under
[quoted text clipped - 108 lines]
> research effort far from being perfect, I am aware of it.  So please
> don't be too harsh.

Sorry typo...In case of an update

[MAKE PART = 'other' WHEN NAME NOT 'screw'] in case of UPDATE of
PARTS
should in fact read
[MAKE PART (NAME) = 'other' WHEN NAME NOT 'screw'] in case of UPDATE
of PARTS
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.