Database Forum / General DB Topics / DB Theory / February 2008
A research effort on a computing model...
|
|
Thread rating:  |
Cimode - 07 Feb 2008 23:01 GMT On the last years, I have been putting some modest but consistent effort to build a computing model whose principles, I embodied under the form of a db core that allows relation definition and manipulation as well as basic relational operations.
As a part of a genuine research effort, I hope this computing model and subsequent db core to be more faithful to core relational principles than what we have get accustomed to with direct image systems. Because I am getting closer to draw out a first stable version of it (probably end of 2008) that I plan on making open source (a mix of assembly and C), I would like to share some of the aspects of this model to have some feedback and eventually correct some mistakes I may have mislooked or underestimated.
I always felt frustrated that we are using systems that are not delivering the tenth of what RM promesses so here are some achievements/aspects (I let you decide) of how this db core may get us one step closer to what could be a TRDBMS. A a note, I have made which may seem *unconventional* but nevertheless have serious fundamental justifications I would be happy to discuss or amend if I am wrong.
The computing model has the following features and characteristics:
About the computing model and core fundamental concepts...
> The computing model and subsequent db core exclusively respects binary logic. > The model and implemented system has 3 layers of abstraction: the media layer(that represents the methodology and protocols used to presents information to the user), the logical and the physical layer.
> The subsequent core uses new algorhytms that allow, based on a specific logical representation, to bind relational concepts to rules of probability to set theory to represents sets in a way I have never observed elsewhere. The immediate consequence of such relationship is to allow the minimization the number of logical reads and writes at run time but not only. The systems also allows to exploit in an interesting way properties such as symmetry or commutativity. > Relation definition is done exclusively through domain definition and constraints addition/update on attributes. > Basic inter relation operations are supported. > The notion of relation is equated with the notion of type. A new relation *de facto* constitutes a new type that may be reused in other relation definitions. > The system supports subtyping and makes more practical relation decomposition. It makes use of the only useful concept in OO: inheritance
About a possible language and opportunities of presented by the sublanguage
> The SQL SELECT verb is not used. Only WITH is emphasized to express conditionality carrying over an atribute.
> The SQL INSERT/UPDATE/DELETE verbs are not used to the profit of more generic MAKE associated with WHEN (in case of UPDATE) to modify the body of a relation. The purpose of the WHEN verb is to reflect the dynamic nature of a specific relation.
> The concept of table in a SQL sense totally disappears. A relation has a runtime logical representation that must explicitely be embodied as what is required in the media layer.
For instance, I use the PRESENT2D verb and associate it to the relation definition to represent data as a table. To put it short, the compiler I am currently working on allows to write
PRESENT2D -->states that the expected presentation layer is a bidimensional table [<RELATION NAME> WITH ATTRIBUTE1 = <ATTRIBUTE 1 DOMAIN VALUE>]
Example using the PART relation with PART: {number, name}
PRESENT2D [PART WITH NAME = 'screw']
By supposing, I would need to send by FTP the output of the PART relation then I would simply write:
SENDFTP('192.168.10.99:2050') [PART WITH NAME = 'screw']
But I can also write...
[MAKE PART = 'other' WHEN NAME NOT 'screw'] in case of UPDATE of PARTS not being 'screw' and... [MAKE PART = 1, 'other'] in case of an INSERT of an additional row and... [MAKE PART = PART MINUS [PART WITH NAME = 'screw']] in case of trying of a DELETE of all PARTS with name 'screw'
These are the basics but the grammatical and semantic separation allows interesting properties when expressing more complex operations.
On the logical side of things...
> Logical order is never a prerequisite in logical layer for operating relations. > The concept of key does not exists as well as the concept of primary key in the traditional SQL sense. The unique identifier is *de facto* a total set of attributes constituing the relation. > The concept of foreign key does not exist as well for the same above reason. > Based on the above principles, dupplicates as a result of an insert are impossible. The system automatically discards any dupplicate that may potentially be the output of an insert operation.
> Based on the above principles, dupplicates as a result of update are impossible. The system automatically discards any dupplicate that may potentially be the output of an update.
On the physical side of things...
> Physical order is irrelevant because of the nature of physical data organization. > No indexing scheme of any kind is used. > As a consequence of non direct image representation, the size of physical files may actually decrease as the number of tuples increases. > Operations are exclusively done on disk. No RAM caching of any sort is performed to operate relations. > As far as response time is concerned, I have no PRESENT2D statement running above 5ms (7200 rpm), no matter how big the table (up to the limit of 64 bits memory registers). Response time increase as the number of rows increases is quasi neglectable. There are many many other things I would like to add but I will try instead to put it as part of a future documentation. Any comments would be appreciated. Just keep in mind that this is a single man's research effort far from being perfect, I am aware of it. So please don't be too harsh.
Brian Selzer - 08 Feb 2008 01:00 GMT > On the last years, I have been putting some modest but consistent > effort to build a computing model whose principles, I embodied under [quoted text clipped - 104 lines] >> The concept of foreign key does not exist as well for the same above >> reason. Trivializing functional dependencies? Eliminating inclusion dependencies? Sounds to me like you'll have a big job selling that.
>> Based on the above principles, dupplicates as a result of an insert are >> impossible. The [quoted text clipped - 26 lines] > research effort far from being perfect, I am aware of it. So please > don't be too harsh. Cimode - 08 Feb 2008 09:05 GMT > "Cimode" <cim...@hotmail.com> wrote in message
> Trivializing functional dependencies? Eliminating inclusion dependencies? > Sounds to me like you'll have a big job selling that. Thanks for the comments. Quite frankly, I do not recall mentionning anything about dependencies which are logical concepts that can be supported otherwise than by using keys. Instead of keys, a concept prior to RM, I prefered the concept of unique identifier, type and un-ary type. Just keep in mind that this is not a SQL system. Besides, I really am not trying to sell anything (the db core is designed to be open source) but rather as a way to explore new pathes and open up the ground for new ideas and perspectives to emerge .
To give you an idea:
if you consider...
PART: NUMBER(integer), NAME(string) and CAR: MODEL(string), PART(part) , PART in CAR has type PART , therefore *foreign key* is implicit though use of user defined type. I mentionned that a relation *de facto* constitutes a type with limited number of values that some other relation can draw from. I somehow believe that the concept was to use by Codd as a marketing effort to convince his IBM audience
Hope this clarifies...
Brian Selzer - 08 Feb 2008 10:23 GMT >> "Cimode" <cim...@hotmail.com> wrote in message > [quoted text clipped - 24 lines] > > Hope this clarifies... It does not.
Keys are also a logical concept. Keys imply join dependencies (including functional dependencies), and due to the Principle of Full Normalization, join dependencies can imply keys.
Also, you're assumption that foreign keys should be implicit is misguided. Consider:
MachinesEmployeesAreTrainedOn {Employee, Machine} KEY{Employee, Machine};
MachinesEmployeesAreRunning {Employee, Machine} KEY{Employee, Machine}.
Without any foreign key, the two are unrelated: an employee can be running a machine that he /is not/ trained on, an employee can be running a machine that he /is/ trained on, or an employee can be trained on a machine that he is not running. (Or neither, assuming, of course, that there is an Employees relation.)
With a foreign key, on the other hand, from MachinesEmployeesAreRunning to MachinesEmployeesAreTrainedOn, an employee can only run a machine that they're trained on.
Without the foreign key, each tuple in
MachinesEmployeesAreTrainedOn JOIN MachinesEmployeesAreRunning
represents an instance where an employee is running a machine that he is trained on.
With the foreign key, on the other hand, each tuple in MachinesEmployeesAreRunning represents an instance where an employee is running a machine that he is trained on.
Cimode - 08 Feb 2008 11:54 GMT [Snipped]
> Keys are also a logical concept. Keys imply join dependencies (including > functional dependencies), and due to the Principle of Full Normalization, > join dependencies can imply keys. Actually Keys are initially a physical concept in IBM mainframes that Codd reused to clarify the relational model. I do not believe it to be such a determining factor. On the other hand, domains are a much more powerful concept to describe functionnal dependencies.
> Also, you're assumption that foreign keys should be implicit is misguided. > Consider: [quoted text clipped - 4 lines] > MachinesEmployeesAreRunning {Employee, Machine} > KEY{Employee, Machine}. I do not see any real contradiction, except on the imperative use of KEY terminology to clarify JOIN dependency, using your example...
Assuming a M:N cardinality between Machines and Employee, associative relations such as MachinesEmployeesAreTrainedOn and MachinesEmployeesAreRunning are simply sub types of Machine_Employee domain defined according to relation/type Employee and Machine. In other words, MachinesEmployeesAreTrainedOn and MachinesEmployeesAreRunning relations are subsets of Machine_Employee cartesian product.
> Without any foreign key, the two are unrelated: an employee can be running a > machine that he /is not/ trained on, an employee can be running a machine > that he /is/ trained on, or an employee can be trained on a machine that he > is not running. (Or neither, assuming, of course, that there is an > Employees relation.) I do see your point. Fair enough. Suppose the following relations
EMPLOYEE MACHINES EMPLOYEE_MACHINE (representing the associative relation between EMPLOYEE and MACHINES)
Nothing prevents from declaring MachinesEmployeesAreTrainedOn and MachinesEmployeesAreRunning as independet subsets of EMPLOYEE_MACHINE or as subset of one another. In the case they would not be independent functional dependency can simply be expressed by adding additional constraints on both MachinesEmployeesAreTrainedOn and MachinesEmployeesAreRunning relation definition. At definition time we can simply write:
EMPLOYEE: fname(string), lname(string) MACHINES: model(string), number(string) EMPLOYEE_MACHINE: employee(employee), machine(machine) MachinesEmployeesAreTrainedOn: employee_machine(employee_machine WITH first set of constraints) and assuming Employee can run only machines they have been trained on, simply define MachinesEmployeesAreRunning:employee_machine(MachinesEmployeesAreTrainedOn) and in this case MachinesEmployeesAreRunning is necessarily a subset of MachinesEmployeesAreTrainedOn. Functional dependencies is preserved without having to use the KEY concept.
[Snipped]
> With the foreign key, on the other hand, each tuple in > MachinesEmployeesAreRunning represents an instance where an employee is > running a machine that he is trained on. See above.
I believe mentionned that the unique identifier (equivalent of primary key) necessarily constitutes a superkey and that I express the foreign key concept implicitely through subtyping.
Brian Selzer - 08 Feb 2008 15:56 GMT > [Snipped] >> [quoted text clipped - 6 lines] > be such a determining factor. On the other hand, domains are a much > more powerful concept to describe functionnal dependencies. What does it matter that the physical concept in IBM mainframes uses the same term? I don't think Codd's use was merely for clarification. Keys and foreign keys are fundamental. I suggest you read Ronald Fagin's paper on DKNF.
>> Also, you're assumption that foreign keys should be implicit is >> misguided. [quoted text clipped - 38 lines] > MachinesEmployeesAreRunning relation definition. At definition time > we can simply write: How far do you want to go? Any juxtaposition of two attribute values could refer to an individual in the Universe of Discourse. Consider the case of a relation that is in 3NF but not in BCNF due to overlapping keys. If A, B, and C are prime attributes in a relation, {A, B, C, D} with keys {A, B}, {B, C} and {A, C}, would you have a separate relation for A*B, a separate relation for B*C, and a separate relation for A*C so that the result would be {AB, BC, AC, D}? What would be the point? Why not just use IDENTITY fields.
> EMPLOYEE: fname(string), lname(string) > MACHINES: model(string), number(string) [quoted text clipped - 16 lines] > key) necessarily constitutes a superkey and that I express the foreign > key concept implicitely through subtyping. Cimode - 08 Feb 2008 16:14 GMT > > [Snipped] > [quoted text clipped - 11 lines] > foreign keys are fundamental. I suggest you read Ronald Fagin's paper on > DKNF. Done thanks.
I have not stated that keys were useless but simply that the concepts they underline can be expressed differently. Their importance in RM seems overrated.
> >> Also, you're assumption that foreign keys should be implicit is > >> misguided. [quoted text clipped - 40 lines] > > How far do you want to go? Quite frankly: as far as possible from current implementations. I have nevertheless answered your question based on an example. As I have mentionned in the header, some choices I made are not conventional among which the fact that I do not place so much emphasis on the concept of key.
> Any juxtaposition of two attribute values could > refer to an individual in the Universe of Discourse. Consider the case of a [quoted text clipped - 3 lines] > would you have a separate relation for A*B, a separate relation for B*C, and > a separate relation for A*C so that the result would be {AB, BC, AC, D}? The above reason and questions only applies when keys are central to the problem. When this is not the case, BCNF somehow becomes moot concept. The overlapping example is a specific problem I have not yet adressed but it is in my top list for next year (as well as temporal data).
> What would be the point? Why not just use IDENTITY fields. Why use them if we can simply use custom made typing and superkeys.
> > EMPLOYEE: fname(string), lname(string) > > MACHINES: model(string), number(string) [quoted text clipped - 16 lines] > > key) necessarily constitutes a superkey and that I express the foreign > > key concept implicitely through subtyping. Brian Selzer - 08 Feb 2008 19:39 GMT >> > [Snipped] >> [quoted text clipped - 90 lines] >> What would be the point? Why not just use IDENTITY fields. > Why use them if we can simply use custom made typing and superkeys. I brought that up because it appears that you're going to require that every key be unary. Whether you call it a unique identifier or whatever, there are problems associated with forcing all keys to be unary, and I think that before there can be general acceptance of your ideas, you will have to address those issues thoroughly. On the one hand, unique identifiers bear a striking resemblance to object identifiers, and on the other hand, there are issues, particulary due to redundancy, with nested relations. How to you propose to get around those?
>> > EMPLOYEE: fname(string), lname(string) >> > MACHINES: model(string), number(string) [quoted text clipped - 17 lines] >> > key) necessarily constitutes a superkey and that I express the foreign >> > key concept implicitely through subtyping. Cimode - 08 Feb 2008 20:19 GMT > >> "Cimode" <cim...@hotmail.com> wrote in message > [quoted text clipped - 100 lines] > before there can be general acceptance of your ideas, you will have to > address those issues thoroughly. Noted. I just have not mentionned that I was forcing use of un-ary keys...The example provided of associative entity certainly is not an un-ary key.
> On the one hand, unique identifiers bear a > striking resemblance to object identifiers, and on the other hand, there are > issues, particulary due to redundancy, with nested relations. How to you > propose to get around those? What are these issues. Providing an example I can attempt bringing an anwser to may help. As I mentionned, dupplicates are systematically rejected. They are inherently impossible. as a part of the logical model for the system.
Brian Selzer - 09 Feb 2008 00:00 GMT > "Cimode" <cim...@hotmail.com> wrote in message > [quoted text clipped - 113 lines] > before there can be general acceptance of your ideas, you will have to > address those issues thoroughly. Noted. I just have not mentionned that I was forcing use of un-ary keys...The example provided of associative entity certainly is not an un-ary key.
> On the one hand, unique identifiers bear a > striking resemblance to object identifiers, and on the other hand, there > are > issues, particulary due to redundancy, with nested relations. How to you > propose to get around those? (My newsreader is not quoting correctly again. Sorry about this.) <<<<<<<<<<<<<<<<<<<<<<< What are these issues. Providing an example I can attempt bringing an anwser to may help. As I mentionned, dupplicates are systematically rejected. They are inherently impossible. as a part of the logical model for the system.
The problems with object identifiers have been discussed here time and time again. I don't think a recap would serve any purpose. The problems with nested relations are due to having the same information recorded in the database several times, possibly in several relations. This does not necessarily mean that there are duplicates, as there can be in a SQL table. A relation that is in 1NF but not in 2NF exhibits redundancy due to the fact that there is unrelated information represented in a relation. A relation that is in 2NF but not in 3NF exhibits redundancy because there is information that depends on a set of attributes that is not a key, and therefore there is redundancy whenever more than one instance of that set of attributes appears. Similarly, when relations are nested, and a value that appears both several tiers down one path of the hierarchy and a few tiers down another, and that value is composed of several other values (subtrees, if you will), the fact that that value appears several places can cause problems, the most obvious being the need to update the value in several places whenever the individual it represents changes or whenever any of the components of that individual lower in the heirarchy changes.
Cimode - 09 Feb 2008 12:31 GMT > > "Cimode" <cim...@hotmail.com> wrote in message > [quoted text clipped - 148 lines] > places whenever the individual it represents changes or whenever any of the > components of that individual lower in the heirarchy changes. It seems we have a communication problem. I do honnestly do not see how you can come up to te conclusion that I am using object identifiers as unique identifier based on the posted examples. Let me describe the process by which the system allows t identify tuples.
Once a specific candidate key is identified by designer as a valid identifier, the system *internally* decomposes the relation into 2 relations as a single relation on which the candidate key is stored and a 2nd relation that implements a 1:0 cardinality to the first relation. I have never mentionned that the first relation to have an un-ary unique identifier. Here is an example:
suppose we have a PERSON relation : fname, lname, dob...suppose that lname, dob constitues the unique identifer for PERSON...The system internally (invisible) and logically decomposes PERSON as PERSON_PRIME with lname,dob and implements unicity on lname and dob only then it declare PERSON_SECOND with a reference to PERSON_PRIME and consider lname, dob as a type of its own. ON a designer perspective, the system still allows to manipulate PERSON as a relation only that it internally considers it as a view. In such example, redudancy is impossible and missing data is dealt way internally by the system.
I do not see then how that could lead anyone to the conclusion that there may be some kind of un-ary relation prerequisite of any kind. Subtyping is a process by which one can easily implement a reference without having to go through the hassle of per-attribute analysis. The methods decomposition and subtyping have been clarified over and over by Hugh Darwen in thirdmaifesto.com even though he did it for the purpose of clarifying correct ways of hadling missing information.
Hope this clarifies...
Cimode - 08 Feb 2008 13:36 GMT > On the last years, I have been putting some modest but consistent > effort to build a computing model whose principles, I embodied under [quoted text clipped - 108 lines] > research effort far from being perfect, I am aware of it. So please > don't be too harsh. Sorry typo...In case of an update
[MAKE PART = 'other' WHEN NAME NOT 'screw'] in case of UPDATE of PARTS should in fact read [MAKE PART (NAME) = 'other' WHEN NAME NOT 'screw'] in case of UPDATE of PARTS
|
|
|