Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / January 2007

Tip: Looking for answers? Try searching our database.

RA with MV attributes

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
David - 16 Jan 2007 08:40 GMT
Here is a partial formalization of a relational algebra based on MV
attributes.  The approach appears simple and intuitive.  In particular
the join of two relations is rather elegant.

Definition:  An attribute a consists of an attribute-name N(a) and a
domain-name D(a).

Definition:  A relation-type is a set of attributes.   The attribute
names must all be unique.

Definition:  A relation r consists of a relation-type A(r) together
with a set of tuples T(r), where each tuple is a map from each
attribute a inA(r) to a subset of D(a).

Definition:  Let r be a relation and B be a subset of A(r).  Then
proj(r,B) is a relation r’ where A(r’) = B,  and for each t’ in
T(r’),   t’ is the restriction of t on B.

Definition:  Let r1,r2 be relations that are join compatible (ie where
any attributes of the same name also have the same domain).  We define
the join r = r1 |X| r2 as follows.

   A(r) = (A(r1) union A(r2))

   T(r) =
   { t |
        there exists t1 in T(r1) &
        there exists t2 in T(r2) &
        for each a in A(r1) \ A(r2),  t(a) = t1(a)  &
        for each a in A(r2) \ A(r1),  t(a) = t2(a)  &
        for each a in (A(r1) intersect A(r2)), t(a) = (t1(a) intersect
t2(a))
   }

Example

r1(Names,Cars)
   bill,              car1,car2,car4
   john,fred        car3

r2(Cars,Colours)
  car1,car3,car4    red
  car2              green

r1 |X| r2 (Names,Cars,Colours)
  bill              car1,car4     red
  bill              car2          green
  john,fred         car3          red
  john,fred                       green

This is actually an outer join.  An inner join requires removal of
tuples with empty intersections on the common attributes.

Some advantages of MV attributes

1.  Avoids redundancy.  This could be significant – say when there
are a large number of people that co-own a large number of cars.

2.  It deals rather elegantly with missing information.

and some significant disadvantages

1.  Each tuple only represents a *lower bound* on the known
relationships.   Therefore it is important to avoid making assumptions
about what is not true when looking at a given tuple.   For example the
last tuple of r1 |X| r2 above doesn’t imply that John and Fred
don’t own any cars.

2.  The simple relationship back to propositions and FOPL is lost.
Perhaps the meaning of a relation could be understood in terms of
underlying 6NF predicates.

3.  There is a non-uniqueness of representation because tuples can
group values in different ways yet achieve the same set-theoretic
result.

In the end I’m not sure whether MV attributes are a good idea or not.
Marshall - 16 Jan 2007 11:10 GMT
Interesting post. Some comments inline.

> Definition:  A relation r consists of a relation-type A(r) together
> with a set of tuples T(r), where each tuple is a map from each
> attribute a inA(r) to a subset of D(a).

I realize it's becoming fashionable, but I still dislike the idea
that a relation "consists of" its type and its value, both.
It's type is its type and its value is its value; it doesn't
make sense to me otherwise. If we consider the relation
as being a combination of two things, we ought to be
able to consider those two things separately. In which
case we can ask the question, what is the type of a
relation's value? Presumably that is also its type, and
so we have infinite regress.

Or consider how it sounds for a value of a non-parameterized
type: "An integer consists of a numeric-type int together with
an int." ???

Yet another reason to dislike it is that it combines in a single
construct both a static construct and a dynamic one. It is
entirely reasonable to consider a system where types exist
at compile time and do not exist at runtime, and values exist
at runtime and do not exist at compile time. So how could we
combine them?

Anyway, this is a diversion from your main point.

>     T(r) =
>     { t |
[quoted text clipped - 5 lines]
> t2(a))
>     }

Wow. Crazy.

> This is actually an outer join.

It's actually a sort of full outer join, yes?

I'm not sure why you'd want to *replace* regular natural join with
this operator, assuming we consider it useful. It seems more like
an add-on.

 An inner join requires removal of
> tuples with empty intersections on the common attributes.
>
> Some advantages of MV attributes
>
> 1.  Avoids redundancy.  This could be significant – say when there
> are a large number of people that co-own a large number of cars.

I don't see how this avoids redundancy.

> 2.  It deals rather elegantly with missing information.

I don't get this point either.

> and some significant disadvantages
>
[quoted text clipped - 11 lines]
> group values in different ways yet achieve the same set-theoretic
> result.

2 seems like a deal-breaker, doesn't it? Unless we can use something
other than FOPL to assign meaning to terms.

> In the end I’m not sure whether MV attributes are a good idea or not.

Well, I'm not entirely sure whether your analysis was of MV or of this
operator you came up with. I still see modest benefit to nested
relations,
for example. And they seem to work fine without any additional
operators
besides the usual ones, although I haven't analyzed this all that much.

Marshall
David - 16 Jan 2007 15:36 GMT
> Interesting post. Some comments inline.
>
[quoted text clipped - 24 lines]
>
> Anyway, this is a diversion from your main point.

I agree with your points.   Note that we can't define a join on two
relation values without access to their type information.   A weakly
typed language that allows for joins on two relations would be forced
to explicitly store the type together with the relation-value as part
of the run-time state.   As you suggest above, a strongly typed
language allows values to be stored without explicit run time type
information.

I regard the combination of type and value into a single construct as a
convenience for the exposition.

> >     T(r) =
> >     { t |
[quoted text clipped - 11 lines]
>
> It's actually a sort of full outer join, yes?

Yes.

> I'm not sure why you'd want to *replace* regular natural join with
> this operator, assuming we consider it useful. It seems more like
> an add-on.

It is interesting that a full outer join has a simpler definition than
a natural join.

>   An inner join requires removal of
> > tuples with empty intersections on the common attributes.
[quoted text clipped - 5 lines]
>
> I don't see how this avoids redundancy.

MV attributes allow for storing the same relation with less stored
values.  I interpret that as meaning it can avoid redundancy.

Consider a single tuple with Ni values for the ith attribute.   The
number of values stored by the tuple is sum(Ni).   Without MV
attributes the number of values stored is numAttribs x product(Ni)
which could be very large in pathological cases.

> > 2.  It deals rather elegantly with missing information.
>
> I don't get this point either.

Subject to integrity constraints, a tuple can map an attribute to an
empty set.

> > and some significant disadvantages
> >
[quoted text clipped - 14 lines]
> 2 seems like a deal-breaker, doesn't it? Unless we can use something
> other than FOPL to assign meaning to terms.

I wonder whether a complete and elegant RA is enough to make MV
attributes respectable.  Furthermore I have no trouble seeing how it
relates back to 6NF predicates.   I expect such a mapping could be
formalised.

> > In the end I’m not sure whether MV attributes are a good idea or not.
>
[quoted text clipped - 4 lines]
> operators
> besides the usual ones, although I haven't analyzed this all that much.

Another interesting point of comparison relates to updates.  I can see
pros and cons of MV attributes.

I imagine a system using MV attributes should support cardinality
integrity constraints on each attribute.   For example,

   person(Person, Parents)     where Person: 1..1  and  Parents: 0..2

I expect referential integrity constraints will be easy to formalise.
David - 17 Jan 2007 00:48 GMT
> > Interesting post. Some comments inline.
> >
[quoted text clipped - 35 lines]
> I regard the combination of type and value into a single construct as a
> convenience for the exposition.

After thinking about it some more I've changed my mind.  The join
operation needs to *calculate* the "type" of the result.  Therefore
it seems better to regard the "type information" as part of the
relation value.

To avoid the infinite regress you should say
1.   A relation value r comprises both A(r) and T(r).
2.   The *type* of all relation values is the same:  ie "relation"
3.   T(r) on its own is not a relation value
4.   The type of T(r) is "set of tuples", not "relation"

BTW saying "r is a relation value" is naff.  Really should say "r
is a relation".

I presume in a proper general purpose system strong typing of relvars
won't work.  Having to specify the type of relvars would be painful.

It would also disallow conditional projection, such as

  relation r = test ? r1 :  proj(r1,B)

...not that I have an example of where that would be useful.
Marshall - 17 Jan 2007 01:28 GMT
> > > Interesting post. Some comments inline.
>
[quoted text clipped - 40 lines]
> it seems better to regard the "type information" as part of the
> relation value.

The calculation of the result type can be performed knowing only
the operand *types* and not the operand values. Yes, this could
be deferred to runtime, but it doesn't have to be, and if it is, the
same calculation will be done, with the same result, at every run,
whereas the calculation could be performed just once, statically.

Furthermore you will find that this property holds true of many
functions over generic types. Consider a simple generic list
type supporting a "get first element" operation. The type of
the generic list is parameterized by the element type T.
The type of getFirstElement() is int -> T. Hence the getFirstElement
operation could also be said to "calculate the type" but if you
code this up in, say, C++, the type calculating occurs entirely
at compile time.

> To avoid the infinite regress you should say
> 1.   A relation value r comprises both A(r) and T(r).
[quoted text clipped - 4 lines]
> BTW saying "r is a relation value" is naff.  Really should say "r
> is a relation".

That doesn't work for me. In 2, your proposal only considers
abstract, parameterized type "relation" and doesn't consider
concrete relation types. In 4, you draw a distinction between
a set of tuples and a relation, but those are equivalent as far
as I know. Can you explain the distinction?

> I presume in a proper general purpose system strong typing of relvars
> won't work.  Having to specify the type of relvars would be painful.

Well, the term "strong typing" is fraught with peril, but assuming
you mean "static typing" then I don't see why you think this
would be an issue. Consider that a CREATE TABLE statement
specifies the type of relvars.

> It would also disallow conditional projection, such as
>
>    relation r = test ? r1 :  proj(r1,B)
>
> ...not that I have an example of where that would be useful.

I think you will find that *every* language with a ? : operator
does not allow the two possible results to have incompatible types.

Marshall
Bob Badour - 17 Jan 2007 02:23 GMT
>>>>Interesting post. Some comments inline.
>>
[quoted text clipped - 87 lines]
> I think you will find that *every* language with a ? : operator
> does not allow the two possible results to have incompatible types.

How does javascript evaluate the following?

o.left = oX + ( ! fNS4 ? "px" : 0 )

I have seen that a lot recently. The interesting thing--if I read it
correctly--is the return type of the conditional fundamentally alters
the preceding operation. ie. string concatenation vs. numeric addition

(Not that javascript is anywhere near as interesting as your discussion
about static typing, though--sorry to interject.)
Marshall - 17 Jan 2007 08:17 GMT
> > I think you will find that *every* language with a ? : operator
> > does not allow the two possible results to have incompatible types.
[quoted text clipped - 5 lines]
> correctly--is the return type of the conditional fundamentally alters
> the preceding operation. ie. string concatenation vs. numeric addition

Yeah; I probably should have said "every *statically typed* language".
I don't really do dynamically typed languages; they are too wiggly for
my taste. But they are certainly interesting to compare with static
typing, especially since the former are inherently more able to express
computations that are safe but can't be proven to be safe, and the
latter are inherently more able to express constraints.

A related issue I find quite compelling is whether a single language
could be designed that would incorporate the best of both, leaving
the decision of which one to use in the hands of the programmer,
to be made on a case-by-case basis, rather than in the hands of
the language designer, who necessarily decides these things for
all programmers using his language.

Marshall
Gene Wirchenko - 17 Jan 2007 17:32 GMT
[snip]

>A related issue I find quite compelling is whether a single language
>could be designed that would incorporate the best of both, leaving
>the decision of which one to use in the hands of the programmer,
>to be made on a case-by-case basis, rather than in the hands of
>the language designer, who necessarily decides these things for
>all programmers using his language.

    Like VB 6?  You can declare variables as being of one type (say,
Integer) or any (Variant).

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Marshall - 18 Jan 2007 05:28 GMT
> "Marshall" <marshall.spi...@gmail.com> wrote:[snip]
>
[quoted text clipped - 7 lines]
> Like VB 6?  You can declare variables as being of one type (say,
> Integer) or any (Variant).

Sure, that's a start. However what I'm starting to imagine is something
that AFAIK is not in any existent language. The idea is that a given
module could be capable of very strict static checking, such as proving
the absence of infinite loops, faults such as divide by zero, and
freedom
from stack overflows through a proof of maximum resource usage.
Another module could be completely free from such requirements,
fully dynamic, having all checking deferred until runtime. All in the
same language; the choice of where in the continuum you are made
by the programmer based on the needs of the current application.

Marshall
Gene Wirchenko - 18 Jan 2007 06:32 GMT
>> "Marshall" <marshall.spi...@gmail.com> wrote:[snip]
>>
[quoted text clipped - 18 lines]
>same language; the choice of where in the continuum you are made
>by the programmer based on the needs of the current application.

    I think that a general solution would require a solution to the
halting problem.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Marshall - 18 Jan 2007 07:51 GMT
> >> Like VB 6?  You can declare variables as being of one type (say,
> >> Integer) or any (Variant).
[quoted text clipped - 12 lines]
> I think that a general solution would require a solution to the
> halting problem.

No, no, I'm not proposing writing a program that tells whether another
program halts or not; that is of course impossible. However it
is certainly possible to write code that can correctly recognize some
programs as halting, some programs as diverging, and for other
programs it cannot tell. There are certain classes of applications
for which a proof of termination is highly desirable, and some
of the people building these systems already use termination
proofs.

If one were writing such an application, which has such very
strict requirements, one might well want a language that
will issue a compile error unless it has a termination proof of
the program, whether it is human supplied or automatically
generated, or a mix.

Marshall
Gene Wirchenko - 18 Jan 2007 18:13 GMT
>> >> Like VB 6?  You can declare variables as being of one type (say,
>> >> Integer) or any (Variant).
[quoted text clipped - 16 lines]
>program halts or not; that is of course impossible. However it
>is certainly possible to write code that can correctly recognize some
                                                                 ^^^^
    Magic Weasel Word!

>programs as halting, some programs as diverging, and for other
>programs it cannot tell. There are certain classes of applications
>for which a proof of termination is highly desirable, and some
>of the people building these systems already use termination
>proofs.

    The magic weasel word is "some".  You can also write some code
that will correctly recognise whether some pieces of code will ever,
say, generate an integer overflow.

    If you can not tell if said program will ever terminate, it will
not be possible, in general, to tell if it will ever generate an
integer overflow.

>If one were writing such an application, which has such very
>strict requirements, one might well want a language that
>will issue a compile error unless it has a termination proof of
>the program, whether it is human supplied or automatically
>generated, or a mix.

    That will come at the cost of possibly throwing out a program
that works, but that the proof system can not prove as working.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Marshall - 18 Jan 2007 20:54 GMT
> >If one were writing such an application, which has such very
> >strict requirements, one might well want a language that
> >will issue a compile error unless it has a termination proof of
> >the program, whether it is human supplied or automatically
> >generated, or a mix.

> That will come at the cost of possibly throwing out a program
> that works, but that the proof system can not prove as working.

Yes, absolutely. Which for some applications (not many) is
not merely worth it but mandatory. There is software that
lives depend on.

On the other hand, there are software systems where they
don't even want to be bothered with a compile step. "The
compiler just gets in my way" they say. These two represent
the endpoints in a continuum.

Right now we have disparate systems that satisfy both
groups. But those systems don't look anything like each
other, and their position on the correctness-proof
continuum is fixed. The question is, can we accomodate
both types of users in a single system? I think it's
possible, or at least approachable. Or at the very
very least, an interesting question to think about.

Marshall
Gene Wirchenko - 18 Jan 2007 22:34 GMT
>> >If one were writing such an application, which has such very
>> >strict requirements, one might well want a language that
[quoted text clipped - 8 lines]
>not merely worth it but mandatory. There is software that
>lives depend on.

    Thank you.  I was concerned that you were not seeing this, and it
is important (at least to me).

>On the other hand, there are software systems where they
>don't even want to be bothered with a compile step. "The
>compiler just gets in my way" they say. These two represent
>the endpoints in a continuum.

    Truly.

>Right now we have disparate systems that satisfy both
>groups. But those systems don't look anything like each
[quoted text clipped - 3 lines]
>possible, or at least approachable. Or at the very
>very least, an interesting question to think about.

    I think it is possible.

    If I want something quick and dirty, I need not do anything to my
code.  I compile with "accept everything".

    If I want severe checking, then I set flags appropriately and
possibly have to add to my code to indicate special treatment.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Bob Badour - 20 Jan 2007 10:29 GMT
> Right now we have disparate systems that satisfy both
> groups. But those systems don't look anything like each
[quoted text clipped - 3 lines]
> possible, or at least approachable. Or at the very
> very least, an interesting question to think about.

Be careful what you wish for or you will end up with ADA.
Marshall - 20 Jan 2007 14:41 GMT
> > Right now we have disparate systems that satisfy both
> > groups. But those systems don't look anything like each
[quoted text clipped - 5 lines]
>
> Be careful what you wish for or you will end up with ADA.

Ha! It is a fair point.

Marshall
Marshall - 18 Jan 2007 22:19 GMT
> If you can not tell if said program will ever terminate, it will
> not be possible, in general, to tell if it will ever generate an
> integer overflow.

I don't think this is correct. At least, I am pretty sure that
it is possible to generate a proof of overflow-freedom
without a proof of termination. However you said
"in general" and, as you pointed out earlier in so
many words , it is not possible to decide any
of these qualities *in general.* In fact, if I understand
Rice's theorem, *all* nontrivial properties of code
are undecidable. This is far from meaning that
termination analysis is valueless, however; on
the contrary.

Note that there are many useful systems in which
it is not possible to express algorithms that do not
halt. These systems are of course not Turing complete,
but that doesn't mean they are not useful. If I am not
mistaken propositional calculus is decidable. Note also
that join, union, project, etc. all have the property of
always terminating when applied to finite relations.

Marshall
Chris Smith - 21 Jan 2007 05:26 GMT
> In fact, if I understand
> Rice's theorem, *all* nontrivial properties of code
> are undecidable.

I'll be the picky person who points out that Rice's Theorem says all
non-trivial properties of the (languages/problems/functions) that are
(accepted/solved/computed) by code are undecidable.  Clearly, the
question "does this program use 'foo' as an identifier?" is both non-
trivial and decidable.

But yes, I agree with what you are ultimately saying.  I'm being picky
for the following reason.  I would say that the distinction of
properties of code versus its language/problem/function is relevant
precisely because it explains that some kinds of analysis are possible
in general even for a Turing-complete language, but they will just never
correspond exactly to a property of the language.  What can happen is
that, for example, the failure of analysis to prove correctness doesn't
NECESSARILY mean that a program is incorrect, but is a pretty damn good
indicator in real life such that it would be a remarkable coincidence if
the program actually were still correct.

Signature

Chris Smith

Bob Badour - 20 Jan 2007 10:30 GMT
>>"Marshall" <marshall.spi...@gmail.com> wrote:[snip]
>>
[quoted text clipped - 18 lines]
> same language; the choice of where in the continuum you are made
> by the programmer based on the needs of the current application.

What is the point of establishing an upper bound on resource usage for a
module if one cannot guarantee that the resource will be available?
David - 17 Jan 2007 02:51 GMT
> > > > Interesting post. Some comments inline.
> >
[quoted text clipped - 68 lines]
> abstract, parameterized type "relation" and doesn't consider
> concrete relation types.

1 & 2 are saying *by definition* there is a type called "relation".
It is neither abstract nor parameterised.

Why can't we say that *by definition* the set of attributes of a
relation is part of its state not its type?

Here's an analogy.  A C++ library provides a matrix class that is not
parameterised on the number of rows and columns.   You can write this

   void foo(int n,int m)
   {
       Matrix A(n,3), B(3,m);
       Matrix C = A * B;       // Size = n x m
   }

The non-parameterised type also allows for mutative operations that add
or remove rows and columns.

> In 4, you draw a distinction between
> a set of tuples and a relation, but those are equivalent as far
> as I know. Can you explain the distinction?

4 follows from 1 which is a definition of the type "relation".   My
point is that it is self consistent.

Consider this:  In mathematics a function is defined on a particular
domain and codomain.   Restrict the domain and you have a *different*
function.  Otherwise we can't make statements about whether a given
function is 1:1 or onto.   This suggests that domain and codomain
should be regarded as *properties* of a given function.   A similar
thing seems to be true for a relation - it is more than its set of
tuples.   Note in any case that I defined a tuple to be a mapping on
A(r), so therefore you can't really disconnect the tuple from the set
of attributes anyway.

> > I presume in a proper general purpose system strong typing of relvars
> > won't work.  Having to specify the type of relvars would be painful.
[quoted text clipped - 12 lines]
> I think you will find that *every* language with a ? : operator
> does not allow the two possible results to have incompatible types.

That's my point.   That's why it seems better to have a
non-parameterized type called "relation".
paul c - 17 Jan 2007 03:34 GMT
> ...
> Why can't we say that *by definition* the set of attributes of a
> relation is part of its state not its type?
> ...

We can say anything we want "by definition", the question is where does
saying it get us?  What conclusions does it lead to and so forth.  I
like your suggestions because they seem to have something behind them,
but don't ask me what, exactly.  I wish you would distinguish whether
you are talking about physical implementations or user concepts.

p
David - 17 Jan 2007 04:02 GMT
> > ...
> > Why can't we say that *by definition* the set of attributes of a
[quoted text clipped - 3 lines]
> We can say anything we want "by definition", the question is where does
> saying it get us?  What conclusions does it lead to and so forth.

Marshall doesn't seem prepared to accept this definition on it face,
because he has assumed that the set of attributes of a relation is
surely its type.

> I
> like your suggestions because they seem to have something behind them,
> but don't ask me what, exactly.  I wish you would distinguish whether
> you are talking about physical implementations or user concepts.

I guess this relates to the following discussion

http://www.ai.mit.edu/docs/articles/good-news/subsection3.2.1.html

I tend to think that both implementation and interface should be
reasonably simple, and it's bad to strongly favor one at the expense of
the other.   I often look at what the implementation is telling me in
order to design a reasonable interface.   There is nothing better than
getting your cake and eating it too.
paul c - 17 Jan 2007 14:58 GMT
>>>...
>>>Why can't we say that *by definition* the set of attributes of a
[quoted text clipped - 22 lines]
> order to design a reasonable interface.   There is nothing better than
> getting your cake and eating it too.

That's a bit too blurry for me.  What is simple as compared to what is
too simple depends very much on the objective, as I think somebody said
"as simple as possible, but no simpler!".

I do think that once one has a definition it is important to proceed to
positioning it in the scheme of things, for example is this a physical
approach for storing two relations in one or is it an attempt to promote
three-valued logic or is it something else?  I find this thread
interesting because I think rva's have not been explored very thoroughly
(perhaps they should be discounted eventually, but I don't think anybody
has yet made a good case, or at least one I could understand, for this,
yet).

The "advantage" of avoiding redundancy in general is also a blurred one
for me. If the domain we're interested in is what cars certain people
own, I don't find it all redundant that two people may happen to own the
same kind of car.  Whereas avoiding physical redundancy as far as
computer machinery is concerned is a very worthy goal.

p
David - 18 Jan 2007 05:28 GMT
> >>>...
> >>>Why can't we say that *by definition* the set of attributes of a
[quoted text clipped - 26 lines]
> too simple depends very much on the objective, as I think somebody said
> "as simple as possible, but no simpler!".

IIRC that was Einstein

> I do think that once one has a definition it is important to proceed to
> positioning it in the scheme of things, for example is this a physical
> approach for storing two relations in one or is it an attempt to promote
> three-valued logic or is it something else?

I'm interested in both its logical and physical aspects.  It seems
best initially to focus mainly on the logical.

I wouldn't call it a 3vl.

> I find this thread
> interesting because I think rva's have not been explored very thoroughly
> (perhaps they should be discounted eventually, but I don't think anybody
> has yet made a good case, or at least one I could understand, for this,
> yet).

What is rva?  Did you mean mva?

> The "advantage" of avoiding redundancy in general is also a blurred one
> for me. If the domain we're interested in is what cars certain people
> own, I don't find it all redundant that two people may happen to own the
> same kind of car.  Whereas avoiding physical redundancy as far as
> computer machinery is concerned is a very worthy goal.

It seems worth thinking about redundancy from the logical point of view
- in terms of what it means for updates and schema changes.   There
are pros and cons of MV attributes that require analysis.

For example, adding an MV attribute to an existing relation is trivial
- if every existing tuple maps the new attribute to the empty set
then the addition of the attribute hasn't changed the information in
the DB and it won't affect any existing queries.
paul c - 19 Jan 2007 11:01 GMT
> ...
>>I do think that once one has a definition it is important to proceed to
[quoted text clipped - 5 lines]
> best initially to focus mainly on the logical.
> ...

I could entertain this if the motive were to formalize a physical
storage scheme.  But I don't see anything logical about presenting to a
user the inference that fred and bill own an empty set of cars that are
green, especially if the user could then project away the Cars attribute
and conclude that they do own a green car or cars.  (That seems like
using the empty set to achieve Codd's connection trap.)

p
David - 20 Jan 2007 00:12 GMT
> > ...
> >>I do think that once one has a definition it is important to proceed to
[quoted text clipped - 10 lines]
> user the inference that fred and bill own an empty set of cars that are
> green

That's an incorrect inference.  That's not what a tuple means.  If you
have the tuple

   Names = {Fred, Bill}, Cars = {},  CarColour = {green}

Then you are meant to read out propositions by taking the cross product
of the sets.   The cross product is of course empty.

> especially if the user could then project away the Cars attribute
> and conclude that they do own a green car or cars.  (That seems like
> using the empty set to achieve Codd's connection trap.)

You must perform selection (to ensure there really is a car) before the
projection.  This is logical and AFAIK everything is self-consistent.
There are some "rules to the game".  Don't say it's illogical because
you're not playing the game.

If you say you don't find it particularly intuitive or natural I
somewhat agree with you.
Marshall - 20 Jan 2007 00:46 GMT
> That's an incorrect inference.  That's not what a tuple means.  If you
> have the tuple
[quoted text clipped - 3 lines]
> Then you are meant to read out propositions by taking the cross product
> of the sets.   The cross product is of course empty.

"meant to read out propositions by taking the cross product?" Says who?

Anyway, that doesn't make a lot of sense. It introduces a new
discipline, tuple normalization, because the tuple

 Names = {Fred, Bill}, Cars = {}

has the same information content as

 Names = {}, Cars = {}

Why does having RVAs mean I have to introduce this new
semantics for tuples? What was wrong with the old one?

Marshall
David - 20 Jan 2007 14:25 GMT
> > That's an incorrect inference.  That's not what a tuple means.  If you
> > have the tuple
[quoted text clipped - 5 lines]
>
> "meant to read out propositions by taking the cross product?" Says who?

Please regard it as a definition.  In my original post I briefly
mentioned the idea that a set of tuples with MV attributes could be
mapped to a set of 6NF propositions (with single valued attributes).
I had this cross product idea in mind to formalise that.

Consider a relation on the attributes {Name,Car,CarColour}.  Towards
formalising the meaning of the relation it is assumed there are a
number of associated predicates.  In this case there are 5

   person(Name)
   car(Car)
   colour(Colour)
   owns(Name,Car)
   hascolour(Car,Colour)

Each predicate is associated with some subset of the attributes of the
relation.  For each tuple we generate a set of propositions by taking
cross products.   The "meaning" of a relation is defined to be the
union of all the propositions over all tuples in the relation.

For example, the following tuple

   Names = {fred, bill}, Cars = {car1, car2},  CarColour = {red}

would give the following propositions

   person(fred).
   person(bill).
   car(car1).
   car(car2).
   colour(red).
   owns(fred,car1).
   owns(fred,car2).
   owns(bill,car1).
   owns(bill,car2).
   hascolour(car1,red).
   hascolour(car2,red).

However, it is also interesting to consider a derived predicate

 ownscarwithcolour(Name,Car,CarColour)

Note firstly that this can be written as a conjunction.  In prolog we
would write

ownscarwithcolour(Name,Car,CarColour) :-
   owns(Name,Car), hascolour(Car,CarColour).

If we take cross products as I've described we will implicitly filter
out those tuples containing empty sets (which a number of people have
regarded with extreme suspicion in this thread).

What is interesting is that after joining r1(Names,Cars) with r2(Cars,
CarColours)  to yield r3(Names,Cars,CarColours),  and then using the
cross products idea to extract the propositions for ownscarwithcolour,
you will find that it has done precisely a *natural* join on the
underlying predicates of the conjunction.   This is despite the fact
that the join operation I defined in the original post looks more like
a full outer join when you look at the generated tuples!

> Anyway, that doesn't make a lot of sense. It introduces a new
> discipline, tuple normalization, because the tuple
[quoted text clipped - 4 lines]
>
>   Names = {}, Cars = {}

No.  The first tuple states that Fred and Bill exist.

Nevertheless I agree that MV attributes lead to non-uniqueness of
representation because you can always group in different ways.  I would
hope that the DB programmer generally deals with relations in a
set-theoretic way and therefore doesn't care about the non-uniqueness
in the tuples.

Note that relation union and difference allow for asserting new
relationships and retracting existing relationships.   This shows that
updates can be performed despite the non-uniqueness of tuple
representation.

> Why does having RVAs mean I have to introduce this new
> semantics for tuples? What was wrong with the old one?

Does it possess nice properties?
paul c - 20 Jan 2007 01:28 GMT
>>>...
>>>
[quoted text clipped - 20 lines]
> of the sets.   The cross product is of course empty.
> ...

Try telling that to a user who can see with his own eyes that the system
is telling him that Fred and Bill stand in some relation to green cars
and then explain to him that there is no relation in this case, because
he is looking at something called a "join" that some other user created!
 (Then he will ask you what does join mean and the boss in frustration
will renew your contract for another year.)

p
David - 20 Jan 2007 14:39 GMT
> >>>...
> >>>
[quoted text clipped - 27 lines]
>   (Then he will ask you what does join mean and the boss in frustration
> will renew your contract for another year.)

Obviously (end) users shouldn't look directly at a full outer join
result.

Please read my recent post to Marshall.   I describe an interesting
relationship between outer and inner joins.   Keep in mind the idea
that tuples containing empty sets disappear in the mapping between
tuples and propositions.
Marshall - 17 Jan 2007 09:07 GMT
> > Furthermore you will find that this property holds true of many
> > functions over generic types. Consider a simple generic list
[quoted text clipped - 19 lines]
>
> 1 & 2 are saying *by definition* there is a type called "relation".

Right.

>  It is neither abstract nor parameterised.

Uh, no. It absolutely is abstract, because there is no value whose
most specific type (or "concrete type" if you prefer) is simply
"relation."
Relation types are parameterized by a set of attributes.

By definition.

> Why can't we say that *by definition* the set of attributes of a
> relation is part of its state not its type?

We certainly *can*; the question is whether we *should.* And I'm
prepared to argue that we shouldn't, because we lose useful
properties and useful distinctions.

> Here's an analogy.  A C++ library provides a matrix class that is not
> parameterised on the number of rows and columns.   You can write this
[quoted text clipped - 4 lines]
>         Matrix C = A * B;       // Size = n x m
>     }

Well, what is it a matrix *of*? Something goes in the cells, and that
something has a type. If your matrix class only supports a single
fixed cell type, then it is not a parameterized type (and it is not a
good analog for relations, unless we mean relations of only a
single fixed type.) If it supports arbitrary cell types, then it is
parameterized by the cell type.

You know:

 new Matrix<int>(3, 3);

Note the syntax: the angle brackets are parameter passing at compile
time; the parentheses are parameter passing at runtime.

> The non-parameterised type also allows for mutative operations that add
> or remove rows and columns.

Sure; some matrix/array/whatever types allow this, *if and only if* the
width
and height of the matrix is not part of the type. In C++, for example,
it
is possible to go either way; you can have a matrix type that takes a
cell
type, a width, and a height at compile time, and complains if you
attempt
a missized matrix multiplication.

 new Matrix<int, 3, 3>();

Such a type can support a higher degree of correctness checking at
compile time, but cannot support modification of the type parameters
(such as width and height) at runtime. It is a tradeoff.

> > In 4, you draw a distinction between
> > a set of tuples and a relation, but those are equivalent as far
> > as I know. Can you explain the distinction?
> 4 follows from 1 which is a definition of the type "relation".   My
> point is that it is self consistent.

I agree it is consistent.

> Consider this:  In mathematics a function is defined on a particular
> domain and codomain.   Restrict the domain and you have a *different*
> function.  Otherwise we can't make statements about whether a given
> function is 1:1 or onto.   This suggests that domain and codomain
> should be regarded as *properties* of a given function.

The choice of functions for you example is a poor one, because
functions
are considered immutable constant values in virtually every programming
language. And immutable constant values do not have any distinction
in value between compile time and runtime. Any operation you care to
do on them can be done in either phase. The distinction I'm proposing
is most relevant when considering the difference between the constant
*type* of an expression and its varying-from-run-to-run value.

> A similar
> thing seems to be true for a relation - it is more than its set of
> tuples. Note in any case that I defined a tuple to be a mapping on
> A(r), so therefore you can't really disconnect the tuple from the set
> of attributes anyway.

I'm not proposing "disconnecting" them per se; I'm trying to emphasize
that type and value may exist at different phases in computation. They
remain intimately connected the same way that "3" and "int" are
connected.
But I would not say that "3" is part of the type nor that "int" is part
of
the value.

Types may disappear (or not) at runtime; values do not exist at
compile time. If we remove this distinction, we remove valuable
capabilities from our language.

A simple example:

Consider the idea of join compatibility. This is violated if we attempt
to join two relations each having an attribute of a given name but a
different type in each relation. Now, whatever your feelings
about how this should be handled, warning, error, whatever, we have
to consider the question of when and whether this situation can be
detected.

Under a system in which we consider the attributes as part of the
static type, and not part of the value, the presence or absence of
join compatibility in a given query can be established by looking
*only at the query* and not at the value of any of the operands.
In other words, if the query is included in some source code, we
can analyze the source code *without ever running it* and determine
exactly the status of join compatibility. (Let me know if this needs
further explanation.)

Under a system in which we consider the attributes as part of the
value, and under which we have a single universal type "relation";
we *cannot* determine this ahead of time. Join compatibility checking
*has to* be deferred until runtime, and it has to be checked
every time a join occurs.

> > > I presume in a proper general purpose system strong typing of relvars
> > > won't work.  Having to specify the type of relvars would be painful.
[quoted text clipped - 15 lines]
> That's my point.   That's why it seems better to have a
> non-parameterized type called "relation".

It absolutely is not "better." It is different. You gain some things
and you lose some things. We could only consider it better if
we a priori exclude some considerations and focus only on a
subset.

It is certainly entirely possible to have a dynamic relational
language.
It would be able to express more algorithms: those that are safe
but not provably safe. It would not be well-suited to static analysis;
we could not prove the absence of join compatibility for example.

Different. A tradeoff.

Marshall

PS. As this thread develops, it begins to appear that the question of
whether the header is part of the values is almost the same question
as whether we have a static or a dynamic type system.
David - 17 Jan 2007 10:45 GMT
> > > Furthermore you will find that this property holds true of many
> > > functions over generic types. Consider a simple generic list
[quoted text clipped - 27 lines]
> most specific type (or "concrete type" if you prefer) is simply
> "relation."

No that's not what I mean.   Consider a C++ program that defines a
concrete class called Relation that stores the set of attributes and
the tuples.  This is a concrete non-parameterized class.

We could write

void foobar()
{
   Relation r;    // An empty relation with no attributes or tuples
   r.AddAttribute("x", "double");
}

> Relation types are parameterized by a set of attributes.
>
[quoted text clipped - 48 lines]
> compile time, but cannot support modification of the type parameters
> (such as width and height) at runtime. It is a tradeoff.

Agreed

> > > In 4, you draw a distinction between
> > > a set of tuples and a relation, but those are equivalent as far
[quoted text clipped - 18 lines]
> is most relevant when considering the difference between the constant
> *type* of an expression and its varying-from-run-to-run value.

I'm not sure...

Out of interest, do you think it is reasonable to write domain(f) for
the domain of function f,  or would you say that is evil somehow?

> > A similar
> > thing seems to be true for a relation - it is more than its set of
[quoted text clipped - 13 lines]
> compile time. If we remove this distinction, we remove valuable
> capabilities from our language.

I'm certainly not arguing against parameterised types in general,
just as I'm sure you are not advocating that types always be fully
parameterised.

I also think we both have a pretty good understanding of the tradeoffs.

> A simple example:
>
[quoted text clipped - 19 lines]
> *has to* be deferred until runtime, and it has to be checked
> every time a join occurs.

Agreed

> > > > I presume in a proper general purpose system strong typing of relvars
> > > > won't work.  Having to specify the type of relvars would be painful.
[quoted text clipped - 20 lines]
> we a priori exclude some considerations and focus only on a
> subset.

I agree it won't be better in any absolute sense, because it depends
on the situation.

However, I imagine in practice I would prefer the non-parameterised
type for relations for most applications.  One reason stems from the
fact that RA expressions influence the type so having to specify the
result type would be too painful.

> It is certainly entirely possible to have a dynamic relational
> language.
[quoted text clipped - 9 lines]
> whether the header is part of the values is almost the same question
> as whether we have a static or a dynamic type system

In a sense.  However note that in C++ (despite its static type system)
I would normally avoid parameterisation of relation types because every
distinct relation type would have to be compiled.   Even worse a join
operation would be parameterised on the type of both relations it joins
on.  Of course you can avoid the binary bloat by factoring out a weakly
typed implementation from the templates.   Another consideration is
whether the proliferation of instantiations leads to better
performance.  However this would only be practical in a system that has
completely fixed schema, known at compile time.  It doesn't appear
satisfactory for a general purpose RDBMS.
Bob Badour - 17 Jan 2007 13:43 GMT
>>>Furthermore you will find that this property holds true of many
>>>functions over generic types. Consider a simple generic list
[quoted text clipped - 117 lines]
> of
> the value.

A quibble: If one accepts that a type is a set of values and the
operations defined on those values, then "3" is "part of the type" as it
is an element of the value set.

> Types may disappear (or not) at runtime; values do not exist at
> compile time. If we remove this distinction, we remove valuable
> capabilities from our language.

I don't follow that at all. Types and values just are.

> A simple example:
>
[quoted text clipped - 4 lines]
> to consider the question of when and whether this situation can be
> detected.

I assume by "different type" you mean neither type shares a subtype
other than the universal subtype.

> Under a system in which we consider the attributes as part of the
> static type, and not part of the value, the presence or absence of
[quoted text clipped - 49 lines]
> whether the header is part of the values is almost the same question
> as whether we have a static or a dynamic type system.

Uh, yup.
David - 18 Jan 2007 01:32 GMT
[snip]

> >>A similar
> >>thing seems to be true for a relation - it is more than its set of
[quoted text clipped - 13 lines]
> operations defined on those values, then "3" is "part of the type" as it
> is an element of the value set.

Note that thinking of a type this way reminds us that it's
nonsensical to say that the set of attributes of a relation (directly)
represents its type.   The type "relation" has more to do with the
set of all possible sets of tuples.

Marshall is correct in saying that the type of a relation can (if we
desire) be parameterized on the attributes, and that this may indeed by
useful sometimes.   However, I don't think it's so useful in the
mathematical definition of a relation.  For example a join would no
longer be regarded as a binary operation.

>From Wikipedia : "... a binary operation on a set S is a binary
function from S and S to S"

[snip]
Marshall - 18 Jan 2007 05:47 GMT
> Note that thinking of a type this way reminds us that it's
> nonsensical to say that the set of attributes of a relation (directly)
> represents its type.   The type "relation" has more to do with the
> set of all possible sets of tuples.

If we know the set of attributes and their types, do we not
also know the set of all possible sets of tuples? What's
the difference?

> Marshall is correct in saying that the type of a relation can (if we
> desire) be parameterized on the attributes, and that this may indeed by
[quoted text clipped - 4 lines]
> From Wikipedia : "... a binary operation on a set S is a binary
> function from S and S to S"

Natural join is a binary operation closed over the set of all
relations.

Marshall
David - 18 Jan 2007 06:27 GMT
> > Note that thinking of a type this way reminds us that it's
> > nonsensical to say that the set of attributes of a relation (directly)
[quoted text clipped - 4 lines]
> also know the set of all possible sets of tuples? What's
> the difference?

They are different sets.

> > Marshall is correct in saying that the type of a relation can (if we
> > desire) be parameterized on the attributes, and that this may indeed by
[quoted text clipped - 7 lines]
> Natural join is a binary operation closed over the set of all
> relations.

The very idea to put all relations of all "types" into a single set
suggests that the "type" has now become a property of the relation
value.

Let's ignore sub-typing because it's not relevant.   When we write
set<T> we assume all elements have the *same* type T.  If we want to
support inhomogeneous collections then we need to introduce tagged
unions (say) and then the element "type" indicator is really just a
property of the elements.
Marshall - 18 Jan 2007 08:10 GMT
> > > From Wikipedia : "... a binary operation on a set S is a binary
> > > function from S and S to S"
[quoted text clipped - 4 lines]
> suggests that the "type" has now become a property of the relation
> value.

Meh. This is the point in the conversation at which Han Solo
pulls out his blaster and shoots the communication console.
Without a formal definition of what it means for something
be a property of something else, the argument is almost
entirely a matter of word choice.

Marshall
David - 18 Jan 2007 11:28 GMT
> > > > From Wikipedia : "... a binary operation on a set S is a binary
> > > > function from S and S to S"
[quoted text clipped - 10 lines]
> be a property of something else, the argument is almost
> entirely a matter of word choice.

Ok, I could have said that better.

By definition, the *state* of a value equates to all the information
you may gather about the value assuming you know its type.  If you
don't know its type then all bets are off.

If you put a value of type T1 into a set S of type set<T2> where T1 is
a proper subset of type T2, you may have a problem because of loss of
type information when looking at the elements of S because you only
know that they are of type T2.

Given that we want to talk about operators (like natural join) on the
set of all relations, it is necessary for the attribute information to
be part of the relation value's state.
Marshall - 18 Jan 2007 05:18 GMT
> > Types may disappear (or not) at runtime; values do not exist at
> > compile time. If we remove this distinction, we remove valuable
> > capabilities from our language.
>
> I don't follow that at all. Types and values just are.

Certainly, in the platonic sense. It is also useful to consider
when types and values are reified, and when they cease to
be so. In some systems, types may be reified at compile
time but not at runtime at all; this is how C works. Values
of course are not reified until runtime.

> > Consider the idea of join compatibility. This is violated if we attempt
> > to join two relations each having an attribute of a given name but a
[quoted text clipped - 5 lines]
> I assume by "different type" you mean neither type shares a subtype
> other than the universal subtype.

Sure. Or perhaps we may be speaking of system that does not have
subtyping. Out of curiosity, what does Alphora do? I vaguely recall
reading that they abandoned subtyping at some point.

Marshall
Bob Badour - 20 Jan 2007 14:53 GMT
>>>Types may disappear (or not) at runtime; values do not exist at
>>>compile time. If we remove this distinction, we remove valuable
[quoted text clipped - 21 lines]
> subtyping. Out of curiosity, what does Alphora do? I vaguely recall
> reading that they abandoned subtyping at some point.

Indeed. I wrote Bryn Rhodes at Alphora just recently to ask about this
very issue. Apparently, they replaced subtyping with a system of
implicit conversions, and they hope to reintroduce subtyping later
(probably via inheritance by the sound of it.)

I can see how one might use such a mechanism as a limited replacement to
subtyping. If one does not understand subtyping, though, I wonder how
one is to recognize when it is appropriate (and more importantly
inappropriate) to have an implicit conversion.

It seems to me the only safe implicit conversions are conversions from
subtypes to supertypes.
JOG - 18 Jan 2007 01:42 GMT
> [snip]
> > > 2.  It deals rather elegantly with missing information.
[quoted text clipped - 3 lines]
> Subject to integrity constraints, a tuple can map an attribute to an
> empty set.

The more I think about this the less it makes sense to me. Which
representation is the most elegant?

1) Bill does not have a car.
2) Bill has an {} car.
3) Bill has a NULL car.

I'll take number one.
David - 18 Jan 2007 02:24 GMT
> > [snip]
> > > > 2.  It deals rather elegantly with missing information.
[quoted text clipped - 12 lines]
>
> I'll take number one.

I'll take number 4 - saying nothing at all

IMO a relational model should never try to state that a relation
(definitely) doesn't exist, nor that a relation may or may not exist.
Rather it merely states all the relations that definitely do exist,
and by the closed world assumption everything else is assumed to be
false.  Query results must be understood in this fashion.  ie failure
to find a result in the DB may indicate lack of information.   This
reminds me of Prolog's "negation as failure" approach.

MV attributes appear to offer a credible solution to representing lots
of 6NF relations without all that repetition of the primary key.   Of
course you must be prepared to read the tuple in the "right way".
When I see the tuple

   Names       Cars
   ----------------
   bill        {}

I only "read out" the proposition

  Person(bill).

So in a sense, it is like number 4 above.
Neo - 18 Jan 2007 02:52 GMT
> This reminds me of Prolog's "negation as failure" approach.

Is it similar to dbd's as shown below?

A person named john.
It is unknown if he has or hasn't a car, bicycle, heart ...
(new 'john 'person)

A person named mary. It is known she has a furrarri.
(new 'furrarri 'car)
(new 'mary 'person)
(set mary has furrarri)

A person named bob. It is known he does not have a bug.
(new 'bug 'car)
(new 'bob 'person)
(set bob hasn't bug)
David - 18 Jan 2007 02:59 GMT
> > This reminds me of Prolog's "negation as failure" approach.
>
[quoted text clipped - 13 lines]
> (new 'bob 'person)
> (set bob hasn't bug)

Does the query engine know that "hasn't" means "not has"?
Neo - 18 Jan 2007 04:33 GMT
> > (new 'hasn't 'verb)         Note: verb has is already in db
> > (new 'john 'person)
> > (new 'furrarri 'car) (new 'mary 'person) (set mary has furrarri)
> > (new 'bug 'car)    (new 'bob 'person)    (set bob hasn't bug)
>
> Does the query engine know that "hasn't" means "not has"?

Not unless user tells it something like:
(new (set 'not 'has) 'verb)
(new 'means 'verb)
(set hasn't means (. 'not 'has))

But I don't think that is exactly what you meant. You may have to
specify most precisely. Based on above, it answers the following
queries:

(; Does john have a car?)
(; Gets nothing)
(get john has (get car instance *))

(; Does mary have any car?)
(; Gets mary has furrarri)
(get mary has (get car instance *))

(; Does bob have any car?)
(; Gets nothing)
(get bob has (get car instance *))

(; Get bob hasn't any car?)
(; Gets bob hasn't bug)
(get bob hasn't (get car instance *))
or
(get bob (get * means (. 'not 'has)) (get car instance *))

It is dumber than a door nail, but still defyies logic :)
JOG - 18 Jan 2007 01:58 GMT
> Interesting post. Some comments inline.
>
[quoted text clipped - 11 lines]
> relation's value? Presumably that is also its type, and
> so we have infinite regress.

I also find it an extremely uncomfortable definition. A mathematical
relation has no "relation type", so I prefer Pascal's standpoint where
a relation is a set of functions mapping attributes to values. It seems
to me to appeal far more to conventional set theory and directly
illustrates why db-attributes are necessarily unordered, again not a
feature of mathematical relations. In addition Set-builder notation
could be used to define a relation's schema and constraints to
collectivize propositions of the correct format in neat mathematical
fashion.

> Or consider how it sounds for a value of a non-parameterized
> type: "An integer consists of a numeric-type int together with
[quoted text clipped - 72 lines]
>
> Marshall
David - 18 Jan 2007 02:35 GMT
> > Interesting post. Some comments inline.
> >
[quoted text clipped - 15 lines]
> relation has no "relation type", so I prefer Pascal's standpoint where
> a relation is a set of functions mapping attributes to values.

Huh?  That’s what I did (except of course that I mapped attributes to
sets of values to investigate the MV approach)

> It seems
> to me to appeal far more to conventional set theory and directly
> illustrates why db-attributes are necessarily unordered, again not a
> feature of mathematical relations.

Sure.  Note that the join operator I defined is commutative.

[snip]
Neo - 16 Jan 2007 16:49 GMT
> r1(Names,Cars)
>     bill,               car1,car2,car4
[quoted text clipped - 9 lines]
>    john,fred      car3           red
>    john,fred                       green

Below is a related (but not same) example using dbd:

(new 'red 'color)
(new 'green 'color)

(new 'car1 'car)
(set car1 color red)

(new 'car2 'car)
(set car2 color green)

(new 'car3 'car)
(set car3 color red)

(new 'car4 'car)
(set car4 color red)

(new 'bill 'person)
(set bill has car1)
(set bill has car2)
(set bill has car4)

(new 'john 'person)
(set john has car3)

(new 'fred 'person)
(set fred has car3)

(; Get persons who have car3)
(; Gets john, fred)
(get * has car3)

(; Get persons who have red cars)
(; Gets bill, john, fred, bill again)
(get * has (get * color red))

(; Get persons who have cars1 and 2)
(; Gets bill)
(& (get * has car1)
   (get * has car2))

(; Get persons who have cars3 or 4)
(; Gets john, fred, bill)
(| (get * has car3)
  (get * has car4))
Jon Heggland - 17 Jan 2007 07:45 GMT
> Example
>
[quoted text clipped - 16 lines]
> last tuple of r1 |X| r2 above doesn’t imply that John and Fred
> don’t own any cars.

So what exactly does that last tuple mean?
Signature

Jon

David - 17 Jan 2007 09:23 GMT
> > Example
> >
[quoted text clipped - 18 lines]
>
> So what exactly does that last tuple mean?

It would appear that this tuple only states that John and Fred exist
and the colour Green exists.  This information is redundant with
respect to the other tuples.

I don't even think the tuple should be regarded as implying John and
Fred don't have any green cars.   I would avoid using individual
tuples to infer that a relationship doesn't exist in other tuples.
I would say tuples can only add relationships, never remove them.

A challenge for the MV approach is to support updates effectively.  In
the general case it is far from clear how to properly assert new
relationships or retract existing relationships.   If we can define r1
union r2 effectively (and remove redundancy) then we have a basis of
adding new relationships to an existing relation.  Similarly defining
r1 \ r2 properly provides a basis for retracting relationships.

Eg, consider that we have the tuple

r1(Names,Cars)
    bill,mary,fred             car1,car2,car3,car4

and we want to retract the fact that mary owns car3.  Therefore define

r2(Names,Cars)
    mary                     car3

We want to calculate

r1\r2(Names,Cars)
    bill,fred                         car1,car2,car3,car4
    mary                             car1,car2,car4

Evidently after confirming that r1,r2 are union compatible we
conceptually iterate over the tuples in r1 and find the tuples from r2
that intersect it.  This may cause a tuple from r1 to be split into a
number of pieces.  A piece may split multiple times because of multiple
intersecting tuples from r2.   It would be interesting to turn this
idea into a practical algorithm.

A more practical approach may be to require that every relation have a
primary key and only support non-primary key MV attributes.   I presume
this would simplify the implementation of a relational algebra somewhat.
Jon Heggland - 17 Jan 2007 10:04 GMT
>>> r1 |X| r2 (Names,Cars,Colours)
>>>    bill              car1,car4     red
[quoted text clipped - 11 lines]
> and the colour Green exists.  This information is redundant with
> respect to the other tuples.

"It would appear"? You are guessing?

Anyway, I assume the other tuples /do/ state more, i.e. "bill car2
green" does not merely state that Bill exists, Car2 exists and Green
exists? (I'll refrain from asking what exactly the meaning of this
"exists" business is.) So the tuples have different kinds of meaning?
What determines this meaning? The presence or absence of empty sets?
What is simple, intuitive and elegant about this?

> I don't even think the tuple should be regarded as implying John and
> Fred don't have any green cars.  I would avoid using individual
> tuples to infer that a relationship doesn't exist in other tuples.
> I would say tuples can only add relationships, never remove them.

And often, tuples don't even add relationships, just confusion and
redundancy, it seems ...
Signature

Jon

David - 17 Jan 2007 13:39 GMT
> >>> r1 |X| r2 (Names,Cars,Colours)
> >>>    bill              car1,car4     red
[quoted text clipped - 13 lines]
>
> "It would appear"? You are guessing?

No, but I'm a little cautious!

> Anyway, I assume the other tuples /do/ state more, i.e. "bill car2
> green" does not merely state that Bill exists, Car2 exists and Green
> exists? (I'll refrain from asking what exactly the meaning of this
> "exists" business is.) So the tuples have different kinds of meaning?

No, there is an underlying consistency.  The role of the Colour
attribute is understood to be relevant to the cars in the tuple.  This
role is implicit in the r2 relation and remains the case after the
join.  The attribute could have instead been named "CarColour".   There
is an implicit universal quantification here.  ie

   for all cars in tuple,  car is green

Since there are no cars, the tuple is vacuous.

This reminds me of true statements like

    for all elephants on the Moon,  elephant wears pink panties

> What determines this meaning? The presence or absence of empty sets?
> What is simple, intuitive and elegant about this?
[quoted text clipped - 6 lines]
> And often, tuples don't even add relationships, just confusion and
> redundancy, it seems ...
Bob Badour - 17 Jan 2007 13:51 GMT
>>Example
>>
[quoted text clipped - 18 lines]
>
> So what exactly does that last tuple mean?

And what would happen if we replaced (car1,car3,car4)<->(red) with
(car1,car3,car4)<->(red,blue) in r2 ?

Suppose as well that r2(Cars,Colours) has the following tuple:
                 yellow

Would r1 |X| r2 (Names, Cars, Colours) have these tuples?

bill                               yellow
john, fred                         yellow

Or would it have this tuples?

bill,john,fred                     yellow

What meaning would we ascribe to restricting r2 to the colour yellow and
projecting on colour? Similarly, what meaning would we ascribe to
restricting the join to the colour yellow and then projecting on Names?
paul c - 17 Jan 2007 14:27 GMT
>>> Example
>>>
[quoted text clipped - 22 lines]
> (car1,car3,car4)<->(red,blue) in r2 ?
> ...

Just curious, what does the symbol <-> stand for here?

p
Bob Badour - 17 Jan 2007 15:22 GMT
>>>> Example
>>>>
[quoted text clipped - 24 lines]
>
> Just curious, what does the symbol <-> stand for here?

Maps or relates.
David - 17 Jan 2007 23:42 GMT
> >>Example
> >>
[quoted text clipped - 21 lines]
> And what would happen if we replaced (car1,car3,car4)<->(red) with
> (car1,car3,car4)<->(red,blue) in r2 ?

Not that an integrity constraint should prevent that, unless we want to
model multi-coloured cars.

> Suppose as well that r2(Cars,Colours) has the following tuple:
>                   yellow
[quoted text clipped - 7 lines]
>
> bill,john,fred                     yellow

I agree these tuples are more that a little suspicious - perhaps enough
to reject the MV approach.   Nevertheless it is mathematically
consistent to simply regard these tuples as vacuous because there are
no cars in the tuple to which yellow applies.   In that sense I imagine
an implementation of the RA would be free to remove redundant
information as required.

The non-uniqeness of representation is troublesome but not necessarily
fatal.  I have this idea of a mapping from the set of tuples to an
underlying set of respectable 6NF propositions.

> What meaning would we ascribe to restricting r2 to the colour yellow and
> projecting on colour? Similarly, what meaning would we ascribe to
> restricting the join to the colour yellow and then projecting on Names?

Very perceptive questions.  The second case is most interesting

Start with

   r3 = r1 |X| r2 (Names,Cars,Colours)
       bill              car1,car4     red
       bill              car2          green
       bill                            yellow
       john,fred         car3          red
       john,fred                       green
       john, fred                      yellow

Now select yellow

   r3'(Names,Cars,Colours)
       bill                            yellow
       john, fred                      yellow

Then project on Names

   r3''(Names)
       bill
       john, fred

This is clearly not useful. The problem lies with the selection step.
If our intention is to find people that really own yellow cars then we
must remove the tuples with no cars.

The MV approach is on shaky ground!
dawn - 18 Jan 2007 00:44 GMT
> > >>Example
> > >>
[quoted text clipped - 46 lines]
> The non-uniqeness of representation is troublesome but not necessarily
> fatal.

Exactly!  Yes, there are tradeoffs.

> I have this idea of a mapping from the set of tuples to an
> underlying set of respectable 6NF propositions.

Most MV providers map to 1NF for SQL purposes.  The issues are a)
NULL-handling and interpretation and b) getting the ordering attributes
into the mix for multi-values where the ordering is significant.   But
if the data were not in BCNF or 3NF minus 1NF, for example, as defined
to MV then they do not magically get into 3NF in the mapping.  I'm not
recalling what 5NF and 6NF are right now, but if your MV data were in
6NF minus 1NF to start, then it should be easy to map to 6NF.

> > What meaning would we ascribe to restricting r2 to the colour yellow and
> > projecting on colour? Similarly, what meaning would we ascribe to
> > restricting the join to the colour yellow and then projecting on Names?
>
> Very perceptive questions.  The second case is most interesting

r2 would be considered poor modeling (we really should name some of
these "normal forms" for MV).  It would be OK for a virtual relation,
but not for a base relation.  The higher level (relation) should be the
strong entity, while the property for that entity would be modeled as
an attribute. So Car(carId, carName, colors) would be acceptable, but
r2 as it looks above would not be.

> Start with
>
[quoted text clipped - 5 lines]
>         john,fred                       green
>         john, fred                      yellow

This is unlike any relation I have seen in any MV system because, again
, it stems from bad data modeling (although I have surely seen bad data
model, just not this bad).  Think of it this way - if someone were to
hand you a form and it would make sense to you to fill it out with the
information on a single record above, then such a record might make
sense.  The above r3 makes no sense to me.

> Now select yellow
>
[quoted text clipped - 9 lines]
>
> This is clearly not useful. The problem lies with the selection step.

I think it would be found in the data model itself.  Model "strong
entities" as relations, with properties (such as color) in property
lists.

> If our intention is to find people that really own yellow cars then we
> must remove the tuples with no cars.
>
> The MV approach is on shaky ground!

There is a different, but unfortunately unwritten as best I can tell,
set of "rules for modeling data" in MV compared to 1NF.  I have
collected information related to this, but the summary would be to
determine whether something is a "thing" within the context of your
system, or is merely a piece of demographic data or a property of some
other thing.  The Things in your system have one row/record per, while
the properties or entities that can be perceived as properties within
the context of the system, are attributes.  I don't know if that
helped, but just in case it does...  --dawn
David - 18 Jan 2007 01:49 GMT
> > > >>Example
> > > >>
[quoted text clipped - 122 lines]
> the context of the system, are attributes.  I don't know if that
> helped, but just in case it does...  -dawn

I purposely wanted to investigate the set theoretic nature of MV
attributes without getting bogged down in rules for modeling data (or
even the presence of primary keys).
Aloha Kakuikanu - 18 Jan 2007 20:57 GMT
> Here is a partial formalization of a relational algebra based on MV
> attributes.  The approach appears simple and intuitive.  In particular
> the join of two relations is rather elegant.
> ...

IMO, RA with MV attributes is quite easy to formalize. I suggest a
nested relation as a formal definition for MV attribute. A critical
step is noticing that there is a (partial) order "<" among all the
relations. Formally:

Q < R    iff    Q /\ R = R

where "/\" is a symbol for relational join. (I don't quite like the
"&&" symbol that Marshall uses:-)

Next,

Q = R    iff    Q < R   and   R < Q

Now that we can compare relational valued attributes, we can define all
the RA operations. Interestingly, set joins (and relational division)
are easily expressed in this framework. For example, given

A = { <x=1, y={<t=a>,<t=b>}> , <x=2, y={<t=b>,<t=c>} > }

B = { <y={<t=a>} }

Then, inequality join

A  /\_a.y<b.y  B

is the same as relational division between "flattened" A and B
relations.
Aloha Kakuikanu - 18 Jan 2007 21:02 GMT
Never mind. Partial order is not enough. It has to be total order.

> > Here is a partial formalization of a relational algebra based on MV
> > attributes.  The approach appears simple and intuitive.  In particular
[quoted text clipped - 29 lines]
> is the same as relational division between "flattened" A and B
> relations.
David - 19 Jan 2007 00:41 GMT
> Never mind. Partial order is not enough. It has to be total order.
>
[quoted text clipped - 31 lines]
> > is the same as relational division between "flattened" A and B
> > relations.

Playing around with the definition of equality can't work because you
can't get away from the need to *intersect* the MV attributes (rather
than merely test them for equality, as done in the conventional RA) in
order to perform a join operation.
Aloha Kakuikanu - 19 Jan 2007 01:33 GMT
> > Never mind. Partial order is not enough. It has to be total order.
> >
[quoted text clipped - 36 lines]
> than merely test them for equality, as done in the conventional RA) in
> order to perform a join operation.