Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / July 2005

Tip: Looking for answers? Try searching our database.

Normalisation

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Patsybulo - 29 Jun 2005 19:11 GMT
How are issues of Null values handled when normalising a database?
Jan Hidders - 29 Jun 2005 20:51 GMT
> How are issues of Null values handled when normalising a database?

Strictly speaking normalization theory doesn't handle relations with
null values. But you can apply the theory anyway if you think of null
values as special values that are distinct from all other values.

-- Jan Hidders
Patsybulo - 01 Jul 2005 16:03 GMT
First all, I'll like to say thanks for your contribution. I'm not too
comfortable with your explanations: I am not looking at null values as
specail values. They can be any form data within that domain but at the
time of normalisation, how would you treat zero entries. Now, Commenting
on your first point, would this mean that the theory of insertion can't be
applied on relations?
Jan Hidders - 01 Jul 2005 20:02 GMT
> First all, I'll like to say thanks for your contribution. I'm not too
> comfortable with your explanations: I am not looking at null values as
> specail values.

That is irrelevant. The point is that you *can* pretend they are special
values and the apply normalization theory.

> They can be any form data within that domain but at the
> time of normalisation, how would you treat zero entries.

Like I said, as a special value. Another option is to treat the field as
a set-valued field (since it apparently can contain 0 or 1 values) which
means that you are not in 1NF and should first normalize such that you are.

> Now, Commenting
> on your first point, would this mean that the theory of insertion can't be
> applied on relations?

No.

-- Jan Hidders
Jon Heggland - 02 Jul 2005 10:34 GMT
> Like I said, as a special value. Another option is to treat the field as
> a set-valued field (since it apparently can contain 0 or 1 values) which
> means that you are not in 1NF and should first normalize such that you are.

In theory, set-valued fields definitely do not violate 1NF. They may
still be a bad design choice, though, and anyway I guess the point in
this thread is to give practical advice for implementation in SQL
databases without proper domain support. So you should normalise. :)
Signature

Jon

Jan Hidders - 02 Jul 2005 14:02 GMT
>>Like I said, as a special value. Another option is to treat the field as
>>a set-valued field (since it apparently can contain 0 or 1 values) which
>>means that you are not in 1NF and should first normalize such that you are.
>
> In theory, set-valued fields definitely do not violate 1NF.

That's only true if you accept Chris Date's redefinition of the "first
normal form", and that is by no means widely accepted as *the*
definition in academia and industry, and I don't see that changing
anytime in the near future.

-- Jan Hidders
Jon Heggland - 02 Jul 2005 19:04 GMT
> > In theory, set-valued fields definitely do not violate 1NF.
>
> That's only true if you accept Chris Date's redefinition of the "first
> normal form", and that is by no means widely accepted as *the*
> definition in academia and industry, and I don't see that changing
> anytime in the near future.

Well, the "atomicity" definition doesn't hold up under scrutiny, and the
"you shouldn't have more than one column about the same thing"
definition is too informal. Unfortunately, it's not all that relevant in
practise, since domain support is so bad, and I don't see *that*
changing anytime in the near future.
Signature

Jon

Jan Hidders - 02 Jul 2005 21:16 GMT
> Well, the "atomicity" definition doesn't hold up under scrutiny, and the
> "you shouldn't have more than one column about the same thing"
> definition is too informal.

Yeah, that's what Date claims. Many disagree, including me.

-- Jan Hidders
Paul - 02 Jul 2005 23:18 GMT
>> Well, the "atomicity" definition doesn't hold up under scrutiny, and
>> the "you shouldn't have more than one column about the same thing"
>> definition is too informal.
>
> Yeah, that's what Date claims. Many disagree, including me.

You think that atomicity is a well defined concept?

Are values in domains not atomic by definition, irrespective of any
internal structure they may have? (Structure invisible to the relational
operators, that is).

Paul.
Jan Hidders - 03 Jul 2005 14:37 GMT
>>>Well, the "atomicity" definition doesn't hold up under scrutiny, and
>>>the "you shouldn't have more than one column about the same thing"
[quoted text clipped - 3 lines]
>  
> You think that atomicity is a well defined concept?

No. But I think it is understood well-enough to be a useful concept.

> Are values in domains not atomic by definition, irrespective of any
> internal structure they may have? (Structure invisible to the relational
> operators, that is).

Yes, but that depends a bit on what you take as "the set of relational
operators". If that includes nesting and unnesting then you make the
structure of nested relations visible, and you should therefore consider
them non-atomic.

-- Jan Hidders
Paul - 04 Jul 2005 12:56 GMT
>> Are values in domains not atomic by definition, irrespective of any
>> internal structure they may have? (Structure invisible to the relational
[quoted text clipped - 4 lines]
> structure of nested relations visible, and you should therefore consider
> them non-atomic.

OK I see. I've never understood the appeal of "nesting" or "unnesting"
relations. It doesn't seem to add anything to the relational model and
only serves to complicate things. Does anyone have a concrete example of
the usefulness of this? Any I've seen in the past seem to be trival to
implement in a standard relational model.

I agree that there is nothing to stop someone having a "relation" domain
but I think this should all be encapsulated in the domain and not
pollute the relational model. Also, I question the practical sense in
doing this as well. It seems a bit like having a database-valued domain
and "simplifying" your database to be a single value in a one-rowed,
database-valued table.

Paul.
Jan Hidders - 04 Jul 2005 19:55 GMT
>>>Are values in domains not atomic by definition, irrespective of any
>>>internal structure they may have? (Structure invisible to the relational
[quoted text clipped - 17 lines]
> and "simplifying" your database to be a single value in a one-rowed,
> database-valued table.

Precisely. I couldn't have said it better myself.

-- Jan Hidders
Jonathan Leffler - 05 Jul 2005 06:31 GMT
>>>> Are values in domains not atomic by definition, irrespective of any
>>>> internal structure they may have? (Structure invisible to the
[quoted text clipped - 12 lines]
>> the usefulness of this? Any I've seen in the past seem to be trival to
>> implement in a standard relational model.

Sorry about one extra level of quote...

The only place where I've seen any plausible use for nested relations
was in C J Date's work where he discussed using a relation-valued
attribute (RVA) as the primary key (PK) of a relation describing
functional dependencies (FD) in a relation schema.  The PK is the set of
attributes in the determinant (left-hand side, LHS) of the FD.  The RHS
would also be a relation-valued attribute.  (If the RHS is a simple
attribute (containing the names of attributes in the described relation
schema), then the RVA is not the PK - the combination of the RVA plus
the singleton attribute is the PK.)

You can model this without any RVA by assigning a name to the FD, then
describing the determinant set in one relation and the determined set in
another.  In SQL (simplified):

CREATE TABLE FD(NAME CHAR(18) NOT NULL PRIMARY KEY);

CREATE TABLE LHS
(
    FD_NAME CHAR(18) NOT NULL REFERENCES FD,
    ATT_NAME CHAR(!8) NOT NULL REFERENCES <column-names>,
    PRIMARY KEY(FD_NAME, ATT_NAME)
);

CREATE TABLE RHS
(
    FD_NAME CHAR(18) NOT NULL REFERENCES FD,
    ATT_NAME CHAR(!8) NOT NULL REFERENCES <column-names>,
    PRIMARY KEY(FD_NAME, ATT_NAME)
);

Signature

Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler@earthlink.net, jleffler@us.ibm.com
Guardian of DBD::Informix v2005.01 -- http://dbi.perl.org/

Jon Heggland - 05 Jul 2005 08:25 GMT
> OK I see. I've never understood the appeal of "nesting" or "unnesting"
> relations. It doesn't seem to add anything to the relational model and
> only serves to complicate things. Does anyone have a concrete example of
> the usefulness of this? Any I've seen in the past seem to be trival to
> implement in a standard relational model.

You can do outer joins without nulls. Also, nested relations are used
heavily in the definitions of the operators for handling interval types
in temporal relational DBMSs (Date, Darwen and Lorentzos, "Temporal data
and the relational model").

> I agree that there is nothing to stop someone having a "relation" domain
> but I think this should all be encapsulated in the domain and not
> pollute the relational model.

I must admit I am unsure of the significance of encapsulation here.
Would anyone care to explain?

Also, I'm not sure what the "pollution" consists of. Is it the
introduction of GROUP/UNGROUP (nest/unnest)operators? GROUP can be
defined as a shorthand for a certain kind of EXTEND, iirc---I'm not sure
about UNGROUP.

> Also, I question the practical sense in
> doing this as well. It seems a bit like having a database-valued domain
> and "simplifying" your database to be a single value in a one-rowed,
> database-valued table.

Yes, you shouldn't use relation-valued attributes just because you can.
In most cases, they are a bad design, which Date is quick to point out.
Signature

Jon

Paul - 05 Jul 2005 13:43 GMT
>>OK I see. I've never understood the appeal of "nesting" or "unnesting"
>>relations. It doesn't seem to add anything to the relational model and
[quoted text clipped - 3 lines]
>
> You can do outer joins without nulls.

An outer join is just syntactic shorthand for an inner join followed by
a union (of the relevant missing rows). It could easily be extended to
remove the NULLs problem by forcing the user to specify default values
instead.

I don't quite understand how nested relations gets round the problem,
though. Because a nested empty relation takes the place of a NULL?

> Also, nested relations are used
> heavily in the definitions of the operators for handling interval types
> in temporal relational DBMSs (Date, Darwen and Lorentzos, "Temporal data
> and the relational model").

I confess I've not read this and can't find a link to a summary of why
this requires nested relations. It seems to my uninformed mind that you
can talk about time in first order logic, so what is the problem in
dealing with temporal data using the standard relational model? I can
understand how specialised domains like intervals might be useful, or
even some syntactic shorthand for complicated constructs like
overlapping periods etc., but not the need for serious amendments to the
relational model like nested relations.

>>I agree that there is nothing to stop someone having a "relation" domain
>>but I think this should all be encapsulated in the domain and not
>>pollute the relational model.
>
> I must admit I am unsure of the significance of encapsulation here.
> Would anyone care to explain?

Just that all the domain stuff is hidden away from view. So the domain
values are just like "black boxes" to the relational part. The domain
part knows that the internal structure of the values are relations, but
the relational part doesn't. I'm not sure if "encapsulation" is the
official terminology, but I'm just making up the jargon here.

> Also, I'm not sure what the "pollution" consists of. Is it the
> introduction of GROUP/UNGROUP (nest/unnest)operators? GROUP can be
> defined as a shorthand for a certain kind of EXTEND, iirc---I'm not sure
> about UNGROUP.

The pollution is the breaking of the idea that the values in the columns
are handled by the domain part of the DBMS, and are atomic from the
point of view of the relational part. (I've used the terms "relational
engine" and "domain/type engine" in the past to describe this.)

Maybe this is all just a matter of perception though, and it's not
really important to make this separation?

Looking at the Third Manifesto book, it looks like their GROUP/UNGROUP
operators are shorthand for something involving EXTEND, but also
"RELATION", which is a type generator. What I'm not getting is why the
relational model needs a relation type generator - shouldn't that be
part of the domain engine and kept separate from the relational engine?

>>Also, I question the practical sense in
>>doing this as well. It seems a bit like having a database-valued domain
[quoted text clipped - 3 lines]
> Yes, you shouldn't use relation-valued attributes just because you can.
> In most cases, they are a bad design, which Date is quick to point out.

OK, I'm going one step further here and claiming they are *always* a bad
design. :) I'm offering no real proof of this (and I don't think one is
really possible) other than a vague appeal to Occam's Razor.

Paul.
Jon Heggland - 05 Jul 2005 17:09 GMT
> > You can do outer joins without nulls.
>
> An outer join is just syntactic shorthand for an inner join followed by
> a union (of the relevant missing rows). It could easily be extended to
> remove the NULLs problem by forcing the user to specify default values
> instead.

Yes, but that is still not quite satisfying. Consider an outer join of
persons and dogs based on dog ownership. A person can own multiple dogs,
or none, while a dog has but a single owner. An outer join is supposed
to list all persons, along with the dogs they own. With default values,
this would result in a relation specifying that a lot of persons own the
same default dog.

> I don't quite understand how nested relations gets round the problem,
> though. Because a nested empty relation takes the place of a NULL?

Yes. A corresponding outer join with nested relations would have *one*
tuple for each person, with the set of dogs they own as a (single)
attribute value. For person who don't own dogs, the nested relation
would be empty; for persons with four dogs, it would contain four dog
tuples. I think it is a cleaner solution.

> > Also, nested relations are used
> > heavily in the definitions of the operators for handling interval types
[quoted text clipped - 9 lines]
> overlapping periods etc., but not the need for serious amendments to the
> relational model like nested relations.

I'm not sure if they are required (they probably aren't), but they are
convenient. The authors are very insistent that they indeed do not amend
the RM, they just introduce syntactic shorthands. In any case, grouped
(or nested) relations are used only as intermediate results to
construct/explain the operators/shorthands they define.

It would be too much work for me to try to summarise their work here,
but Date's "An Introduction to Database Systems" 8th ed. has a chapter
about it.

> >>I agree that there is nothing to stop someone having a "relation" domain
> >>but I think this should all be encapsulated in the domain and not
[quoted text clipped - 8 lines]
> the relational part doesn't. I'm not sure if "encapsulation" is the
> official terminology, but I'm just making up the jargon here.

I think the terminology is just fine, I'm just wondering what the
implications of a relation not being encapsulated is. We don't do
anything with it except use operators on it, do we? What would be the
difference between an encapsulated and a non-encapsulated relation?

> > Also, I'm not sure what the "pollution" consists of. Is it the
> > introduction of GROUP/UNGROUP (nest/unnest)operators? GROUP can be
[quoted text clipped - 5 lines]
> point of view of the relational part. (I've used the terms "relational
> engine" and "domain/type engine" in the past to describe this.)

Well, yeah, but relation types are domains too.

> Maybe this is all just a matter of perception though, and it's not
> really important to make this separation?

That is what I am wondering. Date & co. also make that separation
(though they prefer the terms "scalar" and "non-scalar") but I don't
have deep understanding of the reason or the significance.

> Looking at the Third Manifesto book, it looks like their GROUP/UNGROUP
> operators are shorthand for something involving EXTEND, but also
> "RELATION", which is a type generator. What I'm not getting is why the
> relational model needs a relation type generator - shouldn't that be
> part of the domain engine and kept separate from the relational engine?

The relational engine needs to know how to perform relational operators.
Relational operators are (generic) operators associated with the
(generic) domain of relations. Thus, the relation engine *must* know
about this domain. Indeed, you could view a relational database as
nothing more than the domain of relations, and (persistent) variables of
that type. (And facilities for creating other domains.)

You use a relation type generator each time you create a table, to use
SQL terminology. CREATE TABLE T ( A INT ) is the declaration of a
variable named T, with the type RELATION { A INT } (disregarding
irrelevant details about SQL).

> > Yes, you shouldn't use relation-valued attributes just because you can.
> > In most cases, they are a bad design, which Date is quick to point out.
>
> OK, I'm going one step further here and claiming they are *always* a bad
> design. :) I'm offering no real proof of this (and I don't think one is
> really possible) other than a vague appeal to Occam's Razor.

Fair enough. :) The way I see it, they should be allowed just because
disallowing them would be a special case, with weak justification.
Signature

Jon

Alfredo Novoa - 05 Jul 2005 13:59 GMT
>>>>Well, the "atomicity" definition doesn't hold up under scrutiny, and
>>>>the "you shouldn't have more than one column about the same thing"
[quoted text clipped - 5 lines]
>
>No. But I think it is understood well-enough to be a useful concept.

Most people think that atomic means indivisible in that context, and
it does not make any sense.

Regards
Jon Heggland - 03 Jul 2005 13:30 GMT
> > Well, the "atomicity" definition doesn't hold up under scrutiny, and the
> > "you shouldn't have more than one column about the same thing"
> > definition is too informal.
>
> Yeah, that's what Date claims. Many disagree, including me.

How do you define atomicity?
Signature

Jon

Jan Hidders - 03 Jul 2005 14:59 GMT
>>>Well, the "atomicity" definition doesn't hold up under scrutiny, and the
>>>"you shouldn't have more than one column about the same thing"
[quoted text clipped - 3 lines]
>  
> How do you define atomicity?

Roughly, I would say that it means that all your data manipulation can
be expressed as operations over a signature that consists only of (1)
domains described by an abstract data type and (2) relations that
contain in their fields only values from these domains.

An unnest operation, for example, cannot be described in that way.

-- Jan Hidders
Jon Heggland - 04 Jul 2005 06:35 GMT
> > How do you define atomicity?
>
> Roughly, I would say that it means that all your data manipulation can
> be expressed as operations over a signature that consists only of (1)
> domains described by an abstract data type and (2) relations that
> contain in their fields only values from these domains.

I can't make heads or tails of this. Operations over a signature?
Domains described by an abstract data type? Is a domain not the same as
a data type? Can a data type describe several domains? And again with
the "abstract"---does your atomicity require that all data types be
abstract? What, then, is a concrete (or non-abstract) data type, and why
is it not allowed? Are relation types not (abstract) data types? How
about set types?

> An unnest operation, for example, cannot be described in that way.

What if I define an unnest operator that "unnests" strings in a
corresponding manner?

Anyway, let me rephrase my initial question.
What is an atomic datatype/domain?
What is a non-atomic datatype/domain?
What bad happens if you allow a non-atomic datatype in a relation?
Signature

Jon

paul c - 04 Jul 2005 19:39 GMT
>>>How do you define atomicity?
>>
[quoted text clipped - 18 lines]
> Anyway, let me rephrase my initial question.
> What is an atomic datatype/domain?

i may look foolish for jumping into this thread so late, but here's what
i think anyway.  it just seems to me that the RM itself defines very
little which leaves many things it doesn't define.  so my knee-jerk
reaction is to say that what's not in the RM shouldn't be implemented in
the RM part of a RDBMS.  i'd say it must be understood that when people
talk of atomic things, they mean atomic as far as the RM is concerned,
not necessarily as far as some other definition or concept is concerned,
such as a postal address or infinity.  so there has to be some
'conceptual separation' that is preserved once we try to 'get physical'
(i'm tempted to underline the word 'separation', but i don't know how!)

based on this, i'd say that an 'atomic datatype/domain' is one whose
values the RM is able to relate (ie. perform whatever operations are
defined to be relational) to values of the domain or other domains
without knowledge of the structure and operations of the domains
involved.  in this sense, the domain is a kind of 'black box' to the RM.

in practice, there must be some agreed protocol between the RM and the
party that does have such knowledge.  that party could be machine code
or human interaction, although i think most people would expect it to be
code.

the protocol would be needed most obviously so that the RM would know in
what form, for example what bit configuration, to store values or
present them to the party that does understand those values in a way
that doesn't undercut consistent behaviour in the future.  i'd say this
means that the RM cannot itself even decide equality.  it should
theoretically always have to inquire of some outside party whether two
values are 'equal'.  (if there is any respected definition of the RM
that does claim to define equality, i'd like to see it.)

from an implementation point-of-view, this could mean that domain
support might make use of the RM without the original RDBMS even
'knowing' about it.

without being deeply familiar with any so-called RDBMS's, i'll stick my
neck out and state that most implementations have probably made the
mistake of not separating the relational operators from those that are
peculiar to a domain.  i'd say part of the reason has been fear of poor
performance.

the term 'scalar' wrt domains has shown up in recent years.  i gather
that that term is intended to remove certain kinds of context from
domain values, such as relations such as coordinate systems they might
have with each other.  if the term makes sense, then to me it suggests
that a RDBMS might not even store a particular value, rather some
surrogate.  perhaps 'place-holder' is a better term.  the RM part of a
RDBMS wouldn't be any the wiser if that were done (assuming there is
some protocol for presentation).

if this makes any sense, then i'd say that when we think about
implementations, the purely relational support ought to be the smallest
part of a system and would not need much 'maintenance' apart from
optimizations that occur to people after they have implemented the RM.
the domain support work might be endless, just like newsgroup posts
about nulls.

> What is a non-atomic datatype/domain?

wouldn't it be one that the RM part of the RDBMS is able to encode or
decipher or somehow apply without help?

> What bad happens if you allow a non-atomic datatype in a relation?

i think this question really means what bad thing happens if a relation
has a datatype that the RM doesn't have some protocol for presenting to
another party.  the risk is that the RDBMS will then try to do things
itself with the domain.  if that happens, the advantage of the theory
may be lost (the advantage being the same as any theory - being able to
predict behaviour without duplicating it, witness FD's and tables with
empty columns, among many others).  the domain support should rely on
its own theory whatever that theory is - for example, number theory,
regular patterns etc.

having said that, when it comes to implementation, RVA's do seem to cry
out for recursion, because the RM already has the operations to enable them.

just my current opinion, not trying to ram it down anybody's throat.

p
Paul - 04 Jul 2005 23:19 GMT
> the protocol would be needed most obviously so that the RM would know in
> what form, for example what bit configuration, to store values or
[quoted text clipped - 4 lines]
> values are 'equal'.  (if there is any respected definition of the RM
> that does claim to define equality, i'd like to see it.)

I guess the problem is that a single value in a domain may have multiple
representations. For example in a domain of rational numbers:
1/2 = 2/4 = 3/6 = 4/8 = ...

And the relational model couldn't possibly know these are equal, because
arithmetic is outside its area of knowledge.

Unless domains had to store equivalent values in some canonical form I
suppose? So there was always a 1-1 mapping between values and
representations.

Paul.
paul c - 05 Jul 2005 01:17 GMT
>>the protocol would be needed most obviously so that the RM would know in
>>what form, for example what bit configuration, to store values or
[quoted text clipped - 17 lines]
>
> Paul.

yes, it was a so-called 'canonical' form that i had in mind when i
mentioned representation.  although two different people might disagree
on just which form is canonical.  i suppose one could just as well call
it the 'internal' form.  i would think that the canonical or internal
form would be specified by whatever mechanism was used for domain
support.  when i used the word 'representation' i was really thinking of
the user or presentation or external side of things.  on that side, lots
of representations would make sense, such as 'localized' ones.  all the
domain support has to do is to make sure, for example, that what it
calls equal is compatible with the psychological view that the users take.

personally, i suppose i'm lucky that English is my mother tongue because
that language is one of the more dominant ones, maybe the most dominant
one (i know little of Chinese, say) for people who talk about computers.
 still, i imagine that the RM part of a RDMBS might not have to make
any concessions toward localization (i have always thought that
localization should have been called globalization.  then it would be
more obvious how much plumbing can be involved!)

p
Jon Heggland - 05 Jul 2005 11:40 GMT
> i may look foolish for jumping into this thread so late, but here's what
> i think anyway.  it just seems to me that the RM itself defines very
[quoted text clipped - 6 lines]
> 'conceptual separation' that is preserved once we try to 'get physical'
> (i'm tempted to underline the word 'separation', but i don't know how!)

I think this is about the same sentiment as saying that domains are
orthogonal to the RM.

> based on this, i'd say that an 'atomic datatype/domain' is one whose
> values the RM is able to relate (ie. perform whatever operations are
> defined to be relational) to values of the domain or other domains
> without knowledge of the structure and operations of the domains
> involved.  in this sense, the domain is a kind of 'black box' to the RM.

Yes. The only thing a RDBMS requires of a domain, is that it must be
possible to ascertain whether two values are equal / the same. It
doesn't need to know exactly how this is done.

> in practice, there must be some agreed protocol between the RM and the
> party that does have such knowledge.  that party could be machine code
> or human interaction, although i think most people would expect it to be
> code.

I'd say that "knowledge" is encoded in the definition of a domain and
the operators that operate on values from that domain. These can be
user-defined or system-defined, and the RM as such does not care. Even
if a particular RDBMS out of the box knows how to compare two integers,
that knowledge isn't part of the RM as such.

> i'd say this
> means that the RM cannot itself even decide equality.  it should
> theoretically always have to inquire of some outside party whether two
> values are 'equal'.

Yes. That outside party is the definition of the equality operator for
the domain in question.

> from an implementation point-of-view, this could mean that domain
> support might make use of the RM without the original RDBMS even
> 'knowing' about it.

I'm not quite sure how to parse this, but I choose to interpret it as
"RDBMSs should be able to support arbitrary user-defined domains---and
it shouldn't be hard to do either". :)

> without being deeply familiar with any so-called RDBMS's, i'll stick my
> neck out and state that most implementations have probably made the
> mistake of not separating the relational operators from those that are
> peculiar to a domain.

Well, I'd say the mistake (or one of them) is poor domain support. I'd
also say that relational operators *are* peculiar to a domain---the
domain of relations. (Or to be more precise, "relation" is a type
generator, and the relational operators are generic operators.)

> > What is a non-atomic datatype/domain?
>
> wouldn't it be one that the RM part of the RDBMS is able to encode or
> decipher or somehow apply without help?

Wouldn't that be the the same as a system-defined domain? "Integer"
would fit that definition in most (all?) implementations, and most
people who use the word "atomic" certainly consider integers atomic.

> > What bad happens if you allow a non-atomic datatype in a relation?
>
> i think this question really means what bad thing happens if a relation
> has a datatype that the RM doesn't have some protocol for presenting to
> another party.

I'd say that just means that all domains must be well-defined. In the
case of relation-valued attributes, the RM/RDBMS certainly has a
protocol for handling relations.
Signature

Jon

Jan Hidders - 04 Jul 2005 20:11 GMT
>>>How do you define atomicity?
>>
[quoted text clipped - 4 lines]
>  
> I can't make heads or tails of this. Operations over a signature?

Yes, that means that they can be described in a programming language in
which your access to the database is restriced to the functions in the
signature. These operations would be the (1) the operations that are
associated with the domains and (2) operations that retrieve the current
instance of a relvar.

>>An unnest operation, for example, cannot be described in that way.
>
> What if I define an unnest operator that "unnests" strings in a
> corresponding manner?

It does not satisfy the definition.

> Anyway, let me rephrase my initial question.
> What is an atomic datatype/domain?

The problem with that question is that it makes an incorrect assumption.
Atomicity is not a property of a domain per se, but rather a property of
how it is treated by certain operations. The more correct question is:
"When does the DBMS treat a certain domain as atomic?". You can derive
the answer to that question from what I said above.

-- Jan Hidders
Jon Heggland - 05 Jul 2005 11:40 GMT
> >>Roughly, I would say that it means that all your data manipulation can
> >>be expressed as operations over a signature that consists only of (1)
[quoted text clipped - 8 lines]
> associated with the domains and (2) operations that retrieve the current
> instance of a relvar.

How does a set domain (e.g. the domain of sets of integers) violate
this? For definiteness, let's associate the normal set operations with
it---union, intersection, subset, cardinality and so on.

(By the way, is you phrase "domains described by an abstract data type"
significant? If so, can you explain what it means? Are you talking about
type generators like set, array, list and so on?)

> >>An unnest operation, for example, cannot be described in that way.
> >
> > What if I define an unnest operator that "unnests" strings in a
> > corresponding manner?
>
> It does not satisfy the definition.

Does that mean that I should not be allowed to define such an operator?

Or that strings are not atomic?

Or that string are atomic and should not be normalised unless and until
I define an unnest_string operator, in which case all my relvars with
strings suddenly are no longer in 1NF?

Does it matter whether my unnest_string is system defined (by the DBMS)
or user-defined?

> > Anyway, let me rephrase my initial question.
> > What is an atomic datatype/domain?
>
> The problem with that question is that it makes an incorrect assumption.
> Atomicity is not a property of a domain per se,

That is news to me. All database textbooks I have used use the phrase
"atomic value" to define 1NF. Which, while obviously different from
"atomic domain", can hardly be interpreted very differently. I also note
that most of them concur with Date that 1NF is implied by the definition
of a relation.

Where do you get your 1NF definition from?

> but rather a property of
> how it is treated by certain operations. The more correct question is:
> "When does the DBMS treat a certain domain as atomic?". You can derive
> the answer to that question from what I said above.

How does that help with normalisation? And how can you say a priori that
a relvar with a set-valued attribute is not in 1NF, if that depends on
the operators of a particular DBMS?
Signature

Jon

Jan Hidders - 05 Jul 2005 20:51 GMT
>>>>Roughly, I would say that it means that all your data manipulation can
>>>>be expressed as operations over a signature that consists only of (1)
[quoted text clipped - 12 lines]
> this? For definiteness, let's associate the normal set operations with
> it---union, intersection, subset, cardinality and so on.

By itself it doesn't. Until you start introducing operations that treat
it as non-atomic. Then it does.

> (By the way, is you phrase "domains described by an abstract data type"
> significant? If so, can you explain what it means? Are you talking about
> type generators like set, array, list and so on?)

It's only there to emphasize that the only allowed access is through the
functions defined for it and that you cannot access the internal
implementation / representation. I believe that in programming language
circles this is not always assumed.

>>>>An unnest operation, for example, cannot be described in that way.
>>>
[quoted text clipped - 4 lines]
>
> Does that mean that I should not be allowed to define such an operator?

No, just that the fields that can be unnested with that operator cannot
be considered atomic anymore.

> Or that string are atomic and should not be normalised unless and until
> I define an unnest_string operator, in which case all my relvars with
> strings suddenly are no longer in 1NF?

Yes. Although the "should not be normalized" is a bit odd since the
whole point is that the defintion says that in that case they already
are normalized.

> Does it matter whether my unnest_string is system defined (by the DBMS)
> or user-defined?

Yes. Note btw. that if user-defined functions are restricted to the
domains (their input and output types are only domains) then you cannot
define such an operation as a user-defined function.

>>>Anyway, let me rephrase my initial question.
>>>What is an atomic datatype/domain?
[quoted text clipped - 5 lines]
> "atomic value" to define 1NF. Which, while obviously different from
> "atomic domain", can hardly be interpreted very differently.

Absolutely, the point of the definition is to make more precise what it
means that the values in a domain are treated as atomic values.

> How does that help with normalisation?

Since it defines atomicity it tells you when you are in 1NF or not. Just
to be clear on this, I regard this discussion separate from the question
whether you actually *should* be in 1NF or not. All I'm arguing here is
that it is not a meaningless concept and can be defined reasonably well.

> And how can you say a priori that
> a relvar with a set-valued attribute is not in 1NF, if that depends on
> the operators of a particular DBMS?

You cannot. Isn't that a big problem? I don't think so. To understand
why it is important to get a good feeling of which operations break the
atomicity of certain domains. Also consider the following. Suppose a new
user-defined domain is added. If there is no operation defined on it
that exposes its internals by mapping it to a relation, then even an
unnest operation cannot break its atomicity.

-- Jan Hidders
Jon Heggland - 05 Jul 2005 21:58 GMT
> > How does a set domain (e.g. the domain of sets of integers) violate
> > this? For definiteness, let's associate the normal set operations with
> > it---union, intersection, subset, cardinality and so on.
>
> By itself it doesn't. Until you start introducing operations that treat
> it as non-atomic. Then it does.

Does the common substring operation treat a string as non-atomic?

> > How does that help with normalisation?
>
> Since it defines atomicity it tells you when you are in 1NF or not. Just
> to be clear on this, I regard this discussion separate from the question
> whether you actually *should* be in 1NF or not.

That is nice to know. I assumed (as is generally done, I think) that
relations in 1NF are in some sense better than relations not in 1NF (if
such a thing is possible). If that is not the case, I find this
discussion rather pointless.

> > And how can you say a priori that
> > a relvar with a set-valued attribute is not in 1NF, if that depends on
> > the operators of a particular DBMS?
>
> You cannot.

Yet you did say "Another option is to treat the field as a set-valued
field (since it apparently can contain 0 or 1 values) which
means that you are not in 1NF and should first normalize such that you
are."
Signature

Jon

Jan Hidders - 05 Jul 2005 22:31 GMT
>>>How does a set domain (e.g. the domain of sets of integers) violate
>>>this? For definiteness, let's associate the normal set operations with
[quoted text clipped - 4 lines]
>  
> Does the common substring operation treat a string as non-atomic?

No.

>>>How does that help with normalisation?
>>
[quoted text clipped - 6 lines]
> such a thing is possible). If that is not the case, I find this
> discussion rather pointless.

Don't worry. I do believe that relations should preferrably be in this
1NF, but I think we should not do too many things at the same time and
first make sure we agree on what it exactly says.

>>>And how can you say a priori that
>>>a relvar with a set-valued attribute is not in 1NF, if that depends on
[quoted text clipped - 6 lines]
> means that you are not in 1NF and should first normalize such that you
> are."

Good point. I'm making the silent assumption here that if you allow
nested relations, then you probably also have the nest/unnest relations
as are usually found in the nested relational algebra. If you don't have
those then you are arguably treating it more as an atomic value than a
set value.

-- Jan Hidders
Jon Heggland - 06 Jul 2005 09:28 GMT
> > Yet you did say "Another option is to treat the field as a set-valued
> > field (since it apparently can contain 0 or 1 values) which
[quoted text clipped - 4 lines]
> nested relations, then you probably also have the nest/unnest relations
> as are usually found in the nested relational algebra.

The quote didn't mention nested relations, it is about sets. Do you make
that assumption about sets as well? Why? Why not about strings?

> If you don't have
> those then you are arguably treating it more as an atomic value than a
> set value.

So the presence of nest/unnest is the problem? For the sake of the
argument, let us say that a relation with relation-valued attributes is
in 1NF iff there are no nest/unnest operators.

Given the above definition, do you think that a relation with relation-
valued attributes in the presence of nest/unnest should be "normalised"
to reach 1NF? Why?

(Btw, I'm not sure what you mean by "nested relational algebra"---Date's
GROUP and UNGROUP don't affect the other operators in any way.)
Signature

Jon

Jan Hidders - 06 Jul 2005 19:52 GMT
>>>Yet you did say "Another option is to treat the field as a set-valued
>>>field (since it apparently can contain 0 or 1 values) which
[quoted text clipped - 7 lines]
> The quote didn't mention nested relations, it is about sets. Do you make
> that assumption about sets as well? Why?

These sets are very similar to unary relations. Treating them
differently would make not much sense because there are simple
operations that transform one into the other.

> Why not about strings?

They are not very similar to relations. :-) Besides, most nested
relational algebras I know are not equipped with an operation for
unnesting strings.

>>If you don't have
>>those then you are arguably treating it more as an atomic value than a
[quoted text clipped - 3 lines]
> argument, let us say that a relation with relation-valued attributes is
> in 1NF iff there are no nest/unnest operators.

It's close, but not exactly what the definition says. But nevermind, I
think it's close enough so let's assume it for the sake of the argument.

> Given the above definition, do you think that a relation with relation-
> valued attributes in the presence of nest/unnest should be "normalised"
> to reach 1NF? Why?

It ensures that the operations at relation level are essentially those
of the flat relational algebra and that they all work, in some sense, at
the same level. The theory of query optimization for these operation is
reasonably well understood, which is far less the case for operations
that mix the levels of computation such as the nest and unnest do. In
some sense we are forcing here the user to keep things simple so that
the job of the query optimizer becomes easier.

> (Btw, I'm not sure what you mean by "nested relational algebra"---Date's
> GROUP and UNGROUP don't affect the other operators in any way.)

I use the term as it is commonly used by database researchers and that
is the one you find if you google for "nested relational algebra". I'm
not sure what you mean by "affect the other operators" here.

-- Jan Hidders
Jon Heggland - 07 Jul 2005 11:04 GMT
> > The quote didn't mention nested relations, it is about sets. Do you make
> > that assumption about sets as well? Why?
[quoted text clipped - 6 lines]
>
> They are not very similar to relations. :-)

A set can be transformed into a unary relation with a simple operation.
A string can be transformed into a binary relation (integer and
character) with a simple operation.

> Besides, most nested
> relational algebras I know are not equipped with an operation for
> unnesting strings.

That's just because it's a pretty useless thing to do. :) My point it
that the difference between sets and strings in this context is pragma,
not logic.

> > Given the above definition, do you think that a relation with relation-
> > valued attributes in the presence of nest/unnest should be "normalised"
[quoted text clipped - 3 lines]
> of the flat relational algebra and that they all work, in some sense, at
> the same level.

That does not help the least. Normalisation is for base relvars. Even if
you normalise them all, derived relvars and query expressions might/will
still be unnormalised if you allow nest/unnest.

> The theory of query optimization for these operation is
> reasonably well understood, which is far less the case for operations
> that mix the levels of computation such as the nest and unnest do. In
> some sense we are forcing here the user to keep things simple so that
> the job of the query optimizer becomes easier.

I think your objection here is to the existence and complexity of nested
version of the standard relational algebra operators. Date does, too---
that's why he discards them. He writes:

"Now, this idea has been extensively researched [...] under such names
as 'nested relations' or 'NFNF relations' (NFNF stands for 'non first
normal form' [...]). However, I do not take the approach of references
[Jaeschke; Roth, Korth and Silberschatz], which add many new operators
to the relational algebra and calculus to handle 'nested relations', and
thus definitely do breach first normal form (as the term 'non first
normal form' suggests). Rather, I propose a simple generalisation of the
relational algebra extend operator to permit (among other things) the
use of relation-valued expressions to define attribute values. In this
way, I stay within the spirit of the classical relational model, while
still obtaining some of the benefits, such as they may be, of 'nested
relations.'"

Now, I think we surely agree that complicating the relational algebra by
adding 'nested' versions of the classical operators is a bad idea.

I also think we can agree that set-valued or relation-valued attributes
are perfectly okay if you exclude nest/unnest (and the 'nested'
classical operators, of course).

Thus, I ask you if we can agree that generalising extend to support
nest/unnest functionality in the way Date suggests is also okay, since
it does not imply adding nested versions of the classical operators; and
that it does not affect normalisation.

> > (Btw, I'm not sure what you mean by "nested relational algebra"---Date's
> > GROUP and UNGROUP don't affect the other operators in any way.)
>
> I use the term as it is commonly used by database researchers and that
> is the one you find if you google for "nested relational algebra".
> I'm not sure what you mean by "affect the other operators" here.

I mean that "the operations at relation level are essentially those of
the flat relational algebra and that they all work, in some sense, at
the same level." Join, project, select/restrict, union, difference and
so on remain exactly the same. You have to be able to compare two
relation values for equality, but that is trivial.
Signature

Jon

Jan Hidders - 07 Jul 2005 20:06 GMT
>>>The quote didn't mention nested relations, it is about sets. Do you make
>>>that assumption about sets as well? Why?
[quoted text clipped - 10 lines]
> A string can be transformed into a binary relation (integer and
> character) with a simple operation.

That requires logarithmic space, and not constant space as my
transformation. So it is arguably more complex.

>>Besides, most nested
>>relational algebras I know are not equipped with an operation for
[quoted text clipped - 3 lines]
> that the difference between sets and strings in this context is pragma,
> not logic.

My definition of 1NF doesn't make that distinction.

>>>Given the above definition, do you think that a relation with relation-
>>>valued attributes in the presence of nest/unnest should be "normalised"
[quoted text clipped - 7 lines]
> you normalise them all, derived relvars and query expressions might/will
> still be unnormalised if you allow nest/unnest.

Even then it still helps. You can always split the query in two parts:
(1) the computation of the flattened final result and (2) nesting it to
obtain the actual result. Those separate phases are easier to optimise
than when they are mixed.

>>The theory of query optimization for these operation is
>>reasonably well understood, which is far less the case for operations
[quoted text clipped - 4 lines]
> I think your objection here is to the existence and complexity of nested
> version of the standard relational algebra operators.

Not really. The operators are actually very simple (only nest and unnest
in the usual algebra) but these simple operations can still lead to very
complex optimisation problems.

> Date does, too---
> that's why he discards them. He writes:
[quoted text clipped - 11 lines]
> still obtaining some of the benefits, such as they may be, of 'nested
> relations.'"

*sigh* Let's just say I disagree with Date here on several points.

> Now, I think we surely agree that complicating the relational algebra by
> adding 'nested' versions of the classical operators is a bad idea.

Ok.

> I also think we can agree that set-valued or relation-valued attributes
> are perfectly okay if you exclude nest/unnest (and the 'nested'
> classical operators, of course).

Yes.

> Thus, I ask you if we can agree that generalising extend to support
> nest/unnest functionality in the way Date suggests is also okay, since
> it does not imply adding nested versions of the classical operators; and
> that it does not affect normalisation.

I'm sorry but no. It would still complicate optimization and in fact
would it make even more difficult because it adds a calculus-aspect to
the algebra that makes it harder to recognize optimizable operations.

-- Jan Hidders
Jon Heggland - 08 Jul 2005 11:06 GMT
> >>These sets are very similar to unary relations. Treating them
> >>differently would make not much sense because there are simple
[quoted text clipped - 10 lines]
> That requires logarithmic space, and not constant space as my
> transformation. So it is arguably more complex.

Please elaborate. Assuming for the sake of the argument that you are
right, so what?

> >>Besides, most nested
> >>relational algebras I know are not equipped with an operation for
[quoted text clipped - 5 lines]
>
> My definition of 1NF doesn't make that distinction.

Your definition of 1NF seems singularly useless if you cannot use it to
determine the quality of a relvar in any way---unless you introduce a
lot of unstated and pragmatic assumptions. It is also rather
complicated, imo, since you have to refer to operations over signatures
and proper classes as opposed to sets/domains.

Let me define Jon's Normal Form (JNF): A relation is in JNF if it has an
even number of columns. If a particular DBMS cannot handle relations
with odd numbers of columns (or is inefficient at it), you should
normalise. :)

> Even then it still helps. You can always split the query in two parts:
> (1) the computation of the flattened final result and (2) nesting it to
[quoted text clipped - 13 lines]
> in the usual algebra) but these simple operations can still lead to very
> complex optimisation problems.

More complex than (say) summarize / group by? Isn't a nested relation
pretty equivalent to the "grouped table" that is produced in SQL?

I see your point, but for me it smacks of the kind of reasoning that
leads you to "denormalise" relations because join is too complex and
slow. It may be the right choice at the implementation level, but we
should separate that from the logical level.
Signature

Jon

Jan Hidders - 08 Jul 2005 22:28 GMT
>>>>These sets are very similar to unary relations. Treating them
>>>>differently would make not much sense because there are simple
[quoted text clipped - 13 lines]
> Please elaborate. Assuming for the sake of the argument that you are
> right, so what?

It indicates that in one case there is a larger similarity than in the
other because you meed more work to do the transformation. You're not
asking me to explain the stated complexity classes of the operations,
are you?

>>>>Besides, most nested
>>>>relational algebras I know are not equipped with an operation for
[quoted text clipped - 9 lines]
> determine the quality of a relvar in any way---unless you introduce a
> lot of unstated and pragmatic assumptions.

Usually it is relatively well-known which operations are possible in a
DBMS and which aren't. That makes it in practice actually a quite stable
notion even though it is a relative one.

> It is also rather
> complicated, imo, since you have to refer to operations over signatures
> and proper classes as opposed to sets/domains.

The definition does not refer to proper classes, and it is always a bit
dangerous to call something complicated just because you had trouble
understanding it. :-) As any good database researcher you probably know
and understand the notion of "genericity". Just as a test to see if you
really understood it, can you tell me the relationship between this
notion and the notion of 1NF I defined?

> I see your point, but for me it smacks of the kind of reasoning that
> leads you to "denormalise" relations because join is too complex and
> slow. It may be the right choice at the implementation level, but we
> should separate that from the logical level.

I think the situation is a bit more complex than that. For me there are
actually two logical levels: one at the conceptual level and one at the
external level (as defined in the ANSI/SPARC model). For the external
level I would agree that the form of the data model should be dictated
by purely logical arguments. It should simply properly model how the
users see their data. No more, no less.

However, at the conceptual level the task of the model becomes more
complex. Its job is to unify all the different models of the different
user groups, but in a relatively implementation independent way. That
means, for example, that if two groups want to nest the relations
differently, then they probably should  be modeled at this level by flat
relations. It is for this level that I think 1NF is still useful.

-- Jan Hidders
Jon Heggland - 09 Jul 2005 13:04 GMT
> >>>A set can be transformed into a unary relation with a simple operation.
> >>>A string can be transformed into a binary relation (integer and
[quoted text clipped - 10 lines]
> asking me to explain the stated complexity classes of the operations,
> are you?

Well, yes, I actually am. Sorry if it is trivial, but I don't see the
difference. Or the logarithm.

Anyway, that is an implementation matter. The transformation at the
logical level is trivial.

> Usually it is relatively well-known which operations are possible in a
> DBMS and which aren't. That makes it in practice actually a quite stable
> notion even though it is a relative one.

Fair enough. It is the odd one out of the normal forms, though, since
the others aren't relative in that way. And it has the unfortunate
consequence that higher normal forms don't imply 1NF, if I understand
you correctly.

> > It is also rather
> > complicated, imo, since you have to refer to operations over signatures
[quoted text clipped - 3 lines]
> dangerous to call something complicated just because you had trouble
> understanding it. :-)

Complexity can be measured pretty objectively, no? :)

> As any good database researcher

How do you know I'm any good? :)

> you probably know
> and understand the notion of "genericity". Just as a test to see if you
> really understood it, can you tell me the relationship between this
> notion and the notion of 1NF I defined?

It would be easier if you would care to restate your 1NF definition,
including (if necessary) the definition of atomicity (and other non-
trivial terms), but I'll have a go. :)

"Relation" is a generic (is "generic type" bad form?), and allowing
operators that manipulate generics breaks atomicity and leads to
paradoxes and optimisation problems. Except the classic relational
operator; those are okay.

> I think the situation is a bit more complex than that. For me there are
> actually two logical levels: one at the conceptual level and one at the
[quoted text clipped - 9 lines]
> differently, then they probably should  be modeled at this level by flat
> relations. It is for this level that I think 1NF is still useful.

Fair enough, I guess.
Signature

Jon

Jan Hidders - 10 Jul 2005 10:16 GMT
>>>>>A set can be transformed into a unary relation with a simple operation.
>>>>>A string can be transformed into a binary relation (integer and
[quoted text clipped - 13 lines]
> Well, yes, I actually am. Sorry if it is trivial, but I don't see the
> difference. Or the logarithm.

Come on, Jon. How much tape does the Turing machine need that computes
this transformation if n is the size of the input?

> Anyway, that is an implementation matter. The transformation at the
> logical level is trivial.

That it requires logarithmic space is implementation independant and an
objective measure of its complexity.

>>Usually it is relatively well-known which operations are possible in a
>>DBMS and which aren't. That makes it in practice actually a quite stable
[quoted text clipped - 4 lines]
> consequence that higher normal forms don't imply 1NF, if I understand
> you correctly.

That is correct.

>>As any good database researcher  
>
[quoted text clipped - 13 lines]
> paradoxes and optimisation problems. Except the classic relational
> operator; those are okay.

Hm, that was actually not the notion of genericity I meant. What I meant
is the notion that is associated with query languages and mentioned for
example in page 3 paragraph 3 of
http://citeseer.ist.psu.edu/vandenbussche01applications.html

-- Jan Hidders
Jon Heggland - 10 Jul 2005 18:21 GMT
> Come on, Jon. How much tape does the Turing machine need that computes
> this transformation if n is the size of the input?

Apparently, there are holes in my education---I'm more an engineer than
a scientist. The complexity I'm used to computing on algorithms deals
with time or number of operations; both the set and string algorithms
seem linear to me. But if you won't bother to explain, no big deal. I
still say both operations are trivial to perform. And why draw the line
at that particular place?
Signature

Jon

Jan Hidders - 12 Jul 2005 22:11 GMT
>>Come on, Jon. How much tape does the Turing machine need that computes
>>this transformation if n is the size of the input?
[quoted text clipped - 3 lines]
> with time or number of operations; both the set and string algorithms
> seem linear to me.

And engineers don't have to know about space complexity of algorithms?
:-) But time complexity is also different. The unary relation can be
transformed in linear time, but the string transformation requires
O(n.log(n)) steps.

> But if you won't bother to explain, no big deal.

I'll give you hint. For unnesting the string you need a counter. How
much space will that counter take on the Turing tape. How much work, on
average, will it take to increment that counter?

> I
> still say both operations are trivial to perform. And why draw the line
> at that particular place?

The linear-time constant-space transformation are traditionally seen as
the class of computations that just do some reformatting but don't
really compute anything new. Of course, this is to some extent a matter
of taste and may under certain circumstances not be an appropriate
definition, but if you look at what computations are in that class then
it usually makes sense.

-- Jan Hidders
Jon Heggland - 13 Jul 2005 07:54 GMT
> > Apparently, there are holes in my education---I'm more an engineer than
> > a scientist. The complexity I'm used to computing on algorithms deals
[quoted text clipped - 3 lines]
> And engineers don't have to know about space complexity of algorithms?
> :-)

I guess not. :) I can estimate the memory use of my algorithms, but I
haven't felt the need for turing tape before now.

> But time complexity is also different. The unary relation can be
> transformed in linear time, but the string transformation requires
> O(n.log(n)) steps.

Seriously?

> > But if you won't bother to explain, no big deal.
>
> I'll give you hint. For unnesting the string you need a counter. How
> much space will that counter take on the Turing tape.

I'm not versed in storing counters on turing tapes, but in my favourite
programming languages, that counter takes constant space. Unless the
string is *really* long (like over four billion characters)---is that
what you are talking about? I'll admit there is a logarithm there, but
do you really consider that significant? Is that an issue for turing
tape, anyway?

> How much work, on average, will it take to increment that counter?

You'll have to do it as many times as the length of the string. You'll
have to hint more if I am to find the hidden logarithm here. :)
Signature

Jon

Jan Hidders - 13 Jul 2005 22:52 GMT
>>>Apparently, there are holes in my education---I'm more an engineer than
>>>a scientist. The complexity I'm used to computing on algorithms deals
[quoted text clipped - 6 lines]
> I guess not. :) I can estimate the memory use of my algorithms, but I
> haven't felt the need for turing tape before now.

Yeah, that's right, you're an engineer, you don't need to know theory. :-(

>>But time complexity is also different. The unary relation can be
>>transformed in linear time, but the string transformation requires
>>O(n.log(n)) steps.
>
> Seriously?

Good Lord. So let me get this straight. You are close to getting your
PhD in computer science but you are not capable of giving correctly the
time and space complexity in big-Oh notation of even a very simple
algorithm? I hope for your sake and that of your university that you are
trolling here. Seriously.

>>>But if you won't bother to explain, no big deal.
>>
[quoted text clipped - 5 lines]
> string is *really* long (like over four billion characters)---is that
> what you are talking about?

Of course. We are talking about the worst-case theoretical complexity
here. Note btw. that when we start talking about DNA data and satellite
information such sizes are suddenly not so theoretical anymore.

> Is that an issue for turing tape, anyway?

Yes, even more so. Btw. it's named after Alan Turing so you should write
it with a capital.

-- Jan Hidders
Jon Heggland - 14 Jul 2005 14:19 GMT
> > I guess not. :) I can estimate the memory use of my algorithms, but I
> > haven't felt the need for turing tape before now.
>
> Yeah, that's right, you're an engineer, you don't need to know theory. :-(

Just like scientists don't need to know practise, I guess. What is the
complexity of counting sort? Have you heard about the unit cost
assumption?

Estimating complexity on Turing machines is one thing; doing it for
usable programming languages / systems is slightly different. I know
theory, but not *all* theory. Sorry, but we can't all be omniscient.

> >>But time complexity is also different. The unary relation can be
> >>transformed in linear time, but the string transformation requires
[quoted text clipped - 6 lines]
> time and space complexity in big-Oh notation of even a very simple
> algorithm?

Well, I have the (admittedly small) comfort that apparently, my
professor in algorithm theory is as stupid or ignorant as I am. :)

> I hope for your sake and that of your university that you are
> trolling here. Seriously.

I wouldn't need to if you would state your views clearly instead of
condescendingly dispensing you wisdom piecemeal and in hints and
riddles. It is not a good strategy if you want to clarify your arguments
rather than obscure them behind smokescreens, as I am pretty sure you
know.

But thank you for your kind concern. I presume it extends to the authors
of textbooks in algorithm theory as well.

> >>I'll give you hint. For unnesting the string you need a counter. How
> >>much space will that counter take on the Turing tape.
[quoted text clipped - 6 lines]
> Of course. We are talking about the worst-case theoretical complexity
> here.

Sophistry. This is the kind of thing that gives theory a bad name.

Anyway, your notions of 1NF and normalisation is on the one hand based
on the actual operators implemented by a particular DBMS (or pragmatical
assumptions thereof) and vague optimisation issues, and on the other
hand ultra-theoretical nitpicking such as this? And your 1NF definition
requires deep understanding of Russel's and Cantor's paradoxes,
relational optimisation, domain/set/class theory and Turing machines?
YMMV, but I prefer Date's definition.

> Note btw. that when we start talking about DNA data and satellite
> information such sizes are suddenly not so theoretical anymore.

Thank you very much. I never would have though of that---ensuring that
my counters are big enough!
Signature

Jon

Misha Dorman - 19 Jul 2005 00:13 GMT
>>>I guess not. :) I can estimate the memory use of my algorithms, but I
>>>haven't felt the need for turing tape before now.
>>
> Estimating complexity on Turing machines is one thing; doing it for
> usable programming languages / systems is slightly different. I know
> theory, but not *all* theory. Sorry, but we can't all be omniscient.

The programming language (and the constant space/time assumptions for
"int") are an abstraction -- and (like all abstractions) are leaky[1].

Will you need an 8-bit, 16-bit, 32-bit or 64-bit counter (and pointers,
etc)? They take up different "constant" amounts of space, you know (and
don't even start on the speed impact of the increased cache required
when you deal with lots of them).

>>>>But time complexity is also different. The unary relation can be
>>>>transformed in linear time, but the string transformation requires
>>>>O(n.log(n)) steps.

On my old 6502 (8-bit registers), 16-bit multiply was a macro/subroutine
and was significantly slower than 8-bit. Now you are probably using a
32-bit CPU, but if you ever go above ~4G then you will notice that
64-bit ops are _still_ slower than 32-bit ones. Our appetite for data
has increased too, of course -- a single DVD is ~10^10 bytes long and
HDTV probably adds another power of ten.

<oldfartmode>I dunno, kids these days with their fancy dual-core Athlons
-- whatever happened to coding assembler to fit in a 256 byte page? Back
then we knew the cost of a integer multiply mumble mumble <oldfartmode/>[2]

>>>I'm not versed in storing counters on turing tapes, but in my favourite
>>>programming languages, that counter takes constant space. Unless the
[quoted text clipped - 5 lines]
>
> Sophistry. This is the kind of thing that gives theory a bad name.

No. Reality.

The Turing model isn't a perfect abstraction of the performance
characteristics either, but it has some nice properties:

* not subject to vagaries of CPU/language design fashion
* doesn't "miss" anything important (i.e. it is worst case)

Engineering is the process of balancing the constraints of physical
reality with cost and requirements. To do that you must _know_ the
constraints.

Theory is practical, remember :-)

Misha

[1] Jon better not apply for a job at Fog Creek, me thinks
(http://www.joelonsoftware.com/articles/fog0000000319.html).

[2] Oh look, I've simultaneously answered the "what is XML good for"
thread too :-)
Marshall  Spight - 19 Jul 2005 07:16 GMT
> >>>I guess not. :) I can estimate the memory use of my algorithms, but I
> >>>haven't felt the need for turing tape before now.
[quoted text clipped - 8 lines]
> Will you need an 8-bit, 16-bit, 32-bit or 64-bit counter (and pointers,
> etc)? They take up different "constant" amounts of space, you know

Well oh my GOD you could have knocked me over with a feather when
I read this. And here all along I have been asuming that 8, 16, 32
and 64 bit counters all took the SAME amount of space. I am going
to have to go back over a BUNCH of my code and make sure I don't
have some hidden bugs as a result of learning this.

> > Sophistry. This is the kind of thing that gives theory a bad name.
>
> No. Reality.

I'm going to go with sophistry. Everyone raise your hand
who's seen a single string value longer than 2^32. Can we have
the house lights up for a moment? [scans audience] Didn't think
so.

The *reality* of the situation is that 32 bits is good enough
for about 99% of everything, and 64 bits is good enough for
the rest.

Let's say you have something going on that you want to count.
Let's say it happens quite often: one million times a second.
There's not much of anything that happens that fast that you
can actually work with on today's hardware, but let's just
say. Let's say you use a 64 bit counter to count these events.
Now your boss wants to know the space requirements for this
counter. Do you tell him:
1) O(1), or
2) O(n), because the counter is going to overflow in HALF A MILLION
YEARS.

(Brought to you by today's clever use of Google:)
http://www.google.com/search?q=2%5E64+microseconds+in+years

Anyone here who can write a significant application that will run
continuously for half a million years, I will give a cookie.
Pretty sure *I* can't, despite having spent the last 25 years
doing little else except write applications, and being considered
fairly good at it.

Marshall
Jon Heggland - 19 Jul 2005 09:40 GMT
> > Estimating complexity on Turing machines is one thing; doing it for
> > usable programming languages / systems is slightly different. I know
> > theory, but not *all* theory. Sorry, but we can't all be omniscient.
>
> The programming language (and the constant space/time assumptions for
> "int") are an abstraction -- and (like all abstractions) are leaky[1].

Hm? The Turing machine is even more of an abstraction. Neither technique
is ideal. It depends on the circumstances which one is best to use, and
the circumstances I am used to strongly favour one over the other.

> <oldfartmode>I dunno, kids these days with their fancy dual-core Athlons
> -- whatever happened to coding assembler to fit in a 256 byte page?

Or using microcode to construct instruction sets? That was good fun,
actually.

> > Sophistry. This is the kind of thing that gives theory a bad name.
>
> No. Reality.

Everything is reality at some level. But sorry; I meant the debating
technique as much as the idea of counters taking log n space.

> Engineering is the process of balancing the constraints of physical
> reality with cost and requirements. To do that you must _know_ the
> constraints.

Agreed. But it is no better to be unaware of "practical" complexity
assessment than to quietly ignore the complexity associated with strings
of unbounded lengths (unless they are specifically called for), I think.

> [1] Jon better not apply for a job at Fog Creek, me thinks
> (http://www.joelonsoftware.com/articles/fog0000000319.html).

I don't know where you got the impression that I am unaware of (? or
opposed to?) that kind of considerations.
Signature

Jon

Paul - 06 Jul 2005 20:08 GMT
> The quote didn't mention nested relations, it is about sets. Do you make
> that assumption about sets as well? Why? Why not about strings?

From my view: if you have a domain of relations, they exist in a
separate universe to the main relations in the DBMS.

From your view: relations in a domain of relations exist in the same
universe as the main relations of the DBMS.

The inner structure of sets or strings is hidden from the relational
operators, so why make an exception for a relation-valued domain?

If you think about the mapping from predicate logic to the relational
model, a "nested" relation in the way you propose corresponds to
second-order logic.

First order logic basically means that the arguments to your
propositions can't themselves be propositions.

But this is the crucial bit I think: they can be values (strings) that
can be interpreted as propositions by some other structure.

For example consider the statement:

"Fred has employee_id 123 and the following set of
statements are true: {'Fred has phone number 2345',
'Fred has phone number 3456' }"

from the nested viewpoint, the relational part of the DBMS natively
understands the second-order reference to the propositions referred to
inside the outer proposition.

from the atomic viewpoint, the information about the phone numbers is
just a meaningless domain value to the relational part of the DBMS - it
needs the type engine to make sense of it.

Now it is possible I guess that the nested relational model adds that
little bit of second order logic in such a way that things are still
consistent and you also get increased expressive power, but I've not
really seen any example of this. It would entail somehow joining an
"nested" relation with a standard relation I think.

Paul.
Jon Heggland - 07 Jul 2005 11:04 GMT
> The inner structure of sets or strings is hidden from the relational
> operators, so why make an exception for a relation-valued domain?

I don't, really. I should have been clearer about this: I don't support
nested relational algebra, or "mixing of levels". The inner structure of
a relational-valued attribute should be no more and no less hidden from
relational operators than that of a set-valued or string-valued
attribute. But if we allow sets, we should allow relations.

I think Date's GROUP (nest) operator is very hard to argue against---it
cannot be said to violate encapsulation in any way. It is just the
observation that with the operator

EXTEND relation ADD expression AS attrname

, "expression" can be an expression of any type. The UNGROUP (unnest) is
not quite as simple, so I am more skeptical to it, but disallowing it
does not impact the discussion at hand.

> If you think about the mapping from predicate logic to the relational
> model, a "nested" relation in the way you propose corresponds to
[quoted text clipped - 5 lines]
> But this is the crucial bit I think: they can be values (strings) that
> can be interpreted as propositions by some other structure.

Then the crucial bit is not whether relation-valued attributes can
occur, but whether they are interpreted as (sets of) propositions by the
"classic" relational operators. And this is in fact the difference
between Date's approach and the NFNF approach, I think.

> For example consider the statement:
>
[quoted text clipped - 5 lines]
> understands the second-order reference to the propositions referred to
> inside the outer proposition.

This is the NFNF approach, I presume.

> from the atomic viewpoint, the information about the phone numbers is
> just a meaningless domain value to the relational part of the DBMS - it
> needs the type engine to make sense of it.

This is Date's approach.

I think we agree; the only point of disagreement would be whether the
UNGROUP operator "understands" the propositions it is ungrouping.

I'd say it doesn't (it doesn't *have to*, at least). You could define an
unnesting operator that works on sets or strings---which the "relational
part of the DBMS" does *not* understand---that would be isomorph to
UNGROUP. Therefore, "understanding" is not required for UNGROUP, and the
relational part of the DBMS can just ignore it.
Signature

Jon

Jon Heggland - 06 Jul 2005 09:28 GMT
> > Does it matter whether my unnest_string is system defined (by the DBMS)
> > or user-defined?
>
> Yes.

Are you serious? If unnest_string is system defined, relations with
string attributes are in 1NF, but if the exact same operator is user
defined, they are not? Why on earth would you want to make such a
distinction?

> Note btw. that if user-defined functions are restricted to the
> domains (their input and output types are only domains) then you cannot
> define such an operation as a user-defined function.

Let me try to define such an operator.

unnest_string takes a relation IN and an attribute name A as arguments,
returns a relation OUT. The type of attribute A in IN is character
string.

First, I derive an array of tuples from IN, in order to iterate over it.
Alternatively, I use a cursor; the point is to access tuples one at a
time.

For each tuple T, I look at the string value S of attribute A.

For each character C in S (I use length and substring operators here), I
insert a tuple into OUT (if it is not already there), consisting of the
tuple T without A, but with a new attribute B, the value of which is C
for the new tuple in question.

Is there anything wrong with this?
Signature

Jon

Jan Hidders - 06 Jul 2005 19:38 GMT
>>>Does it matter whether my unnest_string is system defined (by the DBMS)
>>>or user-defined?
[quoted text clipped - 5 lines]
> defined, they are not? Why on earth would you want to make such a
> distinction?

Hmm. You're right. I have to apologize here because I don't know why I
said that. The definition I gave doesn't even mention that distinction.

>>Note btw. that if user-defined functions are restricted to the
>>domains (their input and output types are only domains) then you cannot
[quoted text clipped - 5 lines]
> returns a relation OUT. The type of attribute A in IN is character
> string.

Ah, but now you are using the domain or relations, right? There is a
problem with that domain. It doesn't exist. The collection of all
relations is a proper class, and not a set, but domains have to be sets.

-- Jan Hidders
Jon Heggland - 07 Jul 2005 09:25 GMT
> >>Note btw. that if user-defined functions are restricted to the
> >>domains (their input and output types are only domains) then you cannot
[quoted text clipped - 9 lines]
> problem with that domain. It doesn't exist. The collection of all
> relations is a proper class, and not a set, but domains have to be sets.

You'll have to educate me on the difference between "proper class" and
"domain", I'm afraid. The term "class" is used for so many slightly
different things.

Be that as it may, what is the problem?

Should I be forbidden from treating "relation" as a (generic) domain
when defining this operator? Why?

Would you allow this operator if it were system-defined?
Signature

Jon

Jan Hidders - 07 Jul 2005 19:33 GMT
>>>>Note btw. that if user-defined functions are restricted to the
>>>>domains (their input and output types are only domains) then you cannot
[quoted text clipped - 13 lines]
> "domain", I'm afraid. The term "class" is used for so many slightly
> different things.

http://en.wikipedia.org/wiki/Class_(set_theory)

> Should I be forbidden from treating "relation" as a (generic) domain
> when defining this operator? Why?

Because by definition it isn't, and redefining the notion of domain such
that it is, is not that easy without either running into paradoxes or
getting a notion which it is almost impossible to reason about.

> Would you allow this operator if it were system-defined?

I am not disallowing anything.

-- Jan Hidders
Jon Heggland - 08 Jul 2005 10:15 GMT
> >>Ah, but now you are using the domain or relations, right? There is a
> >>problem with that domain. It doesn't exist. The collection of all
[quoted text clipped - 5 lines]
>
> http://en.wikipedia.org/wiki/Class_(set_theory)

Call me stupid, but you still have to explain to me why the
"collection" of relations is not a set. I can't figure out from
Wikipedia what it is that disqualifies something from being a set.

Are you saying that the "domain of sets" is not a domain either?

And that relation-valued and set-valued attributes are contradictions in
terms, since relation and set are not domains?

> > Should I be forbidden from treating "relation" as a (generic) domain
> > when defining this operator? Why?
>
> Because by definition it isn't, and redefining the notion of domain such
> that it is, is not that easy without either running into paradoxes or
> getting a notion which it is almost impossible to reason about.

Can you give me any examples of trouble arising from this? And an
explanation why the relational operators do not run into paradoxes? Or
do they?

> > Would you allow this operator if it were system-defined?
>
> I am not disallowing anything.

Please don't be coy or cryptic. Do you mean that *you* do not disallow
anything, but mathematical theory does? Or that being forbidden and
being disallowed is different?
Signature

Jon

Jan Hidders - 08 Jul 2005 22:53 GMT
>>>>Ah, but now you are using the domain or relations, right? There is a
>>>>problem with that domain. It doesn't exist. The collection of all
[quoted text clipped - 9 lines]
> "collection" of relations is not a set. I can't figure out from
> Wikipedia what it is that disqualifies something from being a set.

That's not so easy to explain, but I'll give a hint. Do you know about
Russel's paradox? The contradiction that appears when you assume the
existense of the set of all sets? A solution to that paradox is to
accept that it is actually not a set, at least not in the sense that all
the usual axioms of set theory apply to it.  In some sense you might say
that is is "too large" to be a set. The collection of all relations has
the same problem. For more explanation you can probably ask any
mathematician at your institute.

> Are you saying that the "domain of sets" is not a domain either?

Indeed.

> And that relation-valued and set-valued attributes are contradictions in
> terms, since relation and set are not domains?

No. For example the set of all sets of integers is a set and perfectly
valid as a domain. The same for the set of relations that all belong to
a certain relation type. Usually if you apply a strict typing regime
there is no problem.

>>>Should I be forbidden from treating "relation" as a (generic) domain
>>>when defining this operator? Why?
[quoted text clipped - 6 lines]
> explanation why the relational operators do not run into paradoxes? Or
> do they?

They are not functions defined over domains. Note that my remark that
started this was that the nest operation cannot be defined as a function
*over* *domains*. If you drop the latter restriction you can define them
without a problem.

-- Jan Hidders
Jon Heggland - 09 Jul 2005 13:04 GMT
> In some sense you might say
> that is is "too large" to be a set. The collection of all relations has
> the same problem.

I'm skeptical to this, but if it is too difficult to explain (or to give
an example of a problem), I'll let it be for the moment.

> > And that relation-valued and set-valued attributes are contradictions in
> > terms, since relation and set are not domains?
[quoted text clipped - 3 lines]
> a certain relation type. Usually if you apply a strict typing regime
> there is no problem.

In that case, I redefine my unnest_string so that IN is a RELATION { K :
INTEGER, A : STRING }, OUT is a RELATION { K : INTEGER, B : CHARACTER },
and the attribute name argument is not needed.

Now, my operator is perfectly fine, isn't it? But has STRING now become
non-atomic?

> >>>Should I be forbidden from treating "relation" as a (generic) domain
> >>>when defining this operator? Why?
[quoted text clipped - 8 lines]
>
> They are not functions defined over domains.

That much is obvious, given your definitions above. But wasn't your
point that this leads to problems and paradoxes? Is there a difference
between my previous unnest_string and the relational operators? Don't
the relational operators treat "relation" like a domain?

> Note that my remark that
> started this was that the nest operation cannot be defined as a function
> *over* *domains*. If you drop the latter restriction you can define them
> without a problem.

Without a problem? Except the problems of making string non-atomic and
leading to Russelian paradoxes, or without *any* problem? I must admit I
have lost track of your argument.
Signature

Jon

Jan Hidders - 10 Jul 2005 09:43 GMT
>>In some sense you might say
>>that is is "too large" to be a set. The collection of all relations has
>>the same problem.
>
> I'm skeptical to this, but if it is too difficult to explain (or to give
> an example of a problem), I'll let it be for the moment.

I'll give another hint. Since unary relations are similar to sets you
can get Cantor's paradox.

> In that case, I redefine my unnest_string so that IN is a RELATION { K :
> INTEGER, A : STRING }, OUT is a RELATION { K : INTEGER, B : CHARACTER },
> and the attribute name argument is not needed.
>
> Now, my operator is perfectly fine, isn't it? But has STRING now become
> non-atomic?

I think the definition is quite clear on that.

>>>>>Should I be forbidden from treating "relation" as a (generic) domain
>>>>>when defining this operator? Why?
[quoted text clipped - 11 lines]
> That much is obvious, given your definitions above. But wasn't your
> point that this leads to problems and paradoxes?

No, redefining the notion of domain would.

> Is there a difference
> between my previous unnest_string and the relational operators? Don't
> the relational operators treat "relation" like a domain?

That depends on how you define them. If you view the selection with a
certain predicate as a single operation then the answer is 'no'. If you
define or each relation type a different select then the answer is 'yes'.

>>Note that my remark that
>>started this was that the nest operation cannot be defined as a function
[quoted text clipped - 3 lines]
> Without a problem? Except the problems of making string non-atomic and
> leading to Russelian paradoxes, or without *any* problem?

I wouldn't call non-atomicity a problem in the sense that paradoxes are
a problem. You may or you may not want to be in 1NF, but a formal system
with contradictions is useless.

-- Jan Hidders
mAsterdam - 10 Jul 2005 10:02 GMT
> ... a formal system with contradictions is useless.

It would be nice if legislators were aware of that :-)
Paul - 10 Jul 2005 12:07 GMT
>>> In some sense you might say that is is "too large" to be a set. The
>>> collection of all relations has the same problem.
[quoted text clipped - 4 lines]
> I'll give another hint. Since unary relations are similar to sets you
> can get Cantor's paradox.

Doesn't this only apply if you are considering the set of all relations
over all domains? What if you restrict yourself to a finite set of
domains? I can't see how Cantor's Paradox would apply in this case.

So rather than having a domain of "the set of all relations", which
can't exist, you could have a domain of "the set of all relations over a
specified finite set of domains". Or even an infinite set of domains, I
suppose, providing it's still a well-defined set.

So the "size explosion" problem here is with the domains rather than the
relations?

Paul.
Jan Hidders - 10 Jul 2005 12:19 GMT
>>>>In some sense you might say that is is "too large" to be a set. The
>>>>collection of all relations has the same problem.
[quoted text clipped - 8 lines]
> over all domains? What if you restrict yourself to a finite set of
> domains? I can't see how Cantor's Paradox would apply in this case.

It wouldn't. But the question at hand was whether the collection of all
relations can be a domain. If you are going to postualte the set of
domains a priori, then the set of relations over those domains will of
course not be one of those postulated domains.

-- Jan Hidders
Paul - 11 Jul 2005 13:03 GMT
>> Doesn't this only apply if you are considering the set of all relations
>> over all domains? What if you restrict yourself to a finite set of
[quoted text clipped - 4 lines]
> domains a priori, then the set of relations over those domains will of
> course not be one of those postulated domains.

OK. but domains can be thought of as simply sets. So then the
"collection of all domains" is like the "set of all sets" which, by
Cantor's Paradox, isn't actually a well-defined set.

So we can't even get to the stage of considering the set of all
relations over all domains, because "all domains" is meaningless in set
theory! No wonder Cantor went insane... :)

Paul.
vc - 11 Jul 2005 17:01 GMT
> >> Doesn't this only apply if you are considering the set of all relations
> >> over all domains? What if you restrict yourself to a finite set of
[quoted text clipped - 8 lines]
> "collection of all domains" is like the "set of all sets" which, by
> Cantor's Paradox, isn't actually a well-defined set.

In "naive" set theory, yes,  in ZF, no.  There is no problem with
defining a "set of all domains" provided that the set satisfies ZF
axioms.

> So we can't even get to the stage of considering the set of all
> relations over all domains, because "all domains" is meaningless in set
> theory! No wonder Cantor went insane... :)
>
> Paul.
Paul - 11 Jul 2005 17:21 GMT
>>OK. but domains can be thought of as simply sets. So then the
>>"collection of all domains" is like the "set of all sets" which, by
[quoted text clipped - 3 lines]
> defining a "set of all domains" provided that the set satisfies ZF
> axioms.

So couldn't our "set of all domains" be a valid domain itself? And thus
a member of itself?

Paul.
vc - 11 Jul 2005 18:40 GMT
> >>OK. but domains can be thought of as simply sets. So then the
> >>"collection of all domains" is like the "set of all sets" which, by
[quoted text clipped - 8 lines]
>
> Paul.

Not in ZF where, given D as a set of all domains formed at an earlier
stage (see the "iterative/cumulative conception of set" I mentioned
earlier in the thread),  you can talk about D as being a member of some
set formed at a later stage: {D,..} or {{D}, {D...}, ...}  and so
forth.

vc
Jon Heggland - 10 Jul 2005 18:39 GMT
> I'll give another hint. Since unary relations are similar to sets you
> can get Cantor's paradox.

Gee, thanks.

> > In that case, I redefine my unnest_string so that IN is a RELATION { K :
> > INTEGER, A : STRING }, OUT is a RELATION { K : INTEGER, B : CHARACTER },
[quoted text clipped - 4 lines]
>
> I think the definition is quite clear on that.

This mode of argument is rather tiresome. I do not find you clear at
all. Any reason why you can't say "yes" or "no"?

Would anybody else care to jump in and try to summarise dr. Hidder's
arguments?

> >>>Can you give me any examples of trouble arising from this? And an
> >>>explanation why the relational operators do not run into paradoxes? Or
[quoted text clipped - 6 lines]
>
> No, redefining the notion of domain would.

So me constructing new relational operators is just fine, just as long
as I don't use the words "function" and "domain"? You've lost me several
times over.
Signature

Jon

VC - 11 Jul 2005 01:36 GMT
Hi,

Now that we are done with conceptual objects and stuff,  let's take a look
at sets ;)

>>>>>Ah, but now you are using the domain or relations, right? There is a
>>>>>problem with that domain. It doesn't exist. The collection of all
[quoted text clipped - 14 lines]
> Russel's paradox? The contradiction that appears when you assume the
> existense of the set of all sets?

So far so good.

> A solution to that paradox is to accept that it is actually not a set, at
> least not in the sense that all the usual axioms of set theory apply to
> it.

That would not be what's considered the "standard" solution (ZF set theory).
The notion of "proper set" simply does not exist in ZF.  Everything is a set
and its elements.

> In some sense you might say that is is "too large" to be a set.

That's a possible approach but not necessary with the relational model where
ZF is quite sufficient,  thank you very much.

>> Are you saying that the "domain of sets" is not a domain either?
>
> Indeed.

Set of *all* sets does not exist in ZF,  and if you define *your* set of
sets so that it would satisfy the ZF axioms,  you are OK (e.g. do not
include in your set of sets the set of sets itself or any other set where
your set of sets would be a member).   The Z (Zermelo)  axioms can be
derived from the "iterative" conception of sets which can be briefly
described so:  at stage zero let's form all possible collections of
elements, if no elements available, form an ampty set; at stage one let's
form all possible collections using individuals *and* collections fom stage
zero;  ... ; at stage three let's form all possible collections using stuff
from stage zero, one and two;  and so on.  It's easy to see that with the
iterative conception,  no set belongs to itself,  and therefore, there is no
set of all sets. So,  if you take care to form your set of sets so that it
would contain only sets from the previous stages (easy to do),  you should
be cool.

>>>>Should I be forbidden from treating "relation" as a (generic) domain
>>>>when defining this operator? Why?
>>>
>>>Because by definition it isn't, and redefining the notion of domain such
>>>that it is, is not that easy without either running into paradoxes or
>>>getting a notion which it is almost impossible to reason about.

Well,  a set of relations would most certainly be a set if you  take
necessary precautions (see the iterative conception).  I cannot imagine how
one can describe a set of relations (in the RM) so that one would run into
the Russell paradox.  Can you ? ;)

>> Can you give me any examples of trouble arising from this? And an
>> explanation why the relational operators do not run into paradoxes? Or do
>> they?
>
> They are not functions defined over domains

The relational operators are of course ordinary functions, nothing fancy
about them,   of the kind F: R->R  where R is a *set* of relations.

> Note that my remark that started this was that the nest operation cannot
> be defined as a function *over* *domains*.

But it can,  why not ?

vc
VC - 11 Jul 2005 00:46 GMT
[...]
> Ah, but now you are using the domain or relations, right? There is a
> problem with that domain. It doesn't exist. The collection of all
> relations is a proper class, and not a set, but domains have to be sets.

The collection of all relations is most certainly a set,  and therefore, a
domain,  domain being a synonym of set.  The term "proper class" implies
that you talk in terms of set theory other than ZF ( Zermelo - Fraenkel ) ).
There is no need to do so for the reltional model unless you can show there
is ;)

> -- Jan Hidders
Jan Hidders - 12 Jul 2005 22:22 GMT
> [...]
>
[quoted text clipped - 7 lines]
> There is no need to do so for the reltional model unless you can show there
> is ;)

There is indeed no such need, unless of course you want to define the
domain of relations, which you cannot do in ZF.

-- Jan Hidders
VC - 13 Jul 2005 00:44 GMT
>> [...]
>>
[quoted text clipped - 9 lines]
> There is indeed no such need, unless of course you want to define the
> domain of relations, which you cannot do in ZF.

The onus of proof of such impossibility is squarely on your shoulders.
Please oblige (define a collection/domain of relations, within ZF, which
ain't a set).

Thanks.

> -- Jan Hidders
Jan Hidders - 13 Jul 2005 21:03 GMT
>>>[...]
>>>
[quoted text clipped - 14 lines]
> Please oblige (define a collection/domain of relations, within ZF, which
> ain't a set).

Defining a collection of relations within ZF that is not a set, is
neither here nor there.

-- Jan Hidders
VC - 13 Jul 2005 22:28 GMT
>>>>[...]
>>>>
[quoted text clipped - 17 lines]
> Defining a collection of relations within ZF that is not a set, is neither
> here nor there.

I am sorry but your response does not make any sense whatsoever.  What  is
"neither here nor there" supposed to mean ?

Please define a collection of relations obeying ZF axioms and show it's not
a set.

> -- Jan Hidders
Jan Hidders - 13 Jul 2005 23:00 GMT
>>>>>[...]
>>>>>
[quoted text clipped - 21 lines]
> I am sorry but your response does not make any sense whatsoever.  What  is
> "neither here nor there" supposed to mean ?

It means that it is irrelevant for the question whether you can define
the domain of relations in ZF. Note that is says "the domain" and not "a
domain".

-- Jan Hidders
VC - 14 Jul 2005 02:28 GMT
>>>>>>[...]
>>>>>>
[quoted text clipped - 25 lines]
> domain of relations in ZF. Note that is says "the domain" and not "a
> domain".

I am sorry but you are still not making sense.  What sacred meaning does the
definite article impart to a reader ?  What is "the domain of relations" ?
Do you mean the domain of all the relations there are ? Or some other "the
domain" ?

Let's assume "the domain of relations" stands for a collection of all the
relations there are.
Now,  we all know, don't we, that a relation in the relational model is just
a set (forgetting about the "header" which is not important).  So,  given
that relations, or  all the relations there are,  are just sets,  why the
domain, or the collection of all the relations  which are guaranteed by
definion to be sets formed at an earlier [than our domain] stage in the
cumulative hierarchy,  is not a set ?

> -- Jan Hidders
Jon Heggland - 05 Jul 2005 11:43 GMT
> > What is an atomic datatype/domain?
>
> The problem with that question is that it makes an incorrect assumption.
> Atomicity is not a property of a domain per se, but rather a property of
> how it is treated by certain operations.

By the way, what operators in particular are you referring to here?
Signature

Jon

 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2010 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.