Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / July 2008

Tip: Looking for answers? Try searching our database.

Examples of SQL anomalies?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
Artemus Ward - 26 Jun 2008 09:55 GMT
Before I waste a big chunk of next week, does anyone know of any docs on the
web that attempt to gather together examples of anomolous behaviour in
standard SQL?  I am thinking particularly of things like inconsistent
treatment of nulls.  I've got a couple of (to me) very obvious examples but
there must be many more.

Failing that, can anyone suggest good Google search terms that might return
a manageable number of hits?  Nothing I've tried has been very useful.

Arty
David Cressey - 26 Jun 2008 13:06 GMT
> Before I waste a big chunk of next week, does anyone know of any docs on the
> web that attempt to gather together examples of anomolous behaviour in
[quoted text clipped - 6 lines]
>
> Arty

what have you tried?
Brian Selzer - 26 Jun 2008 13:06 GMT
> Before I waste a big chunk of next week, does anyone know of any docs on
> the web that attempt to gather together examples of anomolous behaviour in
[quoted text clipped - 7 lines]
>
> Arty

Well, the first thing that comes to mind is that SQL tables are not sets of
rows, but bags of rows.
Carl Federl - 26 Jun 2008 17:09 GMT
> > Failing that, can anyone suggest good Google search terms that might  return a manageable number of hits?"

Search for these 3 keywords:  sql inconsistency "c. j. date"
and also  sql critique "c. j. date"
This critique is still accurate
http://www.cs.duke.edu/~junyang/courses/cps216-2003-spring/papers/date-1983.pdf
-CELKO- - 27 Jun 2008 16:20 GMT
>> This critique is still accurate <<

No, some things have changed with the addition of table and row
constructors using (VALUES (), (), ..) AS <table name>, row
comparisons, the UNION, INTERSECT and EXCEPT, etc.

What is driving me nuts is allowing an alias in UPDATE and MERGE
clauses.  The model for an alias has been that a working table is
effectively created with the alias (correlation) name and it
disappears at the end of the statement.

The fact that SUM({}) IS NULL (undefined) and SUM(a) + SUM(b) <> SUM(a
+b) is not a problem, if you have enough math to understand that
summation (big sigma) is not addition (+) when you deal with sets.
paul c - 26 Jun 2008 17:38 GMT
> Before I waste a big chunk of next week, does anyone know of any docs on the
> web that attempt to gather together examples of anomolous behaviour in
[quoted text clipped - 6 lines]
>
> Arty

What I've read of CJ Date's suggests that he thinks nulls are not
'manageable' in the first place, so if one takes him to be an authority
it shouldn't surprise that a web search will only expose more mayhem.

I took many of Date's criticisms of SQL as meaning that once  one goes
beyond the simplest of queries many inconsistent and/or wrong results
are possible, not to mention a severe lack of closure, blah, blah.
Personally, I'm content to accept that and not make a career out of
studying all the  nuances that make confusion inevitable.  I heard Codd
himself disparage SQL on the grounds that it doesn't enforce 'primary'
keys.  He went on at length about he considered them essential.

It also seems to me that a whole industry has been built on quicksand
because of people's attachment to traditional procedural languages.  A
bit ironic since when Codd still worked for IBM and much of the
relational theory was being developed, he and others would have been
quite aware of IBM's own somewhat declarative language, RPG!  (at least
more declarative than the Cobol/PLI of those days and perhaps more
declarative than the latest various versions of SQL - in RPG essentially
all a problem did was replace its inputs with outputs, I thought of it
as 'Replacement Program Generator'!)

The apparently wide-spread acceptance of SQL seems rather anti-human to
me, I can only attribute it to the modern fascination with technocratic
complexity.  My simplest argument against it would be that the SQL
standard is something like 1,000 pages of haphazard jargon (apparently
it never mentions 'relations'!).  I doubt if any one mortal, including
Date, can assimilate it all.  When he (and McGoveran and Pascal and
Darwen) suggest that the original SQL developers either mis-read or
mis-understood what Codd originally wrote.  In later writings, even Codd
himself added to the confusion, perhaps this was because he wasn't very
interested with integration of languages with the model.

Even Date's response to SQL (as well as to the 'object' movement), ie.,
the TTM book and related writings, while written much more precisely
than the SQL standard seems to me to be in danger of not being
comprehensible by a single mortal.
Nilone - 26 Jun 2008 22:38 GMT
> Before I waste a big chunk of next week, does anyone know of any docs on the
> web that attempt to gather together examples of anomolous behaviour in
[quoted text clipped - 6 lines]
>
> Arty

Take a look at dbdebunk.com in general, and
http://www.dbdebunk.com/page/page/2928212.htm specifically.
Gene Wirchenko - 26 Jun 2008 22:58 GMT
>Before I waste a big chunk of next week, does anyone know of any docs on the
>web that attempt to gather together examples of anomolous behaviour in
[quoted text clipped - 4 lines]
>Failing that, can anyone suggest good Google search terms that might return
>a manageable number of hits?  Nothing I've tried has been very useful.

    Others have mentioned nulls.  One of my pet peeves in this area
is that in SQL, if I sum zero numbers, the result is null.
Mathematically, the result is 0.  I have an app where I have had make
adjustments where it is possible that zero numbers are summed.

    Try this one about duplicates:
         http://www.dbdebunk.com/page/page/627052.htm

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Bob Badour - 26 Jun 2008 23:43 GMT
>>Before I waste a big chunk of next week, does anyone know of any docs on the
>>web that attempt to gather together examples of anomolous behaviour in
[quoted text clipped - 12 lines]
>      Try this one about duplicates:
>           http://www.dbdebunk.com/page/page/627052.htm

My personal favorite:

SUM(A) + SUM(B) != SUM(A+B)
Philipp Post - 27 Jun 2008 16:39 GMT
> Others have mentioned nulls.  One of my pet peeves in this area
is that in SQL, if I sum zero numbers, the result is null.
Mathematically, the result is 0.  I have an app where I have had make
adjustments where it is possible that zero numbers are summed. <

SQL Server at least stripps off NULLs in the SUM function and returns
a number, but not in a simple addition - then it returns NULL if at
least one entry is NULL.

> SUM(A) + SUM(B) != SUM(A+B) <

The question is, if these issues are due to the SQL specification or
simply due to a problem in a specific SQL product. Or could it be,
that the definition is not precise enough in some points, so that
database vendors implemented it differently? A quick look at the SQL
92 spec did not sched much light on this for me regarding the source
of the problem.

Thanks,

Philipp Post
Gene Wirchenko - 27 Jun 2008 18:00 GMT
>> Others have mentioned nulls.  One of my pet peeves in this area
>is that in SQL, if I sum zero numbers, the result is null.
[quoted text clipped - 9 lines]
>The question is, if these issues are due to the SQL specification or
>simply due to a problem in a specific SQL product. Or could it be,

    I am told that it is the spec.

>that the definition is not precise enough in some points, so that
>database vendors implemented it differently? A quick look at the SQL
>92 spec did not sched much light on this for me regarding the source
>of the problem.

    I do not use nulls in my tables.  I still get null when I sum
what turns out to be zero rows.  If I want the total amount charged on
invoices for a client in a certain date range, and there are no such
invoices, the total is zero.  SQL returns null.  Every place where
that possibility exists, I have to patch the cursor.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
-CELKO- - 27 Jun 2008 19:29 GMT
>> I do not use nulls in my tables.  I still get null when I sum what turns out to be zero rows.  If I want the total amount charged on invoices for a client in a certain date range, and there are no such invoices, the total is zero. <<

No, that is flat wrong.  There is no such invoice, so the question of
its total makes no sense.  It is a totally different kind of thing
from an invoice that actually totaled to zero.  Zero, empty set, empty
strings, NULLs, etc. are all different concepts around the idea of
missing data of some kind.

Also, why are you using cursors?  If you want to give this misleading
data, then use a COALESCE(invoice_tot, 0.00) in your code.
Gene Wirchenko - 27 Jun 2008 22:27 GMT
>>> I do not use nulls in my tables.  I still get null when I sum what turns out to be zero rows.  If I want the total amount charged on invoices for a client in a certain date range, and there are no such invoices, the total is zero. <<
>
[quoted text clipped - 3 lines]
>strings, NULLs, etc. are all different concepts around the idea of
>missing data of some kind.

    If I want the total amount invoiced in a time period and there
are no invoices, the total amount invoiced was zero.

>Also, why are you using cursors?  If you want to give this misleading
>data, then use a COALESCE(invoice_tot, 0.00) in your code.

    Null is the misleading datum.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
-CELKO- - 30 Jun 2008 16:52 GMT
>> If I want the total amount invoiced in a time period and there are no invoices, the total amount invoiced was zero. <<

You have to coalesce the NULL to a zero or drop it as not in the set.
Remember this one from freshman logic when the book got to the Greek
paradoxes?

1) A cat has one more tail than no cat.
2) No cat has 12 tails.
Ergo; a cat has 13 tails.

This is based on confusing the **quantity** zero with the absence of
an entity to have a quantity.  Get a copy of "Zero: The Biography of a
Dangerous Idea" by Charles Seife or "The Nothing that Is: A Natural
History of Zero" by Robert Kaplan and Ellen Kaplan.  There is also a
kid's book, A Place for Zero: A Math Adventure" by Angeline Sparagna
Lopresti and Phyllis Hornung, but it does not get into set theory.
David Cressey - 02 Jul 2008 13:30 GMT
> >>> I do not use nulls in my tables.  I still get null when I sum what turns out to be zero rows.  If I want the total amount charged on invoices
for a client in a certain date range, and there are no such invoices, the
total is zero. <<

> >No, that is flat wrong.  There is no such invoice, so the question of
> >its total makes no sense.  It is a totally different kind of thing
[quoted text clipped - 9 lines]
>
>      Null is the misleading datum.

Except that null isn't a datum.  If you consider it a datum, that may be why
you are misled.
Bob Badour - 02 Jul 2008 14:05 GMT
>>>>>I do not use nulls in my tables.  I still get null when I sum what
>
[quoted text clipped - 20 lines]
> Except that null isn't a datum.  If you consider it a datum, that may be why
> you are misled.

Except that the total invoiced is a datum, and he requested that datum,
which has no reason or excuse for NULL.
Bob Badour - 02 Jul 2008 14:12 GMT
>>>>>> I do not use nulls in my tables.  I still get null when I sum what
>>
[quoted text clipped - 25 lines]
> Except that the total invoiced is a datum, and he requested that datum,
> which has no reason or excuse for NULL.

Correction: no legitimate excuse.
Gene Wirchenko - 02 Jul 2008 19:40 GMT
>"Gene Wirchenko" <genew@ocis.net> wrote in message

[snip]

>>      Null is the misleading datum.
>
>Except that null isn't a datum.  If you consider it a datum, that may be why
>you are misled.

    Null is a datum.  It is not a value.

    No, I am misled by SQL when it does not handle addition properly.
Well, I was, but now that I distrust it more, things are working
better.  Sort of.  Kludge, kludge.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
David Cressey - 03 Jul 2008 14:30 GMT
> >"Gene Wirchenko" <genew@ocis.net> wrote in message
>
[quoted text clipped - 6 lines]
>
>      Null is a datum.  It is not a value.

No, a marker indicating that a datum is missing is not a datum.  At best,
it's a metadatum.
Bob Badour - 03 Jul 2008 15:21 GMT
>>>"Gene Wirchenko" <genew@ocis.net> wrote in message
>>
[quoted text clipped - 12 lines]
> No, a marker indicating that a datum is missing is not a datum.  At best,
> it's a metadatum.

It's not a value. It is, however, a datum. It is information encoded
suitably for machine processing even if the machine processing is
inconsistent and poor machine processing.
Marshall - 03 Jul 2008 16:11 GMT
> It's not a value. It is, however, a datum.

Do you have a definition for "value"? I only ask because
you're the sort of person who might. I regularly recall your
(ISO?) definition for data; I have found that very useful.

My personal definition for "value" is something like a
member of a set, specifically a member of a domain.
I'm less happy with that as time goes by. There's an
interesting mathematical definition for "value" that gave
me a bit of a surprise when I first saw it:

"The quantity which a function f takes upon
application to a given quantity."

http://mathworld.wolfram.com/Value.html

In other words, value is something that is just a
member of the range of a function. Still seems
odd.

Do you have a definition for "information"?

Here's a good definition of "information":

"a formal criminal charge made without a grand jury
indictment by a prosecutor in a document."

OK maybe that's not so good for our purposes.

Marshall
Bob Badour - 03 Jul 2008 16:31 GMT
>>It's not a value. It is, however, a datum.
>
[quoted text clipped - 4 lines]
> My personal definition for "value" is something like a
> member of a set, specifically a member of a domain.

Exactly. A data type comprises two sets: a set of values and a set of
operations defined using those values. A value then is simply a member
of a set which has operations defined over the set.

> I'm less happy with that as time goes by. There's an
> interesting mathematical definition for "value" that gave
[quoted text clipped - 8 lines]
> member of the range of a function. Still seems
> odd.

That definition for 'value of a function' is more equivalent to 'return
value' in a computational context. The RM doesn't make as much
distinction among functions and other relations; they are all relations.
Data type support, which is both necessary for the RM and orthogonal to
the RM, makes more distinction among functions and other relations.

Please note, though, that the wolfram definition is consistent with the
element of a set definition above, and the function f is merely an
operation defined using the set of values in its range (possibly among
other sets.)

> Do you have a definition for "information"?

What's wrong with the ISO 2382 definition? It's the first definition in
the standard.
Marshall - 03 Jul 2008 16:50 GMT
> Exactly. A data type comprises two sets: a set of values and a set of
> operations defined using those values. A value then is simply a member
> of a set which has operations defined over the set.

It suddenly occurs to me that what you describe is pretty much
exactly an "algebraic structure" in abstract algebra.

http://en.wikipedia.org/wiki/Algebraic_structure

> > Do you have a definition for "information"?
>
> What's wrong with the ISO 2382 definition? It's the first definition in
> the standard.

What's wrong with it is that I can't remember it.

Marshall
Bob Badour - 03 Jul 2008 17:03 GMT
>>Exactly. A data type comprises two sets: a set of values and a set of
>>operations defined using those values. A value then is simply a member
[quoted text clipped - 11 lines]
>
> What's wrong with it is that I can't remember it.

You are the one who uses google groups. I avoid it like the plague.

Search for ISO/IEC 2382-1 01 badour

and shorten things until you find it.

Oh, all right:

From the ISO/IEC Standard vocabularies 2382-1

01.01.01 01.01.01
information (in information processing) information (en traitement de I'
information)
Knowledge concerning objects, such as facts, events, things,processes,
or ideas, including concepts, that within a certain context has a
particular meaning.

01.01.02 01.01.02
data
A reinterpretable representation of information in a formalized manner
suitable for communication, interpretation, or processing.
JOG - 03 Jul 2008 17:35 GMT
I normally accord to the definitions:

information = datum + meaning
datum = value + description

These are useful, succinct and accord nicely with [ISO 2382-1:1984]:
“Data: A representation of facts, concepts, or instructions in a
normalized manner suitable for communication, interpretation, or
processing by humans or by automatic means”, while giving something a
bit more formalized in terms of data theory.

In terms of defining value both the output of a function or the
element of a set seem fine, but then defining a value as being some
amount or quantity also seems fine.
Bob Badour - 03 Jul 2008 20:00 GMT
> I normally accord to the definitions:
>
[quoted text clipped - 10 lines]
> element of a set seem fine, but then defining a value as being some
> amount or quantity also seems fine.

Not all values are quantitative.
JOG - 04 Jul 2008 03:39 GMT
> > I normally accord to the definitions:
>
[quoted text clipped - 12 lines]
>
> Not all values are quantitative.

really? Hmmm. I suppose so. A value as a symbol used to refer to some
aspect of the real world might be better. It's certainly a tough one.
Bob Badour - 04 Jul 2008 03:56 GMT
>>>I normally accord to the definitions:
>>
[quoted text clipped - 15 lines]
> really? Hmmm. I suppose so. A value as a symbol used to refer to some
> aspect of the real world might be better. It's certainly a tough one.

I disagree. A symbol is a representation of a value; it is not the value
itself.

Data types do not have to be orders--not even partial orders. They can
be orders, and they can be quantitative, but they need not be.

A video data type is not quantitative; although, one would expect a
video data type to have lots of operations with quantitative results as
well as operations with non-quantitative results.
Brian Selzer - 03 Jul 2008 21:48 GMT
>I normally accord to the definitions:
>
[quoted text clipped - 10 lines]
>element of a set seem fine, but then defining a value as being some
>amount or quantity also seems fine.

I have a different take:

A value has no value except that which is assigned to it under an
interpretation: a value is not the output of just /any/ function, but rather
/the/ function that maps language terms to what those terms mean.

Information is just one or more values.

A datum is simply a formatted value that cannot be broken down into a
collection of component values that means the same thing.

Data is just formatted information.
JOG - 04 Jul 2008 03:42 GMT
> >I normally accord to the definitions:
>
[quoted text clipped - 16 lines]
> interpretation: a value is not the output of just /any/ function, but rather
> /the/ function that maps language terms to what those terms mean.

that's similar to what I said no? value + description?

> Information is just one or more values.

so you don't incorporate a "meaning" into information? I don't think I
could agree with you there.

> A datum is simply a formatted value that cannot be broken down into a
> collection of component values that means the same thing.
>
> Data is just formatted information.
Bob Badour - 04 Jul 2008 03:57 GMT
>>>I normally accord to the definitions:
>>
[quoted text clipped - 28 lines]
>>
>>Data is just formatted information.

JOG, in case you hadn't noticed, Selzer is just making sh.t up as he
goes along.
Brian Selzer - 04 Jul 2008 13:14 GMT
>>>>I normally accord to the definitions:
>>>
[quoted text clipped - 32 lines]
>JOG, in case you hadn't noticed, Selzer is just making sh.t up as he goes
>along.

JOG, in case you hadn't noticed, Badour, being incapable of independent
thought, is just envious of those of us who aren't.  The function that maps
terms of a language to objects in the Universe of Discourse has been
referenced and described by many.  While I haven't before seen "value"
defined as just the output of that particular function, it all just falls
into place for me, since every value carries with it some sense or meaning
and that particular function is integral to the process of attributing
meaning to language elements or combinations of language elements.
Brian Selzer - 04 Jul 2008 04:44 GMT
> "JOG" <jog@cs.nott.ac.uk> wrote in message news:5b32834f-b082-42ba-a1f5-
> 7a94b44532ae@a1g2000hsb.googlegroups.com...
[quoted text clipped - 24 lines]
>
> that's similar to what I said no? value + description?

I don't think so.  A value is still a value even without being described or
formatted.  It is what a language term such as an individual name means.

> > Information is just one or more values.
>
> so you don't incorporate a "meaning" into information? I don't think I
> could agree with you there.

A value is what a term means--whether that term be as simple as an
individual name or as complex as a definite description.  So information
being just--or if you will, incorporating--one or more values thus
incorporates meaning.

> > A datum is simply a formatted value that cannot be broken down into a
> > collection of component values that means the same thing.
> >
> > Data is just formatted information.
David BL - 04 Jul 2008 03:04 GMT
> I normally accord to the definitions:
>
[quoted text clipped - 10 lines]
> element of a set seem fine, but then defining a value as being some
> amount or quantity also seems fine.

I agree with

   information = datum + meaning

I think the distinction between value and data/information has to do
with the distinction between value and variable.  A value doesn't have
a context in time/space and therefore cannot in itself be regarded as
data or information.  As Bob says, “a value just is".

A variable is a holder for an encoded value.  I would suggest

   datum  = encoded value

An encoded value means an “appearance of a value” using C.Date
terminology and this occurs in time and space and therefore we can say
it has a context.
JOG - 04 Jul 2008 03:35 GMT
> > I normally accord to the definitions:
>
[quoted text clipped - 27 lines]
> terminology and this occurs in time and space and therefore we can say
> it has a context.

Well, I wonder if this is the same thing, just said in different vocab
- the appearance you talk about is just the description I refer to.
David BL - 04 Jul 2008 07:21 GMT
> > > I normally accord to the definitions:
>
[quoted text clipped - 30 lines]
> Well, I wonder if this is the same thing, just said in different vocab
> - the appearance you talk about is just the description I refer to

Ok, I wasn’t sure what "description" meant.
Marshall - 02 Jul 2008 17:22 GMT
> Zero, empty set, empty strings, NULLs, etc. are all different
> concepts around the idea of missing data of some kind.

I just wanted to highlight this sentence as a remarkable example
of cluelessness.

Marshall
-CELKO- - 27 Jun 2008 19:23 GMT
>> The question is, if these issues are due to the SQL specification or simply due to a problem in a specific SQL product. Or could it be, that the definition is not precise enough in some points, so that database vendors implemented it differently? <<

Nope, it is the specs.  All aggregate (set) functions begin by
removing the NULLs from their parameter set, then if there is a
DISTINCT option on the parameter, they remove redundant duplicates and
finally do the operation (MIN, MAX, AVG, SUM, COUNT on what is left.
Since an empty set has no elements upon which to apply an operation,
SQL returns a NULL (okay, it should be an "undefined" if we were
mathematically correct).

In SQL as in Set Theory, equality (=) and grouping are not the same;
the SUM() and the + are  not the same.  They are for different levels
of abstraction.  It makes senses after your first course with
transfinite numbers -- the cardinality of Aleph Null is not the same
as counting all the integers one by one, etc.
Gene Wirchenko - 27 Jun 2008 22:29 GMT
>>> The question is, if these issues are due to the SQL specification or simply due to a problem in a specific SQL product. Or could it be, that the definition is not precise enough in some points, so that database vendors implemented it differently? <<
>
[quoted text clipped - 5 lines]
>SQL returns a NULL (okay, it should be an "undefined" if we were
>mathematically correct).

    No, it should be zero per mathematicians I have checked with.

>In SQL as in Set Theory, equality (=) and grouping are not the same;
>the SUM() and the + are  not the same.  They are for different levels
>of abstraction.  It makes senses after your first course with
>transfinite numbers -- the cardinality of Aleph Null is not the same
>as counting all the integers one by one, etc.

    Bafflegab.  I have studied transfinite numbers; the area is
irrelevant.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Philipp Post - 29 Jun 2008 04:13 GMT
> In SQL as in Set Theory, equality (=) and grouping are not the same; the SUM() and the + are  not the same.  They are for different levels of abstraction. <

Thanks for the hint, but the behaviour of the set functions indeed
seems to be a bit strange (might also be I still miss something)

I will assume that the column can take no negative values to keep the
example easier:

a) 1 + NULL + 1 = NULL, because one element is unknown, I cannot know
the result. That makes sense.

b) SUM{1, NULL, 1} = 2, however I could just say that the result is AT
LEAST 2 but the function cannot state it that precisely and as such it
would make more sense if it would then throw an error like "You cannot
apply this function on data containing NULL" what would force the
developer to SELECT SUM(SomeAmount) .... WHERE NOT SomeAmount IS NULL.
If this would make SQL programming more fun is a different question...

brgds

Philipp Post
David Cressey - 02 Jul 2008 19:39 GMT
>b) SUM{1, NULL, 1} = 2, however I could just say that the result is AT
>LEAST 2 but the function cannot state it that precisely and as such it
>would make more sense if it would then throw an error like "You cannot
>apply this function on data containing NULL" what would force the
>developer to SELECT SUM(SomeAmount) .... WHERE NOT SomeAmount IS NULL.
>If this would make SQL programming more fun is a different question...

I disagree.  If NULL meant "there is a value for this entry, but we don't
know what it is"  then your interpretation might be correct.  However, if
NULL means "there is no value for this entry"  then excluding it from
consideration is the only sensible approach.

Consider the missing row.  The missing row is a row that should have been
inserted but, due to some screwup in the application or the people supplying
the data,  it was never inserted.  I'm sure you'll agree that,  in this
case,  the only sane thing to do with SUM is to leave the missing row out of
the SUM.  And surely the only sane thing to do with AVG is to leave the
missing row out of both the numerator and the denominator of the fraction
that results in AVG.

Well a NULL,  properly construed, is a row that is partly not there.

I realize there are a lot of experts who disagree with my opinion here,  but
I'll stand by it.
Philipp Post - 03 Jul 2008 11:04 GMT
> >b) SUM{1, NULL, 1} = 2, however I could just say that the result is AT
> >LEAST 2 but the function cannot state it that precisely and as such it
[quoted text clipped - 7 lines]
> NULL means "there is no value for this entry"  then excluding it from
> consideration is the only sensible approach.

You are right. I get more and more the impression, that all the
confusion starts at the definition what NULL actually means. It could
be

- not known =  there should be a value lateron
- not applicable = I can omit it
- confused with an amount of zero (0) = a mistake, if I mean zero (0)
I should use it.

Brian also rightfully said that it similarly.

brgds

Philipp Post
David Cressey - 03 Jul 2008 14:32 GMT
> >b) SUM{1, NULL, 1} = 2, however I could just say that the result is AT
> >LEAST 2 but the function cannot state it that precisely and as such it
[quoted text clipped - 7 lines]
> NULL means "there is no value for this entry" then excluding it from
> consideration is the only sensible approach.

"You are right. I get more and more the impression, that all the
confusion starts at the definition what NULL actually means. It could
be

- not known =  there should be a value lateron
- not applicable = I can omit it
- confused with an amount of zero (0) = a mistake, if I mean zero (0)
I should use it.
"

Unfortunately the SQL standard propagates this confusion instead of
resolving it.
Brian Selzer - 29 Jun 2008 04:51 GMT
>>> The question is, if these issues are due to the SQL specification or
>>> simply due to a problem in a specific SQL product. Or could it be, that
[quoted text clipped - 8 lines]
> SQL returns a NULL (okay, it should be an "undefined" if we were
> mathematically correct).

MIN, MAX and AVG are meaningless when applied to an empty bag, but it seems
to me that COUNT should always return 0 when the bag is empty, and
similarly, SUM should return 0.  SUM should only return NULL if one of the
values to be summed is NULL.

> In SQL as in Set Theory, equality (=) and grouping are not the same;
> the SUM() and the + are  not the same.  They are for different levels
> of abstraction.  It makes senses after your first course with
> transfinite numbers -- the cardinality of Aleph Null is not the same
> as counting all the integers one by one, etc.
-CELKO- - 30 Jun 2008 17:08 GMT
>> MIN, MAX and AVG are meaningless when applied to an empty bag, but it seems to me that COUNT should always return 0 when the bag is empty, <<

Which COUNT?; we have two different functions in SQL that look too
much alike:

1) COUNT([DISTINCT] <exp>) = inside the set, throws away NULLs,
optionally converts a bag to a set.  If he has nothing to count, he is
not defined and we use a NULL.  See another posting to Gene about "no
cats" versus "zero cats" and the conceptual problems that mankind had
before Cantor.

2) COUNT(*) = works at the set level.  This is not counting; it is
cardinality.  A better notation from a pure mathematical viewpoint
would have been "CARD (<table expression>)" in the same style as
"[NOT] EXISTS (<table expression>)".  If the set is empty, then the
cardinality is zero by definition.

>> .. and similarly, SUM should return 0.  SUM should only return NULL if one of the values to be summed is NULL. <<

You can fake it with a little code:

CASE WHEN COUNT(*) = COUNT(<exp>) --no nulls in set
    THEN SUM(<exp>) -- usual behavior
    ELSE NULL END -- weird behavior

SQL requires a warning be issued when a NULL is dropped from an
aggregate result.   You can test for it in your code.  The bad news is
that when you use a cursor, when that warning comes is implementation
defined.  It can be done at DECLARE CURSOR, OPEN, the first FETCH or
the FETCH that gets the grouping with the NULL dropped from it.
Brian Selzer - 30 Jun 2008 18:31 GMT
>>> MIN, MAX and AVG are meaningless when applied to an empty bag, but it
>>> seems to me that COUNT should always return 0 when the bag is empty, <<
>
> Which COUNT?; we have two different functions in SQL that look too
> much alike:

Well, the OP wanted examples of SQL anomalies, and you've just confirmed a
big one.

If you have a bag that can contain peaches, but doesn't, then the answer to
the question "How many peaches are in the bag?" is clearly zero.  If you are
asked by the accountant, "How much were we billed by AT&T this month?" but
AT&T didn't send a bill, then the answer is clearly zero.  That SQL's COUNT
and SUM are something other than these common sense usages exemplifies their
anomalous nature.

> 1) COUNT([DISTINCT] <exp>) = inside the set, throws away NULLs,
> optionally converts a bag to a set.  If he has nothing to count, he is
[quoted text clipped - 22 lines]
> defined.  It can be done at DECLARE CURSOR, OPEN, the first FETCH or
> the FETCH that gets the grouping with the NULL dropped from it.
-CELKO- - 30 Jun 2008 22:23 GMT
>> If you have a bag that can contain peaches, but doesn't, then the answer to the question "How many peaches are in the bag?" is clearly zero.  If you are asked by the accountant, "How much were we billed by AT&T this month?" but AT&T didn't send a bill, then the answer is clearly zero.  <<

No, the accountant is going to ask "When the !$^@# hell did we get an
account with AT&T?!"  That particular example is near and dear to my
heart; AT&T created a commercial account for me which they first
charged $0.00 at set up.  When I did not object -- I had just started
a home computer line with them and  --  they took it out ~$750 from m
checking account without my permission.

When the gas gauge of my car says zero, that is an actual measurement
using a ratio scale with a natural origin point.  When I talk about
putting gas in my Lamborghini, it makes just as much sense to talk
about putting -10 liters into the gas tank.  When something does not
exist, then you can say anything.

Basic logic:
1) A false premise can imply a true or a false conclusion.
2) A true premise can only imply a true conclusion.
Marshall - 30 Jun 2008 23:51 GMT
> When the gas gauge of my car says zero, that is an actual measurement
> using a ratio scale with a natural origin point.  When I talk about
[quoted text clipped - 5 lines]
> 1) A false premise can imply a true or a false conclusion.
> 2) A true premise can only imply a true conclusion.

You're mixing up nonexistence with emptiness. The
empty set exists. We can count how many members
it has. That count is zero. We can add up all the
numbers in it and get a total; that total is zero. We
can also multiply all the numbers in it together and
get a product; that product is 1. We can AND together
all the booleans in it and get a boolean value; that
value is TRUE. Having things work this way is simpler
than if we don't.

However we cannot count the members of the set
of all sets that don't contain themselves as members,
because there is no such set.

Marshall
-CELKO- - 01 Jul 2008 17:20 GMT
>> You're mixing up nonexistence with emptiness. The empty set exists. We can count how many members it has. That count [cardinality] is zero. <<

Not quite the same thing ..

>> We can add up all the numbers in it and get a total; that total is zero. <<

But there are no members to add! You created zero from nothing as a
convention to get rid of the term in the sequence.  In the older
summation notation, there also is a convention that when the initial
value of the index is greater than the final value, the summation is
zero.  Pardon my pseudo-code, but how do you defend the traditional
approach which says:

 SUM (i) FOR i FROM 7 TO 9 = 24
 SUM (i) FOR i FROM 9 TO 7 = 0

A set-oriented version might be:

SUM (i) FOR i IN {7,8,9}  = 24
SUM (i) FOR i IN {9,7,8}  = 24
SUM (i) FOR i IN {8,9,7}  = 24
SUM (i) FOR i IN {9,8,7}  = 24
  etc.

I can support this convention with associativity, commutativity and
induction.
Marshall - 01 Jul 2008 19:19 GMT
> >> You're mixing up nonexistence with emptiness. The empty set exists. We can count how many members it has. That count [cardinality] is zero. <<
>
> Not quite the same thing ..

What does that mean?

> >> We can add up all the numbers in it and get a total; that total is zero. <<
>
> But there are no members to add!

So what?

That this is completely a non-problem is most evident with
count. Start with a bag containing three bananas. Remove
three bananas. How many bananas remain? How is that
the least bit hard?

> You created zero from nothing as a
> convention to get rid of the term in the sequence.

False! It's not from nothing, and it's not simply a convention.
It's the identity of the operator being aggregated.

> In the older
> summation notation, there also is a convention that when the initial
[quoted text clipped - 4 lines]
>   SUM (i) FOR i FROM 7 TO 9 = 24
>   SUM (i) FOR i FROM 9 TO 7 = 0

Again, this is not a convention. This form specifies a sequential
loop, with a starting number and an ending number. It's inherently
sequential. But since we're aggregating a binary function that
is both commutative and associative, and since the sequence
has no duplicates, the list-theoretic and set-theoretic answers
will be identical.

The first one says to

 add together all the naturals greater than or equal to 7 and
 less than or equal to 9.

The second one says to

 add together all the naturals greater than or equal to 9 and
 less than or equal to 7.

This is pretty obviously 24 and 0, respectively, which is completely
consistent with what I've asserted.

> A set-oriented version might be:
>
[quoted text clipped - 6 lines]
> I can support this convention with associativity, commutativity and
> induction.

You have misapprehended the semantics of the construct.

Marshall
-CELKO- - 01 Jul 2008 21:29 GMT
>> What does that mean? <<

The Greeks had a paradox:
1) A cat has one more tail than no cat.
2) No cat has 12 tails.
3) Therefore a cat has 13 tails.

The word "no" is used two different ways.  In the (1) "no" is a zero
and in (2) it is non-existence.

>> [But there are no members to add!] So what?  <<

ab nilo, ex nilo  -- from nothing comes nothing.

>> That this is completely a non-problem is most evident with count. Start with a bag containing three bananas. Remove three bananas. How many bananas remain? How is that the least bit hard? <<

But I have to have a bag first and it has to make sense to put bananas
in that bag.

>> False! It's not from nothing, and it's not simply a convention. It's the identity of the operator being aggregated. <<

Yes, zero is the additive identity.  But this is a convention used to
get rid of the empty set problem and preserve easy computations.

>> Again, this [ordered index sets] is not a convention. This form specifies a sequential loop, with a starting number and an ending number. It's inherently sequential. But since we're aggregating a binary function that is both commutative and associative, and since the sequence has no duplicates, the list-theoretic and set-theoretic answers will be identical. <<

I agree that this is pure procedural programming in mathematical
disguise; I want a set-oriented solution.  This depends on the index
set being finite; commutative and associative are a bonus that don't
work so well for countably infinite series.  You can easily find a set
in which you associate the elements in different ways and get
different results.

(1 + -1 +1 + -1 +1 ..) = ((1-1) + (1-1) + ..) = 0
(1 + -1 +1 + -1 +1 ..) = (1 + (-1 +1) + (-1 +1) + ..) = 1

The convention is to say it is undefined or that it does not converge.
I am a little soft on saying the answer is the set {0, 1}, and
defining other such results as the set of naturals or whatever.  I
have no idea what the rules would be like.

>>  You have misapprehended the semantics of the construct.  <<

No, I am saying I want to move from "list-theoretic" and "set-
theoretic" summations.
JOG - 01 Jul 2008 22:37 GMT
> >> What does that mean? <<
>
> The Greeks had a paradox:
> 1) A cat has one more tail than no cat.

Asking how many tails a "no cat" has is like asking how many tails the
colour blue has. The answer is not zero, the answer is "category
error" - the question makes no sense. As a consequence there is no
paradox here, because the first line is in error.

Anyhow, there is no such thing as a non-existent cat - a thing has to
exist (whether abstractly or physically) for it to be a "thing" in the
first place, by definition. Regards, J.

> 2) No cat has 12 tails.
> 3) Therefore a cat has 13 tails.
[quoted text clipped - 37 lines]
> No, I am saying I want to move from "list-theoretic" and "set-
> theoretic" summations.
-CELKO- - 01 Jul 2008 23:06 GMT
exactly!
JOG - 02 Jul 2008 00:29 GMT
> exactly!

Not really. I was just pointing out that the "greek paradox" mentioned
contained a category error. Asking the cardinality of a set does not
hold any such problems.

(n.b that as someone pointed out previously asking the cardinality of
the set of all sets would represent a category error, because such a
set is dos not exist, and hence has no properties).
Marshall - 02 Jul 2008 00:13 GMT
> >> What does that mean? <<
>
[quoted text clipped - 5 lines]
> The word "no" is used two different ways.  In the (1) "no" is a zero
> and in (2) it is non-existence.

This is totally irrelevant to the question at hand. You are
raising a red herring.

> >> [But there are no members to add!] So what?  <<
>
> ab nilo, ex nilo  -- from nothing comes nothing.

You can quote irrelevant latin phrases at me all day long.
"Semper fidelis tyrannosausus," as #21 yelled. This won't
change the fact that the empty set has zero members.

Or see this:

http://en.wikipedia.org/wiki/Empty_sum

or this:

http://en.wikipedia.org/wiki/Empty_product

> >> That this is completely a non-problem is most evident with count. Start with a bag containing three bananas. Remove three bananas. How many bananas remain? How is that the least bit hard? <<
>
> But I have to have a bag first and it has to make sense to put bananas
> in that bag.

And you have to have a good breakfast before you start your day.
How many bananas in an empty bag? Ask any four year old, and
they'll tell you; as I said it's not a hard question.

I will anticipate your response and point out that bananas get
overripe rapidly when left on the shelf for a few days.

> >> False! It's not from nothing, and it's not simply a convention. It's the identity of the operator being aggregated. <<
>
> Yes, zero is the additive identity.  But this is a convention used to
> get rid of the empty set problem and preserve easy computations.

If you think that zero is the additive identity by convention, then
you clearly do not understand what a convention is.
There is no problem to get rid of.

After that, you said a bunch of irrelevant stuff which I'm just
going to ignore.

Marshall
Bob Badour - 02 Jul 2008 00:42 GMT
<snip>

> After that, you said a bunch of irrelevant stuff which I'm just
> going to ignore.

What was so relevant about the stuff that came before?

I watch you and JOG and Gene reply to his nonsense and I wonder: Why?
Oh, why?!? Why do you elevate his nonsense with the blessing of a reply?
Marshall - 02 Jul 2008 00:50 GMT
> <snip>
>
[quoted text clipped - 5 lines]
> I watch you and JOG and Gene reply to his nonsense and I wonder: Why?
> Oh, why?!? Why do you elevate his nonsense with the blessing of a reply?

Well I'd certainly hate to just let it stand there unopposed!

Marshall
Bob Badour - 02 Jul 2008 01:06 GMT
>><snip>
>>
[quoted text clipped - 7 lines]
>
> Well I'd certainly hate to just let it stand there unopposed!

I used to think that way. Now I think: If someone has to scrape it off
their shoe, so be it. Most people will step around it in any case.

At some point, I realised: If I just filter the twit, I can save myself
huge amounts of time. If some neophyte replies, one can always
straighten him or her out.

Joe has built a career out of publishing outrageous nonsense to provoke
his betters into replying. Given that he composes his nonsense mostly
with block copy and paste, it takes him no effort. But because what he
writes is incoherent, it takes a great deal of effort for anyone to
reply coherently.

Most lurkers won't bother to read all of his crap because it is too long
and tedious. When lurkers see people reply to his crap as if it were
worthy of a reply, they get an incorrect impression. You are just
putting in huge amounts of effort to promote this a.shole. Why? What's
in it for you?
paul c - 02 Jul 2008 04:34 GMT
>> <snip>
>>
[quoted text clipped - 8 lines]
>
> Marshall

History shows that as the messages become more precise and and the
examples more concrete, there is always a point at which joe c will need
to depart the exchange before anybody else.  Of course, depending on the
topic, this can happen to any of us, but in his case it takes quite a
while because the starting point is usually gibberish.  This is obvious
even to somebody like me who is only interested in isolated aspects of
db theory.  He seems to be a modern version of those ancient so-called
poets who played semantic games but who were embraced by the main
population.  He should stay away from theory, after all it can't help
his book sales, as that audience could care less about theory, and from
what he posts here, it certainly isn't his subject.
Gene Wirchenko - 02 Jul 2008 01:33 GMT
[snip]

>> Yes, zero is the additive identity.  But this is a convention used to
>> get rid of the empty set problem and preserve easy computations.

>If you think that zero is the additive identity by convention, then
>you clearly do not understand what a convention is.
>There is no problem to get rid of.

    Hey, he threw in transfinite numbers over this!  We can hand him
the ordered field axioms.

>After that, you said a bunch of irrelevant stuff which I'm just
>going to ignore.

    It does make a good shovel though.  He has dug himself in deep.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Keith H Duggar - 08 Jul 2008 03:42 GMT
CELKO wrote :
> Marshall wrote :
> > False! It's not from nothing, and it's not simply a convention.
> > It's the identity of the operator being aggregated. <<
>
> Yes, zero is the additive identity.  But this is a convention used to
> get rid of the empty set problem and preserve easy computations.

Yes, sum[{}] = 1 eliminates problems and makes computation easier.
It's a shame that SQL offers neither of those benefits.

KHD
Marshall - 30 Jun 2008 23:54 GMT
> Well, the OP wanted examples of SQL anomalies, and you've just confirmed a
> big one.
[quoted text clipped - 5 lines]
> and SUM are something other than these common sense usages exemplifies their
> anomalous nature.

[I meant to say this in my other post, but]

Brian gets it exactly right here.

Marshall
Bob Badour - 01 Jul 2008 00:07 GMT
>>Well, the OP wanted examples of SQL anomalies, and you've just confirmed a
>>big one.
[quoted text clipped - 11 lines]
>
> Marshall

Well, except that SQL adds the additional anomaly that
count(*) != sum(1)

ie. SQL can count zero rows with count but not with sum.
paul c - 01 Jul 2008 02:03 GMT
>>> Well, the OP wanted examples of SQL anomalies, and you've just
>>> confirmed a
[quoted text clipped - 22 lines]
>
> ie. SQL can count zero rows with count but not with sum.

I was hoping you two would pipe up about this because as usual, it is
completely unclear whether CELKO is trying to answer the OP's question
or whether he is trying to justify SQL's myriad behaviours.  (Not
claiming this is deliberate, though I do have my suspicions!)

Not that I know much about SQL, having given up on it many years ago.

On the matter of 'common sense', I've learned not to trust it, at least
as a first principle, as it often turns out to be a cover phrase meaning
'that's what I thought in the first place' and thus varies dramatically
depending on the person, the moment and the lingo in play.  For sure,
bringing up intricate arbitrary logics like Johnston's is phony in this
case.  Elementary logic such as described, say, by Bertrand Russell (who
had far fewer axes to grind than Celko and Johnston), in which the empty
set has zero members, happens to agree with Brian S's common sense, so
everybody except of course for the technocratic apologists and
opportunists ought to be content that SQL once again paints itself into
a corner.

OTOH, it may be that elementary logic is obscure to many  and will
always be so.  I remember a first-year logic course where fully three
quarters of 300 exam candidates walked out of the hall in the first five
minutes.  I know they weren't finished.  (Not sure if it had anything to
do with the professor who was often hospitalized or on crutches because
of his habit of not looking when he crossed the road.)  However Celko is
devious by suggesting that clear explanation lies in the further
complications he touts.  He is actually promoting unnecessary and faux
elitism.  Also, if it is indeed the case that the average programmer who
has grown up at the knees of mostly physical concepts is incapable of
understanding Russell, that can only be an argument for avoiding the
arbitrariness of SQL.

There are a few tricks in FOL that can trip anybody up, but far fewer
than the rather amateur legalese the SQL standard contains.
Gene Wirchenko - 01 Jul 2008 02:27 GMT
>>>> Well, the OP wanted examples of SQL anomalies, and you've just
>>>> confirmed a
[quoted text clipped - 15 lines]
>>>
>>> Brian gets it exactly right here.

>> Well, except that SQL adds the additional anomaly that
>> count(*) != sum(1)
>>
>> ie. SQL can count zero rows with count but not with sum.

>I was hoping you two would pipe up about this because as usual, it is
>completely unclear whether CELKO is trying to answer the OP's question
>or whether he is trying to justify SQL's myriad behaviours.  (Not
>claiming this is deliberate, though I do have my suspicions!)

    Fabian Pascal would say that he is doing a Celko.  Now, I have a
sample of my own.  Pulling in transfinite numbers to justify SQL's
sum() nonsense is nonsense itself.

>Not that I know much about SQL, having given up on it many years ago.

>On the matter of 'common sense', I've learned not to trust it, at least
>as a first principle, as it often turns out to be a cover phrase meaning
[quoted text clipped - 7 lines]
>opportunists ought to be content that SQL once again paints itself into
>a corner.

     It is pretty simple, but gets sold as being difficult.

>OTOH, it may be that elementary logic is obscure to many  and will
>always be so.  I remember a first-year logic course where fully three
>quarters of 300 exam candidates walked out of the hall in the first five
>minutes.  I know they weren't finished.  (Not sure if it had anything to
>do with the professor who was often hospitalized or on crutches because
>of his habit of not looking when he crossed the road.)  However Celko is

    It may have a few surprises, but it is not really that bad.

>devious by suggesting that clear explanation lies in the further
>complications he touts.  He is actually promoting unnecessary and faux
>elitism.  Also, if it is indeed the case that the average programmer who

    Quite.

>has grown up at the knees of mostly physical concepts is incapable of
>understanding Russell, that can only be an argument for avoiding the
>arbitrariness of SQL.

    Logic may have a few tricks and traps to it, but it sure beats
the alternatives.

>There are a few tricks in FOL that can trip anybody up, but far fewer
>than the rather amateur legalese the SQL standard contains.

     Quite.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
paul c - 01 Jul 2008 02:49 GMT
...
>      Fabian Pascal would say that he is doing a Celko.  ...

That's might be more polite than calling somebody a professional obfuscator.
Bob Badour - 01 Jul 2008 03:48 GMT
>>>> Well, the OP wanted examples of SQL anomalies, and you've just
>>>> confirmed a
[quoted text clipped - 57 lines]
> There are a few tricks in FOL that can trip anybody up, but far fewer
> than the rather amateur legalese the SQL standard contains.

The rules for aggregation don't even have all that much to do with FOL.

I remember the time Joe tried to argue that an aggregate over zero
elements is undefined and cited Graham, Knuth et al. In fact, he cited
the page before the one that stated point blank that an aggregate over
zero elements is defined as the identity element for the base operation.

He's an idiot.
Marshall - 01 Jul 2008 07:36 GMT
> The rules for aggregation don't even have all that much to do with FOL.

Ah, I see Bob has preceded me with this point.

> I remember the time Joe tried to argue that an aggregate over zero
> elements is undefined and cited Graham, Knuth et al. In fact, he cited
> the page before the one that stated point blank that an aggregate over
> zero elements is defined as the identity element for the base operation.

D'oh! :-)

Marshall
Marshall - 01 Jul 2008 07:34 GMT
> >> [I meant to say this in my other post, but]
>
[quoted text clipped - 11 lines]
> or whether he is trying to justify SQL's myriad behaviours.  (Not
> claiming this is deliberate, though I do have my suspicions!)

'fraid I can't help with that! :-)

> On the matter of 'common sense', I've learned not to trust it, at least
> as a first principle, as it often turns out to be a cover phrase meaning
[quoted text clipped - 7 lines]
> opportunists ought to be content that SQL once again paints itself into
> a corner.

Yeah, I considered quibbling with the phrase "common sense" but
decided against. Brian didn't say that common sense was what
*caused* that to be the answer; he just described that answer
as being (consistent with) common sense. Which I'd call exactly
right.

(I figure I give Brian enough grief when he says stuff I disagree
with, I ought to be able to give credit when it's due.)

> [...]
> There are a few tricks in FOL that can trip anybody up, but far fewer
> than the rather amateur legalese the SQL standard contains.

Agreed.

Although to take it one further, one doesn't even need to appeal
to FOL to justify; one can appeal to basic arithmetic.

Or one can look to the functional programming world, and
see what they do with folds over empty lists.

Or one can even appeal to an imperative programmer's inherently
procedural bias, and ask him how he'd most simply program it.
He'd say:

count = 0;
for each item
 count = count + 1
end for

And what does that evaluate to if there are no items?

Marshall
-CELKO- - 01 Jul 2008 17:26 GMT
>> Or one can even appeal to an imperative programmer's inherently procedural bias, and ask him how he'd most simply program it.
He'd say:

count = 0;
for each item
  count = count + 1
end for

And what does that evaluate to if there are no items? <<

Notice how you created zero from nothing?

Things can get really bad.  Your choices for defining a summation
indexed over an empty are to define it as zero to get rid of it or to
define it as undefined to get rid of it.  The important thing is to
get rid of it.

Another convention in traditional Sigma notation is that when the
initial value of an index is greater than the final value, the
summation is zero.  That is not a set-oriented approach to
summations.  Again, the idea that ordering changes the results is
strictly a convention.

What do you want to do about 0^0?   How would you define 0/0 without a
convention?  Instead of a FOR EACH loop, we might define summation
this way in pseudo-SQL.

CREATE FUNCTION Sum(my_set)
RETURNS <numeric data type>
AS
BEGIN
DECLARE total <numeric data type>;
SET total = MIN(my_set); -- exists in the data!
REPEAT
SET my_set = my_set – {MIN(my_set)}; -- remove element from countable
set
SET total =  total + MIN (my_set); -- add element that exists in data
to total
UNTIL my_set = {}; -- use all the elements
RETURN total;
END;

This handles a set with one element and we don't have to create a zero
out of thin air.  The loop removes the minimal element (could be max,
could be random) from the set and adds it to the running total until
the set is empty.  If the set starts empty, we get a NULL.

Now try these statements:

CREATE TABLE Empty – no rows possible
(i INTEGER NOT NULL
 CHECK (1=0));

SELECT SUM(i) FROM Empty;  -- returns  null
SELECT SUM(i) FROM Empty GROUP BY i; -- returns an empty table
SELECT SUM(i), COUNT(i), COUNT(*) FROM Empty GROUP BY i; --empty
table, 3 columns

So why is GROUP BY not returning a NULL or a zero?  That will give you
a headache!  A table always has at least one grouping, so there is
always at least one row.  Think of {{}}, a set whose element is the
empty set.  Let's keep going with another table that only has a NULL:

CREATE TABLE JustNulls
(i INTEGER CHECK (i IS NULL));

INSERT INTO JustNull VALUES (NULL);

SELECT SUM(i) FROM JustNull;  -- returns null
SELECT SUM(i) FROM JustNull GROUP BY i; -- returns null
SELECT SUM(i), COUNT(i), COUNT(*) FROM JustNull GROUP BY i; -- null,
0, 1
Gene Wirchenko - 01 Jul 2008 18:09 GMT
>>> Or one can even appeal to an imperative programmer's inherently procedural bias, and ask him how he'd most simply program it.

[snipped multiple screens of lines]

    Wow!  Mr. Celko is really on a roll of blather.

Sincerely,

Gene Wirchenko

Computerese Irregular Verb Conjugation:
    I have preferences.
    You have biases.
    He/She has prejudices.
Marshall - 01 Jul 2008 19:38 GMT
> >> Or one can even appeal to an imperative programmer's inherently procedural bias, and ask him how he'd most simply program it.
>
[quoted text clipped - 13 lines]
> define it as undefined to get rid of it.  The important thing is to
> get rid of it.

Not at all. I'm not "getting rid" of anything. The goal is not
to brush things under the rug; the goal is to preserve
the algebraic identities that are so useful in so many ways.
To do that, it is necessary to support aggregating binary
operations over collections that have zero or one members,
even though the binary operation requires *two* arguments.

And it's not like this is in any way *hard* or anything. I mean,
how old was I when I learned what an identity element was?
I'm thinking it was junior high.

> Another convention in traditional Sigma notation is that when the
> initial value of an index is greater than the final value, the
> summation is zero.  That is not a set-oriented approach to
> summations.  Again, the idea that ordering changes the results is
> strictly a convention.

It's not a convention, and it's not order-dependent. Indexes
are all greater than or equal to the initial value, and less than
or equal to the final value. If the final value is smaller than
the initial value, then there are no qualifying indexes.

> What do you want to do about 0^0?

0^0 = 1.

http://en.wikipedia.org/wiki/0%5E0#Zero_to_the_zero_power

> How would you define 0/0 without a convention?

One *doesn't* define 0/0. Not every function is total.

> Instead of a FOR EACH loop, we might define summation
> this way in pseudo-SQL.
[quoted text clipped - 18 lines]
> could be random) from the set and adds it to the running total until
> the set is empty.  If the set starts empty, we get a NULL.

That sucks. That is *terrible.* It goes to extra trouble to define
a semantics that's obviously worse than the simpler thing.

> Now try these statements:
>
[quoted text clipped - 21 lines]
> SELECT SUM(i), COUNT(i), COUNT(*) FROM JustNull GROUP BY i; -- null,
> 0, 1

All of these are simpler without NULLs.

Hey, NULLs were an interesting experiment. I don't think, back
in the day, we could reasonably expect people to know what
a bad idea it was. And so it's in the standard, and backward
compatibility is important, so it stays in the standard. But
don't think that means they were a good idea to begin with.
They weren't. Sorry.

Marshall
-CELKO- - 01 Jul 2008 21:33 GMT
>> What do you want to do about 0^0?
0^0 = 1.

http://en.wikipedia.org/wiki/0%5E0#Zero_to_the_zero_power <<

Let me cut & paste what is at that link:

The evaluation of 0^0 presents a problem, because different
mathematical reasoning leads to different results. The best choice for
its value — and indeed, whether or not to consider 0^0 indeterminate
(i.e., undefined) — depends on the context. According to Benson
(1999), "The choice whether to define 0^0 is based on convenience, not
on correctness."[2]

That is not very supportive of your position.
Marshall - 01 Jul 2008 23:52 GMT
> >> What do you want to do about 0^0?
>
[quoted text clipped - 12 lines]
>
> That is not very supportive of your position.

It also lists nine bullet points for why it should be treated as 1,
and
two bullet points for why it should be treated as undefined. And
it explicitly says:

"When 0^0 is regarded as an empty product of zeros, its value is 1."

which is the most directly relevant case when discussing aggregation.

Or you could read the section on Exponentiation in Abstraction Algebra

http://en.wikipedia.org/wiki/0%5E0#Exponentiation_in_abstract_algebra

and note that it also is consistent with 0^0 = 1.

Or you could just realize that exponentiation is not used as
an aggregate operator, and the whole issue is just a red herring.

Marshall
Keith H Duggar - 08 Jul 2008 03:50 GMT
CELKO wrote :
> Marshall wrote :
> > False! It's not from nothing, and it's not simply a convention.
> > It's the identity of the operator being aggregated. <<
>
> Yes, zero is the additive identity.  But this is a convention used to
> get rid of the empty set problem and preserve easy computations.

Yes, sum[{}] = 1 eliminates problems and makes computation easier.
It's a shame that SQL offers neither of those benefits.

KHD
goanna - 08 Jul 2008 12:41 GMT
> CELKO wrote :
> > Marshall wrote :
[quoted text clipped - 3 lines]
> > Yes, zero is the additive identity.  But this is a convention used to
> > get rid of the empty set problem and preserve easy computations.

> Yes, sum[{}] = 1 eliminates problems and makes computation easier.
> It's a shame that SQL offers neither of those benefits.

I guess you mean sum[{}] = 0
Keith H Duggar - 10 Jul 2008 03:21 GMT
> > CELKO wrote :
> > > Marshall wrote :
[quoted text clipped - 7 lines]
>
> I guess you mean sum[{}] = 0

Yes. Thank you.
Ed Prochak - 02 Jul 2008 14:15 GMT
> > Well, the OP wanted examples of SQL anomalies, and you've just confirmed a
> > big one.
[quoted text clipped - 11 lines]
>
> Marshall

I'm not so sure about the AT&T bill. Consider the question may be
badly phrased:
"How much were we billed by AT&T this month?"
Is the amount being sought the amount on the bill received this month?
or the amount for this month's services?
If the question is asked July 1 and we haven't got the bill in the
mail yet, the answer must be "I don't know."
And who is the "we"? Are there multiple entities involved? (multiple
businesses owned by one parent company with separate phone accounts/
bills? or maybe a home business and personal home phone account both
shown on one bill?)

So even in real life NULL (aka "I don't know") happens to be the
correct answer to some questions. It is not as clean cut as you and
Brian would like.
 Ed
Bob Badour - 02 Jul 2008 14:26 GMT
>>>Well, the OP wanted examples of SQL anomalies, and you've just confirmed a
>>>big one.
[quoted text clipped - 28 lines]
> Brian would like.
>   Ed

You are suggesting the computer should have some way to evaluate
external predicates. The computer cannot. Your argument is a red herring.

If one expects a bill and the total is 0, that's just as useful to alert
one that the bill has not yet arrived as a NULL would be.
-CELKO- - 02 Jul 2008 16:00 GMT
>> If one expects a bill and the total is $0.00, that's just as useful to alert one that the bill has not yet arrived as a NULL would be. <<

No it isn't.  I pay my bills ahead as much as possible -- an old habit
from years of consulting work and irregular income. A zero bill simply
means that my advance payments have been used up and my bill is paid
this month.  A NULL means the bill has not arrived yet, so I had
better be ready to take action.  Big differences in how I account for
my expenses.
paul c - 02 Jul 2008 16:25 GMT
>>> If one expects a bill and the total is $0.00, that's just as useful to alert one that the bill has not yet arrived as a NULL would be. <<
>
[quoted text clipped - 4 lines]
> better be ready to take action.  Big differences in how I account for
> my expenses.

Once again you are introducing confusion, in this case muddying the
waters with unstated design questions, confusing a system that is
designed to anticipate suppliers and bills with one that isn't.   The
fact that such misconnections is tolerated by so many is just a
reflection of the rampant illiteracy in the western world today.  Those
people who can't position an English sentence relative to the task at
hand are no different from the janitor I knew who wanted the computer to
pick tommorow's track winners for him, they are basically incompetent
for the task.
-CELKO- - 02 Jul 2008 16:44 GMT
>> Once again you are introducing confusion, in this case muddying the waters with unstated design questions, confusing a system that is designed to anticipate suppliers and bills with one that isn't. <<

Do you really find that a credit balance is not a common business
practice? What I am pointing out is that a zero is a value on a scale
and a NULL or other miss