Database Forum / DB2 Topics / July 2006
Variables in SP do not compare as equal when both are NULL
|
|
Thread rating:  |
Mark A - 10 Jul 2006 06:33 GMT DB2 8.2 for Linux, FP 10 (also performs the same on DB2 8.2 for Windoes, FP 11).
Using the SAMPLE database, tables EMP and EMLOYEE.
In the followng stored procedure, 2 NULL columns (COMM) are selected into 2 different SP variables and compared for equal. They are both NULL, but do not compare as equal. When the Not NULL columns (SALARY) are compared, they do compare as equal.
Is there a reason for this?
Statement terminator is @.
-- SET THE COMMISION TO NULL ON TWO SAMPLE TABLES FOR EMPNO '000010'
UPDATE EMP SET COMM = NULL WHERE EMPNO = '000010'@ UPDATE EMPLOYEE SET COMM = NULL WHERE EMPNO = '000010'@
-------------------------------------------------- DROP PROCEDURE TEST_SP@
CREATE PROCEDURE TEST_SP ( OUT EMP_COMM DECIMAL(9,2), OUT EMP_SALARY DECIMAL(9,2), OUT EMPLOYEE_COMM DECIMAL(9,2), OUT EMPLOYEE_SALARY DECIMAL(9,2), OUT v_ERRMSG_1 VARCHAR(500), OUT v_ERRMSG_2 VARCHAR(500) ) SPECIFIC TEST_SP RESULT SETS 0 MODIFIES SQL DATA LANGUAGE SQL BEGIN
--------------------------------------------- -- Declare variables ---------------------------------------------
DECLARE not_found CONDITION FOR SQLSTATE '02000'; DECLARE at_end SMALLINT DEFAULT 0;
DECLARE A_EMPNO CHAR(6); DECLARE A_EMP_COMM DECIMAL(9,2); DECLARE A_EMP_SALARY DECIMAL(9,2);
DECLARE B_EMPNO CHAR(6); DECLARE B_EMPLOYEE_COMM DECIMAL(9,2); DECLARE B_EMPLOYEE_SALARY DECIMAL(9,2);
--------------------------------------------- -- Declare exit handlers ---------------------------------------------
DECLARE EXIT HANDLER FOR SQLEXCEPTION GET DIAGNOSTICS EXCEPTION 1 v_ERRMSG_1 = MESSAGE_TEXT;
DECLARE CONTINUE HANDLER FOR not_found SET at_end = 1;
DECLARE EXIT HANDLER FOR SQLWARNING GET DIAGNOSTICS EXCEPTION 1 v_ERRMSG_1 = MESSAGE_TEXT;
--------------------------------------------------------------- -- Begin processing logic ---------------------------------------------------------------
SELECT EMPNO, COMM, SALARY INTO A_EMPNO, A_EMP_COMM, A_EMP_SALARY FROM DB2INST1.EMP WHERE EMPNO = '000010';
SELECT EMPNO, COMM, SALARY INTO B_EMPNO, B_EMPLOYEE_COMM, B_EMPLOYEE_SALARY FROM DB2INST1.EMPLOYEE WHERE EMPNO = A_EMPNO;
IF A_EMP_COMM = B_EMPLOYEE_COMM THEN SET V_ERRMSG_1 = 'NULLS COMPARE AS EQUAL'; ELSE SET V_ERRMSG_1 = 'NULLS DO NOT COMPARE AS EQUAL'; END IF;
IF A_EMP_SALARY = B_EMPLOYEE_SALARY THEN SET V_ERRMSG_2 = 'NON-NULLS COMPARE AS EQUAL'; ELSE SET V_ERRMSG_2 = 'NON-NULLS DO NOT COMPARE AS EQUAL'; END IF;
SET EMP_COMM = A_EMP_COMM; SET EMP_SALARY = A_EMP_SALARY;
SET EMPLOYEE_COMM = B_EMPLOYEE_COMM; SET EMPLOYEE_SALARY = B_EMPLOYEE_SALARY;
END@
CALL TEST_SP (?,?,?,?,?,?)@
Value of output parameters -------------------------- Parameter Name : EMP_COMM Parameter Value : -
Parameter Name : EMP_SALARY Parameter Value : 52750.00
Parameter Name : EMPLOYEE_COMM Parameter Value : -
Parameter Name : EMPLOYEE_SALARY Parameter Value : 52750.00
Parameter Name : V_ERRMSG_1 Parameter Value : NULLS DO NOT COMPARE AS EQUAL
Parameter Name : V_ERRMSG_2 Parameter Value : NON-NULLS COMPARE AS EQUAL
Return Status = 0
mike - 10 Jul 2006 06:42 GMT > In the followng stored procedure, 2 NULL columns (COMM) are selected into 2 > different SP variables and compared for equal. They are both NULL, but do > not compare as equal. When the Not NULL columns (SALARY) are compared, they > do compare as equal. > > Is there a reason for this? Yep, that's the way it's supposed to be.
db2 "values (case when (nullif(0,0) = nullif(0,0)) then 1 else 0 end)"
1 ----------- 0
1 record(s) selected.
db2 "values (case when (nullif(0,0) is null AND nullif(0,0) IS NULL) then>
1 ----------- 1
1 record(s) selected.
Mark A - 10 Jul 2006 06:48 GMT > Yep, that's the way it's supposed to be. > [quoted text clipped - 14 lines] > > 1 record(s) selected. Yes, I figured out that is how it works. Is there is reason for this?
Jeroen van den Broek - 10 Jul 2006 07:47 GMT >> Yep, that's the way it's supposed to be. >> [quoted text clipped - 16 lines] > > Yes, I figured out that is how it works. Is there is reason for this? Yes, there is. It is in the definition of NULL's. A NULL means the specific value is unknown, NULL in itself isn't a value that can be matched against any other value.
 Signature Jeroen
mike - 10 Jul 2006 07:48 GMT > Yes, I figured out that is how it works. Is there is reason for this? It is because in SQL, NULL is not a value, but the absence of one. The equality predicate compares two values, not absence of them, and if either of the operands to the equality predicate are NULL then the result is unknown.
Mark A - 10 Jul 2006 08:09 GMT > It is because in SQL, NULL is not a value, but the absence of one. > The equality predicate compares two values, not absence of them, > and if either of the operands to the equality predicate are NULL > then the result is unknown. That sounds like a circular argument to me (as does the other one posted by Jeroene).
You claim that null is not a value, but the phrase "null value" turns up frequently, including in DB2 documentation. And there are exceptions, such as when using the DISTINCT predicate, where "null values are considered equal. So (COL1 is NOT DISTINCT from COL2) will be true if both columns contain an equal non-null value and also when both columns are the null value." [quote from iSeries DB2 Infocenter].
Obviously, someone thought that relational theory dictated that two nulls are not equal, but I don't quite understand under what conditions that it would be useful. It just seems we have to do a lot of extra coding to get around this "rule".
mike - 10 Jul 2006 08:23 GMT > ... It just seems we have to do a lot of extra coding to get > around this "rule". Consider using COALESCE with predicates that involve nullable columns to avoid explicitly coding the extra null tests.
Brian Tkatch - 10 Jul 2006 15:22 GMT > > It is because in SQL, NULL is not a value, but the absence of one. > > The equality predicate compares two values, not absence of them, [quoted text clipped - 15 lines] > would be useful. It just seems we have to do a lot of extra coding to get > around this "rule". It is a misunderstanding of what NULL is. NULL is on a different "level" than other values, as it talks about the value. All values are either known or unknown. That is, if it is known, the actual value can be returned, if it is unknown, only a big question-mark can be returned.
The second thing is, that a WHERE clause tests for a boolean value, and so, a truth-table can be used to discern what it means. For example, the AND truth-table:
(T)RUE/(F)ALSE
T+T=T T+F=F F+T=F F+F=F
If NULL is added to any equation, we cannot answer any questions, because since we don't know the value, its truth cannot be discerned. Being neither TRUE nor FALSE, we must say we don't know, which is database terminology is called NULL. Comparing NULL to NULL is comparing to completely unknown values. There is no way anyone could figure out how to compare them, so the result must be NULL as well.
Ultimately, to deal with NULLs, the special keyword IS is used.
As for why the documentation refers to it as a NULL value, it is because it is differentiation between types of value, not of the values themselves.
As for the INDEX, INDEXes are not part of the data, as much as they are part of making the database work. So, implementation of NULL values on an INDEX is up to the developers, and database implementations are not consistent.
If you find NULL being more trouble than it is worth, perhaps NULL is not appropriate for your scenario. Because if you use =, you are saying you know what the value is. If there is no value, it should be a zero-length string ('') not a NULL-value.
B.
Mark A - 10 Jul 2006 19:14 GMT "Brian Tkatch" <Maxwell_Smart@ThePentagon.com> wrote in message at conditions that it
> If NULL is added to any equation, we cannot answer any questions, > because since we don't know the value, its truth cannot be discerned. > Being neither TRUE nor FALSE, we must say we don't know, which is > database terminology is called NULL. Comparing NULL to NULL is > comparing to completely unknown values. There is no way anyone could > figure out how to compare them, so the result must be NULL as well. If COMM on table EMP is null, and COMM on table EMPLOYEE is null (both for the same EMPNO= '000010', which is the PK), if I say:
IF EMP.COMM = EMPLOYEE.COMM
then I "CAN" compare them and IMO, they should be equal even if both values are null. I understand that you disagree, but that is your opinion.
> Ultimately, to deal with NULLs, the special keyword IS is used. Not exactly. What I can do is:
IF COALESCE(EMP.COMM,0) = COALESCE(EMPLOYEE.COMM,0)
or
IF ((EMP.COMM = EMPLOYEE.COMM) OR (EMP.COMM IS NULL AND EMPLOYEE.COMM IS NULL))
It seems to me that those who actually have a need for IF EMP.COMM = EMPLOYEE.COMM to test false (assuming that both columns are null) are in the small minority (it would be hard for me to imagine what that would be used for) and that any special syntax for it test true (as Serge mentioned may be coming) should have been reserved for the rare exceptions and not the majority case where most people would expect it to test true.
> As for why the documentation refers to it as a NULL value, it is > because it is differentiation between types of value, not of the values > themselves. This sounds like gobblygook to me. If the column is null, then it contains null values. The value is null.
Brian Tkatch - 10 Jul 2006 20:07 GMT > "Brian Tkatch" <Maxwell_Smart@ThePentagon.com> wrote in message at > conditions that it [quoted text clipped - 13 lines] > then I "CAN" compare them and IMO, they should be equal even if both values > are null. I understand that you disagree, but that is your opinion. Imagine you are an investigator, and you want to find out if the perpetrator of crime A is also the perpetrator of Crime B. So, you look at the files, and find that the perpetrator in both cases in unknown. According to your definition of NULL (AKA unknown) you conclude that both perpetrators are indeed the same person.
You seem to be making the mistake of thinking that NULL is a value. NULL is not a value. It is a state of unknown.
> > Ultimately, to deal with NULLs, the special keyword IS is used. > > Not exactly. What I can do is: > > IF COALESCE(EMP.COMM,0) = COALESCE(EMPLOYEE.COMM,0) And what do you think COALESCE() does anyway?
It tests for IS NULL!
> or > > IF ((EMP.COMM = EMPLOYEE.COMM) OR (EMP.COMM IS NULL AND EMPLOYEE.COMM IS > NULL)) Correct. Each case of IS NULL returns either TRUE or FALSE.
> It seems to me that those who actually have a need for IF EMP.COMM = > EMPLOYEE.COMM to test false (assuming that both columns are null) are in the > small minority Noone wants it to test FALSE. They want it to test NULL.
> (it would be hard for me to imagine what that would be used for) Guess Willy Wanka would be dissapoiinted with you, eh? :)
Imagine i have a database that tracks all my lottery ticket purchases and winning numbers i can run reports of my accuracy. At the beginning of the week i purchase a ticket, and put in my chosen numbers, leaving the field for the winning numbers NULL, because they are unknown. After the winning number is chosen i UPDATE the field with the winning numbers to the now known entry.
When FTPing a file, many times the file size is unknown until after the stream is done. If a database tracked downloads, the size field would be NULL at first.
When creating a new part at a factory, before a number has been assigned to the part, the name would be NULL until the paperwork goes through.
And so on. The cases abound, and happen every day.
>and that any special syntax for it test true (as Serge mentioned may be > coming) should have been reserved for the rare exceptions and not the > majority case where most people would expect it to test true. I see your assertion that most people want it to test true, and i match it with the assertion that most people want it to test NULL. And, i raise the pot with the assertion that if someone wanted it to be true they should actually be using a zero or a zero-length string.
> > As for why the documentation refers to it as a NULL value, it is > > because it is differentiation between types of value, not of the values > > themselves. > > This sounds like gobblygook to me. If the column is null, then it contains > null values. The value is null. I'm sorry it sounds like that to you as i believe it to make a lot of sense. However, you are certainly wlecome to your own opinion, and can submit it to the ANSI comittee on SQL to force it on others. Either way, you may use a database anyway you please, as it is your application. Note though, that DB2 is a database implementation that adheres to SQL standards (as set by ANSI), and NULL is very much handled correctly.
B.
Mark A - 11 Jul 2006 00:35 GMT "Brian Tkatch" <Maxwell_Smart@ThePentagon.com> wrote in message >
> Imagine you are an investigator, and you want to find out if the > perpetrator of crime A is also the perpetrator of Crime B. So, you look [quoted text clipped - 4 lines] > You seem to be making the mistake of thinking that NULL is a value. > NULL is not a value. It is a state of unknown. So, if I run the following query to find out how many crimes each perpetrator has committed, this is what I get (for simplicity, assume there is only one perpetrator per criminal case):
SELECT PERPETRATOR, COUNT(*) AS CRIMES FROM CRIMINAL_CASE GROUP BY PERPETRATOR
PERPETRATOR CRIMES ----------------------- -----------
ANDERSON, JOHN 1 ANSI BOARD 348 JONES, BOB 3 SMITH, SAM 2 - 20
Now from the above report, it shows that for 20 crimes, no perpetrator has been identified. It does not mean that the same person committed all 20 unsolved crimes (only an imbecile would conclude that).
So the ANSI implementation of COL1 = COL2 (if both all NULL) is inconsistent with the ANSI specification of a GROUP BY query. The GROUP BY considers them to be the same, but the "equal" test considers them to be different.
> No one wants it to test FALSE. They want it to test NULL. Not so. If I have an "IF" statement in a procedure comparing the two variables, then I want to test for True/False, not for NULL.
> I'm sorry it sounds like that to you as i believe it to make a lot of > sense. However, you are certainly wlecome to your own opinion, and can [quoted text clipped - 5 lines] > > B. Yes, it is clear that DB2 follows the ANSI standard, but it is not clear that the ANSI standard is consistent (as noted above) or is the preferred interpretation, and there are several databases that apparently use different interpretations (unless you set them to be ANSI compliant). As noted by Serge there are valid arguments on both sides, and DB2 is looking at an alternative solution via SQL syntax.
wombat53 - 11 Jul 2006 03:31 GMT A NULL value is not equal to anything, period (including another NULL value), in the the three valued logic of IBM's implementation of SQL. Hence the predicate syntax "IS/IS NOT NULL", as opposed to " = NULL". Since its value is not known, one cannot say that it is equal (or not equal) to something else. It is unknown. It may be known at a later time, or it may be forever unknowable (i.e. inapplicable; which Dr. Codd later differentiated as two different types of NULL values....) .
There are some anomalies associated with IBM's implementation of NULL values over the years, and I can't get into them here. One example, hoever, is that in a unique index, if you have two NULL values, they are duplicates!!! So, in that case they sort the same (i.e. equal). However I think IBM implemented a workaround to this, which I cannot quote off the top of my head. I am sure you can find it in the doco. wombat53
> "Brian Tkatch" <Maxwell_Smart@ThePentagon.com> wrote in message > > > Imagine you are an investigator, and you want to find out if the [quoted text clipped - 52 lines] > noted by Serge there are valid arguments on both sides, and DB2 is looking > at an alternative solution via SQL syntax. Brian Tkatch - 11 Jul 2006 15:27 GMT > "Brian Tkatch" <Maxwell_Smart@ThePentagon.com> wrote in message > > > Imagine you are an investigator, and you want to find out if the [quoted text clipped - 30 lines] > with the ANSI specification of a GROUP BY query. The GROUP BY considers them > to be the same, but the "equal" test considers them to be different. No, GROUP BY does not consider them to be equal. GROUP BY splits a TABLE into separate groups for use with aggregate FUNCTIONs. This includes unknown values all put in one group.
Aggregate FUNCTIONs, however, only deal with known values, and as such ignore NULL. COUNT(*), however, deals with records, not fields, and as such even if an individual field IS NULL, the record itself is still counted.
> > No one wants it to test FALSE. They want it to test NULL. > > Not so. If I have an "IF" statement in a procedure comparing the two > variables, then I want to test for True/False, not for NULL. I want to ask you a question, because it is my contention that you haven't the slightest idea when to use NULL, and when to use a zero or a zero-length string instead.
Why are there NULL fields in you database? That is, why are some fields NULL, instead of using a zero-length string for character data or zero for numeric data?
> > I'm sorry it sounds like that to you as i believe it to make a lot of > > sense. However, you are certainly wlecome to your own opinion, and can [quoted text clipped - 8 lines] > Yes, it is clear that DB2 follows the ANSI standard, but it is not clear > that the ANSI standard is consistent (as noted above) As above is not a challenge to its consistency. Please provide a real problem with it before challenging its consistency.
>or is the preferred interpretation It *is* the preferred interpretation. That is why basically every database implementation out there uses that with NULL. Only database-esque programs that use SQL for it's concise syntax, but not for the defintion of data would compare NULLs otherwise. MySQL being a good example of that genre.
>, and there are several databases that apparently use > different interpretations (unless you set them to be ANSI compliant). I'm calling your bluff. I want the names of databases that do that. Perhaps they are out there, but i don't know of them (yet).
> As noted by Serge there are valid arguments on both sides, and DB2 is looking > at an alternative solution via SQL syntax. Basically, that would be adding another opeator that handles NULLs otherwise. You could add a FUNCTION yourself that does that.
B.
Mark A - 12 Jul 2006 02:47 GMT "Brian Tkatch" <Maxwell_Smart@ThePentagon.com> wrote in message
> No, GROUP BY does not consider them to be equal. GROUP BY splits a > TABLE into separate groups for use with aggregate FUNCTIONs. This [quoted text clipped - 4 lines] > such even if an individual field IS NULL, the record itself is still > counted. That is a gigantic cop-out.
The GROUP BY puts them together because it assumes them to be the same for the purposes of the query. That obviously does not mean that the perpetrator for all the cases with no perpetrator were committed by the same perpetrator.
> I want to ask you a question, because it is my contention that you > haven't the slightest idea when to use NULL, and when to use a zero or [quoted text clipped - 3 lines] > NULL, instead of using a zero-length string for character data or zero > for numeric data? You are way off-base buddy. I know when and when not to use NULL. The original question referred to variables in DB2 SQL/PL and when one is determining if two variables with the exact same "values" in the specified are equal. So the question does not directly relate to database design, it relates to programming variables. It just so happens that in SQL/PL, variables are SQL data types, but this is not necessarily true in other languages.
> As above is not a challenge to its consistency. Please provide a real > problem with it before challenging its consistency. [quoted text clipped - 12 lines] > I'm calling your bluff. I want the names of databases that do that. > Perhaps they are out there, but i don't know of them (yet). I don't claim to be an expert in SQL Server (I perssume that he means MS SQL Server, but nor sure), or Sybase, but my statement was based on the following quote. I don't know if it still applies to the most current releases of these products:
"...in Oracle, Null is neither equal to nor not equal to Null. SQL Server, by default, does not do it that way: in SQL Server and Sybase, Null is equal to Null. Neither Oracle's, Sybase nor SQL Server's SQL processing is wrong - they are just different. Both databases are in fact ANSI compliant databases but they still work differently. There are ambiguities, backward compatibility issues, and so on, to be overcome. For example, SQL Server supports the ANSI method of Null comparison, just not by default (it would break thousands of existing legacy applications built on that database)."
http://www.apress.com/betabooks/bb_content/26/kyte_ch01_BETA.pdf
Brian Tkatch - 12 Jul 2006 15:10 GMT > "Brian Tkatch" <Maxwell_Smart@ThePentagon.com> wrote in message > > No, GROUP BY does not consider them to be equal. GROUP BY splits a [quoted text clipped - 12 lines] > for all the cases with no perpetrator were committed by the same > perpetrator. And let's say it didn't consider them the same. Should it then return one record for each NULL value? If NULL is unknown, it can neither say that they are the same, or that they are different. So, it just doesn't do anything.
By default, the entire TABLE is one group. GROUP BY separates based on values. In a sense, it leave NULLs alone, but because all values get their own group, the NULLs are left alone, and together.
> > I want to ask you a question, because it is my contention that you > > haven't the slightest idea when to use NULL, and when to use a zero or [quoted text clipped - 5 lines] > > You are way off-base buddy. I know when and when not to use NULL. I challenge that assertion. Because your questions on using them for equality show a basic ignorance of its nature.
> The original question referred to variables in DB2 SQL/PL and when one is > determining if two variables with the exact same "values" in the specified > are equal. So the question does not directly relate to database design, it > relates to programming variables. It just so happens that in SQL/PL, > variables are SQL data types, but this is not necessarily true in other > languages. SQL/PL processes data. Data is defined by the database. Therefore, SQL/PL must itself be designed specifically to database requirements.
Database requirements have a specific nature of NULL. SQL/PL must follow through and support that definition of NULL.
So, if SQL/PL was to change the way of handling NULLs, it would first need to be changed in the database.
That is why i interpret your question on handling NULLs in SQL/PL as a question on the database itself.
> > As above is not a challenge to its consistency. Please provide a real > > problem with it before challenging its consistency. [quoted text clipped - 28 lines] > > http://www.apress.com/betabooks/bb_content/26/kyte_ch01_BETA.pdf OK, i just checked a Sybase implementation (which mostly SQL Server acts the same way, being the same base) and indeed it supports the equality operator with NULL.
So, that is one (two) major implementations that do it.
(I would just like to add, that i despise SQL Server (and Sybase). They break nearly every rule. (And when i am asked for help, i usually try things i know are not valid SQL to see if they work, because there's no telling what it'll actually do). If only because they allow SELECT statements without a FROM clause, it would be enough to make anyone question it calling itself a database. Unfortunately, most people do consider it a database, so i must concede the point.)
B.
--CELKO-- - 13 Jul 2006 14:08 GMT >> Yes, it is clear that DB2 follows the ANSI standard, but it is not clear that the ANSI standard is consistent (as noted above) or is the preferred interpretation, and there are several databases that apparently use different interpretations (unless you set them to be ANSI compliant).<< GROUPING is not the same as testing for equality. Grouping is done with groups and equality is for scalars. We debated this in the old ANSI X3H2 committee decades ago when it was still an issue.
If you want to see if some columns are all NULL, use "COALESCE (c1, c2, ,,cn) IS NULL" on your scalars.
Jeroen van den Broek - 10 Jul 2006 22:54 GMT > "Brian Tkatch" <Maxwell_Smart@ThePentagon.com> wrote in message at > conditions that it [quoted text clipped - 38 lines] > This sounds like gobblygook to me. If the column is null, then it contains > null values. The value is null. Mark, please read this blog (the entry for April 5th 2006). It is not specific for DB2 - in fact it uses SQL Server syntax -, but it pretty much explains the pitfalls when using NULL's (watch the examples for AVG and SUM, for instance). When using NULL's, you must use 3-valued logic, not the ordinary TRUE/FALSE binary logic. Also have a look here: http://www.reviewnet.net/newsletter/articles/a3_feb02.htm which handles the same 'problem' area, from an Oracle perspective.
 Signature Jeroen
Serge Rielau - 10 Jul 2006 16:41 GMT >> It is because in SQL, NULL is not a value, but the absence of one. >> The equality predicate compares two values, not absence of them, [quoted text clipped - 3 lines] > That sounds like a circular argument to me (as does the other one posted by > Jeroene). To make a long story short: ANSI SQL There are ton's of papers on how NULL should or shouldn't behave.
> You claim that null is not a value, but the phrase "null value" turns up > frequently, including in DB2 documentation. And there are exceptions, such > as when using the DISTINCT predicate, where "null values are considered > equal. So (COL1 is NOT DISTINCT from COL2) will be true if both columns > contain an equal non-null value and also when both columns are the null > value." [quote from iSeries DB2 Infocenter]. Correct. It's so special they had to talk about it. :-) There is an alternative predicate in the works for the SQL Standard which will treat NULL as equal because there are valid arguments for both sides.
Cheers Serge
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
IOD Conference http://www.ibm.com/software/data/ondemandbusiness/conf2006/
|
|
|