Database Forum / DB2 Topics / April 2006
SUM(double_value) is not deterministic?
|
|
Thread rating:  |
Artur - 25 Apr 2006 22:26 GMT Hi,
I'm using predefined SUM(() function on DOUBLE column. Surpisingly the sum can give different results depending on the order of values in my table.
Is that correct?
I can explain it, as adding two double values can entail rouding the result, so the order of adding influence the final result.
Actually I cannot change my select statement to cast double column to decimal, or to change the type in the database. I'm looking for a new SUM(double) aggregate function, which internaly will be adding values using decimal, f.e. decimal(31,15) and which finally return casted double value.
Here is a code to replicate the issue: --------------------------------------------------------------- create table table1 (value double) @ create procedure load() begin declare c int default 0; while (c < 100000) do
set c = c + 1; insert into table1 values ( cast(rand() as double) );
end while; end @ call load() @ create table table2 like table1 @ insert into table2 select * from table1 order by value @ select sum(value) from table1 @ select sum(value) from table2 @ select double(sum(cast(value as decimal(31,15)))) from table1 @ results: +4,99359445173494E+004 +4,99359445173499E+004
The correct one in my case is: +4,99359445173498E+004
Regards, Artur Wronski
Serge Rielau - 26 Apr 2006 01:20 GMT > Hi, > [quoted text clipped - 47 lines] > The correct one in my case is: > +4,99359445173498E+004 I'm no expert in the matter, but I can't say that I'm terribly surprised.
Cheers Serge
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
Tonkuma - 26 Apr 2006 02:28 GMT First, It is well known that the result of sum of floating point numbers would be different by sequence of summation of numbers, in mathematical area. To get more accurate result, usually summed from small numbers to large numbers.
That might be the reason of the different results of SUM on table1 and table2.
But, I don't know this means SUM is not deterministic (It might be depending on DB2 specification(definition of deterministic)).
Second, You tried select double(sum(cast(value as decimal(31,15)))) from table1 I thought that an error(drop some lowest digits) may be introduced at the time of converting from float to decimal. So, this way may not usefull to get correct result.
Last, my question is how do you get correct one(+4,99359445173498E+004)?
Artur - 26 Apr 2006 08:28 GMT The problem is that having two different sessions, one from CLP, one from CLI application I get different results of the sum running the same query on the same static table.
Let's say that I have only 3 values, and in my case let's assume the precision is only 3 digits: 1.14 2.23 7.99
I can do the sum in two ways, that give different final result:
1.14 + 2.23 -> 3.37 + 7.99 -> 11.4 2.23 + 7.99 -> 10.2 + 1.14 ->11.3
And in my case the correct sum is: 1.14 + 2.23 -> 3.37 + 7.99 -> 11.36 -> 11.4 2.23 + 7.99 -> 10.22 + 1.14 -> 11.36 -> 11.4
Correct in my definition means not order depended. I know that in general it can be difficult to achieve, but the idea is to add values internally using different type (which can hold the intermediate result without truncation), or even with predefined, each time the same order (fe. not to start adding the highest with lowest value).
I wish to have a way to writh my own sum(double) aggregate function to replace the standard one (my customer is having badly coded report which stores data in double field, and have no access to the source code).
-- Artur Wronski
Tonkuma - 27 Apr 2006 02:38 GMT > I can do the sum in two ways, that give different final result: > > 1.14 + 2.23 -> 3.37 + 7.99 -> 11.4 > 2.23 + 7.99 -> 10.2 + 1.14 ->11.3 How did you calculate these expressions? Especially my question is that why interim result(10.2) in second expression has only one decimal digit .
Here is an example of calculation of second expression by SQL. I got correct interim result and final sum. SELECT interim , interim + 1.14 fimnal FROM (VALUES 2.23 + 7.99 ) Q (interim); --------------------------------------------------------------------
INTERIM FIMNAL ------- ------- 10.22 11.36
1 record(s) selected.
Tonkuma - 27 Apr 2006 03:32 GMT I forgot you wrote "precision is only 3 digits". But, how do you know scale of interim result without knowing data values boforehand, if you want to make scale of interim result also 3? If calculated by SQL and you didn't specify precision and scale of interim result, precisions and scales will be followings. 1.14 + 2.23 -> 3.37 + 7.99 [ -> 11.36] -> 11.3 DEC(3,2) + DEC(3,2) -> DEC(4,2) + DEC(3,2) [ -> DEC(6,2) ] -> DEC(3,1) 2.23 + 7.99 -> 10.22 + 1.14 [ -> 11.36] ->11.3 DEC(3,2) + DEC(3,2) -> DEC(4,2) + DEC(3,2) [ -> DEC(6,2) ] -> DEC(3,1) To get correct answer with 3 digits, it would be better to use ROUND for final result.
SELECT interim , DEC(ROUND( (interim + DEC(7.99, 3 , 2)), 1), 3, 1) fimnal FROM (VALUES DEC(1.14, 3,2) + DEC(2.23, 3 , 2) ) Q (interim); --------------------------------------------------------------------
INTERIM FIMNAL ------- ------ 3.37 11.4
1 record(s) selected.
------------------------- Commands Entered ------------------------- SELECT interim , DEC(ROUND( (interim + DEC(1.14, 3 , 2)), 1), 3, 1) fimnal FROM (VALUES DEC(2.23, 3,2) + DEC(7.99, 3 , 2) ) Q (interim); --------------------------------------------------------------------
INTERIM FIMNAL ------- ------ 10.22 11.4
1 record(s) selected.
Mehmet Baserdem - 27 Apr 2006 04:17 GMT Artur,
When you add the smallest double 0.000000.......000001 to all your double numbers and then sum up and round, it is giving the right total.
I tried this
select round(sum(myfield+0.0000000000000000000001), 2) from mytable
You can replace 2 with any number you want.
regards,
Mehmet
Knut Stolze - 28 Apr 2006 10:46 GMT > Hi, > [quoted text clipped - 3 lines] > > Is that correct? There is a very nicely written summary on floating point numbers here:
http://docs.sun.com/source/806-3568/ncg_goldberg.html
 Signature Knut Stolze DB2 Information Integration Development IBM Germany
Artur - 28 Apr 2006 23:30 GMT Thank you guys for you prompt response.
I'm convinced to analyze value ranges in the database and suggest the database owner to change double to some more precise type.
-- Artur Wronski
|
|
|