Database Forum / DB2 Topics / January 2008
IBM To Acquire Solid Tech: Row Value Constructors
|
|
Thread rating:  |
Bernard Dhooghe - 29 Dec 2007 17:34 GMT So IBM will acquire Solid Tech.
Extract for the SQL Guide from SolidTech: (http://www.solidtech.com/developers/CarrierGrade/manuals/sqlguide/ html/SQLGuide.html#row.value.constructors):
"This section explains one of the less familiar types of expressions, the Row Value Constructor (RVC), and how it is used with relational operators, such as greater than, less than, etc.
A row value constructor is an ordered sequence of values delimited by parentheses, for example:
(1, 4, 9) ('Smith', 'Lisa') You can think of this as constructing a row based on a series of elements/values, just like a row of a table is composed of a series of fields. ... "
Never say never?
Bernard Dhooghe
Serge Rielau - 29 Dec 2007 22:25 GMT > Never say never? Bernard, I'm afraid if DB2 ever implemented row-comparisons there may be no more reason for you to post anymore ;-)
Cheers Serge
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
Bernard Dhooghe - 30 Dec 2007 15:17 GMT > > Never say never? > [quoted text clipped - 7 lines] > DB2 Solutions Development > IBM Toronto Lab So this would be really nice both for the newsgroup and the DB2 UDB users!
If row value constructor is a (relational) answer (as in SQL/92 and Solid), maybe it means there is indeed a question...
Bernard Dhooghe
--CELKO-- - 31 Dec 2007 15:12 GMT Let me do a clipping from SQL FOR SMARTIES:
09.02. Row Comparisons in SQL
Standard SQL generalized the theta operators so they would work on row expressions and not just on scalars. This is not a popular feature yet, but it is very handy for situations where a key is made from more than one column, and so forth. This makes SQL more orthogonal and it has an intuitive feel to it. Take three row constants:
A = (10, 20, 30, 40);
B = (10, NULL, 30, 40);
C = (10, NULL, 30, 100);
It seems reasonable to define a row comparison as valid only when the data types of each corresponding column in the rows are union- compatible. If not, the operation is an error and should report a warning. It also seems reasonable to define the results of the comparison to the AND-ed results of each corresponding column using the same operator. That is, (A = B) becomes:
((10, 20, 30, 40) = (10, NULL, 30, 40));
becomes: ((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 40))
becomes: (TRUE AND UNKNOWN AND TRUE AND TRUE);
becomes: (UNKNOWN);
This seems to be reasonable and conforms to the idea that a NULL is a missing value that we expect to resolve at a future date, so we cannot draw a conclusion about this comparison just yet. Now consider the comparison (A = C), which becomes:
((10, 20, 30, 40) = (10, NULL, 30, 100));
becomes: ((10 = 10) AND (20 = NULL) AND (30 = 30) AND (40 = 100));
becomes: (TRUE AND UNKNOWN AND TRUE AND FALSE);
becomes: (FALSE);
There is no way to pick a value for column 2 of row C such that the UNKNOWN result will change to TRUE because the fourth column is always FALSE. This leaves you with a situation that is not very intuitive. The first case can resolve to TRUE or FALSE, but the second case can only go to FALSE.
Standard SQL decided that the theta operators would work as shown in the table below. The expression RX <comp op> RY is shorthand for a row RX compared to a row RY; likewise, RXi means the i-th column in the row RX. The results are still TRUE, FALSE, or UNKNOWN, if there is no error in type matching. The rules favor solid tests for TRUE or FALSE, using UNKNOWN as a last resort.
The idea of these rules is that as you read the rows from left to right, the values in one row are always greater than or less than) those in the other row after some column. This is how it would work if you were alphabetizing words.
The rules are
1. RX = RY is TRUE if and only if RXi = RYi for all i.
2. RX <> RY is TRUE if and only if RXi <> RYi for some i.
3. RX < RY is TRUE if and only if RXi = RYi for all i < n and RXn < RYn for some n.
4. RX > RY is TRUE if and only if RXi = RYi for all i < n and RXn > RYn for some n.
5. RX <= RY is TRUE if and only if Rx = Ry or Rx < Ry.
6. RX >= RY is TRUE if and only if Rx = Ry or Rx > Ry.
7. RX = RY is FALSE if and only if RX <> RY is TRUE.
8. RX <> RY is FALSE if and only if RX = RY is TRUE.
9. RX < RY is FALSE if and only if RX >= RY is TRUE.
10. RX > RY is FALSE if and only if RX <= RY is TRUE.
11. RX <= RY is FALSE if and only if RX > RY is TRUE.
12. RX >= RY is FALSE if and only if RX < RY is TRUE.
13. RX <comp op> RY is UNKNOWN if and only if RX <comp op> RY is neither TRUE nor FALSE.
The negations are defined so that the NOT operator will still have its usual properties. Notice that a NULL in a row will give an UNKNOWN result in a comparison. Consider this expression:
(a, b, c) < (x, y, z)
which becomes
((a < x) OR ((a = x) AND (b < y)) OR ((a = x) AND (b = y) AND (c < z)))
The standard allows a single-row expression of any sort, including a single-row subquery, on either side of a comparison. Likewise, the BETWEEN predicate can use row expressions in any position in Standard SQL.
Serge Rielau - 31 Dec 2007 19:51 GMT > The standard allows a single-row expression of any sort, including a > single-row subquery, on either side of a comparison. Likewise, the > BETWEEN predicate can use row expressions in any position in Standard > SQL. Yes, we both know that. I don't think the semantics are disputed at all. Bernard and I merely have our traditional argument about priorities.
Supporting the semantics is the easy part, btw.. With the syntax come certain expectations in performance though (make the optimizer aware, exploit indices, ...)
Cheers Serge
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
Mark A - 31 Dec 2007 20:18 GMT Based on what little I read, Solid Tech is in-memory database, and they claim a forty time improvement in performance over DB2.
Can someone explain how they get that much improvement if the entire DB2 database where able to fit into memory and the bufferpool hit ratio is 100%?
Serge Rielau - 01 Jan 2008 14:26 GMT > Based on what little I read, Solid Tech is in-memory database, and they > claim a forty time improvement in performance over DB2. > > Can someone explain how they get that much improvement if the entire DB2 > database where able to fit into memory and the bufferpool hit ratio is 100%? Disclaimer: I know NOTHING about SOLID. The goal of a full grown DBMS is not to provide absolute performance. The goal is to provide scalability. So one reason why "big" DBMS are slower than "small" DBMS in simple(!) cases is length of codepath in runtime and the optimizer (e.g. only NL-JOIN support, only Unicode support, only ...). Secondly it is likely that an in memory database has no bufferpool at all (since it's in memory). So all the algorithms around buffer pool maintenance, page faults etc are missing. Less layers in the onion.
Deployments like at Euronext place an in memory database in front to speed up online processing and smooth out the peaks. ObjectGrid is a common choice here as far as IBM goes.
Cheers Serge
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
Mark A - 01 Jan 2008 19:20 GMT > Disclaimer: I know NOTHING about SOLID. > The goal of a full grown DBMS is not to provide absolute performance. [quoted text clipped - 12 lines] > Cheers > Serge I understand that the code path is shorter, but does that really account for 40 times improvement?
At my previous job we did Loadrunner testing of DB2 versus TimeTen on a DB2 database that was small enough to fit into memory (bufferpools). There was virtually no difference in performance.
I understand that putting the database on the client tier would help, but I am talking about a database that sits on its own database tier.
Serge Rielau - 01 Jan 2008 21:07 GMT > I understand that the code path is shorter, but does that really account for > 40 times improvement? Numbers... Maybe it's 40x in some cases, maybe it's assuming logging and I/O. There are cases where DB2 9.5 is 40x faster than DB2 9. I'm sure I can construct them :-)
> At my previous job we did Loadrunner testing of DB2 versus TimeTen on a DB2 > database that was small enough to fit into memory (bufferpools). There was > virtually no difference in performance. Given that TimesTen is now Oracle I'm glad to hear that ;-)
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
--CELKO-- - 01 Jan 2008 01:21 GMT I just posted that clipping for lurkers who might want to know.
But I had never thought about optimizations. The row constructor was just a shorthand in my mindset. Now I can imagine that if I have a key like (longitude, latitude) that is almost always used as a row constructor to get a location, then I might want a hash instead of a tree index on the pair. I need to play with this idea a bit ...
Thanks!
Serge Rielau - 01 Jan 2008 14:28 GMT > I just posted that clipping for lurkers who might want to know. > [quoted text clipped - 3 lines] > constructor to get a location, then I might want a hash instead of a > tree index on the pair. I need to play with this idea a bit ... The most popular optimization requirement is that a BETWEEN, <, > should exploit start-stop keys of a matching index..
Cheers Serge
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
Bernard Dhooghe - 02 Jan 2008 10:36 GMT When cursors are involved in a query, I believe a better split should exist between the set composition, the ordering and the cursor positioning, a superset of the SQL/92capability (meaning that the superset would also cover the SQL/92capability, as posted some time ago):
In the first SQL standard cursors where only fetch forward cursors.
Then in SQL-92 scrollable cursors where introduced but why should the cursor be at the beginning of the set when scrollable cursors are available? This is logic from non-scrollable cursors time, if scroll backwards is possible, it must bee possible to do it (without return- code 100) directly after open cursor.
So I see the general solution as:
declare cursor cu1 for select ... from ... [where] ... [order by]... (the where can contain a row-value clause, where and order by are optional as always, multiple tables can be involved)
open cursur cu1 [ position cursor where <row-value constructor clause> ] (where clause optional, an order by must be in the select if used and must match the order of the select)
fetch cu1 (fetch next but also previous even directly after open, if cursor positioning is used).
Example:
declare cursor cu for select * from custorder where (custno,dateorder) >= (100,'20070101') .... order by custno,daterorder open cursor cu [position cursor where (custno,dateorder) >= (100,'20070810)] fetch [next|previous] for cu1
A subset is the normal cursor behavior (SQL/92, at beginning of set):
declare cursor cu for select * from custorder where (custno,dateorder) >= (100,'20070101') order by custno,daterorder open cursor cu fetch [next|previous] from cu1
I do understand it is not easy to implement fully row-value constructor's (there is no substitute for orthogonality).
What I regret is that as row-value constructors entered SQL/92 the concept being there to answer real-world problems, it is still no part of DB2 UDB. And speaking with programmers, I do thing the need is still there. 2008 - 92 = 16 years already, so even after 16 years, it seems it still not a priority for DB2 UDB, see also: http://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=299231.
The other thing is that IBM is putting a lot of money in acquisitions, who pays this? Only the shareholder? Is think it is also funded by the customers who pay maintenance on DB2 UDB, so there must be return for them also.
Bernard Dhooghe
> > The standard allows a single-row expression of any sort, including a > > single-row subquery, on either side of a comparison. Likewise, the [quoted text clipped - 14 lines] > DB2 Solutions Development > IBM Toronto Lab --CELKO-- - 02 Jan 2008 15:56 GMT The history of cursors in Standard SQL is interesting. The first forward-only version was based on a punch card reader! It was the most universal file access method in the X3J programming languages, so it would be easier to embed SQL into host languages.
A bit later, we realized that we could base them on a sequential tape file model, since that had become the most universal file access method in the X3J programming languages. One decision was that the imaginary read-write head is before the first row (record) at the start and goes past the last row (record) for the EOF flag. Not all file systems follow those conventions -- some start with the first record in a cache and some set the EOF flag when they read the last record.
The other decision was when to send certain warnings. For example, if I do a GROUP BY and one of my groupings drops a NULL from an aggregate, there is a warning. Do I give that warning for the DECLARE, ALLOCATE, or OPEN on that CURSOR, or do it when I FETCH that row? The answer was that it is implementation defined.
MeBuggyYouJane - 03 Jan 2008 01:54 GMT > So IBM will acquire Solid Tech. > [quoted text clipped - 20 lines] > > Bernard Dhooghe There was a thread on c.d.t a month or so ago about the future of RDBMS as implemented by multi[core|processor] machines with solid state discs. I quoted Linus Torvalds on the implications for file systems, extrapolating to what this might mean to RDBMS implementations. Mr. Celko allowed as his next book would treat the subject.
May haps IBM has felt the breeze? And may haps rendering its XML/IMS fascination pointless? Not that I'm bitter. :)
--CELKO-- - 03 Jan 2008 02:54 GMT >> I quoted Linus Torvalds on the implications for file systems, extrapolating to what this might mean to RDBMS implementations. Mr. Celko allowed as his next book would treat the subject. << Shameless plug: THINKING IN SETS will be out 2008 January 18.
I did not see the Torvalds quote; can you give me a source? I would have loved to use it in the book.
A major problem is the lack of programming languages at any level for parallel processing. The only one that I can remember was OCCAM for imbedded processors, and some proposed ALGOL extensions (paraegin- paraend by Wirth (?) and cobegin-coend by Dijkstra).
MeBuggyYouJane - 04 Jan 2008 01:12 GMT >>> I quoted Linus Torvalds on the implications for file systems, extrapolating to what this might mean to RDBMS implementations. Mr. Celko allowed as his next book would treat the subject. << > > Shameless plug: THINKING IN SETS will be out 2008 January 18. > > I did not see the Torvalds quote; can you give me a source? I would > have loved to use it in the book. I've no idea if this the original, but it comes up on search: http://www.efytimes.com/efytimes/fullnewsp.asp?edid=21160
> A major problem is the lack of programming languages at any level for > parallel processing. The only one that I can remember was OCCAM for > imbedded processors, and some proposed ALGOL extensions (paraegin- > paraend by Wirth (?) and cobegin-coend by Dijkstra). Erlang, my dear man.
--CELKO-- - 04 Jan 2008 17:40 GMT >> Erlang, my dear man. << Larry O'Brien did a column about Erlang as "the language the cool kids are using" recently. Maybe I had better do some research. Thanks!
Chris Browne - 04 Jan 2008 17:52 GMT >>> Erlang, my dear man. << > > Larry O'Brien did a column about Erlang as "the language the cool kids > are using" recently. Maybe I had better do some research. Thanks! It starts with a more-deterministic variation on Prolog (e.g. - forbidding backtracking - you can only reference "bound" values, as opposed to allowing the system to backtrack to search for values to bind).
That "safe base" allows you to define functions that are certain not to have side-effects, as they operate purely on either: a) Externally bound data inputs, or b) Values bound within the function (which inherently can't conflict with anything else).
This then gives you the ability to create concurrent threads that you can be certain can operate together safely, because the functional code *can't* interfere with what's goin on in the other threads.
They then add in lots of library code for dealing with concurrent programming, helper functions to let you do what inter-thread communications and synchronization you DO want to do.
The code reads like Prolog, so that may be a bit of a change ;-).
 Signature "cbbrowne","@","cbbrowne.com" http://cbbrowne.com/info/x.html The English exam was a piece of cake---which was a bit of a surprise, actually, because I was expecting some questions on a sheet of paper.
--CELKO-- - 06 Jan 2008 01:47 GMT >> It starts with a more-deterministic variation on Prolog (e.g. - forbidding backtracking - you can only reference "bound" values, as opposed to allowing the system to backtrack to search for values to bind). <<
I vaguely remember Borland's Turbo PROLOG when I was teaching mumble- mumble years ago ..
>> That "safe base" allows you .. This then gives you the ability to create concurrent threads that you can be certain can operate together safely, because the functional code *can't* interfere with what's going on in the other threads.<< DAMN! I am over 60 years of age and I have to shift a paradigm again??!!
>> The code reads like Prolog, so that may be a bit of a change << More like a flashback :) My first Masters was in Math and liked logic and discrete stuff (I got over 30 credit hours in Stats, however) then looked at PROLOG just before I became "The SQL Guy" in the trade.
MeBuggyYouJane - 07 Jan 2008 02:39 GMT >>> It starts with a more-deterministic variation on Prolog (e.g. - forbidding backtracking - you can only reference "bound" values, as opposed to allowing the system to backtrack to search for values to > bind). << [quoted text clipped - 13 lines] > however) then looked at PROLOG just before I became "The SQL Guy" in > the trade. Joe Armstrong's book from Prag Programmers is the Hot One. I've read through it a couple of times, but not much more. It is said the first Erlang ran on a Prolog compiler. The part of Prolog/Erlang syntax that I just can't abide is the single assignment, and the verbosity it causes. Much like GWBasic. And it's not optional. OTOH, Amzi! has been doing a Prolog/RDBMS server for about a decade, may be longer.
Structurally, Erlang/RDBMS looks (from a 'logical' point of view, no pun intended), especially if combined with the multi solid state disk machines under discussion, to be the tsunami that washes away everything else. My reasoning: a 5NF database either in-memory or on SS disk is so much smaller in data footprint; ACID-ly bulletproof; and can do all the heavy lifting of a (business) application, requiring only some screen painting (generated code from the catalog, of course) for the user. Andromeda does the last part now on conventional machines, and that's a small operation.
We'll see. The next 2 or 3 years could well be the Revenge of Dr. Codd, slap Mr. Chamberlin upside the head (for those unfamiliar, he has gone apostate with XML).
Chris Browne - 07 Jan 2008 16:28 GMT > Andromeda does the last part now on conventional machines, and > that's a small operation. Would you be referring to the "Andromeda" declarative system? http://www.andromeda-project.org/
 Signature (format nil "~S@~S" "cbbrowne" "linuxdatabases.info") http://cbbrowne.com/info/unix.html "The only thing better than TV with the sound off is Radio with the sound off." -- Dave Moon
MeBuggyYouJane - 08 Jan 2008 01:51 GMT >> Andromeda does the last part now on conventional machines, and >> that's a small operation. > > Would you be referring to the "Andromeda" declarative system? > http://www.andromeda-project.org/ yes, yes I would. There have been others, mostly OS. Middlegen, JAG, and Firestorm.
MDA may make a comeback, too.
We live in interesting times.
MeBuggyYouJane - 08 Jan 2008 02:38 GMT > DAMN! I am over 60 years of age and I have to shift a paradigm > again??!! Ah, a kid. In the 80's I took seminars with W. Edwards Deming. He was then in his 80s. From what I recall, he 'worked' until he died in 1993 at 93. You've got a ways to go.
Mark A - 03 Jan 2008 02:59 GMT > There was a thread on c.d.t a month or so ago about the future of RDBMS as > implemented by multi[core|processor] machines with solid state discs. I [quoted text clipped - 4 lines] > May haps IBM has felt the breeze? And may haps rendering its XML/IMS > fascination pointless? Not that I'm bitter. :) IBM feels a lot of breezes these days.
Oracle buys Hyperion (Essbase) and IBM buys Cognos.
Oracle buys TimesTen and IBM buys Solid Tech.
I am sure there are others. And I am sure that IBM thought seriously about buying SAP after Oracle scooped up several ERP vendors.
--CELKO-- - 03 Jan 2008 05:00 GMT >> I am sure there are others. And I am sure that IBM thought seriously about buying SAP after Oracle scooped up several ERP vendors. << Remember when Computer Associates (now CA) used to buy everyone?
Jan M. Nelken - 03 Jan 2008 05:59 GMT >>> I am sure there are others. And I am sure that IBM thought seriously about buying SAP after Oracle scooped up several ERP vendors. << > > Remember when Computer Associates (now CA) used to buy everyone? Isn't it true that CA bought themselves and are called CA square?
Jan M. Nelken
--CELKO-- - 03 Jan 2008 18:38 GMT >> Isn't it true that CA bought themselves and are called CA square? << LOL! Then they will recursively fire half the staff :)
|
|
|