Database Forum / DB2 Topics / July 2006
How to group date periods together ?
|
|
Thread rating:  |
James Conrad StJohn Foreman - 29 Jun 2006 16:03 GMT [DB2 UDB LUW 8.2.4] I've got bookings per day in a table:
bkgdate bookings company 1/1/06 50 A 2/1/06 45 A 3/1/06 55 A 4/1/06 40 A 5/1/06 30 A 6/1/06 45 A 7/1/06 35 A ...
I want to segment this into blocks of days where there are at least 100 bookings in each block.
So the solution I want is along the lines of
date_segment startdate enddate (bookings) 1 1/1/06 3/1/06 150 2 4/1/06 6/1/06 115 ...
So far I've managed to do
SELECT bkgdate, company, sum(bookings) over (partition by company order by bkgdate) as cume_bookings FROM table1
which gives me
bkgdate cume_bookings 1/1/06 50 2/1/06 95 3/1/06 150 4/1/06 190 ...
I was thinking about joining this result set to itself, but that doesn't quite give me the results I want, and the SQL is getting more and more convoluted. Is there a more obvious way to do this that I'm missing?
Thanks
JCSJF
Tonkuma - 30 Jun 2006 09:13 GMT I got the result this way. But, I feel it too complex. Althogh, I tried to simlify or modify some part of this code. I failed to simplify any more. Are there anyone who could show more simple or easily understandable way?
WITH Cum_bookings AS ( SELECT bkgdate , bookings , SUM(bookings) OVER(ORDER BY bkgdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS bookings_cum , ROWNUMBER() OVER(ORDER BY bkgdate) AS rn FROM Bookings ) ,Ranges AS ( SELECT MIN(startdate) AS startdate , MIN(enddate) AS enddate , MIN(bookings_cum) AS bookings_cum , MIN(Lrn) AS Lrn , Hrn FROM (SELECT MIN(L.bkgdate) AS startdate , MIN(H.bkgdate) AS enddate , MIN( H.bookings_cum - L.bookings_cum + L.bookings ) AS bookings_cum , L.rn AS Lrn , MIN(H.rn) AS Hrn FROM Cum_bookings L , Cum_bookings H WHERE L.rn < H.rn AND ( H.bookings_cum - L.bookings_cum + L.bookings ) >= 100 GROUP BY L.rn ) L GROUP BY Hrn ) SELECT INT(ROWNUMBER() OVER(ORDER BY startdate)) AS date_segment , startdate , enddate , bookings_cum AS "(bookings)" FROM Ranges R WHERE EXISTS (SELECT * FROM Ranges E WHERE E.Lrn = R.Hrn + 1) OR EXISTS (SELECT * FROM Ranges E WHERE E.Hrn = R.Lrn - 1) ; --------------------------------------------------------------------
DATE_SEGMENT STARTDATE ENDDATE (bookings) ------------ ---------- ---------- ----------- 1 2006-01-01 2006-03-01 150 2 2006-04-01 2006-06-01 115
2 record(s) selected.
--CELKO-- - 30 Jun 2006 18:14 GMT >> I got the result this way. But, I feel it too complex. << I thought it was quite clever myself!
I was playing with the idea of setting up all possible partitions in a table, then seeing which one (if any) met the summation criteria. Something like this:
bkgdate part_grp part_nbr ======================= 2006-01-01 1 1 2006-02-01 1 1 2006-03-01 1 1 2006-04-01 1 2 2006-05-01 1 2 2006-06-01 1 2 2006-07-01 1 2
2006-01-01 2 1 2006-02-01 2 1 2006-03-01 2 1 2006-04-01 2 1 2006-05-01 2 2 2006-06-01 2 2 2006-07-01 2 3
Wow! That would get big really fast!
Tonkuma - 01 Jul 2006 13:19 GMT Previous my Query was wrong. Here is right(I hope) Query using Recirsive query.
WITH Nbr_bookings AS ( SELECT bkgdate , bookings , INT(ROWNUMBER() OVER(ORDER BY bkgdate)) AS rn FROM Bookings ) ,Recurse (rn, date_segment, startdate, enddate, bookings_sum, next_segment_flag) AS ( SELECT rn , 1 , bkgdate , bkgdate , bookings , 0 FROM Nbr_bookings WHERE rn = 1 UNION ALL SELECT new_rn , CASE WHEN next_segment_flag = 1 THEN date_segment + 1 ELSE date_segment END , CASE WHEN next_segment_flag = 1 THEN bkgdate ELSE startdate END , bkgdate , CASE WHEN next_segment_flag = 1 THEN bookings ELSE bookings_sum + bookings END , new_segment_flag FROM (SELECT pre.rn + 1 AS new_rn , pre.date_segment , pre.startdate , pre.enddate , pre.bookings_sum , pre.next_segment_flag , new.* , CASE WHEN pre.next_segment_flag = 0 AND ( pre.bookings_sum + new.bookings ) >= 100 OR new.bookings >= 100 THEN 1 ELSE 0 END AS new_segment_flag FROM Recurse pre , Nbr_bookings new WHERE pre.rn < 10000 AND pre.rn + 1 = new.rn ) Q ) SELECT date_segment , startdate , enddate , bookings_sum AS "(bookings)" FROM Recurse WHERE next_segment_flag = 1 ; ------------------------------------------------------------------------------
DATE_SEGMENT STARTDATE ENDDATE (bookings) ------------ ---------- ---------- ----------- 1 2006-01-01 2006-03-01 150 2 2006-04-01 2006-06-01 115
2 record(s) selected.
Tonkuma - 03 Jul 2006 02:09 GMT This also seems work well.
WITH Cum_bookings AS ( SELECT bkgdate , bookings , SUM(bookings) OVER(ORDER BY bkgdate ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS bookings_cum , ROWNUMBER() OVER(ORDER BY bkgdate) AS rn FROM Bookings ) ,Ranges AS ( SELECT L.rn AS Lrn , MIN(H.rn) AS Hrn FROM Cum_bookings L , Cum_bookings H WHERE L.rn <= H.rn AND ( H.bookings_cum - L.bookings_cum + L.bookings ) >= 100 GROUP BY L.rn ) SELECT INT(ROWNUMBER() OVER(ORDER BY Lrn)) AS date_segment , L.bkgdate AS startdate , H.bkgdate AS enddate , ( H.bookings_cum - L.bookings_cum + L.bookings ) AS "(bookings)" FROM Cum_bookings L , Cum_bookings H , (SELECT MAX(Lrn) AS Lrn , Hrn FROM Ranges R WHERE (Lrn = 1 OR EXISTS (SELECT * FROM Ranges E WHERE E.Hrn = R.Lrn - 1) ) AND (Hrn = (SELECT MAX(Hrn) FROM Ranges) OR EXISTS (SELECT * FROM Ranges E WHERE E.Lrn = R.Hrn + 1) ) GROUP BY Hrn ) S WHERE L.rn = S.Lrn AND H.rn = S.Hrn ; --------------------------------------------------------------------
DATE_SEGMENT STARTDATE ENDDATE (bookings) ------------ ---------- ---------- ----------- 1 2006-01-01 2006-03-01 150 2 2006-04-01 2006-06-01 115
2 record(s) selected.
--CELKO-- - 30 Jun 2006 17:35 GMT Can we start by putting the date in the proper format?
>> bkgdate bookings company 2006-01-01 50 A 2006-02-01 45 A 2006-03-01 55 A 2006-04-01 40 A 2006-05-01 30 A 2006-06-01 45 A 2006-07-01 35 A ... I want to segment this into blocks of days where there are at least 100 bookings in each block. <<
What do you do about overlaps and subsets?
2006-01-01 2006-03-01 150 2006-02-01 2006-03-01 100 <== subset of first row 2006-03-01 2006-05-01 125 2006-04-01 2006-06-01 115 <== overlaps following row 2006-05-01 2006-07-01 110 etc.
Likewise, we could get "super blocks" like ('2006-01-01', '2006-04-01') which meet the criteria, but are not minimal.
This seems like a good candidate for a SUM(bookings) OVER (PARTITION BY company ORDER BY bkgdate ROWS <<magic range clause>> FOLLOWING)
but I am having trouble with my magic :) Can we get a better spec?
James Conrad StJohn Foreman - 06 Jul 2006 15:27 GMT > What do you do about overlaps and subsets? > [quoted text clipped - 13 lines] > > but I am having trouble with my magic :) Can we get a better spec? Sorry. As in all things, seemed a bit simpler than it was. I want disjoint spanning groups, with each group being minimal.
Tonkuma - 09 Jul 2006 09:01 GMT > As in all things, seemed a bit simpler than it was. I agree. At least, my previous posts used a little complex algorithm or extra code.
> I want disjoint spanning groups, with each group being minimal. But, this problem is not so simple. Because it is neccesary to remove overlaps, sub-sets as CELKO wrote. And all data(rows) need to be included in a group except last some rows if sum of bookings in the group is less than 100.
Here is another example: WITH Sum_bookings AS ( SELECT L.bkgdate AS startdate , H.bkgdate AS enddate , SUM(S.bookings) AS bookings_sum FROM Bookings L , Bookings H , Bookings S WHERE L.bkgdate <= H.bkgdate AND S.bkgdate BETWEEN L.bkgdate AND H.bkgdate GROUP BY L.bkgdate, H.bkgdate HAVING SUM(S.bookings) >= 100 ) ,Recurse (date_segment, startdate, enddate, bookings_sum) AS ( SELECT 1 , startdate , enddate , bookings_sum FROM Sum_bookings WHERE startdate = (SELECT MIN(bkgdate) FROM bookings) AND enddate = (SELECT MIN(enddate) FROM Sum_bookings) UNION ALL SELECT pre.date_segment + 1 , new.startdate , new.enddate , new.bookings_sum FROM Recurse pre , Sum_bookings new WHERE pre.date_segment < 100000 AND new.startdate = (SELECT MIN(startdate) FROM Sum_bookings M WHERE M.startdate > pre.enddate ) AND new.enddate = (SELECT MIN(enddate) FROM Sum_bookings M WHERE M.startdate > pre.enddate ) ) SELECT * FROM Recurse;
|
|
|