Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / DB2 Topics / July 2006

Tip: Looking for answers? Try searching our database.

How to group date periods together ?

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
James Conrad StJohn Foreman - 29 Jun 2006 16:03 GMT
[DB2 UDB LUW 8.2.4] I've got bookings per day in a table:

bkgdate   bookings   company
1/1/06      50            A
2/1/06      45            A
3/1/06      55            A
4/1/06      40            A
5/1/06      30            A
6/1/06      45            A
7/1/06      35            A
...

I want to segment this into blocks of days where there are at least 100
bookings in each block.

So the solution I want is along the lines of

date_segment   startdate   enddate  (bookings)
1                     1/1/06       3/1/06      150
2                     4/1/06       6/1/06      115
...

So far I've managed to do

SELECT bkgdate, company,
sum(bookings) over (partition by company order by bkgdate) as
cume_bookings
FROM table1

which gives me

bkgdate     cume_bookings
1/1/06        50
2/1/06        95
3/1/06        150
4/1/06        190
...

I was thinking about joining this result set to itself, but that
doesn't quite give me the results I want, and the SQL is getting more
and more convoluted.  Is there a more obvious way to do this that I'm
missing?

Thanks

JCSJF
Tonkuma - 30 Jun 2006 09:13 GMT
I got the result this way. But, I feel it too complex. Althogh, I tried
to simlify or modify some part of this code. I failed to simplify any
more. Are there anyone who could show more simple or easily
understandable way?

WITH Cum_bookings AS (
SELECT bkgdate
    , bookings
    , SUM(bookings) OVER(ORDER BY bkgdate
                          ROWS BETWEEN UNBOUNDED PRECEDING
                                   AND CURRENT ROW
                        ) AS bookings_cum
    , ROWNUMBER()   OVER(ORDER BY bkgdate) AS rn
 FROM Bookings
)
,Ranges AS (
SELECT
      MIN(startdate)    AS startdate
    , MIN(enddate)      AS enddate
    , MIN(bookings_cum) AS bookings_cum
    , MIN(Lrn)          AS Lrn
    , Hrn
 FROM (SELECT MIN(L.bkgdate)        AS startdate
            , MIN(H.bkgdate)        AS enddate
            , MIN(  H.bookings_cum
                  - L.bookings_cum
                  + L.bookings    ) AS bookings_cum
            , L.rn                  AS Lrn
            , MIN(H.rn)             AS Hrn
         FROM Cum_bookings L
            , Cum_bookings H
        WHERE L.rn < H.rn
          AND (  H.bookings_cum
               - L.bookings_cum
               + L.bookings    ) >= 100
        GROUP BY
              L.rn
      ) L
GROUP BY Hrn
)
SELECT INT(ROWNUMBER() OVER(ORDER BY startdate)) AS date_segment
    , startdate
    , enddate
    , bookings_cum AS "(bookings)"
 FROM Ranges R
WHERE EXISTS
      (SELECT *
         FROM Ranges E
        WHERE E.Lrn = R.Hrn + 1)
   OR EXISTS
      (SELECT *
         FROM Ranges E
        WHERE E.Hrn = R.Lrn - 1)
;
--------------------------------------------------------------------

DATE_SEGMENT STARTDATE  ENDDATE    (bookings)
------------ ---------- ---------- -----------
          1 2006-01-01 2006-03-01         150
          2 2006-04-01 2006-06-01         115

 2 record(s) selected.
--CELKO-- - 30 Jun 2006 18:14 GMT
>> I got the result this way. But, I feel it too complex. <<

I thought it was quite clever myself!

I was playing with the idea of setting up all possible partitions in a
table, then seeing which one (if any) met the summation criteria.
Something like this:

bkgdate       part_grp part_nbr
=======================
2006-01-01      1                1
2006-02-01      1                1
2006-03-01      1                1
2006-04-01      1                2
2006-05-01      1                2
2006-06-01      1                2
2006-07-01      1                2

2006-01-01      2                1
2006-02-01      2                1
2006-03-01      2                1
2006-04-01      2                1
2006-05-01      2                2
2006-06-01      2                2
2006-07-01      2                3

Wow! That would get big really fast!
Tonkuma - 01 Jul 2006 13:19 GMT
Previous my Query was wrong.
Here is right(I hope) Query using Recirsive query.

WITH Nbr_bookings AS (
SELECT bkgdate
    , bookings
    , INT(ROWNUMBER()   OVER(ORDER BY bkgdate)) AS rn
 FROM Bookings
)
,Recurse
(rn, date_segment, startdate, enddate, bookings_sum,
next_segment_flag) AS (
SELECT rn
    , 1
    , bkgdate
    , bkgdate
    , bookings
    , 0
 FROM Nbr_bookings
WHERE rn = 1
UNION ALL
SELECT new_rn
    , CASE
      WHEN next_segment_flag = 1 THEN
           date_segment + 1
      ELSE date_segment
      END
    , CASE
      WHEN next_segment_flag = 1 THEN
           bkgdate
      ELSE startdate
      END
    , bkgdate
    , CASE
      WHEN next_segment_flag = 1 THEN
           bookings
      ELSE bookings_sum + bookings
      END
    , new_segment_flag
 FROM (SELECT pre.rn + 1 AS new_rn
            , pre.date_segment
            , pre.startdate
            , pre.enddate
            , pre.bookings_sum
            , pre.next_segment_flag
            , new.*
            , CASE
              WHEN pre.next_segment_flag = 0
               AND (  pre.bookings_sum
                    + new.bookings    ) >= 100
               OR  new.bookings >= 100         THEN
                   1
              ELSE 0
              END  AS new_segment_flag
         FROM Recurse      pre
            , Nbr_bookings new
       WHERE pre.rn < 10000
         AND pre.rn + 1 = new.rn
      ) Q
)
SELECT date_segment
    , startdate
    , enddate
    , bookings_sum  AS "(bookings)"
 FROM Recurse
WHERE next_segment_flag = 1
;
------------------------------------------------------------------------------

DATE_SEGMENT STARTDATE  ENDDATE    (bookings)
------------ ---------- ---------- -----------
          1 2006-01-01 2006-03-01         150
          2 2006-04-01 2006-06-01         115

 2 record(s) selected.
Tonkuma - 03 Jul 2006 02:09 GMT
This also seems work well.

WITH Cum_bookings AS (
SELECT bkgdate
    , bookings
    , SUM(bookings) OVER(ORDER BY bkgdate
                          ROWS BETWEEN UNBOUNDED PRECEDING
                                   AND CURRENT ROW
                        ) AS bookings_cum
    , ROWNUMBER()   OVER(ORDER BY bkgdate) AS rn
 FROM Bookings
)
,Ranges AS (
SELECT
      L.rn      AS Lrn
    , MIN(H.rn) AS Hrn
 FROM Cum_bookings L
    , Cum_bookings H
WHERE L.rn <= H.rn
  AND (  H.bookings_cum
       - L.bookings_cum
       + L.bookings    ) >= 100
GROUP BY
      L.rn
)
SELECT INT(ROWNUMBER() OVER(ORDER BY Lrn)) AS date_segment
    , L.bkgdate          AS startdate
    , H.bkgdate          AS enddate
    , (  H.bookings_cum
       - L.bookings_cum
       + L.bookings    ) AS "(bookings)"
 FROM Cum_bookings L
    , Cum_bookings H
    , (SELECT
              MAX(Lrn) AS Lrn
            , Hrn
         FROM Ranges R
        WHERE
              (Lrn = 1
               OR
               EXISTS
               (SELECT *
                  FROM Ranges E
                 WHERE E.Hrn = R.Lrn - 1)
              )
          AND
              (Hrn = (SELECT MAX(Hrn) FROM Ranges)
               OR
               EXISTS
               (SELECT *
                  FROM Ranges E
                 WHERE E.Lrn = R.Hrn + 1)
              )
        GROUP BY
              Hrn
      ) S
WHERE L.rn = S.Lrn
  AND H.rn = S.Hrn
;
--------------------------------------------------------------------

DATE_SEGMENT STARTDATE  ENDDATE    (bookings)
------------ ---------- ---------- -----------
          1 2006-01-01 2006-03-01         150
          2 2006-04-01 2006-06-01         115

 2 record(s) selected.
--CELKO-- - 30 Jun 2006 17:35 GMT
Can we start by putting the date in the proper format?

>> bkgdate   bookings   company
2006-01-01      50            A
2006-02-01      45            A
2006-03-01      55            A
2006-04-01      40            A
2006-05-01      30            A
2006-06-01      45            A
2006-07-01      35            A
...
I want to segment this into blocks of days where there are at least 100
bookings in each block. <<

What do you do about overlaps and subsets?

2006-01-01      2006-03-01      150
2006-02-01      2006-03-01      100  <== subset of first row
2006-03-01      2006-05-01      125
2006-04-01      2006-06-01      115  <== overlaps following row
2006-05-01      2006-07-01      110
 etc.

Likewise, we could get "super blocks" like ('2006-01-01', '2006-04-01')
which meet the criteria, but are not minimal.

This seems like a good candidate for  a SUM(bookings)
OVER (PARTITION BY company ORDER BY bkgdate ROWS <<magic range clause>>
FOLLOWING)

but I am having trouble with my magic :)  Can we get a better spec?
James Conrad StJohn Foreman - 06 Jul 2006 15:27 GMT
> What do you do about overlaps and subsets?
>
[quoted text clipped - 13 lines]
>
> but I am having trouble with my magic :)  Can we get a better spec?

Sorry.  As in all things, seemed a bit simpler than it was.  I want
disjoint spanning groups, with each group being minimal.
Tonkuma - 09 Jul 2006 09:01 GMT
> As in all things, seemed a bit simpler than it was.
I agree. At least, my previous posts used a little complex algorithm or
extra code.

> I want disjoint spanning groups, with each group being minimal.
But, this problem is not so simple. Because it is neccesary to remove
overlaps, sub-sets as CELKO wrote. And all data(rows) need to be
included in a group except last some rows if sum of bookings in the
group is less than 100.

Here is another example:
WITH Sum_bookings AS (
SELECT L.bkgdate       AS startdate
    , H.bkgdate       AS enddate
    , SUM(S.bookings) AS bookings_sum
 FROM Bookings L
    , Bookings H
    , Bookings S
WHERE L.bkgdate <= H.bkgdate
  AND S.bkgdate BETWEEN L.bkgdate AND H.bkgdate
GROUP BY
      L.bkgdate, H.bkgdate
HAVING SUM(S.bookings) >= 100
)
,Recurse (date_segment, startdate, enddate, bookings_sum) AS (
SELECT 1
    , startdate
    , enddate
    , bookings_sum
 FROM Sum_bookings
WHERE startdate
      = (SELECT MIN(bkgdate) FROM bookings)
  AND enddate
      = (SELECT MIN(enddate) FROM Sum_bookings)
UNION ALL
SELECT pre.date_segment + 1
    , new.startdate
    , new.enddate
    , new.bookings_sum
 FROM Recurse      pre
    , Sum_bookings new
WHERE pre.date_segment < 100000
  AND new.startdate
      = (SELECT MIN(startdate)
           FROM Sum_bookings M
          WHERE M.startdate > pre.enddate
        )
  AND new.enddate
      = (SELECT MIN(enddate)
           FROM Sum_bookings M
          WHERE M.startdate > pre.enddate
        )
)
SELECT * FROM Recurse;
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.