Database Forum / DB2 Topics / March 2007
consolidate time periods
|
|
Thread rating:  |
--CELKO-- - 24 Mar 2007 18:50 GMT Can I get a little help? Given a very simple table of events, with their durations, I want to consolidate time periods for an event that (1) touch each other or (2) overlap each other.
CREATE TABLE Events (event_id VARCHAR(25) NOT NULL, start_date DATE NOT NULL, end_date DATE NOT NULL, CHECK (start_date <= end_date), PRIMARY KEY (event_id, start_date, end_date));
INSERT INTO Events VALUES ('Pie Eating', '2009-01-01', '2009-01-02'), ('Pie Eating', '2009-01-03', '2009-01-05'), ('Pie Eating', '2009-01-04', '2009-01-07'), ('Pie Eating', '2009-02-01', '2009-02-07');
The goal is to reduce these three rows into fewer rows that show how many days we were eating pies.
('Pie Eating', '2009-01-01', '2009-01-07')
The first and second rows in the INSERT INTO statement touch each other and can be replaced with:
('Pie Eating', '2009-01-01', '2009-01-05')
The third row will overlap with this new row and can be consolidated with it, as shown before. However, the fourth row has data that occurs a month after the first three and it is disjoint from all the other rows.
What I want is:
1) A portable, simple cursor solution 2) A Recursive CTE solution
I am going to use this in a book, so you will get credit.
Serge Rielau - 25 Mar 2007 04:07 GMT > Can I get a little help? Given a very simple table of events, with > their durations, I want to consolidate time periods for an event that [quoted text clipped - 34 lines] > > I am going to use this in a book, so you will get credit. 1) Too much work and with SQL Server supporting recursive CTE I claim portability ;-) 2) WITH rec(event_id, start_date, end_date) AS (SELECT event_id, start_date, end_date FROM events UNION ALL SELECT rec.event_id, rec.start_date, events.end_date FROM events, rec WHERE events.start_date BETWEEN rec.start_date AND rec.end_date + 1 day AND events.end_date > rec.end_date) SELECT event_id, MIN(start_date) as start_date, end_date FROM (SELECT event_id, start_date, MAX(end_date) AS end_date FROM rec GROUP BY event_id, start_date) AS rtrunc GROUP BY event_id, end_date;
EVENT_ID START_DATE END_DATE ------------------------- ---------- ---------- SQL0347W The recursive common table expression "SRIELAU.REC" may contain an infinite loop. SQLSTATE=01605
Pie Eating 01/01/2009 01/07/2009 Pie Eating 02/01/2009 02/07/2009
2 record(s) selected with 1 warning messages printed.
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
Serge Rielau - 25 Mar 2007 04:08 GMT PS: I suspect this can be written with OLAP expressions as well using windowing
 Signature Serge Rielau DB2 Solutions Development IBM Toronto Lab
Lennart - 25 Mar 2007 09:58 GMT > PS: I suspect this can be written with OLAP expressions as well using > windowing I guess something like:
SELECT min_start_date, MAX(end_date) FROM ( SELECT start_date, end_date, MAX(CASE WHEN start_date <= max_end_date + 1 day THEN NULL ELSE start_date END) OVER (ORDER BY start_date, end_date ROWS UNBOUNDED PRECEDING) FROM ( SELECT start_date, end_date, MAX(end_date) OVER ( ORDER BY start_date, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) FROM events ) T1 (start_date, end_date, max_end_date) ) T2 (start_date, end_date, min_start_date) GROUP BY min_start_date ORDER BY min_start_date;
should work
/Lennart
Lennart - 25 Mar 2007 10:22 GMT Sorry, should take tare of different event_id's as well:
SELECT event_id, min_start_date, MAX(end_date) FROM ( SELECT event_id, start_date, end_date, MAX(CASE WHEN start_date <= max_end_date + 1 day THEN NULL ELSE start_date END) OVER (PARTITION BY event_id ORDER BY event_id, start_date, end_date ROWS UNBOUNDED PRECEDING) FROM ( SELECT event_id, start_date, end_date, MAX(end_date) OVER ( PARTITION BY event_id ORDER BY event_id, start_date, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) FROM events ) T1 (event_id, start_date, end_date, max_end_date) ) T2 (event_id, start_date, end_date, min_start_date) GROUP BY event_id, min_start_date ORDER BY event_id, min_start_date;
/Lennart
Tonkuma - 26 Mar 2007 17:30 GMT > PS: I suspect this can be written with OLAP expressions as well using > windowing [quoted text clipped - 3 lines] > DB2 Solutions Development > IBM Toronto Lab This is my trial. I added data which are included in another row's range.
INSERT INTO Events VALUES ('Apple Eating', '2009-01-01', '2009-01-02') , ('Apple Eating', '2009-01-03', '2009-01-05') , ('Apple Eating', '2009-01-04', '2009-01-10') , ('Apple Eating', '2009-01-05', '2009-01-06') , ('Apple Eating', '2009-01-08', '2009-01-09') , ('Apple Eating', '2009-01-11', '2009-01-12') , ('Apple Eating', '2009-02-01', '2009-02-07') , ('Apple Eating', '2009-03-01', '2009-03-07') , ('Apple Eating', '2009-03-03', '2009-03-05') , ('Apple Eating', '2009-04-01', '2009-04-03') , ('Apple Eating', '2009-04-04', '2009-04-05') , ('Apple Eating', '2009-04-04', '2009-04-06') ;
------------------------- Commands Entered ------------------------- WITH Flagged_Events AS ( SELECT event_id, start_date, end_date , CASE WHEN (pre_end_date < start_date - 1 DAYS OR pre_end_date IS NULL) AND (fol_start_date > end_date + 1 DAYS OR fol_start_date IS NULL OR fol_end_date < end_date) THEN 'Only' WHEN (pre_end_date < start_date - 1 DAYS OR pre_end_date IS NULL) AND fol_start_date <= end_date + 1 DAYS THEN 'Begin' WHEN pre_end_date >= start_date - 1 DAYS AND pre_end_date < end_date AND (fol_start_date > end_date + 1 DAYS OR fol_start_date IS NULL OR fol_end_date < end_date) THEN 'End' ELSE 'Middle' END AS Flag FROM (SELECT event_id, start_date, end_date , MAX(end_date) OVER(PARTITION BY event_id ORDER BY start_date, end_date ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ) pre_end_date , MIN(end_date) OVER(PARTITION BY event_id ORDER BY start_date, end_date ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING ) fol_end_date , MIN(start_date) OVER(PARTITION BY event_id ORDER BY start_date, end_date ROWS BETWEEN 1 FOLLOWING AND UNBOUNDED FOLLOWING ) fol_start_date FROM Events ) R ) SELECT event_id, start_date , CASE Flag WHEN 'Only' THEN end_date ELSE new_end_date END AS end_date FROM (SELECT event_id, start_date, end_date, Flag , MAX(end_date) OVER(PARTITION BY event_id ORDER BY start_date, end_date ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) new_end_date FROM (SELECT event_id, start_date, end_date, Flag , MAX(Flag) OVER(PARTITION BY event_id ORDER BY start_date, end_date ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING ) fol_Flag FROM Flagged_Events WHERE Flag <> 'Middle' ) Q WHERE Flag <> 'End' OR Flag = 'End' AND (fol_Flag <> 'End' OR fol_Flag IS NULL) ) R WHERE Flag <> 'End' ORDER BY event_id, start_date; --------------------------------------------------------------------
EVENT_ID START_DATE END_DATE ------------------------- ---------- ---------- Apple Eating 2009-01-01 2009-01-12 Apple Eating 2009-02-01 2009-02-07 Apple Eating 2009-03-01 2009-03-07 Apple Eating 2009-04-01 2009-04-06 Pie Eating 2009-01-01 2009-01-07 Pie Eating 2009-02-01 2009-02-07
6 record(s) selected.
--CELKO-- - 25 Mar 2007 19:02 GMT >> 1) Too much work and with SQL Server supporting recursive CTE I claim portability ;-) << Actually, it is pretty easy:
DECLARE EventList CURSOR FOR SELECT DISTINCT E.event_id, C.cal_date FROM Events AS E, Calendar AS C WHERE C.cal_date BETWEEN E.start_date AND E.end_date ORDER BY E.event_id, C.cal_date ASC FOR READ ONLY;
We now have a list of each day in an event.
1. Fetch the first row and put it into local storage, making the end_date = start_date 2. Fetch the next row as teh current row 3. If this current row is in the same event and INTERVAL '1' DAY after the local end_date, update the end_date with it. 4. If the current row is not in the same event or > INTERVAL '1' DAY after the local end_date, then 4.1. Insert the local storage into a working table as a row 4.2. Overwrite the local storage with the current row, making the end_date = start_date 5. Loop until the end of the cursor. 6. The working table is the desired answer.
I can keep the code fairly portble and short.
Art S. Kagel - 26 Mar 2007 16:20 GMT >>>1) Too much work and with SQL Server supporting recursive CTE I claim portability ;-) << > [quoted text clipped - 6 lines] > ORDER BY E.event_id, C.cal_date ASC > FOR READ ONLY; Joe,
This only works if some date record in the cal_date table exists within the date range of each and every row for an event. Otherwise you will not see the entire range of the event, only the range spanned by cal_date records. If that's what you want, it's fine. However, if, instead, the request is to fetch the entire range of dates for an event that will include a given single date (or small date range). That would require one to push the date matching into a sub-query IB.
Art S. Kagel
> We now have a list of each day in an event. > [quoted text clipped - 12 lines] > > I can keep the code fairly portble and short. Knut Stolze - 25 Mar 2007 20:29 GMT > Can I get a little help? Given a very simple table of events, with > their durations, I want to consolidate time periods for an event that [quoted text clipped - 34 lines] > > I am going to use this in a book, so you will get credit. Completely different approach: http://tinyurl.com/2plby8
 Signature Knut Stolze DB2 z/OS Utilities Development IBM Germany
Art S. Kagel - 26 Mar 2007 16:25 GMT > Can I get a little help? Given a very simple table of events, with > their durations, I want to consolidate time periods for an event that [quoted text clipped - 15 lines] > The goal is to reduce these three rows into fewer rows that show how > many days we were eating pies. An aside: I prefer to encode these kinds of structures with a start date and a duration as:
CREATE TABLE Events (event_id VARCHAR(25) NOT NULL, start_date DATE NOT NULL, addl_days INTEGER, -- Where available I would use INTERVAL (DAYS) instead CHECK (addl_days >= 0), PRIMARY KEY (event_id, start_date));
I find that it makes finding events that occur on a particular date easier to code, and allows for both compressed and single day records to coexist and return correct results.
So, your data might look like:
INSERT INTO Events VALUES ('Pie Eating', '2009-01-01', 1 ), ('Pie Eating', '2009-01-03', 2 ), ('Pie Eating', '2009-01-04', 3 ), -- I might prevent the overlap with -- a trigger, but... ('Pie Eating', '2009-02-01', 6 );
So, a cursor solution to your problem:
DECLARE EventList CURSOR FOR SELECT DISTINCT E.event_id, E.start_date, E.addl_days FROM Events as E, Calendar as C WHERE C.cal_date between E.start_date AND (E.start_date + E.addl_days) FOR READ ONLY;
And the pseudo-code is similar to yours with the exception that you are only incrementing the addl_days.
Obviously my previous comment about the dates in Calendar affecting the results also apply to this schema.
FWIW.
Art S. Kagel
Tonkuma - 26 Mar 2007 20:28 GMT > I am going to use this in a book, so you will get credit. I want ask the people who reply to CELKO's question. He is going to use your reply to use in his book without explicit your permission. Or, post in this forum means implicitly give permission to use any way your writings for all peoples? In another word, did you agreed to give up your copywrite?
Art S. Kagel - 26 Mar 2007 23:24 GMT >>I am going to use this in a book, so you will get credit. > [quoted text clipped - 4 lines] > your writings for all peoples? In another word, did you agreed to give > up your copywrite? Yes, you implicitely agree to let him use your reply.
Art S. Kagel
jefftyzzer - 26 Mar 2007 23:28 GMT > > I am going to use this in a book, so you will get credit. > [quoted text clipped - 4 lines] > your writings for all peoples? In another word, did you agreed to give > up your copywrite? Joe:
If you've not already, it may be of value to you to see chapter 6 (especially section 6.5.2) of Snodgrass's _Developing Time-Oriented Database Applications in SQL_, which is now in the public domain: http://www.cs.arizona.edu/~rts/publications.html.
I was working on a "find contigous slots of time for a given meeting of x minute duration" query that went against a table of timeslots of various lengths--some of which may or may not have been available--and the section of Snodgrass's book I mention was very helpful to me. You may also find Zimanyi's "Temporal Aggregates and Temporal Universal..." paper in v35:1 (6/06) of the SIGMOD Record useful.
Regards,
--Jeff
--CELKO-- - 27 Mar 2007 02:11 GMT >> the section of Snodgrass's book I mention was very helpful to me. You may also find Zimanyi's "Temporal Aggregates and Temporal Universal..." paper in v35:1 (6/06) of the SIGMOD Record useful. << I know Rick from our time on ANSI X3H2 together and I used some of his material in one of my other books. I am trying to do an updated version of it, with some of the new SQL-99 and SQL-2003 stuff. That is why I am posting in newgroups that attract smart programmers :)
I did not know about Zimanyi's stuff; thanks!
--CELKO-- - 27 Mar 2007 02:07 GMT >> I want ask the people who reply to CELKO's question. He is going to use your reply to use in his book without explicit your permission. << I do not have to ask when you post in a public newsgroup without a copyright on the material. I do because it is polite and it usually gets me more help. I also have my publisher send out review copies to contributors so they can highlight their name adn enjoy a $30-50 book.
Right now, there is a Blog called "Joe Celko, the SQL apprrentice" that is nothing but a collection of **my** newsgroup postings (I wish I knew who is doing it). He has made several hundred dollars in advertising from it. The most that Google will do is post a disclaimer that I have nothing to do with the site.
I also wish he would fix my spelling, but that would lead to problems with improper use of public material. Welcome to the weird world of IP and copyrights!
|
|
|