Database Forum / DB2 Topics / December 2005
SQL Problem with date ranges
|
|
Thread rating:  |
alexanderpope11@hotmail.com - 30 Nov 2005 22:39 GMT Hello,
how do I write this SQL: I would like to check the Main table for invalid rows. An invalid row is: any row where the Start_date to stop_date range overlaps an invalid date in the Code table. For example, Row#2 is invalid because the Start_date-Stop_Date range overlaps 2 days in the code table where the code AA was not valid (12/30/2000 - 12/31/2000)
Main Table Row# emp_id code start_date stop_date 1 98233 AA 05/05/2000 12/28/2000 2 98235 AA 05/05/2000 04/21/2001
Code Table Code Description eff_Date end_Date AA Hospitals 01/01/2000 12/29/2000 AA Hospitals, Schools 01/01/2001 12/31/9999 XX N/A 01/01/2009 12/31/9999 Thanks Alex
Knut Stolze - 01 Dec 2005 10:44 GMT > Hello, > [quoted text clipped - 15 lines] > AA Hospitals, Schools 01/01/2001 12/31/9999 > XX N/A 01/01/2009 12/31/9999 This looks suspiciously like an exercise at school to me. So I don't want to give you the answer (because it is very simple) but rather like to know what you have already tried? I.e. what's your SQL statement so far? Based on that, we can surely help you to figure out misconceptions if there are any.
 Signature Knut Stolze DB2 Information Integration Development IBM Germany
alexanderpope11@hotmail.com - 01 Dec 2005 17:26 GMT Knut, I have a solution. If there is a more efficient method let me know:
--1) create tables and insert data
drop table main; create table main (row integer, emp_id integer, code char(2), start_date date, stop_date date );
insert into main (row, emp_id, code,start_date, stop_date) values (1,98233,'AA','05/05/2000','12/28/2000'), (2,98235,'AA','05/05/2000','04/21/2001');
select * from main;
drop table code; drop table code_ct; create table code_ct (code char(2), desc varchar(40), eff_date date, end_date date);
insert into code_ct values ('AA', 'Hospitals', '01/01/2000', '12/29/2000'), ('AA', 'Hospitals, Schools', '01/01/2001', '12/31/9999'), ('XX', 'N/A', '01/01/2009', '12/31/9999');
select * from code_ct;
--2a) find the rows in the code table with a gap between end_date and start_date. it doesnt matter how large the gap is --2b) find any gap date --2c) finally, return any rows where the gap date falls -- between the main.start_date and main.stop_date
select a.row, a.emp_id, a.code,a.start_Date, GapRows.GapDate, a.stop_date from main a, ( select code,EFF_DATE,END_DATE,next_eff_Date, NEXT_EFF_DATE - 1 day as GapDate from table (select CT.code, CT.EFF_DATE, CT.END_DATE, max(eff_date) over (partition by code order by code, eff_date rows between 1 following and 1 following) as next_eff_Date from code_ct CT ) as tmp (code,EFF_DATE,END_DATE,next_eff_Date) where (next_eff_Date - END_DATE) > 1 ) as GapRows (code, EFF_DATE, END_DATE, NEXT_EFF_DATE, GapDate ) where a.code = GapRows.code and GapDate between a.start_date and a.stop_date;
> > Hello, > > [quoted text clipped - 26 lines] > DB2 Information Integration Development > IBM Germany Knut Stolze - 02 Dec 2005 12:43 GMT >> > how do I write this SQL: >> > I would like to check the Main table for invalid rows. An invalid row >> > is: any row where the Start_date to stop_date range overlaps an invalid >> > date in the Code table. For example, Row#2 is invalid because the >> > Start_date-Stop_Date range overlaps 2 days in the code table where the >> > code AA was not valid (12/30/2000 - 12/31/2000) If I got this right, then an "invalid date range" is a range that is not covered by the dates in the code table, correct?
>> > Main Table >> > Row# emp_id code start_date stop_date [quoted text clipped - 13 lines] >> Based on that, we can surely help you to figure out misconceptions if >> there are any.
> --2a) find the rows in the code table with a gap between end_date and > start_date. it doesnt matter how large the gap is [quoted text clipped - 17 lines] > where a.code = GapRows.code AND > GapDate between a.start_date and a.stop_date; That does look quite fine to me. I'd probably do it that way: (1) check for each interval/range in the main table if it overlaps with a range in the code table, and (2) if it overlaps, then check that the date right before and after the overlaps are covered by another range from the code table
SELECT ... FROM main AS m JOIN code_table AS ct ON m.code = ct.code WHERE m.start_date < ct.end_date AND m.end_date > ct.eff_date AND -- check if another range in CT lies before the current one -- (only if the range in main exceeds the range in CT) CASE WHEN m.start_date < ct.eff_date THEN ( SELECT MAX(1) FROM code_table AS b WHERE b.code = m.code AND b.end_date >= m.start_date AND b.eff_date <= m.start_date ) ELSE 0 END = 1 AND -- check if another range in CT lies after the current one -- (only if the range in main exceeds the range in CT) CASE WHEN m.end_date > ct.end_date THEN ( SELECT MAX(1) FROM code_table AS a WHERE b.code = m.code AND b.eff_date <= m.end_date AND b.end_date >= m.end_date ) ELSE 0 END = 1
The subselects in the case expressions could also be rewritten, I'd say, Maybe that would make things more clearer.
 Signature Knut Stolze DB2 Information Integration Development IBM Germany
Brian Tkatch - 02 Dec 2005 16:34 GMT You also need to check if anyone is before the earliest date.
B.
Brian Tkatch - 02 Dec 2005 16:45 GMT This was a good exercise. It took me a while to come up with a solution, and that's how i learn things. Here's my solution, which is just what i came up with, not that it is better than anybody else's.
The CASE expressions are because the DATE given is the final valid DATE in db2, and adding a DAY results in SQL0183N. I add a DAY (or subtract) because the end date of one range can be one day before the begin date of the next range, and there will be no missed days. Since your example provide the final valid DATE, i assumed there could be the earliest one too.
The GROUP BY at the end is simply so only one row shows up per row in the Main TABLE, otherwise more than one may appear. In this example, two rows would appear instead.
I like my solution because it is straightforward. The WITH provides the bad days being check for, and then a simple BETWEEN is used.
DECLARE GLOBAL TEMPORARY TABLE Main ( Row INTEGER, Emp_Id INTEGER, Code CHAR(0002), Start_Date DATE, Stop_Date DATE ) WITH REPLACE
DECLARE GLOBAL TEMPORARY TABLE Code_CT ( Code CHAR(2), Desc VARCHAR(0040), Eff_Date DATE, End_Date DATE ) WITH REPLACE
INSERT INTO SESSION.Main VALUES (1, 98233, 'AA', '05/05/2000', '12/28/2000'), (2, 98235, 'AA', '05/05/2000', '04/21/2001')
INSERT INTO SESSION.Code_CT VALUES ('AA', 'Hospitals', '01/01/2000', '12/29/2000'), ('AA', 'Hospitals, Schools', '01/01/2001', '12/31/9999'), ('XX', 'N/A', '01/01/2009', '12/31/9999')
WITH Bad_Day (Code, The_Day) AS ( SELECT Code, End_Date + 1 DAY FROM SESSION.Code_CT Outer WHERE End_Date < DATE('12/31/9999') AND NOT EXISTS ( SELECT * FROM SESSION.Code_CT Inner WHERE Inner.Code = Outer.Code AND Outer.End_Date BETWEEN CASE Inner.Eff_Date WHEN DATE('01/01/0001') THEN Inner.Eff_Date ELSE Inner.Eff_Date - 1 DAY END AND CASE Inner.End_Date WHEN DATE('01/01/0001') THEN Inner.End_Date ELSE Inner.End_Date - 1 DAY END ) UNION ALL SELECT Code, Eff_Date - 1 DAY FROM SESSION.Code_CT Outer WHERE Eff_Date > DATE('01/01/0001') AND NOT EXISTS ( SELECT * FROM SESSION.Code_CT Inner WHERE Inner.Code = Outer.Code AND Outer.Eff_Date BETWEEN CASE Inner.Eff_Date WHEN DATE('12/31/9999') THEN Inner.Eff_Date ELSE Inner.Eff_Date + 1 DAY END AND CASE Inner.End_Date WHEN DATE('12/31/9999') THEN Inner.End_Date ELSE Inner.End_Date + 1 DAY END ) ) SELECT Main.Row, Main.Emp_Id, Main.Code, Main.Start_Date, Main.Stop_Date, MIN(Bad_Day.The_Day) FROM SESSION.Main Main, Bad_Day WHERE Main.Code = Bad_Day.Code AND Bad_Day.The_Day BETWEEN Main.Start_Date AND Main.Stop_Date GROUP BY Main.Row, Main.Emp_Id, Main.Code, Main.Start_Date, Main.Stop_Date
DROP TABLE SESSION.Main DROP TABLE SESSION.Code_CT
B.
alexanderpope11@hotmail.com - 01 Dec 2005 17:29 GMT Knut, I found a solution. if there is a more efficient method let me know.
The key is to find a gap date between dates in the code table, for a specific code; and figure out if that gap date falls between the start_date and stop_date in the main table.
drop table main; create table main (row integer, emp_id integer, code char(2), start_date date, stop_date date );
insert into main (row, emp_id, code,start_date, stop_date) values (1,98233,'AA','05/05/2000','12/28/2000'), (2,98235,'AA','05/05/2000','04/21/2001');
select * from main;
drop table code; drop table code_ct; create table code_ct (code char(2), desc varchar(40), eff_date date, end_date date);
insert into code_ct values ('AA', 'Hospitals', '01/01/2000', '12/29/2000'), ('AA', 'Hospitals, Schools', '01/01/2001', '12/31/9999'), ('XX', 'N/A', '01/01/2009', '12/31/9999');
select * from code_ct;
select a.row, a.emp_id, a.code,a.start_Date, GapRows.GapDate, a.stop_date from main a, ( select code,EFF_DATE,END_DATE,next_eff_Date, NEXT_EFF_DATE - 1 day as GapDate from table (select CT.code, CT.EFF_DATE, CT.END_DATE, max(eff_date) over (partition by code order by code, eff_date rows between 1 following and 1 following) as next_eff_Date from code_ct CT ) as tmp (code,EFF_DATE,END_DATE,next_eff_Date) where (next_eff_Date - END_DATE) > 1 ) as GapRows (code, EFF_DATE, END_DATE, NEXT_EFF_DATE, GapDate ) where a.code = GapRows.code and GapDate between a.start_date and a.stop_date;
|
|
|