Database Forum / General DB Topics / DB Theory / February 2004
BCNF and relational algebra
|
|
Thread rating:  |
Perception - 22 Feb 2004 20:00 GMT Hi all,
I have the following relations (expressed as functional dependencies) which I consider to be in BCNF (please correct me if I'm wrong):
Patient Identifier -> Patient Name, Ward Date Doctor -> Clinic
Now the problem here is that I am unable to deduce how to make any real query on these set of data because there are no common attributes between the relations. Is it even possible for relations under BCNF to have no common attributes? Note that the original relation was identical expect it had the additional functional dependency: Patient Identifier, Date, Doctor -> Patient Name, Ward, Clinic (i.e. the attributes on the LHS were the primary key). I decomposed the relation as I have shown above given that I knew each Patient Identifier was unique, a patient would never change wards but could attend several clinics, on each occasion a patient would see one doctor, doctor names were unique and that doctors were only associated with a single clinic. Now I believe that I have made a reasonable interpretation of this information both in my choice of primary key in the original relation (i.e. Patient Identifier, Date and Doctor) and in my decompositions into BCNF (every determinant is a candidate key).
Thus, if I wanted to list, say, all patient identifiers and names for all patients attending a certain clinic and having not been seen by a certain doctor how on earth would this be possible in relational algebra given that I cannot join over any common attribute?? So either I have made a mistake in my decomposition or there are some techniques that I am not aware of (that can link the FDs in some way).
In either case I would appreciate any help or clarification!
Thanks in anticipation.
Bob Badour - 22 Feb 2004 20:15 GMT > Hi all, > [quoted text clipped - 30 lines] > > Thanks in anticipation. What do you suppose the unary relation of dates means?
Perception - 22 Feb 2004 23:10 GMT > What do you suppose the unary relation of dates means? Thanks for the reply.
To answer your question: the unary relation means that every date is associated with more than one value for every other attribute (that is, Date only determines itself).
Could you perhaps have hinted that I should do a cartesian/cross product of Date with my other two normalised relations and then join them? More specifically, if I were to call my relations A, B and C where
A:Patient Identifier -> Patient Name, Ward B:Date C:Doctor -> Clinic
And if I wanted to achieve the example query I mentioned before, namely list all patient identifiers and names for all patients attending a certain clinic and having not been seen by a certain doctor, would the following be legal:
First, AxB and BxC (where x means 'cross'). Then join the new relations over 'Date'. Then select over the condition Doctor/='certain doctor' and Clinic='certain clinic'. And finally project over patient identifier and patient name.
Would that give me the correct answer? The reason I ask is because first I wasn't sure if my relations were in BCNF because there were no common attributes among them (although every determinant was a candidate key) and second whether I can cross two relations with no common attributes (although I think I can). If the answer to those two questions is yes then will I have computed what I was looking for? If I have gone on a completely wrong tangent please let me know, I'm pretty new to this database stuff!
Thanks again.
Bob Badour - 23 Feb 2004 00:24 GMT > > What do you suppose the unary relation of dates means? > [quoted text clipped - 3 lines] > associated with more than one value for every other attribute (that is, Date > only determines itself). That tells me what Date determines, but it does not tell me what it means. Is every date for all time in the unary relation? What does it mean for a specific date if it is in the relation? What does it mean for a specific date if it is not in the relation?
Perception - 23 Feb 2004 01:56 GMT > > > What do you suppose the unary relation of dates means? > > [quoted text clipped - 9 lines] > specific date if it is in the relation? What does it mean for a specific > date if it is not in the relation? Sorry for the misunderstanding. Specific dates correspond to when a patient visited a doctor. Thus if a date is in the relation, it means a patient had seen a doctor/attended a clinic on that date.
The original un-normalised datebase had tuples of the form:
Patient Id, Patient Name, Ward, Date, Clinic, Doctor
with data of the form, for instance
3842, Smith, 2, 6.7.03, 4, Anderson 9.9.03, 3, Haswell
corresponding to the above attributes for one particular patient.
Bob Badour - 23 Feb 2004 02:43 GMT > > > > What do you suppose the unary relation of dates means? > > > [quoted text clipped - 13 lines] > visited a doctor. Thus if a date is in the relation, it means a patient had > seen a doctor/attended a clinic on that date. How does a unary relation with only a date attribute say anything about a patient or a doctor or a clinic?
Jonathan Leffler - 23 Feb 2004 04:36 GMT > I have the following relations (expressed as functional dependencies) which > I consider to be in BCNF (please correct me if I'm wrong): > > Patient Identifier -> Patient Name, Ward > Date > Doctor -> Clinic
> Sorry for the misunderstanding. Specific dates correspond to when a patient > visited a doctor. Thus if a date is in the relation, it means a patient had [quoted text clipped - 10 lines] > > corresponding to the above attributes for one particular patient. Your BCNF breakdown bears little or no resemblance to the original data - you have lost lots of information, but the decomposition process is supposed to be lossless.
What are the unspoken rules about the data? Can a single patient see more than one doctor on a given date? Can they see the same doctor several times at the same clinic, in the same ward on the same date? Given the example data, it seems we can infer that both clinics 3 and 4 have a ward 2. We can clearly create a 1NF relation by replicating the patient ID, patient name, and ward values for the second row. We are then missing all the information that would help us deduce how this data can be normalized further.
The chances are good that Patient ID --> Patient Name.
Every other FD in your initial solution is disputable. A given doctor may work at several clinics. There is unlikely to be a rule that once a patient has been admitted to ward 2 at some clinic, thereafter, he must always be admitted to ward 2 at every clinic. As already discussed in other postings, the unary relation date is not significant.
You still have not given us all the information we need to do your homework for you, I'm afraid.
Note that relational databases work very largely by comparing data in relations (tables) on the basis of comparable attributes (columns). Depending on the ideological purity of your RTD (relational theory of data), the column names and types of the attributes (columns) should be the same in the two relations being compared. In your breakdown into 3 relations, there are no common attributes between any pair of the relations. As Codd delighted in saying, a database is partitionable into subsets c and C if the relations in c use no common domains (types) with those in C. Your initial solution is fully partionable into three relations. As you said in your original post, it is not possible to frame any queries between such disjoint tables.
 Signature Jonathan Leffler #include <disclaimer.h> Email: jleffler@earthlink.net, jleffler@us.ibm.com Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/
Perception - 23 Feb 2004 18:43 GMT > What are the unspoken rules about the data? Can a single patient see > more than one doctor on a given date? Can they see the same doctor > several times at the same clinic, in the same ward on the same date? All yes.
> Given the example data, it seems we can infer that both clinics 3 and > 4 have a ward 2. We can clearly create a 1NF relation by replicating [quoted text clipped - 9 lines] > must always be admitted to ward 2 at every clinic. As already > discussed in other postings, the unary relation date is not significant. I actually mentioned in my first post that a doctor is only associated with one clinic, and that patients will not change wards during their stay (and that doctor names are unique). These were not an assumption on my part, but facts that came with the data. Sorry if I wasn't clearer.
> Note that relational databases work very largely by comparing data in > relations (tables) on the basis of comparable attributes (columns). [quoted text clipped - 7 lines] > partionable into three relations. As you said in your original post, > it is not possible to frame any queries between such disjoint tables. Yes exactly, which is why this is all so confusing. If I left the data in 1NF then the FDs would be:
Patient Id, Date, Doctor -> Patient Name Patient Id, Date, Doctor -> Ward Patient Id, Date, Doctor -> Clinic
since (Patient Id, Date, Doctor) must be the primary key. But also I can deduce that Patient Id -> Patient Name, and Patient Id -> Ward and finally Doctor -> Clinic from the info I mentioned a couple of lines above. Also Date on its own doesn't determine anything. And basically that was how I decomposed my data into BCNF, by having three relations where Patient Id -> Patient Name, Ward and Doctor -> Clinic and Date which followed from the above observations. Clearly this is BCNF because the determinants are candidate keys (and obviously 3NF since full functional dependence and no transitivity). The puzzling thing is that I can't make any query on this (as mentioned), so either I've missed a functional dependence somewhere or there is an alternate way to express the data in BCNF (but I can't see how at all). Or perhaps I can make a query if I do a cross product of date with the other two relations and then join the results over date--is this possible/make sense (since the cross product operaton does not require common attributes)?
Thank you again.
Jonathan Leffler - 24 Feb 2004 06:14 GMT >>What are the unspoken rules about the data? Can a single patient see >>more than one doctor on a given date? Can they see the same doctor >>several times at the same clinic, in the same ward on the same date? > > All yes. In that case, you don't have a primary key for the schema you outline (unless your interpretation of 'date' includes a sufficiently precise time component). If the 'date' only identifies the day, then the only way to record the fact that the same patient sees the same doctor several times at the same clinic in the same ward on the same date is by using some sort of duplicate record - a possibility which the RM (relational model) denies.
>>Given the example data, it seems we can infer that both clinics 3 and >>4 have a ward 2. We can clearly create a 1NF relation by replicating [quoted text clipped - 14 lines] > that doctor names are unique). These were not an assumption on my part, but > facts that came with the data. Sorry if I wasn't clearer. OK - I didn't read that bit carefully enough; sorry.
So, we have an FD that Doctor --> Clinic. We also appear to have an FD that { Patient ID, Date } --> { Clinic, Ward }, though I'm not quite sure about the Clinic/Ward interrelationships.
>>Note that relational databases work very largely by comparing data in >>relations (tables) on the basis of comparable attributes (columns). [quoted text clipped - 12 lines] > > Patient Id, Date, Doctor -> Patient Name That is, I suppose, an FD, but it is not a minimal FD. The minimal FD is Patient ID --> Patient Name. The Date and Doctor have no effect on things.
> Patient Id, Date, Doctor -> Ward You have not said that a given doctor only works on one ward at the given clinic.
> Patient Id, Date, Doctor -> Clinic Again, you have a non-minimal FD; Doctor --> Clinic you said, so the augmented determinant { Patient ID, Date, Doctor } --> Clinic is also an FD, but it is not minimal.
> since (Patient Id, Date, Doctor) must be the primary key. Not according to what you said at the start of this - a single patient can be visited several times on a given date by a given doctor.
However, I'm willing to think that you misunderstood what I was asking and that this system only records that a given patient saw the given doctor on the given date, but not how many times the doctor actually visited the patient (or, equivalently, that the doctor only sees the patient once on any given date). Under that modified defnition, { Patient ID, Date, Doctor } is the primary key.
> But also I can > deduce that Patient Id -> Patient Name, and Patient Id -> Ward No; { Patient ID, Date } --> Ward.
> and finally > Doctor -> Clinic from the info I mentioned a couple of lines above. Also > Date on its own doesn't determine anything. Agreed.
> And basically that was how I > decomposed my data into BCNF, by having three relations where Patient Id -> [quoted text clipped - 8 lines] > possible/make sense (since the cross product operaton does not require > common attributes)? In another posting, you said:
> The original un-normalised datebase had tuples of the form: > [quoted text clipped - 6 lines] > > corresponding to the above attributes for one particular patient. So, we have:
{ Patient ID, Date, Doctor } --> { Patient Name, Ward, Clinic }
But this is nowhere near BCNF because there are all sorts of other FDs too. Notably: PATIENT: { Patient ID } --> { Patient Name } DOCTOR: { Doctor } --> { Clinic } PATIENT_VISIT: { Patient ID, Date } --> { Ward }
There is still an FD associating Patient, Date and Doctor: DOCTOR_VISIT: { Patient ID, Date, Doctor } --> { }
I'm not clear about how clinics and wards are related, if at all.
Given the four relations implied by the FDs named above (primary key on the LHS of the FD), it is possible to reconstruct your original data. There might also be some extra unidentified constraints related to clinics and wards - there's also a constraint that the Patient ID, Date combination in DOCTOR_VISIT must appear in PATIENT_VISIT; this is a foreign key constraint.
PATIENT: 3842 --> Smith DOCTOR: Haswell --> 3 Anderson --> 4 PATIENT_VISIT: 3842, 6.7.03 --> 2 3842, 9.9.03 --> 3 DOCTOR_VISIT: 3842, 6.7.03, Anderson --> { } 3842, 9.9.03, Haswell --> { }
These four relations can be joined on their various sets of common columns to recreate the original data. Oh well, I guess that sorts out your homework for you...
 Signature Jonathan Leffler #include <disclaimer.h> Email: jleffler@earthlink.net, jleffler@us.ibm.com Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/
Perception - 25 Feb 2004 22:28 GMT <SNIP>
Much appreciated. Thank you for your help.
rkc - 23 Feb 2004 14:14 GMT "Perception" <perceptionist2000@yahoo.com> wrote in message > The original un-normalised datebase had tuples of the form:
> Patient Id, Patient Name, Ward, Date, Clinic, Doctor > > with data of the form, for instance > > 3842, Smith, 2, 6.7.03, 4, Anderson > 9.9.03, 3, Haswell In a nutshell, you skipped 2nf and 3nf to get to what you somehow determined to be bcnf. How'd ya do that?
mAsterdam - 23 Feb 2004 01:00 GMT Hi, I am completely new to this group, but I'll give it a try.
>>What do you suppose the unary relation of dates means? >> > ... > A:Patient Identifier -> Patient Name, Ward > B:Date > C:Doctor -> Clinic Start with providing the predicates which would let the population data convey true propositions.
Example: A: A:Patient Identifier -> Patient Name, Ward ------------------ ------------ ---- 1 Johnson IC 2 Perception RE
What do these rows *mean* ? That should give a clue to the predicate for A.
Perception - 23 Feb 2004 18:50 GMT > Hi, > I am completely new to this group, but I'll give it a try. [quoted text clipped - 8 lines] > Start with providing the predicates which > would let the population data convey true propositions. Could you please elaborate on what you mean by true propositions?
> Example: > A: [quoted text clipped - 5 lines] > What do these rows *mean* ? > That should give a clue to the predicate for A. mAsterdam - 23 Feb 2004 23:12 GMT > "mAsterdam" <mAsterdam@vrijdag.org> wrote in message >>>... [quoted text clipped - 6 lines] > > Could you please elaborate on what you mean by true propositions? Well, you are supposed to be the domain expert.
Domain in both the broad sense of 'universe of discourse', what are we talking about, what kind of facts should the content of these rows reflect, and in a more detailed sense: what is the valid set of values for each attribute/column/field/item/property/whathaveyou (a.k.a. it's type).
>>Example: >>A: [quoted text clipped - 5 lines] >>What do these rows *mean* ? >>That should give a clue to the predicate for A. I can only take a wild guess from this. You should be able to make a more meaningful set, but you could take this to start with. A first try could be:
[Attributes]
Patient Identifer: type=integer. It comes into existence when <some meaningful event in the universe of discourse> (up to you)
Patient Name: type=text. Given name, manually copied from his id-card when hen the patient is admitted to a ward (... just fantasizing to give you an idea, ehrrrm elaborate)
Ward: type=two characters The code at the entrance door of the ward.
[Predicate]
A possible predicate could be:
A Patient, identified by "Patient Identifier", with the given name "Patient Name", has been released from our Ward "Ward" on last tuesday.
(still improvising, that much should be clear)
[Propositions]
The propositions would now become:
Patient 1, Johnson was released from the IC Ward on tuesday. Patient 2, Perception was released from the RE Ward on tuesday.
Now you may not like the 'on tuesday' part (or other parts). I added it to illustrate that *if* this would be the predicate, this would be the set of propositons conveyed by the population. It might be a nice list for the billing department (they would need mor, of course).
In short: what does it *mean*.
HTH
|
|
|