Database Forum / General DB Topics / DB Theory / November 2008
Few confusing things about first normal form
|
|
Thread rating:  |
Srubys@gmail.com - 22 Oct 2008 19:35 GMT greetings
1) For DB to be in 1NF there must be no multi-valued attributes, and no repeating groups. When so, data is said to be atomic. One site claims that even first name and last name must be featured in separate columns, but some other sites did put both first name and last name into a single column, and yet still claimed the table was in 1NF? So which is true?
2)
a) What else do we mean by repeating groups? Just duplicative columns and multi-valued attributes, or …?
b) What qualifies as a duplicative column? I imagine there are cases where it is not so obvious whether we are dealing with duplicative columns or not?
thank you
David Portas - 22 Oct 2008 21:23 GMT Formally speaking any relation is *always* in 1NF by definition. A table (for instance in SQL) is said to be in 1NF if it accurately represents some relation, ie. it has: no duplicate rows; unique column names; no nulls; at least one key. SQL doesn't permit multiple values in a column or repeating goups of columns so those problems do not arise.
Informally, other notions are sometimes claimed to be associated with 1NF. It is sometimes said that a column value "must" be "atomic" or that a collection of attributes representing the same or similar things is a violation of 1NF. The problem is that these are highly subjective ideas which don't make much sense as hard and fast rules. It's best to consider these as design guidelines and nothing to do with a formal definition of 1NF proper, which is simply the definition of a relation.
If you expect to learn these concepts from miscellaneous web sites then you will come to grief. Get a decent text by a respected author such as: http://www.amazon.com/Introduction-Database-Systems-8th/dp/0321197844
 Signature David Portas
Srubys@gmail.com - 23 Oct 2008 01:28 GMT greetings
I realize that the arguments you gave here basically also answered my first question in another thread. But with regards to my second question in that other thread, your argument here is also that SQL doesn’t allow multi valued attributes. But if we limit our discussion just to the theory, then multi valued attributes can exist. Thus table ( where ITEM column holds multiple values ) in my second question
ORDER ( ORDER_ID, ITEM )
is not normalized and as such the question is still valid?
> If you expect to learn these concepts from miscellaneous web sites then you > will come to grief. Get a decent text by a respected author such as: > http://www.amazon.com/Introduction-Database-Systems-8th/dp/0321197844 I only visit those sites if I don’t completely understand what my book is trying to convey ( which admittedly happens alot ).
I'm also unfortunatelly stuck with the book ( long story ) I have and need to understand everything in it
thank you kindly
David Portas - 23 Oct 2008 07:01 GMT On 23 Oct, 01:28, Sru...@gmail.com wrote:
> greetings > [quoted text clipped - 8 lines] > > is not normalized and as such the question is still valid? If ITEM truly is multi-valued then ORDER is not a relation. All attributes are equally important. The fact that it has a regular scalar attribute as a key is irrelevant because if ITEM isn't one value then the operators like equality, assignment and projection can't apply in their usual sense.
I don't mean to exclude the possibility of relation-valued attributes. A relation is a value so RVA's are perfectly OK in principle.
-- David Portas
Srubys@gmail.com - 23 Oct 2008 19:20 GMT greetings
> If ITEM truly is multi-valued then ORDER is not a relation. All > attributes are equally important. The fact that it has a regular [quoted text clipped - 3 lines] > I don't mean to exclude the possibility of relation-valued attributes. > A relation is a value so RVA's are perfectly OK in principle. What do you mean by relation-valued attribute? Regular attribute in relational table?
> I don't mean to exclude the possibility of relation-valued attributes. > A relation is a value so RVA's are perfectly OK in principle. I don’t quite understand what your arguments here
> Formally speaking any relation is *always* in 1NF by definition. A table > (for instance in SQL) is said to be in 1NF if it accurately represents some > relation, ie. it has: no duplicate rows; unique column names; no nulls; at > least one key. But table can also be declared without the key? Wouldn’t that mean that table isn’t in 1NF?
Anyways, I realize SQL doesn’t allow multi-valued attributes, but I figured at least in theory they would be allowed, since I can’t bring myself to believe that the author of my book was so off. Uh
thank you all for helping me
Gene Wirchenko - 23 Oct 2008 19:43 GMT [snip]
>Anyways, I realize SQL doesnt allow multi-valued attributes, but I >figured at least in theory they would be allowed, since I cant bring >myself to believe that the author of my book was so off. Uh No, they are not allowed under the RM, and there are good reasons for this. (You can break up multi-valued attributes into another relation.)
>thank you all for helping me Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
patrick61z@yahoo.com - 01 Nov 2008 13:03 GMT > Sru...@gmail.com wrote: > [quoted text clipped - 7 lines] > for this. (You can break up multi-valued attributes into another > relation.) The solution that makes rdbms purists happy is to declare the entire repeating group be instead a string or blob and let the application deal with the repeating-groupedness of the actual data.
> >thank you all for helping me Your welcome!
paul c - 22 Oct 2008 21:59 GMT > greetings > > 1) For DB to be in 1NF there must be no multi-valued attributes, and > no repeating groups. When so, data is said to be atomic. ... When Codd first used the word "atomic", he may have intended it very casually, as some of his intended audience were decision-makers, but (just as they often are today) many of those were non-technical people. It seems he was very much aware that no matter how good his idea was, it still needed to be sold.
CJ Date and like-minded people have discarded the term "atomic", saying it is at least imprecise and at worst meaningless as far as a Codd-ian relation is concerned.
They now like to use a more mathematical term - scalar. In their approach, in a the kind of relation Codd had in mind, the values of attributes are inherently scalar values. But many authors still natter on about atomic attributes. Maybe they are the kin of the 1960's secondary school teachers who were told to teach math from a set-theory perspective, never having been taught the latter themselves.
Anyway, Date and company's attitude might make one wonder if 2NF should be called 1NF, et cetera.
They discount "repeating groups" as well. As far as I can tell for the same reason. When Codd wrote his first papers, several hierarchical dbms's of the day as well as lower-level access methods had built-in support for such constructs. In fact I seem to remember that support had even found its way into Ansi Cobol. In his 1970 paper, Codd went out of his way to show that such a construct was redundant and therefore unnecessary.
Many in this camp also acknowledge the logical possibility of RVA's, relation-valued-attributes, at the same time as saying that the practical need for RVA's is rare.
Over the last hundred years or so, even the notion of atomic as being indivisible that physicists used has wandered, not just because it was discovered that atoms could be divided but also as more fundamental particles were discovered. Even before then, some had surmised that electrons might be jumping from atom to atom. Somewhere Date makes the point that data whose organization is expected to endure should be stored using durable principles. If the same goes for the lingo used, a mathematical noun/adjective seems better suited than a word like atomic that has so many nuances in common language.
David BL - 23 Oct 2008 03:39 GMT > Sru...@gmail.com wrote: > > greetings [quoted text clipped - 43 lines] > mathematical noun/adjective seems better suited than a word like atomic > that has so many nuances in common language. Is the value of an attribute that is an RVA a scalar?
paul c - 23 Oct 2008 14:11 GMT ...
> Is the value of an attribute that is an RVA a scalar? I believe it is ie., in a "containing" tuple it's a (single) relation value and (in the D&D approach) there is no way to operate on its individual tuples. Their algebra only operates on the value of the relation.
(They have an operator called UNGROUP that forms a different relation that has no RVA's. Definition in thethirdmanifesto.com, see appendix A. They don't the equality operator to treat the second relation as equal to the first even though some of the possible values for the containing /GROUPed relation might have exactly the same information as the UNGROUPed one. However, I don't believe they forbid an operator apart from equality that decides some kind of equivalence.)
Hope somebody will correct me if I've mangled what D&D say.
David BL - 24 Oct 2008 04:27 GMT > ... > [quoted text clipped - 14 lines] > > Hope somebody will correct me if I've mangled what D&D say. Isn't every type a scalar type? Why not drop the word "scalar" as meaningless? In fact it would seem a good idea to avoid the confusion with the rather specific meaning of "scalar" used in linear algebra (where it is associated with the field over which matrices and vectors are defined).
JOG - 27 Oct 2008 00:47 GMT > > ... > [quoted text clipped - 20 lines] > algebra (where it is associated with the field over which matrices and > vectors are defined). There is clash between the compsci use of the term, and the mathematical use. AFAIK Codd meant the former, a scalar meaning a variable that can only hold one value meeting the variable's type. One logical address - one item. This contrasts with an array, for example, which can contain more than one value, but does not require any form of decomposition to iterate through those values (i.e. it is not an object). One logical address - several items (lined up contiguosly)
While this once made sense in the low-level programming world, the uptake of OO (and specifically templates and containers) has severely confused these distinctions - an "Array" in pure OO languages /does/ require decomposition (through accessor methods), it is just that this is often hidden via overloading of an indexing syntax to mimic non- scalars (i.e. square brackets). A <vector> for example is one item, at one logical address, that just happens to contain lots of others.
Hence, to a modern high level programmer, who probably wouldn't have any notion of contiguous memory addressing, all variables are scalars and for it to be otherwise would seem bizarre.
patrick61z@yahoo.com - 01 Nov 2008 13:28 GMT > > > ... > [quoted text clipped - 40 lines] > any notion of contiguous memory addressing, all variables are scalars > and for it to be otherwise would seem bizarre. this is a big "depends". the need for contiguous memory addressing varies from not needed (a business tracking customers is easily relational), to pretty much mandatory (signal processing/ multimedia, do you really need to put each video frame into an oracle table record so that the customer can view your movies over odbc?). It is probably more of a definition of what you want your computer to be doing as a programmer.
Roy Hann - 23 Oct 2008 10:56 GMT >> greetings >> [quoted text clipped - 4 lines] > casually, as some of his intended audience were decision-makers, but > (just as they often are today) many of those were non-technical people. I think we can be certain he did intend it casually because he never went on to make any argument based on what it (might) mean. So in effect his failure to define it is no more significant than Euclid failing to define or comment on the concept of colour.
Furthermore we know that relational algebra doesn't provide tools to discern the internal structure of any value. If we were going to explain that to someone we'd probably also say values are atomic. It's an excellent word for the idea.
The usual misunderstanding seems to be to think that RT and SQL tell us we have to do something to our designs to *make* our values atomic. But that is backwards. We can use any kind of value we want, including the classic list of pizza toppings, but RT (if not SQL) will only ever be able to treat it as an atom. (Having said that, I am still not at all happy with RVAs! :-)
 Signature Roy
JOG - 23 Oct 2008 13:04 GMT On Oct 22, 7:35 pm, Sru...@gmail.com wrote:
> greetings > [quoted text clipped - 4 lines] > into a single column, and yet still claimed the table was in 1NF? So > which is true? The site you refer to, if you have understood it correctly, is mistaken. Both of the following are in 1NF:
R1 <name, age> R2 <firstname, lastname, age>
It is a design decision which is appropriate for your task. However, in almost all cases the second will be by far the better design .
> 2) > > a) What else do we mean by repeating groups? Just duplicative columns > and multi-valued attributes, or …? A repeating group occurs when there are attributes in a proposition which share the same role name. For example:
"person:Frank favourite pizza has topping:Peperoni, topping:Ham and topping:Pineapple on it".
In this instance the "topping" role represents a repeating group.
> b) What qualifies as a duplicative column? I imagine there are cases > where it is not so obvious whether we are dealing with duplicative > columns or not? In my experience it is always extremely obvious.
> thank you I would also echo Roy's sentiments concerning what is meant by atomicity. I understand Codd's use of the word atomic as meaning that a value cannot be decomposed by the relational algebra, unless it is a relation. That is not to say, of course, a particular DBMS will bolt on its own methods of decomposing data items - e.g. string or data functions - but these are (correctly) not part of the RA proper.
Walter Mitty - 23 Oct 2008 14:49 GMT greetings
>1) For DB to be in 1NF there must be no multi-valued attributes, and no repeating groups. When so, data is said to be atomic. One site claims that even first name and last name must be featured in separate columns, but some other sites did put both first name and last name into a single column, and yet still claimed the table was in 1NF? So which is true?
First name and last name do not have the same semantics. A person whose name is "SAMUEL CHRISTOPHER" does not have the same name as a person whose name is "CHRISTOPHER SAMUEL". Since the attributes are different attributes, putting them in spearate columns does not cause an implicit repeating group.
If you go across cultures, the semantics can become obscured. For example, the family name of "MAO ZEDONG" is "MAO". Many Europeans do not realize this.
paul c - 23 Oct 2008 17:37 GMT ...
> If you go across cultures, the semantics can become obscured. For example, > the family name of "MAO ZEDONG" is "MAO". Many Europeans do not realize > this. Instead of semantics why not just call it what it is: cultural misreading?
Gene Wirchenko - 23 Oct 2008 17:59 GMT >... >> >> If you go across cultures, the semantics can become obscured. For example, >> the family name of "MAO ZEDONG" is "MAO". Many Europeans do not realize >> this.
>Instead of semantics why not just call it what it is: cultural misreading? "semantics" has a known meaning. "cultural misreading" is weasel-wording.
Sincerely,
Gene Wirchenko
Computerese Irregular Verb Conjugation: I have preferences. You have biases. He/She has prejudices.
|
|
|