Database Forum / General DB Topics / DB Theory / August 2005
The naive test for equality
|
|
Thread rating:  |
David Cressey - 31 Jul 2005 19:16 GMT There's been discussion about whether the relational engine does or does not understand equality. I'd like to suggest that the relational engine does understand the "what" of equality testing, but is sometimes naive about the "how". Before I go any further with this, let me first describe a naive test for equality.
Here's how it works:
Every value has a type, and is represented by a bit string.
If values A and B are to be tested for equality, the first step is to see if the type of A and the type of B are the same. If not, then an error flag is raised, and the test returns FALSE.
If A and B are of the same type, then the length of the two bitstrings are compared. If the lengths are different, the test returns FALSE.
If the types are the same, and the lengths are the same, the bit strings are compared, one bit at a time. If any corresponding bit is different in A and B, the test returns FALSE.
Otherwise, the test returns TRUE.
Well, what makes me call this test naive? It's naive because it has tested the two representations for equality, and not the two values. The naive test will always work correctly, provided there are no synonyms and no homonyms. There are a lot of datatypes that have these two features, and for those datatypes, the naive test works just fine.
I'm going to sidestep the issue of homonyms.
What if we had a type engine with the following interesting feature: if we ask it to test "humid" and "moist" for equality, it returns TRUE. I'm going to sidestep the question of whether such a type engine is mathematically valid, and the question of how it is implemented.
The point is that the naive test fails to deliver the same answer as the type engine does, because it doesn't recognize "humid" and "moist" as synonyms.
Here's where the relational engine may be able to use the "what" of the equality test, but can't grasp the "how".
There's more, but I'll leave here for now.
DBMS_Plumber - 31 Jul 2005 22:58 GMT You define equality either a) by declaring an equal() relation and populating it with equality mappings (no one does this), or else b) by incorporating into the engine a module of non-declarative code which returns 'true' or 'false'.
For consistency, you might try adding a module 'compare()', instead of 'equal()', because that will get you the set of comparison operators to boot. Most Object-Relational DBMS products adopt this practice.
Numerous other practical problems arise:
a) You need to incorporate the comparison in the engine's infrastructure: sorts, indices, grouping, etc.
b) For some sort algorithms (radix) and other physical operations (hash joins, bloom filters, mapping tuples to partitioning schemes) the bit-wise representation of the value needs to be consistent with the logical operations (or you need to write code to convert your representation to something that's bit-wise consistent)
c) Nothing on the whole green earth can help you if you decide to change the definition of these functions in an operational system, or try to do a database restore with different functions. You have to unload the DBMS, and re-load it again, or else engage in huge amounts of conversion jiggery pokery (this factor heavily influences the design of type extensions)
d) Support for distributed computing needs care, because of byte-orderings, etc.
All of these trade-offs and questions were thoroughly explored in the systems research literature a decade ago. Most SQL DBMS products provide the functionality to some degree.
paul c - 01 Aug 2005 01:22 GMT > You define equality either a) by declaring an equal() relation and > populating it with equality mappings (no one does this), or else b) by [quoted text clipped - 29 lines] > systems research literature a decade ago. Most SQL DBMS products > provide the functionality to some degree. heh (that's a 'heh' on the 2nd glass of plonk). i'd say that everything you say is true within the scope of implementation which is fair enough, given your moniker, although i'd also say that how "You define equality" if it were turned into a question is kind of impossible outside of politically correct circles where people try to do all kinds of impossible things. my thrust was admittedly philosophical (which may explain my choice of moniker), or what i remember Hugh Darwen calling 'mystical'. i was trying in my clumsy way to say that equality is an elusive quality and 'sameness' is easier for me to deal with. admittedly, a philosopher could ask, "if they are the same thing, how could we compare them in the first place?". maybe that's why people make up axioms.
cheers, p
Marshall Spight - 02 Aug 2005 06:21 GMT I have to say I just don't understand why equality produces so much ink. It's not that complicated a concept: two values are equal if they are the same value.
I guess what makes it so complicated is that we're used to having to work with objects and memory locations and aliasing and so forth. These are all *implementation* complexities, though; the interface remains blindingly simple.
And as far as implementation goes, it seems to me that a type system constructed a certain way would always be able to write the equals function. It would need to understand the difference between ordered data and unordered data, so that it would be able to know the difference between { 1, 2 } and [ 1, 2 ]. It would then be able to know that { 1, 2 } and { 2, 1 } were the same, and that [ 1, 2 ] and [ 2, 1 ] were different. It would also need not to support stateful encapsulation, but I consider that an advantage. Stateful encapsulation is quite popular in OO languages, but it's the enemy of data management. You can't manage what you can't observe.
I don't really buy in to the idea that there's a difference between the RM engine's type system and the domain type system. (If only because if I have to implement a language, the type system's going to be the hardest part, and I only want to have to do that once!)
Marshall
David Cressey - 02 Aug 2005 13:37 GMT > I have to say I just don't understand why equality produces > so much ink. It's not that complicated a concept: two > values are equal if they are the same value. The reason it produces so much ink is that we never actually compare two objects directly. What we compare are two representations of objects to see if the represent the same object.
And the issues of homonyms and synonyms in representation schemes is worth quite a bit of ink.
I was thankful when DBMS_plumber informed us that these issues have been fully treated in the literature. It would have been very disconcerting to find out the opposite. And certainly urge anyone who proposes to build a DBMS to bone up on the literature first. The time invested will yield a good return.
Further, the plumber is very right to state that, once a bitwise representation scheme has been proven satisfactory with regard to identity, the next issue is order. But that's not where I'm going with this. I proposed the "naive test" as a straw man, and I hope no one thinks I was setting some kind of trap.
I'm still not done with identity. In particular, I'm not done with synonyms.
The synonyms "humid" and "moist" get into semantic issues that are way beyond where I want to go right now.
Instead, I'm going to suggest anagrams: a valid word with the same letters, but possibly permuted.
Thus if I ask for anagrams of: "post", I get "stop", "pots", etc. If I extend the definition of anagrams just slightly, and define that every word is an anagram of itself, now the relationship is reflexive. It's clearly symmetric and transitive, so it's a flavor of "equality".
But it's a flavor of equality where the combinatorics get quickly out of hand.
And that's what interests me in this discussion.
Marshall Spight - 02 Aug 2005 15:07 GMT > > I have to say I just don't understand why equality produces > > so much ink. [quoted text clipped - 6 lines] > word is an anagram of itself, now the relationship is reflexive. It's > clearly symmetric and transitive, so it's a flavor of "equality". Sure. Specifically, it's an equivalence relation. Let's distinguish between the equality relation specifically and equivalence relations in general. Equality is a much simpler thing.
> But it's a flavor of equality where the combinatorics get quickly out of > hand. > > And that's what interests me in this discussion. I see. Well, maybe I don't actually. But I'm following you so far.
Marshall
Paul - 02 Aug 2005 17:03 GMT Marshall Spight wrote:
> Sure. Specifically, it's an equivalence relation. Let's distinguish > between the equality relation specifically and equivalence relations > in general. Equality is a much simpler thing. Is it, though? We think about 1/2 = 2/4 fine even though they have different representations. Maybe you mean "identity", often shown using a variant of the equals sign with three lines instead of two?
It's rare that we use the "equals" symbol to compare two things with identical representation: 1=1, we'd use the "identity" symbol instead.
It's all about levels of abstraction: equality at the physical layer (representation) may differ from equality at the logical layer (value).
So for the underlying relational engine to compare two values for equality, it can't in general stay in the physical layer; it has just jump up into the logical layer, do the comparison, and then jump back down to the physical layer again.
Paul.
Marshall Spight - 02 Aug 2005 19:49 GMT > Marshall Spight wrote: > > Sure. Specifically, it's an equivalence relation. Let's distinguish > > between the equality relation specifically and equivalence relations > > in general. Equality is a much simpler thing. > > Is it, though? I wasn't clear whether you were questioning my "it's an equivalence relation" or my "equality is [simple]."
> We think about 1/2 = 2/4 fine even though they have > different representations. Sure, because they are the same value. Thining too much about representation can only confuse you. :-)
> Maybe you mean "identity", often shown using > a variant of the equals sign with three lines instead of two? No, I certainly do not mean identity. In fact, I specifically reject the concept of identity in the field of data management. It's useful in OOP to distinguish between equality and identity, but introducing identity in data management is a disaster.
> It's rare that we use the "equals" symbol to compare two things with > identical representation: 1=1, we'd use the "identity" symbol instead. I not sure I agree that it is rare. It is certainly correct.
> It's all about levels of abstraction: equality at the physical layer > (representation) may differ from equality at the logical layer (value). Sure. Note I am not discussing the physical; I'm discussing the logical and conceptual.
I said: "I guess what makes it so complicated is that we're used to having to work with objects and memory locations and aliasing and so forth. These are all *implementation* complexities, though; the interface remains blindingly simple."
> So for the underlying relational engine to compare two values for > equality, it can't in general stay in the physical layer; it has just > jump up into the logical layer, do the comparison, and then jump back > down to the physical layer again. I am unclear as to what you are saying here. The implementation only operates at the implementation level. The implementation implements the logical level, or "interface." I don't know what you mean by "jump up."
Sure, the implementation of a function to test for equality might be more than just a binary region compare, but that doesn't mean that it's not still the implementation.
Marshall
David Cressey - 03 Aug 2005 15:59 GMT > Sure, because they are the same value. Thining too much about > representation can only confuse you. :-) I think the above could be applied to any topic in computer science. We are always manipulating representations, aren't we?
<silly> Baba Louie: I theen' we better get outta here, Quistro.
QuickDraw McGraw: I'll do the thinnin' around here Baba Louie, and don' you fergit it! </silly>
Marshall Spight - 03 Aug 2005 19:37 GMT > > Sure, because they are the same value. Thining too much about > > representation can only confuse you. :-) [quoted text clipped - 8 lines] > you fergit it! > </silly> Homer Simpson: D'oh!
Left out the k.
Marshall
Gene Wirchenko - 03 Aug 2005 18:13 GMT [snip]
>Sure, because they are the same value. Thining too much about >representation can only confuse you. :-) Not the you-thou bit again.
[snip]
Sincerely,
Gene Wirchenko
Paul - 03 Aug 2005 21:46 GMT Marshall Spight wrote:
>>>Sure. Specifically, it's an equivalence relation. Let's distinguish >>>between the equality relation specifically and equivalence relations [quoted text clipped - 4 lines] > I wasn't clear whether you were questioning my "it's an equivalence > relation" or my "equality is [simple]." I guess what I'm saying is that equality isn't really simpler than equivalence relations - they're kind of the same thing really.
>>So for the underlying relational engine to compare two values for >>equality, it can't in general stay in the physical layer; it has just [quoted text clipped - 5 lines] > the logical level, or "interface." I don't know what you mean > by "jump up." what I mean is that you have the relational part and the domain part with their separate physical implementations - but the only way they can talk to each other to establish equality is via their "logical" interfaces - so going up an abstraction level. As opposed to the "naive" way where the relational part can establish equality on its own (using bit representation) without needing the domain part at all. I'm not really saying anthing new here, just rehashing existing posts but what the hell, it might be useful for someone to see something from several angles!
Paul.
VC - 02 Aug 2005 23:22 GMT Hi,
> Marshall Spight wrote: >> Sure. Specifically, it's an equivalence relation. Let's distinguish [quoted text clipped - 4 lines] > different representations. Maybe you mean "identity", often shown using > a variant of the equals sign with three lines instead of two? The standard construction of rationals, as introduced in high school, is:
Let ZxZ' be a set of all ordered pairs of integers (x,y) where x is not a zero (Z' = {Z minus 0}). Let's define an equivalence relation E as
(x1,y1) E (x2, y2) iff x1*y2 = y1*x2
Then rationals are a set Q of equivalence classes defined by the above relation. Technically, one has to *prove* that E is indeed an equivalence relations and that operations like addition and multiplication are well defined and obey the usual laws, etc.
There is no neeed to talk about some vague representations and such, one can simply speak in clear terms of integers and equivalence classes instead.
Paul - 03 Aug 2005 21:25 GMT > Then rationals are a set Q of equivalence classes defined by the above > relation. Technically, one has to *prove* that E is indeed an equivalence [quoted text clipped - 3 lines] > There is no neeed to talk about some vague representations and such, one > can simply speak in clear terms of integers and equivalence classes instead. well, the equivalence class can be thought of as a set of possible representations for the "value" that "is" the equivalence class (feeling like Clinton here explaining what I mean by "is" :))
By "representation" I mean the actual symbols used to convey the idea of a "value", and they may be several of these representations for one value.
Paul.
VC - 04 Aug 2005 03:48 GMT >> Then rationals are a set Q of equivalence classes defined by the above >> relation. Technically, one has to *prove* that E is indeed an [quoted text clipped - 9 lines] > representations for the "value" that "is" the equivalence class (feeling > like Clinton here explaining what I mean by "is" :)) The usual definition of the equivalence class goes is:
Let E be an equivalence relation on the set S. Then, for a given element e in S, its equivalence class is a set of all elements in S that are equivalent to e:
[e] = {x in S| x E e}.
I have no idea what a 'possible representation' might be.
> By "representation" I mean the actual symbols used to convey the idea of > a "value", and they may be several of these representations for one value. I do not understand this.
> Paul. David Cressey - 04 Aug 2005 14:41 GMT > > By "representation" I mean the actual symbols used to convey the idea of > > a "value", and they may be several of these representations for one value. > > I do not understand this. I think the term "literal value" from classical programming language documents might be relevant here.
The literal value conveys from the writer to the reader a specific value from one of the types. Thus
12345 is a literal value 123.45 is a literal value '123.45' is a literal value (of a string).
vc - 04 Aug 2005 15:57 GMT > > > By "representation" I mean the actual symbols used to convey the idea of > > > a "value", and they may be several of these representations for one [quoted text clipped - 11 lines] > 123.45 is a literal value > '123.45' is a literal value (of a string). <Paul> wrote:
"well, the equivalence class can be thought of as a set of possible representations for the "value" that "is" the equivalence class "
I do not see how 'possible representations' (whatever they are), or 'literals', are relevant to the simple notion of equivalence class.
Thanks.
Paul - 06 Aug 2005 15:54 GMT >> <Paul> wrote: >> "well, the equivalence class can be thought of as a set of possible >> representations for the "value" that "is" the equivalence class " > > I do not see how 'possible representations' (whatever they are), or > 'literals', are relevant to the simple notion of equivalence class. maybe you're readng more into it than I mean.
Probably a concrete example might best explain what I'm trying to say.
Consider simple fractions. You have several ways of writing the number 0.5, for example 1/2, 2/4, 3/6, etc. (infinitely many in fact). I'm just saying that all these are possible ways of representing the same number or "value".
You gave the details of how the rationals are constructed mathematically using equivalence relations. In practice, you aren't going to write the rational number 0.5 as the set (1/2, 2/4, 3/6, ...), you will pick one example and use that. In think the standard notation used is square brackets e.g. [1/2] to denote the equivalence class to which 1/2 belongs. Or you could just as well use [2/4].
I've kind of lost track of what started this thread in the first place now! I think it was just to say I didn't think there was any real difference between equality and equivalence relations. Each one defines the other.
When we write 1/2 or 2/4 it is just shorthand for "[1/2]" or "the equivalence class containing 1/2" so 1/2 and 2/4 are actually identical at some level. But clearly at the level of marks on paper or bytes on a computer they are different. And these two levels correspond to the physical and logical levels of the relational model. So something can be equal at the logical level but different at the physical level.
Am I just stating the obvious in a very roundabout way? The orginal post gave an example of strings with a definition of equality that made anagrams equal to each other. The claim was made that that wasn't a proper example because it was an equivalence relation rather than a "plain" equality and I'm just rebutting that claim. I think that was the whole point of this somewhat rambling post.
Slightly confusing the issue is the fact that we are using the word "relation" in a mathematical rather than database sense here.
Paul.
Marshall Spight - 06 Aug 2005 18:02 GMT > I've kind of lost track of what started this thread in the first place > now! I think it was just to say I didn't think there was any real > difference between equality and equivalence relations. Each one defines > the other. Equality is a particular type of equivalence relation. It is the kind where every value is its own equivalence class. Put another way, in equality, the equivalence classes all have cardinality 1.
(This is why I call it "simpler," but it's not a big deal.)
Marshall
Paul - 06 Aug 2005 20:30 GMT Marshall Spight wrote:
>>I've kind of lost track of what started this thread in the first place >>now! I think it was just to say I didn't think there was any real [quoted text clipped - 4 lines] > where every value is its own equivalence class. Put another way, > in equality, the equivalence classes all have cardinality 1. That's not how I interpret it. The way I see it, an equivalence relation *defines* what we mean by equality with respect to a given structure.
So for example you start with expressions of the form "x/y", with x and y integers (y!=0)
Now to begin with, "1/2" != "2/4"
But you create an equivalence relation as VC described, which is basically grouping certain integer pairs together to create a different structure. And you use this equivalence relation to *define* what you mean by "equality" on your new structure. So [1/2] = [2/4]. But conventionally you drop the square brackets indicating the equivalence class and write 1/2 = 2/4, which maybe confuses things though.
So for the rational numbers, you have equality but the corresponding equivalence classes on ZxZ *don't* have cardinality 1
Paul.
Marshall Spight - 06 Aug 2005 22:23 GMT > Marshall Spight wrote: > >>I've kind of lost track of what started this thread in the first place [quoted text clipped - 8 lines] > That's not how I interpret it. The way I see it, an equivalence relation > *defines* what we mean by equality with respect to a given structure. I suppose. We're not quite talking about the same thing, though. I'm talking about classes of values, and you're talking about classes of lexical representations of values. Or maybe you're talkin about expressions; I'm not entirely certain.
(Does the string-of-symbols "one half" also belong in the equivalence class with "1/2" and "2/4"?)
I think it is more useful to think about 1/2 and 2/4 being the same value because of the semantics of division. Once you move into the world of values and out of the world of representations, things get a lot simpler.
Marshall
David Cressey - 06 Aug 2005 22:30 GMT > Marshall Spight wrote: > >>I've kind of lost track of what started this thread in the first place > >>now! I think it was just to say I didn't think there was any real > >>difference between equality and equivalence relations. Each one defines > >>the other. OK, let me jump in with what I think I was about with this thread in the first place.
A lot of other discussion hinges about the interaction between the relational engine and the type engine.
I think there are multiple layers of representation/interpretation in any system of representing meaning using symbols.
For certain equalities of the underlying things ("values" for some people), it's the type engine that knows when two tokens are representations for the same underlying thing. Thus, if we want to know whether 1/2 s really equal to 2/4 or not, we consult the appropriate tpye engine, in this case the rational type engine. If we want to know whether 123.45E1 is really equal to 12.345E2 we consult the floating point number type engine.
As far as I'm concerned "consulting the type engine" is another way of saying what VC said when he said we must put the items in context before we can test them for equality.
With regard to whether A is equal to B or not, we need to consult three engines:
First the variable typing engine to see if A and B are or are not of the same type. If we have static typing of variables, we can do this test at compile time.
Next, the variable state remembering engine to retrieve the current (in context) value of A and B.
Next the type engine determined by the common type of A and B to find out whether the values are really the same or not.
If we omit the last step, we will end up doing a naive test for equality. If there are any synonyms that the type engine knows about and we don't know about, then our test for equality will be naive.
Here's where I'm going with this: in an SQL DBMS, where is the type engine for the type "SQL Table". Or isn't there one?
dawn - 07 Aug 2005 04:54 GMT > > Marshall Spight wrote: > > >>I've kind of lost track of what started this thread in the first place [quoted text clipped - 41 lines] > Here's where I'm going with this: in an SQL DBMS, where is the type engine > for the type "SQL Table". Or isn't there one? I don't know the answer to this question directly, but I've been thinking about what I understood to be your question and have a couple of comments that might or might not be relevant.
If you are equality between relations (not just header), you are including equality related to words and their referents, not only mathematical expressions. [I'm usng the term "referents" here as used in semiotics rather than grammer or programming languages.] We have the concept of equality defined in mathematics, but we do not have the same in a language like English. Even with the concept of synonyms, we are not talking "equals".
If we were to stretch the meaning so that if two word values are close enough in meaning we call them equal, we would end up with some of the same issues that arise when attempting to model the entire language with mathematics. Too much interpretation, context, pragmatics (if I'm using that term correctly, again from semiotics) lies outside of what we capture or even can capture in the metadata. We would get into the "Time flies like an arrow" problems. Is that equal to "Time flies enjoy an arrow"?
We would also need to be able to determine if 'Jo Doe' was the same person as 'Jo Doe Jr' but the data entry person in the one case did not enter Jr as a suffix. Or that Pat DeJong in one table is the same person as Pat DeJong in another. They could even both have the same unique id (candidate key) value, doled out by two different systems for two different people, but the representation (string value) for each is the same. How would you have enough information to be certain these were the same people? You would need to have a unique identifier not just for a table, but for the entire human race (under some conditions you could require the exact same ssn, for example) and the system would have to know that. Think of the various algorithms that attempt to match two names as being the same in order to help de-dup data. They only provide assistance, nothing close to exact.
And then there is the fact that you and I both speak English, but have missed with each other more than once. Two people often cannot agree on what words are equal.
In case I'm not understanding your question (am I in the ballpark?), this response might be completely irrelevant in which case, nevermind.
cheers! --dawn
VC - 08 Aug 2005 04:10 GMT >... The way I see it, an equivalence relation > *defines* what we mean by equality with respect to a given structure. [quoted text clipped - 10 lines] > conventionally you drop the square brackets indicating the equivalence > class and write 1/2 = 2/4, which maybe confuses things though. Right...
> So for the rational numbers, you have equality but the corresponding > equivalence classes on ZxZ *don't* have cardinality 1 Not quite right. In the case of rationals equality, you treat the equivalence class, as a whole, as a single element. E.g, for integers you'd say 2=2; for rationals you'd say [5/10] = [1/2], no difference really since both [5/10] and [1/2] is the *same* element. In other words, your *equality* relation pair would be, say, for integers (1,1) and for rationals (E_half, E_half), where E_half = {1/2, 2/4,, 5/10, ..} etc.
> Paul. David Cressey - 08 Aug 2005 11:37 GMT > Not quite right. In the case of rationals equality, you treat the > equivalence class, as a whole, as a single element. E.g, for integers > you'd say 2=2; for rationals you'd say [5/10] = [1/2], no difference > really since both [5/10] and [1/2] is the *same* element. In other words, > your *equality* relation pair would be, say, for integers (1,1) and for > rationals (E_half, E_half), where E_half = {1/2, 2/4,, 5/10, ..} etc. Right. The entire equivalence class is a single element as viewed by the rationals engine. In order to manipulate this "single element" as data, we need a symbol for it, to represent it.
So we choose one of the elements of the original set to stand as a representative of the entire set that is going to be seen as an element. In this case we might choose the rational with the lowest denominater, namely 1/2.
Now, whenever we are given an unnormalized rational, such as 5/10, we ask the rationals engine to normalize it for us. The rationals engine knows the rule for normalizing, namely remove common factors in the numerator and denominator. So it returns 1/2, the normalized equivalent of 5/10.
If we ask the rationals engine to normalize 1/2, it will give us back 1/2.
So the process of normalizing is choosing one, out of an equivalence class, according to some criterion, and using the symbol that represents the chosen element to act as the normalized form for the entire class.
Marshall Spight - 08 Aug 2005 17:01 GMT > Right. The entire equivalence class is a single element as viewed by the > rationals engine. [quoted text clipped - 17 lines] > according to some criterion, and using the symbol that represents the > chosen element to act as the normalized form for the entire class. I don't see how this is a particularly useful way to look at the issue. It doesn't separate the idea of the lexical symbols we use to display and enter values, and the abstract values themselves.
The way I look at it is, when the compiler sees "5/10" it converts it into a value. The value it converts it into is the same value as when it sees "1/2" because the two are the same value.
I also don't see the benefit of talking about separate engines.
Marshall
David Cressey - 09 Aug 2005 06:52 GMT > > So the process of normalizing is choosing one, out of an equivalence class, > > according to some criterion, and using the symbol that represents the > > chosen element to act as the normalized form for the entire class.
> I don't see how this is a particularly useful way to look at > the issue. It doesn't separate the idea of the lexical symbols > we use to display and enter values, and the abstract values > themselves. The word "symbols" refers not only to the symbols used to exchange data between people and computers, but also to each of the data items inside the computer. In other words, what the computer stores is all symbolic, right down to the most atomic symbols, zero and one. Symbols can be made up of other symbols, strung together. Thus the symbol made up of 11000000 (starting from least significant bit) is a string of symbols that can represent the number three.
When various "engines" (or "objects" if you prefer) inside a large system exchange data with each other (or "messages" if you prefer), they use symbols to communicate with each other.
> The way I look at it is, when the compiler sees "5/10" it converts > it into a value. The value it converts it into is the same value > as when it sees "1/2" because the two are the same value. That's when the compiler sees it. But the number "5/10" could be generated at run time as well. If such an expression is evaluated at run time, it will evaluate to "1/2" (or some bit pattern that is used to represent that number).
> I also don't see the benefit of talking about separate engines. Over the past six weeks, there has been much discussion about what the "relational engine knows" (or "knows about") and what the "type engine knows" (or "knows about"). The prevailing wisdom has been that the type engine understands equality (within its type), but the relational engine does not. The discussion in terms of separate engines proceeds from here.
I started this discussion, about the naive test for equality, based on the supposition that if you just compared two representations of a value, you could tell whether they are the same or not. The synonym and homonym problems are two classic problems that arise whenever you do that. They were referred to as "synonym problem" and "homonym problem", in the literature of about 30 years ago. I'm sorry, but my memory fails me when it comes to citing sources for this.
The two words, "synonym" and "homonym" are borrowed from the argot of natural linguistics, but the two problems arise whenever data is represented.
> Marshall vc - 09 Aug 2005 21:30 GMT > > > So the process of normalizing is choosing one, out of an equivalence > class, [quoted text clipped - 9 lines] > between people and computers, but also to each of the data items inside the > computer. So now we have, in addition to 'representation', a new word 'symbol'. What is even worse, in your vocabulary, it means two different things. Nice..
>In other words, what the computer stores is all symbolic, right > down to the most atomic symbols, zero and one. This is not true. What the computer uses to store numbers (and characters) is called bits, not symbols. Besides, the way the computer implements numbers and characters is entirely irrelevant at the logical level.
> When various "engines" (or "objects" if you prefer) inside a large system > exchange data with each other (or "messages" if you prefer), they use > symbols to communicate with each other. This phrase is so ambiguous as to be almost devoid of meaning. What are "engines" and how do they "exchange data" ? What precisely do you mean ? Hardware components ? Abstract stuctures communicating using some protocol ? Or something else ?
>... > But the number "5/10" could be generated > at run time as well. If such an expression is evaluated at run time, it > will evaluate to "1/2" (or some bit pattern that is used to represent that > number). Whatever bit pattern is used to implement a number is irrelevant at the logical model level.
> I started this discussion, about the naive test for equality, based on the > supposition that if you just compared two representations of a value, you [quoted text clipped - 7 lines] > natural linguistics, but the two problems arise whenever data is > represented. In modelling, "synonym/homonym problems" are problems only when they are self-induced.
> > Marshall mAsterdam - 09 Aug 2005 22:13 GMT [snip]
>>The two words, "synonym" and "homonym" are borrowed from the argot of >>natural linguistics, but the two problems arise whenever data is >>represented. > > In modelling, "synonym/homonym problems" are problems only when they > are self-induced. What do you mean by that? I've done quite some practical modelling with teams. I never experienced the problem not coming up. Synonyms and homonyms had to be delt with (and we didn't always succesfully). I'ld appreciate any hints to recognize them as early as possible.
VC - 09 Aug 2005 23:03 GMT > [snip] >>>The two words, "synonym" and "homonym" are borrowed from the argot of [quoted text clipped - 6 lines] > What do you mean by that? I've done quite some practical modelling > with teams. I never experienced the problem not coming up. For example ?
>Synonyms > and homonyms had to be delt with (and we didn't always succesfully). > I'ld appreciate any hints to recognize them as early as possible. mAsterdam - 10 Aug 2005 18:46 GMT >>[snip] >>>>The two words, "synonym" and "homonym" are borrowed from the argot of [quoted text clipped - 8 lines] > > For example ? What do you mean by "self-induced"?
VC - 10 Aug 2005 22:09 GMT >>>[snip] >>>>>The two words, "synonym" and "homonym" are borrowed from the argot of [quoted text clipped - 10 lines] > > What do you mean by "self-induced"? Self-inflicted (synonym)
mAsterdam - 10 Aug 2005 22:45 GMT >>>>>>...The two words, "synonym" and "homonym" are borrowed from >>>>>>the argot of natural linguistics, but the two problems arise [quoted text clipped - 11 lines] > > Self-inflicted (synonym) The "Self" being the modeller, right? When modelling is done by teams there are more selves. Any two people even when working together closely for years have different associations and connotations with some words some time.
Another, less cryptic example:
Say a team tries to meet the requirement that it should be possible to find out where a piece of information came from.
One thinks 'origin', another one thinks 'source'. (1)
Let's say they talk about it and decide on 'source'.
One thinks 'the source code of a program' because yesterday he spent some time finding a source-file, another one thinks 'the external agent providing the piece of information' because he just finished a business process analysis session. (2)
Both the synonym-problem (1) and the homonym-problem (2) may very well be recognized and resolved, of course. Or not. Or to late.
VC - 11 Aug 2005 01:55 GMT >>>>>>>...The two words, "synonym" and "homonym" are borrowed from >>>>>>>the argot of natural linguistics, but the two problems arise [quoted text clipped - 17 lines] > years have different associations and connotations > with some words some time. Presumably the team has meetings at which they discuss the stuff they interested in and come to some agreement as to what terminology they want to use and what the terms are supposed to mean. It's, like, introduction to modelling 101. Besides, you describe a hypothetical terminology selection/definition process yourself, so it's not clear what the problem might be unless the "team" neglects to identify, say, data objects and relationships [self-infilcts potential pain because of not doing required work].
> Another, less cryptic example: > [quoted text clipped - 10 lines] > piece of information' because he just finished > a business process analysis session. (2) You are kidding, right ? If the modellers chose the name/label "source" and did not define what entity the name refers to, then the name is just meaningless, like say "fshsalkfd". Apparently, your hypothetical modellers are not modellers but some kind of impostors.
> Both the synonym-problem (1) and the homonym-problem (2) may > very well be recognized and resolved, of course. > Or not. Or to late. As I wrote before, data modelling is not a work of [literary] fiction where one needs to bother with stuff like synonyms, homonyms, metaphors, metonymy and what not. Just identify the entities, invent (or use commonly accepted ) names for them and you'll be a happy camper without any need to hide behind high-faluting nonsense like "synonym problem", "conceptual object type" or some such.
Cheers.
dawn - 11 Aug 2005 04:48 GMT > >>>>>>>...The two words, "synonym" and "homonym" are borrowed from > >>>>>>>the argot of natural linguistics, but the two problems arise [quoted text clipped - 17 lines] > > years have different associations and connotations > > with some words some time. Definitely.
> Presumably the team has meetings at which they discuss the stuff they > interested in and come to some agreement as to what terminology they want [quoted text clipped - 24 lines] > meaningless, like say "fshsalkfd". Apparently, your hypothetical modellers > are not modellers but some kind of impostors. It is usually much more subtle than that. Everyone agrees that we need to know whether or not someone is a fullTimeStudent. Ignore the fact that this would likely be a derived attribute -- it illustrates the problem. After some sessions with folks from many departments, the analyst works to get more precision and sits down with someone who knows all of the tuition rules, along with another person ('cause the analyst is no rookie) and they nail down this attribute with the precision of a surgeon.
The system goes live and the financial aid people are irate! Federal aid has just been removed from students because they were no longer flagged as being a fullTimeStudent when by the standards for this financial aid, they clearly ARE a fullTimeStudent.
Then you find out that these two departments use the very same term and might even both have external reasons to use the very same term, and they use it with just slightly different meanings.
It does help if there is a well-maintained and easily used catalog / dictionary / metadata repository. But words are just that. --dawn
VC - 11 Aug 2005 12:02 GMT ....
>> You are kidding, right ? If the modellers chose the name/label "source" >> and [quoted text clipped - 20 lines] > might even both have external reasons to use the very same term, and > they use it with just slightly different meanings. Apparently, the analysts made a mistake in assuming that the set of fullTimeStudents is equal to the set of studentsEligibleForFinancialAid. I did not claim that one can correctly analyze a complex system at one go, it's an iterative process of trial end error. Besides, your example is *not* about naming issues (as you understand yourself) -- presumably there was no ambiguity about the "student" entity .
> It does help if there is a well-maintained and easily used catalog / > dictionary / metadata repository. But words are just that. --dawn dawn - 11 Aug 2005 15:51 GMT > .... > > [quoted text clipped - 25 lines] > Apparently, the analysts made a mistake in assuming that the set of > fullTimeStudents is equal to the set of studentsEligibleForFinancialAid. In this case, yes, but it also happens frequently where such a term is used the same when the analysis is done, but something changes (government regulation or something more subtle) that changes the meaning slightly for one group and not another, so that these differences creep in.
> I > did not claim that one can correctly analyze a complex system at one go, > it's an iterative process of trial end error. and needs to be attended to for the life of the attribute name
> Besides, your example is > *not* about naming issues (as you understand yourself) I thought it was about the name and def of an attribute.
> -- presumably there > was no ambiguity about the "student" entity . There are always differences of opinion about what constitutes a student on a campus. Finance people often use the term as if the student were the same as a corporate customer. Student = Customer. If someone has received some approval to audit a course for zero dollars, the instructor might consider them a student. That is just an example, but the point is that entity names are also just words and are interpreted by humans, each of whom brings a different context to the meaning of the word.
I've been reading and writing too fast lately and might have missed the point, so I'll re-read the thread before posting again. cheers! --dawn
Gene Wirchenko - 11 Aug 2005 17:58 GMT [snip]
>> > It is usually much more subtle than that. Everyone agrees that we need >> > to know whether or not someone is a fullTimeStudent. Ignore the fact [quoted text clipped - 13 lines] >> > might even both have external reasons to use the very same term, and >> > they use it with just slightly different meanings.
>> Apparently, the analysts made a mistake in assuming that the set of >> fullTimeStudents is equal to the set of studentsEligibleForFinancialAid. [quoted text clipped - 4 lines] >meaning slightly for one group and not another, so that these >differences creep in. In British Columbia (and presumably Canada since I have seen federal use of this meaning), you can be a full-time student by taking three three-credit courses in a semester. The usual full course load is five. This is not the commonsense definition, but it is the definition used.
[snip]
>There are always differences of opinion about what constitutes a >student on a campus. Finance people often use the term as if the [quoted text clipped - 4 lines] >interpreted by humans, each of whom brings a different context to the >meaning of the word. Such a student is a student by the normal use of the term. I think this factor is what causes a lot of the trouble.
At my alma mater, there are three major classifications: student, faculty, and staff. They are not mutually exclusive. I have known of faculty who were students and staff who were faculty. There is nothing stopping a staff member from taking a course (making him also a student) or for someone to be in all three categories at the same time.
I am a resident of the U.S.A. I am not a U.S. citizen. If someone conflates the two, there could be a problem.
>I've been reading and writing too fast lately and might have missed the >point, so I'll re-read the thread before posting again. >cheers! --dawn I think you are doing fine.
Sincerely,
Gene Wirchenko
mAsterdam - 11 Aug 2005 19:14 GMT > [snip] >>>>...two departments use the very [quoted text clipped - 12 lines] > is five. This is not the commonsense definition, but it is the > definition used. Used by all? Or only by non-commonsensical people? I'm overstating here surely, but I want to point out that is definition is for a purpose. People/business/departments who support this purpose will tend to use it - and check some register or student card to verify wether somebody who claims to be a student actually is. Others will simple ask: are you a student? (e.g. for downloading some software) and accept the answer as truth.
>>There are always differences of opinion about what constitutes a >>student on a campus. Finance people often use the term as if the [quoted text clipped - 7 lines] > Such a student is a student by the normal use of the term. > I think this factor is what causes a lot of the trouble. Could you elaborate some on this factor?
> At my alma mater, there are three major classifications: student, > faculty, and staff. They are not mutually exclusive. I have known of > faculty who were students and staff who were faculty. There is > nothing stopping a staff member from taking a course (making him also > a student) or for someone to be in all three categories at the same > time. Let's not draw subtyping into it at this point. (Other thread welcome :-)
Gene Wirchenko - 11 Aug 2005 19:28 GMT >> [snip] >>>>>...two departments use the very [quoted text clipped - 14 lines] > >Used by all? Or only by non-commonsensical people? By the common definition used at such institutions. This is probably informed by that these defintions are used by the provincial and federal governments for student loans and on tax returns.
Others are free to use a more literal meaning.
>I'm overstating here surely, but I want to point out >that is definition is for a purpose. [quoted text clipped - 4 lines] >(e.g. for downloading some software) and accept the >answer as truth. There is no argument from me on that.
>>>There are always differences of opinion about what constitutes a >>>student on a campus. Finance people often use the term as if the [quoted text clipped - 9 lines] > >Could you elaborate some on this factor? One who studies. If I study medieval history, I am a student. I might not be enrolled anywhere. I could even be a leading authority in the field.
>> At my alma mater, there are three major classifications: student, >> faculty, and staff. They are not mutually exclusive. I have known of [quoted text clipped - 5 lines] >Let's not draw subtyping into it at this point. >(Other thread welcome :-) I am not subtyping, just saying that the statuses are not mutually exclusive.
Sincerely,
Gene Wirchenko
mAsterdam - 11 Aug 2005 19:38 GMT [snip agreement]
>>>>There are always differences of opinion about what constitutes a >>>>student on a campus. Finance people often use the term as if the [quoted text clipped - 13 lines] > might not be enrolled anywhere. I could even be a leading authority > in the field. I see what you mean, but I am not sure you got my question right. I meant: what is this factor which is causing a lot of trouble? In more modern words: what is the anatomy of this anti-pattern? We might learn to more easily recognize it.
>>> At my alma mater, there are three major classifications: student, >>>faculty, and staff. They are not mutually exclusive. I have known of [quoted text clipped - 8 lines] > I am not subtyping, just saying that the statuses are not > mutually exclusive. Ok.
Gene Wirchenko - 11 Aug 2005 20:09 GMT >[snip agreement] > [quoted text clipped - 20 lines] >In more modern words: what is the anatomy of this anti-pattern? >We might learn to more easily recognize it. I think that the trouble comes from overloading terms. "student" already has a meaning. What distinguishes the special meaning from the more literal meaning? If I do not know that a special meaning is in use in a specific context, I can make a lot of mistakes.
I coin terms for our in-house client billing system. Two examples are "Work Function Code" and "Work Classification Code". These terms have precise meanings. It is possible for someone to misinterpret these, but I think that they are sufficiently unusual usage that most would ask what they mean instead of assuming as with "student".
One area of confusion we have is because of overloading. We use "client" to mean someone who buys from us (mainly services, for example order fulfillment) and "customer" to refer to someone who buys from one of our clients. Some of our employees do not make the proper (for us) distinction.
[snip]
Sincerely,
Gene Wirchenko
mAsterdam - 12 Aug 2005 00:50 GMT >>[snip agreement] >>>>>>There are always differences of opinion about what constitutes a [quoted text clipped - 24 lines] > the more literal meaning? If I do not know that a special meaning is > in use in a specific context, I can make a lot of mistakes. Yep. One trick is not (just) to ask wether someone is a student or not, but details about the students registration (wether they are really checked is an issue, depending on other, maybe later requirements - first make them checkable).
> I coin terms for our in-house client billing system. Two > examples are "Work Function Code" and "Work Classification Code". > These terms have precise meanings. It is possible for someone to > misinterpret these, but I think that they are sufficiently unusual > usage that most would ask what they mean instead of assuming as with > "student". Arrrgh - feast of recognition. Not. :-| I'm used to similar systems. Some departments take these distinctions very serious, other only pay lip-service. Once in a while new managers want to know what's going on and suddenly all kinds of conclusions hit the surface, drawn from the highly polluted data.
> One area of confusion we have is because of overloading. We use > "client" to mean someone who buys from us (mainly services, for > example order fulfillment) and "customer" to refer to someone who buys > from one of our clients. Some of our employees do not make the proper > (for us) distinction. You know what CICS stands for?
paul c - 12 Aug 2005 03:28 GMT >>> [snip agreement] >>> [quoted text clipped - 36 lines] > requirements - first make them checkable). > ... i may be stepping on nuances that i haven't noticed in this thread, but i think the above is getting close to the truth, at least the truth these days. so far, databases ARE naive and so are their "tests". for example, if a user thinks someone is a student and can "fill in" the values that the db predicates want for a student, then the someone is a student as far as the db is concerned, no matter what anyone else thinks.
pc
Gene Wirchenko - 12 Aug 2005 18:52 GMT [snip]
>> One area of confusion we have is because of overloading. We use >> "client" to mean someone who buys from us (mainly services, for [quoted text clipped - 3 lines] > >You know what CICS stands for? I do not presume to know what an acronym stands for when expressed out of context. http://www.acronyms.ch/ says "Customer Information Control System", but this is not a definition, but merely an expansion, and I do not know if it is the one that you are thinking of.
"IDE" can mean "Integrated Development Environment" or "Integrated Device Electronics", and formerly meant "Integrated Drive Electronics". As a software sort who know some hardware, I can easily use either.
Sincerely,
Gene Wirchenko
mAsterdam - 13 Aug 2005 01:28 GMT > [snip] >>> One area of confusion we have is because of overloading. We use [quoted text clipped - 10 lines] > an expansion, and I do not know if it is the one that you are thinking > of. Oops. I was told it originally meant "Customer Information and Client Support". The only piece ever really built was the TP-monitor - but the acronym stuck as the name. I liked the story. I never checked it though, and now I can't find any source for it. Sorry.
> "IDE" can mean "Integrated Development Environment" or > "Integrated Device Electronics", and formerly meant "Integrated Drive > Electronics". As a software sort who know some hardware, I can easily > use either. David Cressey - 13 Aug 2005 16:30 GMT "Gene Wirchenko"
Then there's PCMCIA, which expands to "People Can't Memorize Computer Industry Acronyms"
Marshall Spight - 13 Aug 2005 18:43 GMT > Then there's PCMCIA, which expands to "People Can't Memorize Computer > Industry Acronyms" My favorite is still TWAIN: "Technology Without An Important Name."
Marshall
Gene Wirchenko - 15 Aug 2005 23:00 GMT >> Then there's PCMCIA, which expands to "People Can't Memorize Computer >> Industry Acronyms" Actually, "Personal Computer Memory Card International Association", but the smart expansion is insidious as it is much more mnemonic.
>My favorite is still TWAIN: "Technology Without An Important Name." ^^^^^^^^^ "Interesting"?
I like "SCSI" for the pronunciation.
Sincerely,
Gene Wirchenko
vc - 11 Aug 2005 18:38 GMT > > .... > > > [quoted text clipped - 31 lines] > meaning slightly for one group and not another, so that these > differences creep in. That's life. If the analyst was unable to anticipate some changes, then they can be introduced later. It's called schema evolution.
> > I > > did not claim that one can correctly analyze a complex system at one go, [quoted text clipped - 6 lines] > > I thought it was about the name and def of an attribute. It most certainly was not. It was about an incorrectly specified predicate defining set of students eligible for financial aid. What attribute (and its name) did you have in mind ?
> > -- presumably there > > was no ambiguity about the "student" entity . [quoted text clipped - 4 lines] > someone has received some approval to audit a course for zero dollars, > the instructor might consider them a student. In this case, during the analysis stage, one could have identified a more generic entity, say, Person with entity subtypes of Student and Customer.
> That is just an example, > but the point is that entity names are also just words and are > interpreted by humans, each of whom brings a different context to the > meaning of the word. You are quite right that names are just words without any specific meaning [in the modelling context]. That's why it's necessary to identify the actual entities (attributes, relations, etc) first and then give them names. That's what, among other things, modelling is about, no ?
> I've been reading and writing too fast lately and might have missed the > point, so I'll re-read the thread before posting again. > cheers! --dawn dawn - 15 Aug 2005 01:12 GMT > > > "dawn" <dawnwolthuis@gmail.com> wrote in message <snip>
> You are quite right that names are just words without any specific > meaning [in the modelling context]. That's why it's necessary to > identify the actual entities (attributes, relations, etc) first and > then give them names. That's what, among other things, modelling is > about, no ? Yup, I'm with you -- I thought that we were discussing the other direction: interpretation. The database has values and metadata, perhaps for years, and then there is disagreement on the interpretation. That is exceedingly common. I think your point was that if there is such misinterpretation during the analysis/design phases, then the work isn't yet done and I agree. Cheers! --dawn
mAsterdam - 11 Aug 2005 18:58 GMT >>>>...If the modellers chose the >>>>name/label "source" and did not define what [quoted text clipped - 23 lines] >>Apparently, the analysts made a mistake in assuming that the set of >>fullTimeStudents is equal to the set of studentsEligibleForFinancialAid. This assumes perfect and lasting information at modelling time.
> In this case, yes, but it also happens frequently where such a term is > used the same when the analysis is done, but something changes [quoted text clipped - 28 lines] > point, so I'll re-read the thread before posting again. > cheers! --dawn This is a sub-thread about synonym/homonym problems, but this group does not tend to change the subject line appropriately (I tried a few times, but it did not really work). In the sub-thread your contribution is right on the mark, IMO.
mAsterdam - 11 Aug 2005 18:13 GMT >>>>>>>>...The two words, "synonym" and "homonym" are borrowed from >>>>>>>>the argot of natural linguistics, but the two problems arise [quoted text clipped - 22 lines] > terminology they want to use and what the terms are > supposed to mean. Let me get this straight. In your methodology the terminology is ready (agreed upon) before the modelling starts?
> It's, like, introduction to modelling 101. What is 101?
> Besides, you describe a hypothetical terminology > selection/definition process yourself, so it's not clear what the problem > might be unless the "team" neglects to identify, say, data objects and > relationships [self-infilcts potential pain because of not doing required > work]. How do you propose to identify data objects and relationships from a requirement "It should be possible to find out where a piece of information came from."?
>>Another, less cryptic example: >> [quoted text clipped - 12 lines] > > You are kidding, right ? No.
> If the modellers chose the name/label "source" and > did not define what entity the name refers to, > then the name is just meaningless, like say "fshsalkfd". > Apparently, your hypothetical modellers > are not modellers but some kind of impostors. Please take the drivers' seat. Show us the real thing. Pretend you are modelling the data to meet the requirement. Feel free to ask relevant questions/check assumptions about it.
>>Both the synonym-problem (1) and the homonym-problem (2) may >>very well be recognized and resolved, of course. [quoted text clipped - 3 lines] > [literary] fiction where one needs to bother with stuff like > synonyms, homonyms, metaphors, metonymy and what not. If you don't bother with that "stuff" your work will be exactly that: a work of fiction.
> Just identify the entities, invent (or use commonly > accepted ) names for them and you'll be a happy > camper without any need to hide behind high-faluting > nonsense like "synonym problem", "conceptual > object type" or some such. - I am not hiding at all.
- Please refrain from attributing terms to me I did not use.
vc - 11 Aug 2005 20:01 GMT > > Presumably the team has meetings at which they discuss the > > stuff they interested in and come to some agreement as to what [quoted text clipped - 3 lines] > Let me get this straight. In your methodology the terminology > is ready (agreed upon) before the modelling starts? That's not what I said. I wrote 'they discuss the stuff they are intersted in and come to an agreement as to what terminology they want to use'. It can be rephrased as:
Step one: discover the items/entities of interest; Step two: give the entities names (or use the existing names if it makes sense)
> > It's, like, introduction to modelling 101. > > What is 101? '101' is a synonym of 'introductory'.
> > Besides, you describe a hypothetical terminology > > selection/definition process yourself, so it's not clear what the problem [quoted text clipped - 6 lines] > possible to find out where a piece > of information came from."? Presumably 'a piece of information' can come from a newspaper article, gossip, bank statement, or any other entity capable of producing such piece of information. What's the problem with identifying the information source ? If the requirement is literally as general as you wrote, then it needs to be made more specific in order to be realistically implemented.
> >>Another, less cryptic example: > >> [quoted text clipped - 24 lines] > Pretend you are modelling the data to meet the requirement. > Feel free to ask relevant questions/check assumptions about it. I do not want to 'take the driver's seat' whatever it means. *You* claimed that there is a 'homonym/synonym problem' with respect to naming entities and attributes, therefore, the burden of proof shwing that such problem does in fact exist is squarely on *your* shoulders. So far, you failed to provide the proof: your example either can be handled by conventional entity-relationship methods, or is under-specified to the extent of making almost no sense.
> >>Both the synonym-problem (1) and the homonym-problem (2) may > >>very well be recognized and resolved, of course. [quoted text clipped - 6 lines] > If you don't bother with that "stuff" your work will be > exactly that: a work of fiction. If you say so.
> > Just identify the entities, invent (or use commonly > > accepted ) names for them and you'll be a happy [quoted text clipped - 5 lines] > > - Please refrain from attributing terms to me I did not use. "You" can be used, in English, as an impersonal pronoun referring to anyone out there engaged in data modelling. It was used in this sense - nothing personal.
Cheers.
mAsterdam - 11 Aug 2005 20:52 GMT >>>Presumably the team has meetings at which they discuss the >>>stuff they interested in and come to some agreement as to what [quoted text clipped - 11 lines] > Step two: give the entities names (or use the existing names if it > makes sense) And they wouldn't make sense if they where syn/homonyms?
>>>It's, like, introduction to modelling 101. >> >>What is 101? > > '101' is a synonym of 'introductory'. Thx.
>>>Besides, you describe a hypothetical terminology >>>selection/definition process yourself, so it's not clear what the problem [quoted text clipped - 13 lines] > wrote, then it needs to be made more specific in order to be > realistically implemented. Of course - so what are the questions to ask to get the specifics of the requirement? Extrapolating from your assumption (newspaper article, etc) you'ld ask about /which/ pieces of information you want to know where it came from. Another one would be: what is it you want to know about where it came from? Relevant, no doubt - but not adressing syn/homonym problem. But that makes sense, since you also stated that it shouldn't occur (I take it that 'self-induced' implies 'mistake' here - please correct me if I misinterpreted).
>>>>Another, less cryptic example: >>>> [quoted text clipped - 32 lines] > handled by conventional entity-relationship methods, or is > under-specified to the extent of making almost no sense. "proof" (whatever that word means in this context) is to much to ask.
In modelling sessions I recognize that people spend a lot of time to get the meaning of terms right for use in their model (IMO this is time well spent, especially for large systems). Some of the effort revolves around homonyms and synonyms. From what I read you explain this effort as (just) correcting mistakes. I think there is more about this that can be delt with in a systemic way - but only (this is just a hunch) if we appreciate that this is inherent to team modelling.
>>>>Both the synonym-problem (1) and the homonym-problem (2) may >>>>very well be recognized and resolved, of course. [quoted text clipped - 22 lines] > anyone out there engaged in data modelling. It was used in this sense > - nothing personal. Not only in English. I don't like to be put on one floor with high-faluting nonsense.
VC - 12 Aug 2005 03:19 GMT >>>>Presumably the team has meetings at which they discuss the >>>>stuff they interested in and come to some agreement as to what [quoted text clipped - 13 lines] > > And they wouldn't make sense if they where syn/homonyms? Of course not. Why would one want to use the same name for two different entities [self-inflicted pain] ? If imagination is lacking, and one prefers to call an entity a Thing, one can use at least Thing1, Thing2,,, ad infinitum., if needed., in order to avoid the homonym problem. Synonyms are even easier, just use one, not two or more, names for the same entity and you should be all set.
>>>How do you propose to identify data objects and >>>relationships from a requirement "It should be [quoted text clipped - 15 lines] > Another one would be: what is it you want to know about > where it came from? Well, the attributes one wants to model surely depend on what the customer wants to do with them, sort of obvious, no ? Why not ask the customer directly about that ? E.g. with respect to a newpaper source something like the publisher, circulation, font, whatever, it really depends on what the customer wants to know. .
> Relevant, no doubt - but not adressing syn/homonym problem. > But that makes sense, since you also stated that it > shouldn't occur (I take it that 'self-induced' > implies 'mistake' here - please correct me if I > misinterpreted). Right. I consider mistakes in trying to identify entities and their set of attributes (as well as relationships between entities), a more serious problem than largely overblown issues with synonyms and homonyms which are easy to avoid.
> In modelling sessions I recognize that > people spend a lot of time to get the meaning of terms > right for use in their model (IMO this is time well spent, > especially for large systems). The term meaning is the entity/attribute it names. When divorced from the object it names, the term is just a meaningless string of characters. No doubt, naming conventions are very important to make the understanding of a data model easier for humans (by analogy, association, etc), however, terminology is secondary in importance to entity/attribute/relationship discovery process.
> Some of the effort revolves > around homonyms and synonyms. From what I read you explain > this effort as (just) correcting mistakes. I think there > is more about this that can be delt with in a systemic way - but > only (this is just a hunch) if we appreciate that this is > inherent to team modelling. See above.
Gene Wirchenko - 12 Aug 2005 18:52 GMT [snip]
>Of course not. Why would one want to use the same name for two different >entities [self-inflicted pain] ? If imagination is lacking, and one prefers It can happen when two different points-of-view intersect. Realising that two apparently different entities are actually the same can be tricky, especially when they appear at first to be different.
On the other side, realising that you are dealing not with identitites but distinct things -- particularly when the entities are similar -- can also be tricky.
>to call an entity a Thing, one can use at least Thing1, Thing2,,, ad >infinitum., if needed., in order to avoid the homonym problem. Synonyms >are even easier, just use one, not two or more, names for the same entity >and you should be all set. If you realise that you have such a situation.
[snip]
Sincerely,
Gene Wirchenko
VC - 12 Aug 2005 21:18 GMT > [snip] > [quoted text clipped - 5 lines] > Realising that two apparently different entities are actually the same > can be tricky, especially when they appear at first to be different. Right. Presumably the real world object would be modelled by two entities with different sets of attributes (otherwise it would be easy to recognize that two, or more, entities represent the same object). Therefore, from the point of of the model itself, there are two different entities with two different names thus the dreaded "synonym problem" simply cannot occur..
> On the other side, realising that you are dealing not with > identitites but distinct things -- particularly when the entities are > similar -- can also be tricky. Right. That's why modelling is an error and trial process.
>>to call an entity a Thing, one can use at least Thing1, Thing2,,, ad >>infinitum., if needed., in order to avoid the homonym problem. Synonyms [quoted text clipped - 3 lines] > > If you realise that you have such a situation. Even if you don't, the error is not a naming problem (synonym/homonym and such) but rather a mistake in identifyng correctly the real world objects and their attributes of interest.
> [snip] > > Sincerely, > > Gene Wirchenko Jonathan Leffler - 13 Aug 2005 08:14 GMT >> It's, like, introduction to modelling 101. > What is 101? In the USA, the first course in a given subject seems to be 'Subject 101'; subsequent courses in the same subject get larger numbers (102, 201, dunno what the sequence normally is, and it likely varies between institutions anyway). I'm not clear whether this applies in regular schools (K-12 - meaning kindergarten to grade 12, or ages 5-18) or whether it really only applies to university courses. (And, just to add to the confusion, when they ask you where you went to school, Americans most often mean where did you go to university. Isn't it fun sharing a common language!)
So 'Modelling 101' is a basic course in 'Modelling'.
 Signature Jonathan Leffler #include <disclaimer.h> Email: jleffler@earthlink.net, jleffler@us.ibm.com Guardian of DBD::Informix v2005.02 -- http://dbi.perl.org/
Marshall Spight - 13 Aug 2005 16:11 GMT > > > It's, like, introduction to modelling 101. > > [quoted text clipped - 6 lines] > schools (K-12 - meaning kindergarten to grade 12, or ages 5-18) or > whether it really only applies to university courses. It only applies to college/university.
As an aside, also in the USA, we call it "college" even if it's a university. We only preserve the "college/university" distinction in the names of the institutions. You could say "he went to college at Harvard" just as well as you could say "he went to college at Snakewater Community College." I understand the distinction is important in other dialects of English, but it's not made in the USA. Anti-classism, maybe? (Just a wild speculation; no flames please.) I've never heard anyone say "where did you go to university?" who was a native of the USA.
> (And, just to add > to the confusion, when they ask you where you went to school, Americans > most often mean where did you go to university. Isn't it fun sharing a > common language!) Just so!
I also note that 101 is an important freeway in California; it runs from San Francisco to San Jose (and on to less interesting places like Los Angeles, ha ha) which means it's the primary artery for Silicon Valley. 101 is a section of my commute, and has been for most of my adult life.
Marshall
mAsterdam - 16 Aug 2005 00:17 GMT >>> It's, like, introduction to modelling 101. >> [quoted text clipped - 11 lines] > > So 'Modelling 101' is a basic course in 'Modelling'. Thank you :-)
The modellers I was talking about were real people, well educated (most of them qualified to teach way beyond modelling 101) and well behaved.
Frank_Hamersley - 16 Aug 2005 02:47 GMT > Jonathan Leffler wrote: > >>> It's, like, introduction to modelling 101. [quoted text clipped - 11 lines] > > > > So 'Modelling 101' is a basic course in 'Modelling'. <OT continued>
In my neck of the woods the first digit prescribed the undergraduate year number and the remaining digits where used to identify sub-courses. We rarely used '01' - but any zeros usually correlated with more stature/difficulty. For instance Chem 100 was followed by Chem 200 and finally Chem 300 if pursuing a major in Chem for a B.Sc. Chem 210 (Organic) and 220 (Inorganic) were implied by enrolling for Chem 200. Often courses numbers like Biometrics 221 and 222 were single semester/term subjects on a narrow topic.
So Modelling 101 (IMO) barely qualifies you to do anything - in fact it prolly increases project risk significantly if a so called practitioner gets into the workforce on the strength of it! ;-) Cheers Frank.
paul c - 10 Aug 2005 01:16 GMT > [snip] > [quoted text clipped - 8 lines] > with teams. I never experienced the problem not coming up. > ... maybe you were just lucky. just kidding!
cheers, pc
|
|