Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / August 2005

Tip: Looking for answers? Try searching our database.

The naive test for equality

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
David  Cressey - 31 Jul 2005 19:16 GMT
There's been discussion about whether the relational engine does or does not
understand equality.  I'd like to suggest that the relational engine does
understand the "what" of equality testing,  but is sometimes naive about the
"how".  Before I go any further with this,  let me first describe a naive
test for equality.

Here's how it works:

Every value has a type, and is represented by a bit string.

If values A and B are to be tested for equality,  the first step is to see
if the type of A and the type of B are the same.  If not,  then an error
flag is raised, and the test returns FALSE.

If A and B are of the same type,  then the length of the two bitstrings are
compared.  If the lengths are different,  the test returns FALSE.

If the types are the same, and the lengths are the same,  the bit strings
are compared, one bit at a time.  If any corresponding bit is different in A
and B,  the test returns FALSE.

Otherwise, the test returns TRUE.

Well,  what makes me call this test naive?  It's naive because it has tested
the two representations for equality,  and not the two values.  The naive
test will always work correctly,  provided there are no synonyms and no
homonyms.  There are a lot of datatypes that have these two features,  and
for those datatypes, the naive test works just fine.

I'm going to sidestep the issue of homonyms.

What if we had a type engine with the following interesting feature:  if we
ask it to test "humid"  and "moist" for equality,  it returns TRUE.  I'm
going to sidestep the question of whether such a type engine is
mathematically valid,  and the question of how it is implemented.

The point is that the naive test fails to deliver the same answer as the
type engine does,  because it doesn't recognize "humid" and "moist"  as
synonyms.

Here's where the relational engine may be able to use the "what" of the
equality test,  but can't grasp the "how".

There's more,  but I'll leave here for now.
DBMS_Plumber - 31 Jul 2005 22:58 GMT
You define equality either a) by declaring an equal() relation and
populating it with equality mappings (no one does this), or else b) by
incorporating into the engine a module of non-declarative code which
returns 'true' or 'false'.

For consistency, you might try adding a module 'compare()', instead of
'equal()', because that will get you the set of comparison operators to
boot. Most Object-Relational DBMS products adopt this practice.

Numerous other practical problems arise:

a) You need to incorporate the comparison in the engine's
infrastructure: sorts, indices, grouping, etc.

b) For some sort algorithms (radix) and other physical operations (hash
joins, bloom filters, mapping tuples to partitioning schemes) the
bit-wise representation of the value needs to be consistent with the
logical operations (or you need to write code to convert your
representation to something that's bit-wise consistent)

c) Nothing on the whole green earth can help you if you decide to
change the definition of these functions in an operational system, or
try to do a database restore with different functions. You have to
unload the DBMS, and re-load it again, or else engage in huge amounts
of conversion jiggery pokery (this factor heavily influences the design
of type extensions)

d) Support for distributed computing needs care, because of
byte-orderings, etc.

All of these trade-offs and questions were thoroughly explored in the
systems research literature a decade ago. Most SQL DBMS products
provide the functionality to some degree.
paul c - 01 Aug 2005 01:22 GMT
> You define equality either a) by declaring an equal() relation and
> populating it with equality mappings (no one does this), or else b) by
[quoted text clipped - 29 lines]
> systems research literature a decade ago. Most SQL DBMS products
> provide the functionality to some degree.

heh (that's a 'heh' on the 2nd glass of plonk).  i'd say that everything
you say is true within the scope of implementation which is fair enough,
given your moniker, although i'd also say that how "You define equality"
if it were turned into a question is kind of impossible outside of
politically correct circles where people try to do all kinds of
impossible things.  my thrust was admittedly philosophical (which may
explain my choice of moniker), or what i remember Hugh Darwen calling
'mystical'.  i was trying in my clumsy way to say that equality is an
elusive quality and 'sameness' is easier for me to deal with.
admittedly, a philosopher could ask, "if they are the same thing, how
could we compare them in the first place?".  maybe that's why people
make up axioms.

cheers,
p
Marshall  Spight - 02 Aug 2005 06:21 GMT
I have to say I just don't understand why equality produces
so much ink. It's not that complicated a concept: two
values are equal if they are the same value.

I guess what makes it so complicated is that we're used to
having to work with objects and memory locations and aliasing
and so forth. These are all *implementation* complexities,
though; the interface remains blindingly simple.

And as far as implementation goes, it seems to me that a type
system constructed a certain way would always be able to write
the equals function. It would need to understand the difference
between ordered data and unordered data, so that it would be
able to know the difference between { 1, 2 } and [ 1, 2 ].
It would then be able to know that { 1, 2 } and { 2, 1 }
were the same, and that [ 1, 2 ] and [ 2, 1 ] were different.
It would also need not to support stateful encapsulation, but
I consider that an advantage. Stateful encapsulation is quite
popular in OO languages, but it's the enemy of data management.
You can't manage what you can't observe.

I don't really buy in to the idea that there's a difference
between the RM engine's type system and the domain type system.
(If only because if I have to implement a language, the type
system's going to be the hardest part, and I only want to have
to do that once!)

Marshall
David  Cressey - 02 Aug 2005 13:37 GMT
> I have to say I just don't understand why equality produces
> so much ink. It's not that complicated a concept: two
> values are equal if they are the same value.

The reason it produces so much ink is that we never actually compare two
objects directly.  What we compare are two representations of objects to see
if the represent the same object.

And the issues of homonyms and synonyms in representation schemes is worth
quite a bit of ink.

I was thankful when DBMS_plumber informed us that these issues have been
fully treated in the literature.  It would have been very disconcerting to
find out the opposite.  And certainly urge anyone who proposes to build a
DBMS to bone up on the literature first.  The time invested will yield a
good return.

Further, the plumber is very right to state that, once a bitwise
representation scheme has been proven satisfactory with regard to identity,
the next issue is order.  But that's not where I'm going with this.  I
proposed the "naive test" as a straw man,  and I hope no one thinks I was
setting some kind of trap.

I'm still not done with identity.  In particular,  I'm not done with
synonyms.

The synonyms "humid" and "moist" get into semantic issues that are way
beyond where I want to go right now.

Instead,  I'm going to suggest anagrams:  a valid word with the same
letters,  but possibly permuted.

Thus if I ask for anagrams of:  "post",  I get "stop", "pots",  etc.
If I extend the definition of anagrams just slightly, and define that every
word is an anagram of itself,  now the relationship is reflexive.  It's
clearly symmetric and transitive,  so it's a flavor of "equality".

But it's a flavor of equality where the combinatorics get quickly out of
hand.

And that's what interests me in this discussion.
Marshall  Spight - 02 Aug 2005 15:07 GMT
> > I have to say I just don't understand why equality produces
> > so much ink.
[quoted text clipped - 6 lines]
> word is an anagram of itself,  now the relationship is reflexive.  It's
> clearly symmetric and transitive,  so it's a flavor of "equality".

Sure. Specifically, it's an equivalence relation. Let's distinguish
between the equality relation specifically and equivalence relations
in general. Equality is a much simpler thing.

> But it's a flavor of equality where the combinatorics get quickly out of
> hand.
>
> And that's what interests me in this discussion.

I see. Well, maybe I don't actually. But I'm following you so far.

Marshall
Paul - 02 Aug 2005 17:03 GMT
Marshall Spight wrote:
> Sure. Specifically, it's an equivalence relation. Let's distinguish
> between the equality relation specifically and equivalence relations
> in general. Equality is a much simpler thing.

Is it, though? We think about 1/2 = 2/4 fine even though they have
different representations. Maybe you mean "identity", often shown using
a variant of the equals sign with three lines instead of two?

It's rare that we use the "equals" symbol to compare two things with
identical representation: 1=1, we'd use the "identity" symbol instead.

It's all about levels of abstraction: equality at the physical layer
(representation) may differ from equality at the logical layer (value).

So for the underlying relational engine to compare two values for
equality, it can't in general stay in the physical layer; it has just
jump up into the logical layer, do the comparison, and then jump back
down to the physical layer again.

Paul.
Marshall  Spight - 02 Aug 2005 19:49 GMT
> Marshall Spight wrote:
> > Sure. Specifically, it's an equivalence relation. Let's distinguish
> > between the equality relation specifically and equivalence relations
> > in general. Equality is a much simpler thing.
>
> Is it, though?

I wasn't clear whether you were questioning my "it's an equivalence
relation" or my "equality is [simple]."

> We think about 1/2 = 2/4 fine even though they have
> different representations.

Sure, because they are the same value. Thining too much about
representation can only confuse you. :-)

> Maybe you mean "identity", often shown using
> a variant of the equals sign with three lines instead of two?

No, I certainly do not mean identity. In fact, I specifically
reject the concept of identity in the field of data management.
It's useful in OOP to distinguish between equality and identity,
but introducing identity in data management is a disaster.

> It's rare that we use the "equals" symbol to compare two things with
> identical representation: 1=1, we'd use the "identity" symbol instead.

I not sure I agree that it is rare. It is certainly correct.

> It's all about levels of abstraction: equality at the physical layer
> (representation) may differ from equality at the logical layer (value).

Sure. Note I am not discussing the physical; I'm discussing the logical
and conceptual.

I said:
"I guess what makes it so complicated is that we're used to
having to work with objects and memory locations and aliasing
and so forth. These are all *implementation* complexities,
though; the interface remains blindingly simple."

> So for the underlying relational engine to compare two values for
> equality, it can't in general stay in the physical layer; it has just
> jump up into the logical layer, do the comparison, and then jump back
> down to the physical layer again.

I am unclear as to what you are saying here. The implementation only
operates at the implementation level. The implementation implements
the logical level, or "interface." I don't know what you mean
by "jump up."

Sure, the implementation of a function to test for equality
might be more than just a binary region compare, but that doesn't
mean that it's not still the implementation.

Marshall
David  Cressey - 03 Aug 2005 15:59 GMT
> Sure, because they are the same value. Thining too much about
> representation can only confuse you. :-)

I think the above could be applied to any topic in computer science.  We are
always manipulating representations,  aren't we?

<silly>
Baba Louie:  I theen' we better get outta here, Quistro.

QuickDraw McGraw:  I'll do the thinnin' around here Baba Louie,  and don'
you fergit it!
</silly>
Marshall  Spight - 03 Aug 2005 19:37 GMT
> > Sure, because they are the same value. Thining too much about
> > representation can only confuse you. :-)
[quoted text clipped - 8 lines]
> you fergit it!
> </silly>

Homer Simpson: D'oh!

Left out the k.

Marshall
Gene Wirchenko - 03 Aug 2005 18:13 GMT
[snip]

>Sure, because they are the same value. Thining too much about
>representation can only confuse you. :-)

    Not the you-thou bit again.

[snip]

Sincerely,

Gene Wirchenko
Paul - 03 Aug 2005 21:46 GMT
Marshall Spight wrote:
>>>Sure. Specifically, it's an equivalence relation. Let's distinguish
>>>between the equality relation specifically and equivalence relations
[quoted text clipped - 4 lines]
> I wasn't clear whether you were questioning my "it's an equivalence
> relation" or my "equality is [simple]."

I guess what I'm saying is that equality isn't really simpler than
equivalence relations - they're kind of the same thing really.

>>So for the underlying relational engine to compare two values for
>>equality, it can't in general stay in the physical layer; it has just
[quoted text clipped - 5 lines]
> the logical level, or "interface." I don't know what you mean
> by "jump up."

what I mean is that you have the relational part and the domain part
with their separate physical implementations - but the only way they can
talk to each other to establish equality is via their "logical"
interfaces - so going up an abstraction level. As opposed to the "naive"
way where the relational part can establish equality on its own (using
bit representation) without needing the domain part at all. I'm not
really saying anthing new here, just rehashing existing posts but what
the hell, it might be useful for someone to see something from several
angles!

Paul.
VC - 02 Aug 2005 23:22 GMT
Hi,

> Marshall Spight wrote:
>> Sure. Specifically, it's an equivalence relation. Let's distinguish
[quoted text clipped - 4 lines]
> different representations. Maybe you mean "identity", often shown using
> a variant of the equals sign with three lines instead of two?

The standard construction of rationals,  as introduced in high school, is:

Let ZxZ' be a set of all ordered pairs of integers (x,y) where x is not a
zero (Z' = {Z minus 0}). Let's define an equivalence relation E  as

(x1,y1) E  (x2, y2) iff x1*y2 = y1*x2

Then rationals are a set Q of equivalence classes defined by the above
relation.  Technically,  one has to *prove* that E is indeed an equivalence
relations and that operations like addition and multiplication are well
defined and obey the usual laws,  etc.

There is no neeed to talk about some vague representations and such,  one
can simply speak in clear terms of integers and equivalence classes instead.
Paul - 03 Aug 2005 21:25 GMT
> Then rationals are a set Q of equivalence classes defined by the above
> relation.  Technically,  one has to *prove* that E is indeed an equivalence
[quoted text clipped - 3 lines]
> There is no neeed to talk about some vague representations and such,  one
> can simply speak in clear terms of integers and equivalence classes instead.

well, the equivalence class can be thought of as a set of possible
representations for the "value" that "is" the equivalence class (feeling
like Clinton here explaining what I mean by "is" :))

By "representation" I mean the actual symbols used to convey the idea of
a "value", and they may be several of these representations for one value.

Paul.
VC - 04 Aug 2005 03:48 GMT
>> Then rationals are a set Q of equivalence classes defined by the above
>> relation.  Technically,  one has to *prove* that E is indeed an
[quoted text clipped - 9 lines]
> representations for the "value" that "is" the equivalence class (feeling
> like Clinton here explaining what I mean by "is" :))

The usual definition of the equivalence class goes is:

Let E be an equivalence relation on the set S.  Then,  for a given element e
in S,  its equivalence class is a set of all elements in S  that are
equivalent to e:

[e] = {x in S| x E e}.

I have no idea what a 'possible representation'  might be.

> By "representation" I mean the actual symbols used to convey the idea of
> a "value", and they may be several of these representations for one value.

I do not understand this.

> Paul.
David  Cressey - 04 Aug 2005 14:41 GMT
> > By "representation" I mean the actual symbols used to convey the idea of
> > a "value", and they may be several of these representations for one value.
>
> I do not understand this.

I think the term "literal value"  from classical programming language
documents might be relevant here.

The literal value conveys from the writer to the reader a specific value
from one of the types.  Thus

12345 is a literal value
123.45  is a literal value
'123.45' is a literal value (of a string).
vc - 04 Aug 2005 15:57 GMT
> > > By "representation" I mean the actual symbols used to convey the idea of
> > > a "value", and they may be several of these representations for one
[quoted text clipped - 11 lines]
> 123.45  is a literal value
> '123.45' is a literal value (of a string).

<Paul> wrote:

"well, the equivalence class can be thought of as a set of possible
representations for the "value" that "is" the equivalence class "

I do not see how 'possible representations' (whatever they are), or
'literals', are relevant to the simple notion of equivalence class.

Thanks.
Paul - 06 Aug 2005 15:54 GMT
>> <Paul> wrote:
>> "well, the equivalence class can be thought of as a set of possible
>> representations for the "value" that "is" the equivalence class "
>
> I do not see how 'possible representations' (whatever they are), or
> 'literals', are relevant to the simple notion of equivalence class.

maybe you're readng more into it than I mean.

Probably a concrete example might best explain what I'm trying to say.

Consider simple fractions. You have several ways of writing the number
0.5, for example 1/2, 2/4, 3/6, etc. (infinitely many in fact). I'm just
saying that all these are possible ways of representing the same number
or "value".

You gave the details of how the rationals are constructed mathematically
using equivalence relations. In practice, you aren't going to write the
rational number 0.5 as the set (1/2, 2/4, 3/6, ...), you will pick one
example and use that. In think the standard notation used is square
brackets e.g. [1/2] to denote the equivalence class to which 1/2
belongs. Or you could just as well use [2/4].

I've kind of lost track of what started this thread in the first place
now! I think it was just to say I didn't think there was any real
difference between equality and equivalence relations. Each one defines
the other.

When we write 1/2 or 2/4 it is just shorthand for "[1/2]" or "the
equivalence class containing 1/2" so 1/2 and 2/4 are actually identical
at some level. But clearly at the level of marks on paper or bytes on a
computer they are different. And these two levels correspond to the
physical and logical levels of the relational model. So something can be
equal at the logical level but different at the physical level.

Am I just stating the obvious in a very roundabout way? The orginal post
gave an example of strings with a definition of equality that made
anagrams equal to each other. The claim was made that that wasn't a
proper example because it was an equivalence relation rather than a
"plain" equality and I'm just rebutting that claim. I think that was the
whole point of this somewhat rambling post.

Slightly confusing the issue is the fact that we are using the word
"relation" in a mathematical rather than database sense here.

Paul.
Marshall  Spight - 06 Aug 2005 18:02 GMT
> I've kind of lost track of what started this thread in the first place
> now! I think it was just to say I didn't think there was any real
> difference between equality and equivalence relations. Each one defines
> the other.

Equality is a particular type of equivalence relation. It is the kind
where every value is its own equivalence class. Put another way,
in equality, the equivalence classes all have cardinality 1.

(This is why I call it "simpler," but it's not a big deal.)

Marshall
Paul - 06 Aug 2005 20:30 GMT
Marshall Spight wrote:
>>I've kind of lost track of what started this thread in the first place
>>now! I think it was just to say I didn't think there was any real
[quoted text clipped - 4 lines]
> where every value is its own equivalence class. Put another way,
> in equality, the equivalence classes all have cardinality 1.

That's not how I interpret it. The way I see it, an equivalence relation
*defines* what we mean by equality with respect to a given structure.

So for example you start with expressions of the form "x/y", with x and
y integers (y!=0)

Now to begin with, "1/2" != "2/4"

But you create an equivalence relation as VC described, which is
basically grouping certain integer pairs together to create a different
structure. And you use this equivalence relation to *define* what you
mean by "equality" on your new structure. So [1/2] = [2/4]. But
conventionally you drop the square brackets indicating the equivalence
class and write 1/2 = 2/4, which maybe confuses things though.

So for the rational numbers, you have equality but the corresponding
equivalence classes on ZxZ *don't* have cardinality 1

Paul.
Marshall  Spight - 06 Aug 2005 22:23 GMT
> Marshall Spight wrote:
> >>I've kind of lost track of what started this thread in the first place
[quoted text clipped - 8 lines]
> That's not how I interpret it. The way I see it, an equivalence relation
> *defines* what we mean by equality with respect to a given structure.

I suppose. We're not quite talking about the same thing, though.
I'm talking about classes of values, and you're talking about
classes of lexical representations of values. Or maybe you're
talkin about expressions; I'm not entirely certain.

(Does the string-of-symbols "one half" also belong in the equivalence
class with "1/2" and "2/4"?)

I think it is more useful to think about 1/2 and 2/4 being the
same value because of the semantics of division. Once you move
into the world of values and out of the world of representations,
things get a lot simpler.

Marshall
David  Cressey - 06 Aug 2005 22:30 GMT
> Marshall Spight wrote:
> >>I've kind of lost track of what started this thread in the first place
> >>now! I think it was just to say I didn't think there was any real
> >>difference between equality and equivalence relations. Each one defines
> >>the other.

OK, let me jump in with what I think I was about with this thread in the
first place.

A lot of other discussion hinges about the interaction between the
relational engine and the type engine.

I think there are multiple layers of representation/interpretation in any
system of representing meaning using symbols.

For certain equalities of the underlying things  ("values" for some people),
it's the type engine that knows when two tokens are representations for the
same underlying thing.  Thus,  if we want to know whether 1/2 s really equal
to 2/4 or not,  we consult the appropriate tpye engine,  in this case the
rational type engine.  If we want to know whether 123.45E1 is really equal
to 12.345E2 we consult the floating point number type engine.

As far as I'm concerned "consulting the type engine" is another way of
saying what VC said when he said we must put the items in context before we
can test them for equality.

With regard to whether A is equal to B or not,  we need to consult three
engines:

First the variable typing engine to see if A and B are or are not of the
same type.  If we have static typing of variables,  we can do this test at
compile time.

Next,  the variable state remembering engine to retrieve the current (in
context) value of A and B.

Next the type engine determined by the common type of A and B to find out
whether the values are really the same or not.

If we omit the last step,  we will end up doing a naive test for equality.
If there are any synonyms that the type engine knows about and we don't know
about, then our test for equality will be naive.

Here's where I'm going with this:  in an SQL DBMS,  where is the type engine
for the type "SQL Table".  Or isn't there one?
dawn - 07 Aug 2005 04:54 GMT
> > Marshall Spight wrote:
> > >>I've kind of lost track of what started this thread in the first place
[quoted text clipped - 41 lines]
> Here's where I'm going with this:  in an SQL DBMS,  where is the type engine
> for the type "SQL Table".  Or isn't there one?

I don't know the answer to this question directly, but I've been
thinking about what I understood to be your question and have a couple
of comments that might or might not be relevant.

If you are equality between relations (not just header), you are
including equality related to words and their referents, not only
mathematical expressions.  [I'm usng the term "referents" here as used
in semiotics rather than grammer or programming languages.]  We have
the concept of equality defined in mathematics, but we do not have the
same in a language like English.  Even with the concept of synonyms, we
are not talking "equals".

If we were to stretch the meaning so that if two word values are close
enough in meaning we call them equal, we would end up with some of the
same issues that arise when attempting to model the entire language
with mathematics.  Too much interpretation, context, pragmatics (if I'm
using that term correctly, again from semiotics) lies outside of what
we capture or even can capture in the metadata.  We would get into the
"Time flies like an arrow" problems.  Is that equal to "Time flies
enjoy an arrow"?

We would also need to be able to determine if 'Jo Doe' was the same
person as 'Jo Doe Jr' but the data entry person in the one case did not
enter Jr as a suffix.  Or that Pat DeJong in one table is the same
person as Pat DeJong in another.  They could even both have the same
unique id (candidate key) value, doled out by two different systems for
two different people, but the representation (string value) for each is
the same.  How would you have enough information to be certain these
were the same people?  You would need to have a unique identifier not
just for a table, but for the entire human race (under some conditions
you could require the exact same ssn, for example) and the system would
have to know that.  Think of the various algorithms that attempt to
match two names as being the same in order to help de-dup data.  They
only provide assistance, nothing close to exact.

And then there is the fact that you and I both speak English, but have
missed with each other more than once.  Two people often cannot agree
on what words are equal.

In case I'm not understanding your question (am I in the ballpark?),
this response might be completely irrelevant in which case, nevermind.

cheers!  --dawn
VC - 08 Aug 2005 04:10 GMT
>... The way I see it, an equivalence relation
> *defines* what we mean by equality with respect to a given structure.
[quoted text clipped - 10 lines]
> conventionally you drop the square brackets indicating the equivalence
> class and write 1/2 = 2/4, which maybe confuses things though.

Right...

> So for the rational numbers, you have equality but the corresponding
> equivalence classes on ZxZ *don't* have cardinality 1

Not quite right.  In the case of rationals equality,  you treat the
equivalence class,  as a whole, as a single element. E.g,  for integers
you'd say 2=2;  for rationals you'd say [5/10] = [1/2],  no difference
really since both [5/10] and [1/2] is the *same* element.  In other words,
your *equality* relation pair would be, say,  for integers (1,1) and for
rationals (E_half, E_half), where E_half = {1/2, 2/4,, 5/10, ..} etc.

> Paul.
David  Cressey - 08 Aug 2005 11:37 GMT
> Not quite right.  In the case of rationals equality,  you treat the
> equivalence class,  as a whole, as a single element. E.g,  for integers
> you'd say 2=2;  for rationals you'd say [5/10] = [1/2],  no difference
> really since both [5/10] and [1/2] is the *same* element.  In other words,
> your *equality* relation pair would be, say,  for integers (1,1) and for
> rationals (E_half, E_half), where E_half = {1/2, 2/4,, 5/10, ..} etc.

Right.  The entire equivalence class is a single element as viewed by the
rationals engine.
In order to manipulate this "single element" as data,  we need a symbol for
it, to represent it.

So we choose one of the elements of the original set to stand as a
representative of the entire set that is going to be seen as an element.  In
this case we might choose the rational with the lowest denominater,  namely
1/2.

Now,  whenever we are given an unnormalized rational,  such as 5/10, we ask
the rationals engine to normalize it for us.
The rationals engine knows the rule for normalizing,  namely remove common
factors in the numerator and denominator.  So it returns 1/2,  the
normalized equivalent of 5/10.

If we ask the rationals engine to normalize 1/2,  it will give us back 1/2.

So the process of normalizing is choosing one, out of an equivalence class,
according to some criterion,  and using the symbol that represents the
chosen element to act as the normalized form for the entire class.
Marshall  Spight - 08 Aug 2005 17:01 GMT
> Right.  The entire equivalence class is a single element as viewed by the
> rationals engine.
[quoted text clipped - 17 lines]
> according to some criterion,  and using the symbol that represents the
> chosen element to act as the normalized form for the entire class.

I don't see how this is a particularly useful way to look at
the issue. It doesn't separate the idea of the lexical symbols
we use to display and enter values, and the abstract values
themselves.

The way I look at it is, when the compiler sees "5/10" it converts
it into a value. The value it converts it into is the same value
as when it sees "1/2" because the two are the same value.

I also don't see the benefit of talking about separate engines.

Marshall
David  Cressey - 09 Aug 2005 06:52 GMT
> > So the process of normalizing is choosing one, out of an equivalence class,
> > according to some criterion,  and using the symbol that represents the
> > chosen element to act as the normalized form for the entire class.

> I don't see how this is a particularly useful way to look at
> the issue. It doesn't separate the idea of the lexical symbols
> we use to display and enter values, and the abstract values
> themselves.

The word "symbols" refers not only to the symbols used to exchange data
between people and computers, but also to each of the data items inside the
computer.  In other words,  what the computer stores is all symbolic,  right
down to the most atomic symbols,  zero and one.  Symbols can be made up of
other symbols, strung together. Thus the symbol made up of 11000000
(starting from least significant bit)  is a string of symbols that can
represent the number three.

When various "engines" (or "objects" if you prefer) inside a large system
exchange data with each other (or "messages" if you prefer),  they use
symbols to communicate with each other.

> The way I look at it is, when the compiler sees "5/10" it converts
> it into a value. The value it converts it into is the same value
> as when it sees "1/2" because the two are the same value.

That's when the compiler sees it.  But the number "5/10" could be generated
at run time as well. If such an expression is evaluated at run time,  it
will evaluate to "1/2"  (or some bit pattern that is used to represent that
number).

> I also don't see the benefit of talking about separate engines.

Over the past six weeks,  there has been much discussion about what the
"relational engine knows"  (or "knows about")  and what the "type engine
knows"  (or "knows about").  The prevailing wisdom has been that the type
engine understands equality (within its type),  but the relational engine
does not.  The discussion in terms of separate engines proceeds from here.

I started this discussion,  about the naive test for equality,  based on the
supposition that if you just compared two representations of a value,  you
could tell whether they are the same or not.  The synonym and homonym
problems are two classic problems that arise whenever you do that.  They
were referred to as "synonym problem" and "homonym problem", in the
literature of about 30 years ago.  I'm sorry, but my memory fails me when it
comes to citing sources for this.

The two words,  "synonym" and "homonym"  are borrowed from the argot of
natural linguistics,  but the two problems arise whenever data is
represented.

> Marshall
vc - 09 Aug 2005 21:30 GMT
> > > So the process of normalizing is choosing one, out of an equivalence
> class,
[quoted text clipped - 9 lines]
> between people and computers, but also to each of the data items inside the
> computer.

So now we have,  in addition to 'representation',  a new word 'symbol'.
What is even worse, in your vocabulary, it means two different things.
Nice..

>In other words,  what the computer stores is all symbolic,  right
> down to the most atomic symbols,  zero and one.

This is not true.  What the computer uses to store numbers (and
characters) is called bits,  not symbols.  Besides,  the way the
computer implements numbers and characters is entirely irrelevant at
the logical level.

> When various "engines" (or "objects" if you prefer) inside a large system
> exchange data with each other (or "messages" if you prefer),  they use
> symbols to communicate with each other.

This phrase is so ambiguous as to be almost devoid of meaning. What are
"engines" and how do they "exchange data" ?  What precisely do you mean
?
Hardware components ?  Abstract stuctures communicating using some
protocol ? Or something else ?

>...
>  But the number "5/10" could be generated
> at run time as well. If such an expression is evaluated at run time,  it
> will evaluate to "1/2"  (or some bit pattern that is used to represent that
> number).

Whatever bit pattern is used to implement a number is irrelevant at the
logical model level.

> I started this discussion,  about the naive test for equality,  based on the
> supposition that if you just compared two representations of a value,  you
[quoted text clipped - 7 lines]
> natural linguistics,  but the two problems arise whenever data is
> represented.

In modelling, "synonym/homonym problems" are problems only when they
are self-induced.

> > Marshall
mAsterdam - 09 Aug 2005 22:13 GMT
[snip]
>>The two words,  "synonym" and "homonym"  are borrowed from the argot of
>>natural linguistics,  but the two problems arise whenever data is
>>represented.
>
> In modelling, "synonym/homonym problems" are problems only when they
> are self-induced.

What do you mean by that? I've done quite some practical modelling
with teams. I never experienced the problem not coming up. Synonyms
and homonyms had to be delt with (and we didn't always succesfully).
I'ld appreciate any hints to recognize them as early as possible.
VC - 09 Aug 2005 23:03 GMT
> [snip]
>>>The two words,  "synonym" and "homonym"  are borrowed from the argot of
[quoted text clipped - 6 lines]
> What do you mean by that? I've done quite some practical modelling
> with teams. I never experienced the problem not coming up.

For example ?

>Synonyms
> and homonyms had to be delt with (and we didn't always succesfully).
> I'ld appreciate any hints to recognize them as early as possible.
mAsterdam - 10 Aug 2005 18:46 GMT
>>[snip]
>>>>The two words,  "synonym" and "homonym"  are borrowed from the argot of
[quoted text clipped - 8 lines]
>
> For example ?

What do you mean by "self-induced"?
VC - 10 Aug 2005 22:09 GMT
>>>[snip]
>>>>>The two words,  "synonym" and "homonym"  are borrowed from the argot of
[quoted text clipped - 10 lines]
>
> What do you mean by "self-induced"?

Self-inflicted (synonym)
mAsterdam - 10 Aug 2005 22:45 GMT
>>>>>>...The two words,  "synonym" and "homonym"  are borrowed from
>>>>>>the argot of natural linguistics,  but the two problems arise
[quoted text clipped - 11 lines]
>
> Self-inflicted (synonym)

The "Self" being the modeller, right?
When modelling is done by teams there are more selves.
Any two people even when working together closely for
years have  different associations and connotations
with some words some time.

Another, less cryptic example:

Say a team tries to meet the requirement that it should
be possible to find out where a piece of information came from.

One thinks 'origin', another one thinks 'source'. (1)

Let's say they talk about it and decide on 'source'.

One thinks 'the source code of a program' because
yesterday he spent some time finding a source-file,
another one thinks 'the external agent providing the
piece of information' because he just finished
a business process analysis session. (2)

Both the synonym-problem (1) and the homonym-problem (2) may
very well be recognized and resolved, of course.
Or not. Or to late.
VC - 11 Aug 2005 01:55 GMT
>>>>>>>...The two words,  "synonym" and "homonym"  are borrowed from
>>>>>>>the argot of natural linguistics,  but the two problems arise
[quoted text clipped - 17 lines]
> years have  different associations and connotations
> with some words some time.

Presumably the team has meetings at which they discuss the stuff they
interested in and come to some agreement as to what terminology they  want
to use and what the terms are supposed to mean.  It's, like, introduction to
modelling 101.  Besides,  you describe a hypothetical terminology
selection/definition process yourself, so it's not clear what the problem
might be unless the "team" neglects to identify, say, data objects and
relationships [self-infilcts potential pain because of not doing required
work].

> Another, less cryptic example:
>
[quoted text clipped - 10 lines]
> piece of information' because he just finished
> a business process analysis session. (2)

You are kidding, right ?  If the modellers chose the name/label "source" and
did not define what  entity the name refers to,  then the name is just
meaningless, like say "fshsalkfd". Apparently,  your hypothetical modellers
are not modellers but some kind of impostors.

> Both the synonym-problem (1) and the homonym-problem (2) may
> very well be recognized and resolved, of course.
> Or not. Or to late.

As I wrote before,  data modelling is not a work of [literary] fiction where
one needs to bother with stuff like synonyms,  homonyms, metaphors, metonymy
and what not.  Just identify the entities,  invent (or use commonly
accepted ) names for them and you'll be a happy camper without any need to
hide behind high-faluting nonsense like "synonym problem", "conceptual
object type" or some such.

Cheers.
dawn - 11 Aug 2005 04:48 GMT
> >>>>>>>...The two words,  "synonym" and "homonym"  are borrowed from
> >>>>>>>the argot of natural linguistics,  but the two problems arise
[quoted text clipped - 17 lines]
> > years have  different associations and connotations
> > with some words some time.

Definitely.

> Presumably the team has meetings at which they discuss the stuff they
> interested in and come to some agreement as to what terminology they  want
[quoted text clipped - 24 lines]
> meaningless, like say "fshsalkfd". Apparently,  your hypothetical modellers
> are not modellers but some kind of impostors.

It is usually much more subtle than that.  Everyone agrees that we need
to know whether or not someone is a fullTimeStudent.  Ignore the fact
that this would likely be a derived attribute -- it illustrates the
problem.  After some sessions with folks from many departments, the
analyst works to get more precision and sits down with someone who
knows all of the tuition rules, along with another person ('cause the
analyst is no rookie) and they nail down this attribute with the
precision of a surgeon.

The system goes live and the financial aid people are irate! Federal
aid has just been removed from students because they were no longer
flagged as being a fullTimeStudent when by the standards for this
financial aid, they clearly ARE a fullTimeStudent.

Then you find out that these two departments use the very same term and
might even both have external reasons to use the very same term, and
they use it with just slightly different meanings.

It does help if there is a well-maintained and easily used catalog /
dictionary / metadata repository.  But words are just that.  --dawn
VC - 11 Aug 2005 12:02 GMT
....

>> You are kidding, right ?  If the modellers chose the name/label "source"
>> and
[quoted text clipped - 20 lines]
> might even both have external reasons to use the very same term, and
> they use it with just slightly different meanings.

Apparently, the analysts made a mistake in assuming that the set of
fullTimeStudents is equal to the set of studentsEligibleForFinancialAid.  I
did not claim that one can correctly analyze a complex system at one go,
it's an iterative process of trial end error.  Besides,  your example is
*not* about naming issues (as you understand yourself) -- presumably there
was no ambiguity about the "student" entity .

> It does help if there is a well-maintained and easily used catalog /
> dictionary / metadata repository.  But words are just that.  --dawn
dawn - 11 Aug 2005 15:51 GMT
> ....
> >
[quoted text clipped - 25 lines]
> Apparently, the analysts made a mistake in assuming that the set of
> fullTimeStudents is equal to the set of studentsEligibleForFinancialAid.

In this case, yes, but it also happens frequently where such a term is
used the same when the analysis is done, but something changes
(government regulation or something more subtle) that changes the
meaning slightly for one group and not another, so that these
differences creep in.

> I
> did not claim that one can correctly analyze a complex system at one go,
> it's an iterative process of trial end error.

and needs to be attended to for the life of the attribute name

> Besides,  your example is
> *not* about naming issues (as you understand yourself)

I thought it was about the name and def of an attribute.

> -- presumably there
> was no ambiguity about the "student" entity .

There are always differences of opinion about what constitutes a
student on a campus.  Finance people often use the term as if the
student were the same as a corporate customer.  Student = Customer.  If
someone has received some approval to audit a course for zero dollars,
the instructor might consider them a student.  That is just an example,
but the point is that entity names are also just words and are
interpreted by humans, each of whom brings a different context to the
meaning of the word.

I've been reading and writing too fast lately and might have missed the
point, so I'll re-read the thread before posting again.
cheers!  --dawn
Gene Wirchenko - 11 Aug 2005 17:58 GMT
[snip]

>> > It is usually much more subtle than that.  Everyone agrees that we need
>> > to know whether or not someone is a fullTimeStudent.  Ignore the fact
[quoted text clipped - 13 lines]
>> > might even both have external reasons to use the very same term, and
>> > they use it with just slightly different meanings.

>> Apparently, the analysts made a mistake in assuming that the set of
>> fullTimeStudents is equal to the set of studentsEligibleForFinancialAid.
[quoted text clipped - 4 lines]
>meaning slightly for one group and not another, so that these
>differences creep in.

    In British Columbia (and presumably Canada since I have seen
federal use of this meaning), you can be a full-time student by taking
three three-credit courses in a semester.  The usual full course load
is five.  This is not the commonsense definition, but it is the
definition used.

[snip]

>There are always differences of opinion about what constitutes a
>student on a campus.  Finance people often use the term as if the
[quoted text clipped - 4 lines]
>interpreted by humans, each of whom brings a different context to the
>meaning of the word.

    Such a student is a student by the normal use of the term.  I
think this factor is what causes a lot of the trouble.

    At my alma mater, there are three major classifications: student,
faculty, and staff.  They are not mutually exclusive.  I have known of
faculty who were students and staff who were faculty.  There is
nothing stopping a staff member from taking a course (making him also
a student) or for someone to be in all three categories at the same
time.

    I am a resident of the U.S.A.  I am not a U.S. citizen.  If
someone conflates the two, there could be a problem.

>I've been reading and writing too fast lately and might have missed the
>point, so I'll re-read the thread before posting again.
>cheers!  --dawn

    I think you are doing fine.

Sincerely,

Gene Wirchenko
mAsterdam - 11 Aug 2005 19:14 GMT
> [snip]
>>>>...two departments use the very
[quoted text clipped - 12 lines]
> is five.  This is not the commonsense definition, but it is the
> definition used.

Used by all? Or only by non-commonsensical people?
I'm overstating here surely, but I want to point out
that is definition is for a purpose.
People/business/departments who support this purpose
will tend to use it - and check some register or student card
to verify wether somebody who claims to be a student actually is.
Others will simple ask: are you a student?
(e.g. for downloading some software) and accept the
answer as truth.

>>There are always differences of opinion about what constitutes a
>>student on a campus.  Finance people often use the term as if the
[quoted text clipped - 7 lines]
>      Such a student is a student by the normal use of the term.
> I think this factor is what causes a lot of the trouble.

Could you elaborate some on this factor?

>      At my alma mater, there are three major classifications: student,
> faculty, and staff.  They are not mutually exclusive.  I have known of
> faculty who were students and staff who were faculty.  There is
> nothing stopping a staff member from taking a course (making him also
> a student) or for someone to be in all three categories at the same
> time.

Let's not draw subtyping into it at this point.
(Other thread welcome :-)
Gene Wirchenko - 11 Aug 2005 19:28 GMT
>> [snip]
>>>>>...two departments use the very
[quoted text clipped - 14 lines]
>
>Used by all? Or only by non-commonsensical people?

    By the common definition used at such institutions.  This is
probably informed by that these defintions are used by the provincial
and federal governments for student loans and on tax returns.

    Others are free to use a more literal meaning.

>I'm overstating here surely, but I want to point out
>that is definition is for a purpose.
[quoted text clipped - 4 lines]
>(e.g. for downloading some software) and accept the
>answer as truth.

    There is no argument from me on that.

>>>There are always differences of opinion about what constitutes a
>>>student on a campus.  Finance people often use the term as if the
[quoted text clipped - 9 lines]
>
>Could you elaborate some on this factor?

    One who studies.  If I study medieval history, I am a student.  I
might not be enrolled anywhere.  I could even be a leading authority
in the field.

>>      At my alma mater, there are three major classifications: student,
>> faculty, and staff.  They are not mutually exclusive.  I have known of
[quoted text clipped - 5 lines]
>Let's not draw subtyping into it at this point.
>(Other thread welcome :-)

    I am not subtyping, just saying that the statuses are not
mutually exclusive.

Sincerely,

Gene Wirchenko
mAsterdam - 11 Aug 2005 19:38 GMT
[snip agreement]

>>>>There are always differences of opinion about what constitutes a
>>>>student on a campus.  Finance people often use the term as if the
[quoted text clipped - 13 lines]
> might not be enrolled anywhere.  I could even be a leading authority
> in the field.

I see what you mean, but I am not sure you got my question right.
I meant: what is this factor which is causing a lot of trouble?
In more modern words: what is the anatomy of this anti-pattern?
We might learn to more easily recognize it.

>>>     At my alma mater, there are three major classifications: student,
>>>faculty, and staff.  They are not mutually exclusive.  I have known of
[quoted text clipped - 8 lines]
>      I am not subtyping, just saying that the statuses are not
> mutually exclusive.

Ok.
Gene Wirchenko - 11 Aug 2005 20:09 GMT
>[snip agreement]
>
[quoted text clipped - 20 lines]
>In more modern words: what is the anatomy of this anti-pattern?
>We might learn to more easily recognize it.

    I think that the trouble comes from overloading terms.  "student"
already has a meaning.  What distinguishes the special meaning from
the more literal meaning?  If I do not know that a special meaning is
in use in a specific context, I can make a lot of mistakes.

    I coin terms for our in-house client billing system.  Two
examples are "Work Function Code" and "Work Classification Code".
These terms have precise meanings.  It is possible for someone to
misinterpret these, but I think that they are sufficiently unusual
usage that most would ask what they mean instead of assuming as with
"student".

    One area of confusion we have is because of overloading.  We use
"client" to mean someone who buys from us (mainly services, for
example order fulfillment) and "customer" to refer to someone who buys
from one of our clients.  Some of our employees do not make the proper
(for us) distinction.

[snip]

Sincerely,

Gene Wirchenko
mAsterdam - 12 Aug 2005 00:50 GMT
>>[snip agreement]
>>>>>>There are always differences of opinion about what constitutes a
[quoted text clipped - 24 lines]
> the more literal meaning?  If I do not know that a special meaning is
> in use in a specific context, I can make a lot of mistakes.

Yep. One trick is not (just) to ask wether
someone is a student or not, but details about
the students registration (wether they are really
checked is an issue, depending on other, maybe later
requirements - first make them checkable).

>      I coin terms for our in-house client billing system.  Two
> examples are "Work Function Code" and "Work Classification Code".
> These terms have precise meanings.  It is possible for someone to
> misinterpret these, but I think that they are sufficiently unusual
> usage that most would ask what they mean instead of assuming as with
> "student".

Arrrgh - feast of recognition. Not. :-|
I'm used to similar systems. Some departments take these
distinctions very serious, other only pay lip-service.
Once in a while new managers want to know what's going
on and suddenly all kinds of conclusions hit the surface,
drawn from the highly polluted data.

>      One area of confusion we have is because of overloading.  We use
> "client" to mean someone who buys from us (mainly services, for
> example order fulfillment) and "customer" to refer to someone who buys
> from one of our clients.  Some of our employees do not make the proper
> (for us) distinction.

You know what CICS stands for?
paul c - 12 Aug 2005 03:28 GMT
>>> [snip agreement]
>>>
[quoted text clipped - 36 lines]
> requirements - first make them checkable).
> ...

i may be stepping on nuances that i haven't noticed in this thread, but
i think the above is getting close to the truth, at least the truth
these days.  so far, databases ARE naive and so are their "tests".  for
example, if a user thinks someone is a student and can "fill in" the
values that the db predicates want for a student, then the someone is a
student as far as the db is concerned, no matter what anyone else thinks.

pc
Gene Wirchenko - 12 Aug 2005 18:52 GMT
[snip]

>>      One area of confusion we have is because of overloading.  We use
>> "client" to mean someone who buys from us (mainly services, for
[quoted text clipped - 3 lines]
>
>You know what CICS stands for?

    I do not presume to know what an acronym stands for when
expressed out of context.  http://www.acronyms.ch/ says "Customer
Information Control System", but this is not a definition, but merely
an expansion, and I do not know if it is the one that you are thinking
of.

    "IDE" can mean "Integrated Development Environment" or
"Integrated Device Electronics", and formerly meant "Integrated Drive
Electronics".  As a software sort who know some hardware, I can easily
use either.

Sincerely,

Gene Wirchenko
mAsterdam - 13 Aug 2005 01:28 GMT
> [snip]
>>>     One area of confusion we have is because of overloading.  We use
[quoted text clipped - 10 lines]
> an expansion, and I do not know if it is the one that you are thinking
> of.

Oops. I was told it originally meant
"Customer Information and Client Support".
The only piece ever really built was the TP-monitor
- but the acronym stuck as the name.
I liked the story. I never checked it though, and now
I can't find any source for it. Sorry.

>      "IDE" can mean "Integrated Development Environment" or
> "Integrated Device Electronics", and formerly meant "Integrated Drive
> Electronics".  As a software sort who know some hardware, I can easily
> use either.
David  Cressey - 13 Aug 2005 16:30 GMT
"Gene Wirchenko"

Then there's PCMCIA, which expands to "People Can't Memorize Computer
Industry Acronyms"
Marshall  Spight - 13 Aug 2005 18:43 GMT
> Then there's PCMCIA, which expands to "People Can't Memorize Computer
> Industry Acronyms"

My favorite is still TWAIN: "Technology Without An Important Name."

Marshall
Gene Wirchenko - 15 Aug 2005 23:00 GMT
>> Then there's PCMCIA, which expands to "People Can't Memorize Computer
>> Industry Acronyms"

    Actually, "Personal Computer Memory Card International
Association", but the smart expansion is insidious as it is much more
mnemonic.

>My favorite is still TWAIN: "Technology Without An Important Name."
                                                   ^^^^^^^^^
    "Interesting"?

    I like "SCSI" for the pronunciation.

Sincerely,

Gene Wirchenko
vc - 11 Aug 2005 18:38 GMT
> > ....
> > >
[quoted text clipped - 31 lines]
> meaning slightly for one group and not another, so that these
> differences creep in.

That's life.  If the analyst was unable to anticipate some changes,
then they can be introduced later.  It's called schema evolution.

> > I
> > did not claim that one can correctly analyze a complex system at one go,
[quoted text clipped - 6 lines]
>
> I thought it was about the name and def of an attribute.

It most certainly was not.  It was about an incorrectly specified
predicate defining set of students eligible for financial aid.  What
attribute (and its name) did you have in mind ?

> > -- presumably there
> > was no ambiguity about the "student" entity .
[quoted text clipped - 4 lines]
> someone has received some approval to audit a course for zero dollars,
> the instructor might consider them a student.

In this case, during the analysis stage, one could have identified a
more generic entity, say, Person with entity subtypes of Student and
Customer.

> That is just an example,
> but the point is that entity names are also just words and are
> interpreted by humans, each of whom brings a different context to the
> meaning of the word.

You are quite right that names are just words without any specific
meaning [in the modelling context].  That's why it's necessary to
identify the actual entities (attributes, relations,  etc)  first and
then give them names.  That's what, among other things,  modelling is
about, no ?

> I've been reading and writing too fast lately and might have missed the
> point, so I'll re-read the thread before posting again.
> cheers!  --dawn
dawn - 15 Aug 2005 01:12 GMT
> > > "dawn" <dawnwolthuis@gmail.com> wrote in message
<snip>

> You are quite right that names are just words without any specific
> meaning [in the modelling context].  That's why it's necessary to
> identify the actual entities (attributes, relations,  etc)  first and
> then give them names.  That's what, among other things,  modelling is
> about, no ?

Yup, I'm with you -- I thought that we were discussing the other
direction: interpretation.  The database has values and metadata,
perhaps for years, and then there is disagreement on the
interpretation.  That is exceedingly common.  I think your point was
that if there is such misinterpretation during the analysis/design
phases, then the work isn't yet done and I agree.  
Cheers!  --dawn
mAsterdam - 11 Aug 2005 18:58 GMT
>>>>...If the modellers chose the
>>>>name/label "source" and did not define what
[quoted text clipped - 23 lines]
>>Apparently, the analysts made a mistake in assuming that the set of
>>fullTimeStudents is equal to the set of studentsEligibleForFinancialAid.

This assumes perfect and lasting information at modelling time.

> In this case, yes, but it also happens frequently where such a term is
> used the same when the analysis is done, but something changes
[quoted text clipped - 28 lines]
> point, so I'll re-read the thread before posting again.
> cheers!  --dawn

This is a sub-thread about synonym/homonym
problems, but this group does not tend to change
the subject line appropriately (I tried a few times,
but it did not really work). In the sub-thread your
contribution is right on the mark, IMO.
mAsterdam - 11 Aug 2005 18:13 GMT
>>>>>>>>...The two words,  "synonym" and "homonym"  are borrowed from
>>>>>>>>the argot of natural linguistics,  but the two problems arise
[quoted text clipped - 22 lines]
> terminology they  want to use and what the terms are
> supposed to mean.  

Let me get this straight. In your methodology the terminology
is ready (agreed upon) before the modelling starts?

> It's, like, introduction to modelling 101.  

What is 101?

> Besides,  you describe a hypothetical terminology
> selection/definition process yourself, so it's not clear what the problem
> might be unless the "team" neglects to identify, say, data objects and
> relationships [self-infilcts potential pain because of not doing required
> work].

How do you propose to identify data objects and
relationships from a requirement "It should be
possible to find out where a piece
of information came from."?

>>Another, less cryptic example:
>>
[quoted text clipped - 12 lines]
>
> You are kidding, right ?

No.

> If the modellers chose the name/label "source" and
> did not define what  entity the name refers to,
> then the name is just meaningless, like say "fshsalkfd".
> Apparently,  your hypothetical modellers
> are not modellers but some kind of impostors.

Please take the drivers' seat. Show us the real thing.
Pretend you are modelling the data to meet the requirement.
Feel free to ask relevant questions/check assumptions about it.

>>Both the synonym-problem (1) and the homonym-problem (2) may
>>very well be recognized and resolved, of course.
[quoted text clipped - 3 lines]
> [literary] fiction where one needs to bother with stuff like
> synonyms,  homonyms, metaphors, metonymy and what not.

If you don't bother with that "stuff" your work will be
exactly that: a work of fiction.

> Just identify the entities,  invent (or use commonly
> accepted ) names for them and you'll be a happy
> camper without any need to hide behind high-faluting
> nonsense like "synonym problem", "conceptual
> object type" or some such.

- I am not hiding at all.

- Please refrain from attributing terms to me I did not use.
vc - 11 Aug 2005 20:01 GMT
> > Presumably the team has meetings at which they discuss the
> > stuff they interested in and come to some agreement as to what
[quoted text clipped - 3 lines]
> Let me get this straight. In your methodology the terminology
> is ready (agreed upon) before the modelling starts?

That's not what I said.  I wrote 'they discuss the stuff they are
intersted in and come to an agreement as to what terminology they want
to use'.  It can be rephrased as:

Step one: discover the items/entities of interest;
Step two: give the entities names (or use the existing names if it
makes sense)

> > It's, like, introduction to modelling 101.
>
> What is 101?

'101' is a synonym of 'introductory'.

> > Besides,  you describe a hypothetical terminology
> > selection/definition process yourself, so it's not clear what the problem
[quoted text clipped - 6 lines]
> possible to find out where a piece
> of information came from."?

Presumably 'a piece of information' can come from a newspaper article,
gossip, bank statement, or any other entity capable of producing such
piece of information.  What's the problem with identifying the
information source ?  If the requirement is literally as general as you
wrote,  then it needs to be made more specific in order to be
realistically implemented.

> >>Another, less cryptic example:
> >>
[quoted text clipped - 24 lines]
> Pretend you are modelling the data to meet the requirement.
> Feel free to ask relevant questions/check assumptions about it.

I do not want to 'take the driver's seat' whatever it means.  *You*
claimed that there is a 'homonym/synonym problem' with respect to
naming entities and attributes,  therefore,  the burden of proof shwing
that such problem does in fact exist is squarely on *your* shoulders.
So far,  you failed to provide the proof:  your example either can be
handled by conventional entity-relationship methods, or is
under-specified to the extent of making almost no sense.

> >>Both the synonym-problem (1) and the homonym-problem (2) may
> >>very well be recognized and resolved, of course.
[quoted text clipped - 6 lines]
> If you don't bother with that "stuff" your work will be
> exactly that: a work of fiction.

If you say so.

> > Just identify the entities,  invent (or use commonly
> > accepted ) names for them and you'll be a happy
[quoted text clipped - 5 lines]
>
> - Please refrain from attributing terms to me I did not use.

"You" can be used,  in English,  as an impersonal pronoun referring to
anyone out there engaged in data modelling.  It was used in this sense
- nothing personal.

Cheers.
mAsterdam - 11 Aug 2005 20:52 GMT
>>>Presumably the team has meetings at which they discuss the
>>>stuff they interested in and come to some agreement as to what
[quoted text clipped - 11 lines]
> Step two: give the entities names (or use the existing names if it
> makes sense)

And they wouldn't make sense if they where syn/homonyms?

>>>It's, like, introduction to modelling 101.
>>
>>What is 101?
>
> '101' is a synonym of 'introductory'.
Thx.

>>>Besides,  you describe a hypothetical terminology
>>>selection/definition process yourself, so it's not clear what the problem
[quoted text clipped - 13 lines]
> wrote,  then it needs to be made more specific in order to be
> realistically implemented.

Of course - so what are the questions to ask to
get the specifics of the requirement?
Extrapolating from your assumption (newspaper article, etc)
you'ld ask about /which/ pieces of information you want
to know where it came from.
Another one would be: what is it you want to know about
where it came from?
Relevant, no doubt - but not adressing syn/homonym problem.
But that makes sense, since you also stated that it
shouldn't occur (I take it that 'self-induced'
implies 'mistake' here - please correct me if I
misinterpreted).

>>>>Another, less cryptic example:
>>>>
[quoted text clipped - 32 lines]
> handled by conventional entity-relationship methods, or is
> under-specified to the extent of making almost no sense.

"proof" (whatever that word means in this context)
is to much to ask.

In modelling sessions I recognize that
people spend a lot of time to get the meaning of terms
right for use in their model (IMO this is time well spent,
especially for large systems). Some of the effort revolves
around homonyms and synonyms. From what I read you explain
this effort as (just) correcting mistakes. I think there
is more about this that can be delt with in a systemic way - but
only (this is just a hunch) if we appreciate that this is
inherent to team modelling.

>>>>Both the synonym-problem (1) and the homonym-problem (2) may
>>>>very well be recognized and resolved, of course.
[quoted text clipped - 22 lines]
> anyone out there engaged in data modelling.  It was used in this sense
> - nothing personal.

Not only in English. I don't like to be put on one floor with
high-faluting nonsense.
VC - 12 Aug 2005 03:19 GMT
>>>>Presumably the team has meetings at which they discuss the
>>>>stuff they interested in and come to some agreement as to what
[quoted text clipped - 13 lines]
>
> And they wouldn't make sense if they where syn/homonyms?

Of course not.  Why would one want to use the same name for two different
entities [self-inflicted pain] ?  If imagination is lacking, and one prefers
to call an entity a Thing, one can use at least Thing1,  Thing2,,,  ad
infinitum.,  if needed., in order to avoid the homonym problem.  Synonyms
are even easier,  just use one,  not two or more, names for the same entity
and you should be all set.

>>>How do you propose to identify data objects and
>>>relationships from a requirement "It should be
[quoted text clipped - 15 lines]
> Another one would be: what is it you want to know about
> where it came from?

Well,  the attributes one wants to model surely depend on what the customer
wants to do with them,  sort of obvious, no ?  Why not ask the customer
directly about that ?  E.g. with respect to a newpaper source something like
the publisher,  circulation,  font, whatever,  it really depends on what the
customer wants to know.  .

> Relevant, no doubt - but not adressing syn/homonym problem.
> But that makes sense, since you also stated that it
> shouldn't occur (I take it that 'self-induced'
> implies 'mistake' here - please correct me if I
> misinterpreted).

Right.  I consider mistakes in trying to identify entities and their set of
attributes (as well as relationships between entities),  a more serious
problem than largely overblown issues with synonyms and homonyms which are
easy to avoid.

> In modelling sessions I recognize that
> people spend a lot of time to get the meaning of terms
> right for use in their model (IMO this is time well spent,
> especially for large systems).

The term meaning is the entity/attribute it names.  When divorced from the
object it names,  the term is just a meaningless string of characters.
No doubt,  naming conventions are very important to make the understanding
of a data model easier for humans (by analogy,  association, etc),  however,
terminology is secondary in importance to  entity/attribute/relationship
discovery process.

> Some of the effort revolves
> around homonyms and synonyms. From what I read you explain
> this effort as (just) correcting mistakes. I think there
> is more about this that can be delt with in a systemic way - but
> only (this is just a hunch) if we appreciate that this is
> inherent to team modelling.

See above.
Gene Wirchenko - 12 Aug 2005 18:52 GMT
[snip]

>Of course not.  Why would one want to use the same name for two different
>entities [self-inflicted pain] ?  If imagination is lacking, and one prefers

    It can happen when two different points-of-view intersect.
Realising that two apparently different entities are actually the same
can be tricky, especially when they appear at first to be different.

    On the other side, realising that you are dealing not with
identitites but distinct things -- particularly when the entities are
similar -- can also be tricky.

>to call an entity a Thing, one can use at least Thing1,  Thing2,,,  ad
>infinitum.,  if needed., in order to avoid the homonym problem.  Synonyms
>are even easier,  just use one,  not two or more, names for the same entity
>and you should be all set.

    If you realise that you have such a situation.

[snip]

Sincerely,

Gene Wirchenko
VC - 12 Aug 2005 21:18 GMT
> [snip]
>
[quoted text clipped - 5 lines]
> Realising that two apparently different entities are actually the same
> can be tricky, especially when they appear at first to be different.

Right.  Presumably the real world object would be modelled by two entities
with  different sets of attributes (otherwise it would be easy to recognize
that two, or more,  entities represent the same object).  Therefore, from
the point of of the model itself,  there are two different entities with two
different names thus the dreaded "synonym problem" simply cannot occur..

>     On the other side, realising that you are dealing not with
> identitites but distinct things -- particularly when the entities are
> similar -- can also be tricky.

Right.  That's why modelling is an error and trial process.

>>to call an entity a Thing, one can use at least Thing1,  Thing2,,,  ad
>>infinitum.,  if needed., in order to avoid the homonym problem.  Synonyms
[quoted text clipped - 3 lines]
>
>     If you realise that you have such a situation.

Even if you don't,   the error is not a naming problem (synonym/homonym and
such) but rather a  mistake in identifyng correctly the real world objects
and their attributes of interest.

> [snip]
>
> Sincerely,
>
> Gene Wirchenko
Jonathan Leffler - 13 Aug 2005 08:14 GMT
>> It's, like, introduction to modelling 101.  
> What is 101?

In the USA, the first course in a given subject seems to be 'Subject
101'; subsequent courses in the same subject get larger numbers (102,
201, dunno what the sequence normally is, and it likely varies between
institutions anyway).  I'm not clear whether this applies in regular
schools (K-12 - meaning kindergarten to grade 12, or ages 5-18) or
whether it really only applies to university courses.  (And, just to add
to the confusion, when they ask you where you went to school, Americans
most often mean where did you go to university.  Isn't it fun sharing a
common language!)

So 'Modelling 101' is a basic course in 'Modelling'.

Signature

Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler@earthlink.net, jleffler@us.ibm.com
Guardian of DBD::Informix v2005.02 -- http://dbi.perl.org/

Marshall  Spight - 13 Aug 2005 16:11 GMT
> > > It's, like, introduction to modelling 101.
> >
[quoted text clipped - 6 lines]
> schools (K-12 - meaning kindergarten to grade 12, or ages 5-18) or
> whether it really only applies to university courses.

It only applies to college/university.

As an aside, also in the USA, we call it "college" even if it's a
university. We only preserve the "college/university" distinction
in the names of the institutions. You could say "he went to college
at Harvard" just as well as you could say "he went to college at
Snakewater Community College." I understand the distinction is
important in other dialects of English, but it's not made in the
USA. Anti-classism, maybe? (Just a wild speculation; no flames please.)
I've never heard anyone say "where did you go to university?" who
was a native of the USA.

> (And, just to add
> to the confusion, when they ask you where you went to school, Americans
> most often mean where did you go to university.  Isn't it fun sharing a
> common language!)

Just so!

I also note that 101 is an important freeway in California; it runs
from San Francisco to San Jose (and on to less interesting places like
Los Angeles, ha ha) which means it's the primary artery for Silicon
Valley. 101 is a section of my commute, and has been for most of my
adult life.

Marshall
mAsterdam - 16 Aug 2005 00:17 GMT
>>> It's, like, introduction to modelling 101.  
>>
[quoted text clipped - 11 lines]
>
> So 'Modelling 101' is a basic course in 'Modelling'.

Thank you :-)

The modellers I was talking about were real people,
well educated (most of them qualified to teach way
beyond modelling 101) and well behaved.
Frank_Hamersley - 16 Aug 2005 02:47 GMT
> Jonathan Leffler wrote:
> >>> It's, like, introduction to modelling 101.
[quoted text clipped - 11 lines]
> >
> > So 'Modelling 101' is a basic course in 'Modelling'.

<OT continued>

In my neck of the woods the first digit prescribed the undergraduate year
number and the remaining digits where used to identify sub-courses.  We
rarely used '01' - but any zeros usually correlated with more
stature/difficulty.  For instance Chem 100 was followed by Chem 200 and
finally Chem 300 if pursuing a major in Chem for a B.Sc.  Chem 210 (Organic)
and 220 (Inorganic) were implied by enrolling for Chem 200.  Often courses
numbers like Biometrics 221 and 222 were single semester/term subjects on a
narrow topic.

So Modelling 101 (IMO) barely qualifies you to do anything - in fact it
prolly increases project risk significantly if a so called practitioner gets
into the workforce on the strength of it! ;-)  Cheers Frank.
paul c - 10 Aug 2005 01:16 GMT
> [snip]
>
[quoted text clipped - 8 lines]
> with teams. I never experienced the problem not coming up.
> ...

maybe you were just lucky.  just kidding!

cheers,
pc