Database Forum / General DB Topics / DB Theory / December 2007
Character string relation and functional dependencies
|
|
Thread rating:  |
Tegiri Nenashi - 03 Dec 2007 19:16 GMT The connection between objects and databases seems to be a recurring theme here on c.d.t, so I'd like to contribute one observation. Consider a Substring relation:
Substring ======= data | fragment ------------------- abab | ab abc | ab abc | bc ....
This is pretty unintersting relation until we add more attributes: the "from" and "to" indicating the position of the fragment inside the data, and "#" -- the occurence number. (All numbers are 0-based, intervals boudaries are including-excluding). The example becomes:
Substring ======= data | fragment | from | to | # ------------------------------------ abab | ab | 0 | 2 | 0 abab | ab | 2 | 4 | 1 abc | ab | 0 | 2 | 0 abc | bc | 1 | 3 | 0 ....
There are at least 3 functional dependencies: fragment, from -> to data, fragment, # -> from data, from, to -> fragment
Informally these FDs correspond to the length(), instr(), and substr() functions. So instead of talking about the class String with the length(), instr(), and substr() member functions, we can focus on a relation and functional dependencies....
rpost - 06 Dec 2007 17:40 GMT [...]
>Informally these FDs correspond to the length(), instr(), and substr() >functions. So instead of talking about the class String with the >length(), instr(), and substr() member functions, we can focus on a >relation and functional dependencies.... Certainly. But we can't describe the full semantics of strings in that way. How do you represent concatenation?
Another difference is that database tables are finite and variable, while the set of strings is infinite and fixed. But your idea is used in theoretical papers, where concrete domains are sometimes introduced in terms of fixed infinite tables.
 Signature Reinier Post TU Eindhoven
Tegiri Nenashi - 06 Dec 2007 18:15 GMT > [...] > [quoted text clipped - 5 lines] > Certainly. But we can't describe the full semantics of strings > in that way. How do you represent concatenation? That was not the right representation, consider the relation:
str | prefix | suffix | pos ----------------------------- abcd | ab | cd | 2 abac | ab | ac | 2 ....
Functional dependencies: prefix & suffix -> str (corresponds to concatenation) prefix -> pos (length) str, pos -> prefix (prefix substring operation) str, pos -> suffix (suffix substring operation)
The combination of the latter two (first extract suffix, and then extract prefix) is the usual substring operation: substring(str, pos, length).
Question: how do we represent the "instr" operator?
> Another difference is that database tables are finite and variable, Oh, relations in database world are certainly not restricted by finite cardinality.
Jonathan Leffler - 06 Dec 2007 22:38 GMT >> Another difference is that database tables are finite and variable, > > Oh, relations in database world are certainly not restricted by finite > cardinality. I thought that computers are finite, so the relations containable in them are too - even if damn large. There's a big difference between very large and infinite.
One ultimate limitation is the uniqueness requirement. Suppose you have a table with two integer columns. Since the range of the integer types are finite (even if your DBMS handles multi-precision integers), then the maximum number of distinct rows in the relation is also finite.
 Signature Jonathan Leffler #include <disclaimer.h> Email: jleffler@earthlink.net, jleffler@us.ibm.com Guardian of DBD::Informix v2007.0914 -- http://dbi.perl.org/
publictimestamp.org/ptb/PTB-1963 whirlpool 2007-12-06 21:00:03 7275FCF2EDF20725F56D081C5528272FCD717970FCEE5D43EDE454007D35AD3E246B5C FA6134D8AD5E2A60C6F43508ADFAE3E632D92807ED38395BFD14DA3EA
Tegiri Nenashi - 07 Dec 2007 00:24 GMT > >> Another difference is that database tables are finite and variable, > [quoted text clipped - 4 lines] > them are too - even if damn large. There's a big difference between > very large and infinite. This doesn't really matter. You can still reason about infinite relations with finite resources available on you computer platform.
> One ultimate limitation is the uniqueness requirement. Suppose you have > a table with two integer columns. Since the range of the integer types > are finite (even if your DBMS handles multi-precision integers), then > the maximum number of distinct rows in the relation is also finite. All computer algebra systems work with numbers which are not restricted by a whim of hardware architects. 16/32/64 bit integer numbers (let alone floats)? give me a break!
Marshall - 07 Dec 2007 02:09 GMT > > >> Another difference is that database tables are finite and variable, > [quoted text clipped - 16 lines] > restricted by a whim of hardware architects. 16/32/64 bit integer > numbers (let alone floats)? give me a break! In fact, it is quite possible to treat n-bit "integers" in an algebraically pleasant way: as the ring of integers mod 2^n.
Marshall
V.J. Kumar - 09 Dec 2007 04:05 GMT >> >> Another difference is that database tables are finite and >> >> variable, [quoted text clipped - 8 lines] > This doesn't really matter. You can still reason about infinite > relations You can do that with your brain...
with finite resources available on you computer platform.
but not with that. The computer is an intrinsically finite gadget. Therefore, you'd better use the finite model apparatus to reason about things like the impossibility of expessing transitive closure in the relational algebra. A lot of stuff like the compactness theorem does not work with finite models which makes infinite model proofs inapplicable in the finite case.
>> One ultimate limitation is the uniqueness requirement. Suppose you >> have a table with two integer columns. Since the range of the [quoted text clipped - 5 lines] > restricted by a whim of hardware architects. 16/32/64 bit integer > numbers (let alone floats)? give me a break! Jonathan is of course right, the set of 'floats' is clearly finite that somewhat clumsily approximates real numbers !
Marshall - 09 Dec 2007 04:38 GMT > >> >> Another difference is that database tables are finite and > >> >> variable, [quoted text clipped - 32 lines] > Jonathan is of course right, the set of 'floats' is clearly finite that > somewhat clumsily approximates real numbers ! You fail.
Marshall
Tegiri Nenashi - 10 Dec 2007 18:59 GMT > >> >> Another difference is that database tables are finite and > >> >> variable, [quoted text clipped - 19 lines] > not work with finite models which makes infinite model proofs > inapplicable in the finite case. Let's make a discussion al little bit more specific. Consider language recognition problem: given a word (that is a string of symbols) and a grammar, check if the word generated by the grammar. A naive evaluation would simply start by generating all the words of a (potentially infinite) langauge generated by the grammar, and stops as soon the given word is derived.
Likewise, given a query which involves infinite relations we can have multiple execution strategies, some of them are not terminating. For example, a join of the finite relation
x=1
with infinite relation
x=y
has two alternative executions. We can start enumerating all the tuples in the infinite relation x=y (assuming a countable domain)....
... and will never finish.
However, if we start with the finite relation x=1, the we can fetch all the matching tuples from x=y by "the index lookup".
V.J. Kumar - 11 Dec 2007 14:08 GMT >> Tegiri Nenashi <TegiriNena...@gmail.com> wrote >> innews:cd2ad8bd-78ca-433d-bc0f-3e7ef0c0fe2a@e6g2000prf.googlegroups.co [quoted text clipped - 48 lines] > However, if we start with the finite relation x=1, the we can fetch > all the matching tuples from x=y by "the index lookup". The formulas look very cute, no doubt about that, but immediate questions are:
how do you propose to index infinite relations ? what do you do when you join two infinite relations and the result is also an infinite relation: x < 5 join x > 1 where x ranges over reals ?
Marshall - 11 Dec 2007 14:57 GMT > >> Tegiri Nenashi <TegiriNena...@gmail.com> wrote > >> innews:cd2ad8bd-78ca-433d-bc0f-3e7ef0c0fe2a@e6g2000prf.googlegroups.co [quoted text clipped - 53 lines] > > how do you propose to index infinite relations ? By formula. For the infinite relation x=y, given x, the formula for y is straightforward. :-)
> what do you do when you join two infinite relations and the result is also > an infinite relation: x < 5 join x > 1 where x ranges over reals ? You leave it unevaluated until something extensional comes along.
Marshall
David Cressey - 11 Dec 2007 16:50 GMT > >> Tegiri Nenashi <TegiriNena...@gmail.com> wrote > >> innews:cd2ad8bd-78ca-433d-bc0f-3e7ef0c0fe2a@e6g2000prf.googlegroups.co [quoted text clipped - 53 lines] > > how do you propose to index infinite relations ? With an infinite index, of course.
Tegiri Nenashi - 11 Dec 2007 17:20 GMT > The formulas look very cute, no doubt about that, but immediate questions > are: > > how do you propose to index infinite relations ? First, let's establish the idea that an index for a finite relation R(x,y) is a function. Given x=1 an index function returns all the "matching" tuples, or a single tuple if there is a functional dependency x->y. For an equality relation x=y this function is identity: given x, return x.
> what do you do when you join two infinite relations and the result is also > an infinite relation: x < 5 join x > 1 where x ranges over reals ? Well, let's not get into incountable domains, because there appeared no way to apply the ideas of pipelined execution flow there. Also, as Marshal noted, it is perfectly allright for an intermediatory query evaluation result to be infinite; it is the final result where a user is expected to see the finite set.
V.J. Kumar - 11 Dec 2007 19:29 GMT >> The formulas look very cute, no doubt about that, but immediate >> questions are: [quoted text clipped - 3 lines] > First, let's establish the idea that an index for a finite relation > R(x,y) is a function. First of all, if you introduce the arbitrary function as an additional primitive, deus ex machina as it were, then in many cases you do not need any of the finitely definable infinite relation apparatus. SQL already has built-in or user-defined functions that make possible evaluating queries over a finite tuple domain with embedded infinitary structures. You need to try and express query evaluation in your model with relations only -- that's the whole point !
>Given x=1 an index function returns all the > "matching" tuples, or a single tuple if there is a functional [quoted text clipped - 7 lines] > Well, let's not get into incountable domains, because there appeared > no way to apply the ideas of pipelined execution flow there. I do not know what "pipelined execution flow" is, but the uncountable case is what might actually motivate your research, as in the spacial- temporal database area. The countable case by itself is not very interesting especially if the user "is expected to see the finite set".
If you admit first-order formulas ("extended tuples") as a legitimate query evaluation result, then your ideas coincide with Kanellakis's ideas wrt constraint databases, a tree that has yet to bear practically usable fruits !
> Also, as > Marshal noted, it is perfectly allright for an intermediatory query > evaluation result to be infinite; it is the final result where a user > is expected to see the finite set. Tegiri Nenashi - 11 Dec 2007 20:37 GMT > >> The formulas look very cute, no doubt about that, but immediate > >> questions are: [quoted text clipped - 11 lines] > structures. You need to try and express query evaluation in your model > with relations only -- that's the whole point ! Well, in traditional databases index structures are auxiliary. Likewise, I would like to keep functions hidden. After all there is one relation
x + y = z
but there are three functions that can index it.
Anyway, this whole sub thread is a digression off the initial thought about connection between objects and relations on one hand, and object methods and functional dependencies on the other. I'm surprised nobody objected the idea (pun intended) that object corresponds to the relation, and not relation attribute.
In a word, are computers finite machines? Yes. Does it necessarily restrict relations (and other structures) to be be finite ones? Not necessarily. I suggest to close this rather unsophisticated discussion.
Tegiri Nenashi - 11 Dec 2007 20:47 GMT > Well, in traditional databases index structures are auxiliary. > Likewise, I would like to keep functions hidden. After all there is [quoted text clipped - 3 lines] > > but there are three functions that can index it. Let me elaborate a little more. Suppose we are asked to evaluate the query
x + y = z /\ x=1 /\ z=4
there is no functions there. As usual optimizer engine starts enumerating and costing every join permutation. It would find out that the execution below has a finite cost:
1. Evaluate x=1 2. Evaluate z=4 3. Build a Cartesian product result 4. Join with the relation x + y = z using the index (x,z)->z-x
V.J. Kumar - 11 Dec 2007 21:27 GMT Tegiri Nenashi <TegiriNenashi@gmail.com> wrote in news:38860a21-e69a- 46a4-a798-cc16dde1c4a7@e10g2000prf.googlegroups.com:
>> Well, in traditional databases index structures are auxiliary. >> Likewise, I would like to keep functions hidden. After all there is [quoted text clipped - 17 lines] > 3. Build a Cartesian product result > 4. Join with the relation x + y = z using the index (x,z)->z-x What would your optimizer do in the following case:
x^2 + y^2 + z^2 =u^2 /\ u=25
?
Tegiri Nenashi - 11 Dec 2007 21:47 GMT > Tegiri Nenashi <TegiriNena...@gmail.com> wrote in news:38860a21-e69a- > 46a4-a798-cc16dde1c...@e10g2000prf.googlegroups.com: [quoted text clipped - 24 lines] > > x^2 + y^2 + z^2 =u^2 /\ u=25 Well how indexes are created in the off-the-shelf databases? DB administrator creates them, even though the task is rather trivial. (Remember, that long time ago DBMS were conceived to do it automatically:-). The case of infinite relations is much more challenging, therefore I expect a database programmer (or should I better call him "relational programmer") to code the index function *manually* and register each index with the corresponding relation.
Tegiri Nenashi - 12 Dec 2007 17:37 GMT > > Tegiri Nenashi <TegiriNena...@gmail.com> wrote in news:38860a21-e69a- > > 46a4-a798-cc16dde1c...@e10g2000prf.googlegroups.com: [quoted text clipped - 32 lines] > better call him "relational programmer") to code the index function > *manually* and register each index with the corresponding relation.- Hide quoted text - The problem is, of course, identifiying the indexes. And it may be problematic whether more general problems fit into the system R optimization method. For example given a system of equations:
x + y = 2 x - y = 0
the solution is
(x+y=z) /\ (z=2) /\ (x=y)
Assuming that we have 3 indexes on the first relation: x,y -> z x,z -> y y,z -> x and two indexes on the last one x -> y y -> x we still can't evaluate this query. BTW, how Kanellakis solves it?
V.J. Kumar - 13 Dec 2007 01:17 GMT >> > Tegiri Nenashi <TegiriNena...@gmail.com> wrote in >> > news:38860a21-e69a- [quoted text clipped - 56 lines] > y -> x > we still can't evaluate this query. BTW, how Kanellakis solves it? He does not ! All you do is just substitute relation definitions for the query placeholder variables. You always work with formulas representing possibly infinite relations, you do not 'materialize' down to the actual tuples; in finite cases such materialization clearly happens automatically since any finite relation is representable with the equality predicate and constants.
V.J. Kumar - 11 Dec 2007 21:06 GMT
> Anyway, this whole sub thread is a digression off the initial thought > about connection between objects and relations on one hand, and object > methods and functional dependencies on the other. I'm surprised nobody > objected the idea (pun intended) that object corresponds to the > relation, and not relation attribute. That's probably because it is hard to object to or agree with an idea so vaguely formulated. Yes, objects in some sense could be and have been treated as relations, and vice versa, so what ? Some commercial SQL databases even have both implementations, 'objects as relations' and 'objects as attributes'. Until and unless you define some sort of object algebra and compare its features with the relational algebra or some other algebra of relations, there is nothing to argue about !
> In a word, are computers finite machines? Yes. Does it necessarily > restrict relations (and other structures) to be be finite ones? Not > necessarily. I suggest to close this rather unsophisticated > discussion. Jan Hidders - 09 Dec 2007 12:47 GMT > > >> Another difference is that database tables are finite and variable, > [quoted text clipped - 7 lines] > This doesn't really matter. You can still reason about infinite > relations with finite resources available on you computer platform. Indeed you can. In case anyone is interested in a concrete example:
http://cse.unl.edu/~revesz/cdb/index.htm
-- Jan Hidders
Marshall - 09 Dec 2007 16:15 GMT > > > >> Another difference is that database tables are finite and variable, > [quoted text clipped - 11 lines] > > http://cse.unl.edu/~revesz/cdb/index.htm Looks very interesting. I bought the intro book.
Marshall
Jan Hidders - 10 Dec 2007 16:19 GMT > > > > >> Another difference is that database tables are finite and variable, > [quoted text clipped - 13 lines] > > Looks very interesting. I bought the intro book. Nice! I only read the other book myself, but I've met Peter and he writes very nice papers and gives good talks, so I suspect he writes well.
-- Jan Hidders
V.J. Kumar - 10 Dec 2007 01:09 GMT >> > >> Another difference is that database tables are finite and >> > >> variable, [quoted text clipped - 10 lines] > > Indeed you can. In case anyone is interested in a concrete example: This kind of "reasoning" is an illusion. The constraint database the link refers to works with a finite set of first-order formulas ("extended tuples") that permit quantifier elimination rather than with a finite set of actual tuples as the relational model would. It is a simple observation that a first-order formula under certain conditions can "represent" a possibly infinite set. However, formula-set interpretation happens in the user brain only, not in the database itself as would be the case with the traditional relational model which probably explains along with other problems why constraint databases do not enjoy much success.
> http://cse.unl.edu/~revesz/cdb/index.htm > > -- Jan Hidders Jan Hidders - 10 Dec 2007 17:49 GMT > >> > >> Another difference is that database tables are finite and > >> > >> variable, [quoted text clipped - 19 lines] > interpretation happens in the user brain only, not in the database > itself as would be the case with the traditional relational model [...] Come on, V.J., I'm quite sure that you can make much more interesting contributions to this newsgroup than this kind of quibbling. Please don't let your ego cloud the brightness of your mind.
-- Jan Hidders
paul c - 11 Dec 2007 05:18 GMT >>>>> Another difference is that database tables are finite and variable, >>>> Oh, relations in database world are certainly not restricted by finite [quoted text clipped - 10 lines] > > -- Jan Hidders Because I don't have access to any paid-for resources, I wasn't able to dig deeper than the page indicated, but when I looked at it, it was immediately reminiscent to me of type theory (as opposed to relational theory). Nothing wrong with that of course. However, I think it's important to ask ourselves what it is it that makes sense to implement.
The sense of equality for me has very much to do with one's situation, ie., the application. I can see why some apps might see "overlapping" rectangles, ones that share common vertices if I may use that term, as equal. Also how one might consider different photographs of multiple persons as being the same so far as they might portray the same individual, among others.
When it comes to a system, I've always had the reaction that a facility to define domains/types/objects is quite separate from the Rm, even though any RM implementation must depend on such a facility, whether it is a mental one or a programmed one. Usually, from what I've seen, it is not programmed.
That's about as mystical as I'm prepared to go, outside of a usenet genealogy group.
Marshall - 09 Dec 2007 08:38 GMT > There's a big difference between > very large and infinite. This gets said a lot, and it's true, but it's not very important.
In practical terms, there is nearly no difference between a few hundred bits of precision and an infinite number of bits of precision. Twenty three digits of pi is sufficient to calculate the circumference of the Milky Way from the diameter to within a centimeter.
Marshall
Marshall - 07 Dec 2007 02:06 GMT > > Another difference is that database tables are finite and variable, > > Oh, relations in database world are certainly not restricted by finite > cardinality. Neither is it restricted to variables--not only in papers and in theory but in practical terms.
Marshall
paul c - 07 Dec 2007 03:54 GMT >>> Another difference is that database tables are finite and variable, >> Oh, relations in database world are certainly not restricted by finite [quoted text clipped - 3 lines] > theory > but in practical terms. Marshall, why do I get the feeling you are drifting the theme? (which coming from you would be absolutely okay by me, but this seems a bit cryptic to me and I wonder if you care to expand?)
Marshall - 07 Dec 2007 06:34 GMT > >>> Another difference is that database tables are finite and variable, > >> Oh, relations in database world are certainly not restricted by finite [quoted text clipped - 7 lines] > coming from you would be absolutely okay by me, but this seems a bit > cryptic to me and I wonder if you care to expand?) Well, if nothing else, we can certainly consider relation constants. DUM and DEE are certainly useful, and of course those are constants, not variables.
I also don't see anything wrong with considering, say, addition on arbitrary sized integers as an infinite relation, and reasoning about it accordingly. Yes, there are machine resource limits, but that doesn't have to limit our reasoning.
Also, I am working right now on the idea of a relational logic, so I spend a certain amount of time thinking about equations with table variables that I don't know much about. Sometimes not even the predicate.
I guess drifting the theme is something I do a lot. :-)
Marshall
|
|
|