Database Forum / General DB Topics / DB Theory / March 2008
Functions in the relational context
|
|
Thread rating:  |
Tegiri Nenashi - 06 Mar 2008 20:14 GMT Many people from both side of the fence suggested that functional programming is a promising idea that can embrace both paradigms. Here is why I don't think so. Consider the earlier cute exercise:
On Mar 6, 10:55 am, Tegiri Nenashi <TegiriNena...@gmail.com> wrote:
> What function the following query > > SELECT x FROM sine WHEN y=0.5 > > represent? We have a relation Sine(x,y), which in SQL looks like this
table Sine ( x number; y number; )
One can go ahead and start inserting the values into it, until he realizes that this relation is infinite, so it is better be something else than a table. To simplify the discussion, however, I'll continue to pretend that it is a table (that is relation with finite number of tuples).
Now, consider the query
SELECT y FROM sine WHEN x=pi/4
Without any index on the Sine table, the execution has to scan all the values and filter only those that satisfy the predicate. We'd better have some index to make a quick execution:
create unique index sin on Sine(x);
With this index in place the execution is just a unique index scan, then the retrieval of the matching tuple from the table. Likewise, if we have
SELECT x FROM sine WHEN y=1/2
then we'd better have an index
create index arcsin on Sine(y);
Unlike the previous case it is no longer a unique.
Now does it coincidental, that I named the indexes "sin" and "arcsin", implying that there is perhaps some similarity to the functions sin(), and arcsin()? (The latter being multivalued function, of course). Not at all, the only difference is in implementation:
* The index "sin" calculates the value sin(pi/4) by performing a lookup in some structure.
* The function "sin" calculates the value sin(pi/4) by performing arithmetic operations.
This idea IMO is a vehicle to extending relations to the infinite case. We can't scan infinite relations in principle, but can join them by the means of the indexes -- that is the functions.
Yagotta B. Kidding - 06 Mar 2008 20:45 GMT Tegiri Nenashi <TegiriNenashi@gmail.com> wrote in news:ffb9d7ab-99d0- 44b3-91df-be581420f6c6@s12g2000prg.googlegroups.com:
> Many people from both side of the fence suggested that functional > programming is a promising idea that can embrace both paradigms. Here [quoted text clipped - 56 lines] > case. We can't scan infinite relations in principle, but can join them > by the means of the indexes -- that is the functions. I do not see how your exercise, whilst no doubt cute, discredits the idea that FP can be fruitfully applied to the data management field. Care to provide the missing link/s in your logical chain, like, 'in the RA we can do such and so, but in FP we cannot, therefore, FP is really no good'. Besides, on its face, the exercise is not really relevant to the RA data management. Are you talking about something like constraint databases where you operate with formulas rather than data ?
Tegiri Nenashi - 06 Mar 2008 21:45 GMT > I do not see how your exercise, whilst no doubt cute, discredits the > idea that FP can be fruitfully applied to the data management field. To explore this in any depth, let's clarify what FP is. The function is considered as a data? I fail to see why this idea is significant. Care to support it with an example?
> Care to provide the missing link/s in your logical chain, like, 'in the > RA we can do such and so, but in FP we cannot, therefore, FP is really > no good'. Well, we are not talking completenes, are we? Certainly, in RA we can't express certain things that even primitive procedural language can do, yet we agree that algebraic approach is a good idea.
> Besides, on its face, the exercise is not really relevant to > the RA data management. Are you talking about something like constraint > databases where you operate with formulas rather than data ? I'm not certain where this observation leads to. Combined with an idea that function composition is nothing more than a relational join, the intuition might be that an extension of the relational model don't necessarily require functional programming ideas.
Marshall - 06 Mar 2008 22:22 GMT > > I do not see how your exercise, whilst no doubt cute, discredits the > > idea that FP can be fruitfully applied to the data management field. > > To explore this in any depth, let's clarify what FP is. The function > is considered as a data? Whoops, no, that's not it.
"Functional programming" describes a family of languages with many attributes in common but no hard a fast definition. The name is evocative of the primary position that functions are given. Treating functions as data per se is not a statement I would consider particularly descriptive. Rather I would say that in FP, functions are considered *first class* entities, in this sense of the term:
http://en.wikipedia.org/wiki/First-class_object
FP has a number of other connotations. Functions in FP typically (or in the case of pure FP *necessarily*) are free of side-effects. FPLs also usually make heavy use of algebraic data types
http://en.wikipedia.org/wiki/Algebraic_data_types
and use structural recursion to operate on them.
> I fail to see why this idea is significant. > Care to support it with an example? [quoted text clipped - 6 lines] > can't express certain things that even primitive procedural language > can do, yet we agree that algebraic approach is a good idea. Indeed the algebraic approach is a good idea. And anyone who appreciates this approach, and who has any background in abstract algebra for example, will find (I would expect) that function programming has a very familiar, very axiomatic feel to it.
> I'm not certain where this observation leads to. Combined with an idea > that function composition is nothing more than a relational join, the > intuition might be that an extension of the relational model don't > necessarily require functional programming ideas. Rather my take is, once we observe that function composition is nothing more than a relational join, we might then want to find a model that has the most mature, most axiomatic treatment of computable functions, so that whatever system we are designing, its treatment of functions will be as advanced as possible. And we find that model in the functional programming world.
Marshall
Tegiri Nenashi - 07 Mar 2008 18:41 GMT > "Functional programming" describes a family of languages with > many attributes in common but no hard a fast definition. The [quoted text clipped - 5 lines] > > http://en.wikipedia.org/wiki/First-class_object <quote> Depending on the language, this can imply: </quote>
Let's go through this list one by one.
* being expressible as an anonymous literal value
Unnamed functions? Any example where you actually benefit from this? I understand that we have unnamed relations, so certainly we can have unnamed functions, but is it really big deal?
* being storable in variables
OK, we have relation variables, why not functional variables too?
* being storable in data structures
?
* having an intrinsic identity (independent of any given name)
Do relations have identity?
* being comparable for equality with other entities
Any practical example where you can leverage this?
* being passable as a parameter to a procedure/function
OK, function callbacks. In my last 2 years of Java programming I used abstract class with an abstract method once. Contrast this to bread- and-butter loops, conditionals and assignment.
So this is rather rare feature. Exotic features are not critical for language success. Transitive closure, relational division are cute operations, but it is join and projection that are at the core of the RM success.
* being returnable as the result of a procedure/function
Ditto
* being constructable at runtime
Wow, I never use generated code. I even go that far that dumped off- the shelf parser generators to the point of designing a homegrown parser (on non-joke grammar)
* being printable * being readable
Excuse me, is
int increment( int a ) { rerurn a+1; }
non printable?
* being transmissible among distributed processes
?
* being storable outside running processes
What this is about: Stored procedures, or verson control system? in what respect these two properties are fundamental?
Also is recursion hidden somwhere on this list?
> FP has a number of other connotations. Functions in FP > typically (or in the case of pure FP *necessarily*) are free > of side-effects. FPLs also usually make heavy use of > algebraic data types > > http://en.wikipedia.org/wiki/Algebraic_data_types This concept totally escapes me. What algebraic is about these datatypes? For something being algebraic the underlying operations have to at least remotely resemble the familiar ones plus and multiply, otherwise one quickly finds himself on unfamiliar slippery ground.
> and use structural recursion to operate on them. Why structural recursion is important for practical programmer? A typical programmer's day (OK, that was actually my last week) is split into the following parts: * designing an algorithm * coding * debugging it * optimizing performance I didn't feel that I need to prove any assertions by induction -- intution during the coding phase and bug fix during debugging usually take care of correctness.
This Fortran programming style looks regressive, but may I ask again what are the benefits of functional programming. Is it really higher level abstraction?
rpost - 08 Mar 2008 13:53 GMT > * being passable as a parameter to a procedure/function > [quoted text clipped - 3 lines] > >So this is rather rare feature. For you, perhaps. Many others would consider callbacks to be among the defining characteristics of OO programming. I think it's safe to say that OO programming was primarily motivated by the idea of using callbacks to define an application's response to user interaction. (What Microsoft called "visual programming". It is also called "event-based programming" sometimes.)
 Signature Reinier
Yagotta B. Kidding - 07 Mar 2008 03:14 GMT Tegiri Nenashi <TegiriNenashi@gmail.com> wrote in news:807ffc86-9840- 4a66-8e66-5c1409ad7f1f@s37g2000prg.googlegroups.com:
>> I do not see how your exercise, whilst no doubt cute, discredits the > [quoted text clipped - 3 lines] > is considered as a data? I fail to see why this idea is significant. > Care to support it with an example? FP is a specific programming toolset usually described as having features like higher-order functions (functions accepting other functions as arguments), no side effects/ref transparency, advanced type system capable of automatic type inference, etc. Haskell or SML minus SML's imperative features are typical FP languages. See http://en.wikipedia.org/wiki/Haskell_(programming_language), for example. For ideas on how FP can be applied to data management, see http://citeseer.ist.psu.edu/474871.html. Libkin's work on bag comprehension is in a similar vein.
>> Care to provide the missing link/s in your logical chain, like, 'in the > [quoted text clipped - 5 lines] > can't express certain things that even primitive procedural language > can do, yet we agree that algebraic approach is a good idea. No, not at all. You stated that you do not think FP is a good idea as applied to data management but did not say why exactly it is not a good idea. Then, later you've apparently admitted that you do not have a clear idea of what FP is. So, I am confused as to what your point is vis-à-vis FP vs the rest ! But, I'm repeating myself and we can just let it go, therefore, unless you have a specific example to make your point clearer.
>> Besides, on its face, the exercise is not really relevant to >> the RA data management. Are you talking about something like constraint [quoted text clipped - 5 lines] > intuition might be that an extension of the relational model don't > necessarily require functional programming ideas. I am not sure what the 'f.c. as r.j.' viewpoint buys you, practically speaking, and why the viewpoint is more intuitive or useful or superior in any other imaginable way than manipulating predicates that represent possibly infinite sets (constraint databases) !
Tegiri Nenashi - 07 Mar 2008 19:18 GMT > FP is a specific programming toolset usually described as having features > like higher-order functions (functions accepting other functions as > arguments), Again, I'm very conservative, so let's go through your post line-by- line.
Callbacks? Again this is a rare feature, and rare features don't define language success.
> no side effects/ref transparency, No side effects it is a protection/defence feature. I suggest that it doesn't elevate the programming level.
ref transparency: to what extent java references are transparent or not?
> For ideas on how FP can be applied to data management, seehttp://citeseer.ist.psu.edu/474871.html. This (the referenced PhD thesis, to be more precise) was quite an overwhelming read. However, it seems to emphasise category theory methods, while lambda calculus is barely mentioned. The query optimization part is quite ambitious, not only he covers grouping, aggregation, query unnesting, but he even includes complex datatypes and OQL! Knowing how challenging laying down theoretical framework for optimisation of even simple select-project-join makes me highly suspicious. Query transformations are based on heuristic rules like push-select-through-project. Does he indicate that he get's these axioms for free by just leveraging category theory framework? (This is a rethoric question: no matter how abstract your framework is, some low level work streamlining the underlying algebra and identifying its axioms has to be done. )
Yagotta B. Kidding - 07 Mar 2008 20:18 GMT >> FP is a specific programming toolset usually described as having >> features like higher-order functions (functions accepting other >> functions as arguments), > > Again, I'm very conservative, so let's go through your post line-by- > line. Let's not. I cannot decide for you and no one can explain in the contentious thread whether or not the FP toolset makes sense in your application domain, moreover, I have doubts w.r.t to its applicability in many cases myself. However, despite FP's real or perceived drawbacks, it has as solid theoretical foundation as the RA, something OOP lacks. Therefore, FP appears to be, arguably and at least theoretically, the only solid alternative/complement/extra tool to the RA in the data management area as of today. So, you may want to visit the Haskell page I gave earlier and you apparently ignored, look at the examples and decide if the FP way is suitable or not for whatever you are doing ;)
> Callbacks? Again this is a rare feature, and rare features don't > define language success. I am probably wasting my time, but here are some examples of high-order functions:
1. 'map': map f1 list1 -- applying an arbitrary function to an arbitrary list (types have to match naturally)
2. generic left 'fold' used to define other functions: sum = foldl(+) 0 product = foldl (*) 1
3. anonymous function: map (2*) [1,2,3]
>> no side effects/ref transparency, > [quoted text clipped - 21 lines] > low level work streamlining the underlying algebra and identifying its > axioms has to be done. ) I am not entirely happy with some issues in the referenced work myself although at much lower level, such as set/bag representation. I have the same doubts w.r.t. Libkin/Wong bag languages. I am pretty sure that the set as a data type primitive in any FP languages is pretty much an illusion.
DBMS_Plumber - 07 Mar 2008 20:33 GMT Predictions:
1. Someone is about to re-invent functional data languages (Datalog).
2. The re-invention of Datalog will lead to a re-discovery of why functional languages (while useful in their own context) aren't really useful for scalable data management (optimization needs an algebra).
3. Everyone will agree that SQL sucks but nothing can be done about it.
V.J. Kumar - 07 Mar 2008 23:49 GMT DBMS_Plumber <paul_geoffrey_brown@yahoo.com> wrote in news:34275391-9ca2- 4305-b205-6fe063df8eb3@s12g2000prg.googlegroups.com:
> Predictions: > > 1. Someone is about to re-invent functional data languages > (Datalog). Datalog is Prolog's younger brother, and Prolog is *not* a functional language. Despite some similarities, logical programming is substantially different from FP.
> 2. The re-invention of Datalog will lead to a re-discovery of why > functional languages (while useful in their own context) aren't really > useful for scalable data management (optimization needs an algebra). See above, and try to learn the difference between LP and FP. Who knows, you may even succeed ...
> 3. Everyone will agree that SQL sucks but nothing can be done about > it. Tegiri Nenashi - 08 Mar 2008 01:00 GMT > here are some examples of high-order > functions: > > 1. 'map': map f1 list1 -- applying an arbitrary function to an arbitrary > list (types have to match naturally) map Char.toUpper "Hello World"?
big deal:
project_position&output (String(position, character) /\ toUpper(character, output) )
Admittedly, lists are not first class citizens in RM, which makes the query look awkward.
> 3. anonymous function: > map (2*) [1,2,3] ditto
> 2. generic left 'fold' used to define other functions: > sum = foldl(+) 0 > product = foldl (*) 1 Is there foundation for this fold thingy? The way that you have to specify some "initial value" doesn't look very mathematical...
Enough toy examples. How more interesting programs do look like? I googled matrix multiplication: http://www.cs.auckland.ac.nz/references/haskell/haskell-intro-html/arrays.html this looks even more awful than procedural! It is certainly no match to SQL:
select m1.index1,m2.index2, sum(m1.value*m2.value) from m1,m2 where m1.index2=m2.index1 group by m1.index1,m2.index2
Perhaps you can suggest some other example where nested loops are substituted by elegant combination of fold/map/whatever?
Yagotta B. Kidding - 08 Mar 2008 04:37 GMT >> here are some examples of high-order >> functions: [quoted text clipped - 5 lines] > > big deal: Cool !. You do not need to burden your brain with what's not a big deal !
> Enough toy examples. How more interesting programs do look like? I > googled matrix multiplication: [quoted text clipped - 5 lines] > where m1.index2=m2.index1 > group by m1.index1,m2.index2 You appafently forgot what matrix multiplication is ! Your SQL is not that ...
> Perhaps you can suggest some other example where nested loops are > substituted by elegant combination of fold/map/whatever? Google is your friend.
Dmitry A. Kazakov - 08 Mar 2008 11:10 GMT > Perhaps you can suggest some other example where nested loops are > substituted by elegant combination of fold/map/whatever? I wouldn't even ask you to express this sort of products in general terms independent on concrete objects (m1, m2), their types (matrices), the algebraic operators used for folding (+, *). Because SQL simply lacks any means for such abstractions, which any decent FP or OO language has. So let's pretend we are in early 70s:
Do matrix inversion. (I'd suggest an iterative method. Ah, the accuracy of the result shall be bounded and the bounds known, to the program, I mean.)
In order to continue fun, make a table of all square matrices and then do inverse as you "did" with sine...
 Signature Regards, Dmitry A. Kazakov http://www.dmitry-kazakov.de
-CELKO- - 08 Mar 2008 18:54 GMT >> 1. 'map': map f1 list1 -- applying an arbitrary function to an arbitrary list (types have to match naturally) map Char.toUpper "Hello World"?
big deal: <<
I think you missed the point of the map operator. Instead of UPPER(), use string concatenation. For math, use addition to get summation, division to get continued fractions, or exponentiation to get Knuth's hyper exponential operator. APL had something like this, but I cannot remember the symbol.
-CELKO- - 08 Mar 2008 19:08 GMT This is funny to me. I just wrote a book entitled "Joe Celko's Thinking in Sets: Auxiliary, Temporal, and Virtual Tables in SQL" which is about using tables in place of computational code for the working programmer.
My premise is that parallelism and fast, cheap storage (sold state devices replacing moving disk) make this approach faster and more portable than re-computing values over and over.
In real applications, the number of function values needed is (whatever "small" is this week). My favorite personal example is written up in "Data Mining on a Budget" (http://www.tdan.com/view- perspectives/5343). I like it because 1) Computational approaches were too complex to work 2) I got paid in BBQ
When the auxiliary table does not have the value we want, then we can: 1) Throw an exception and let the application handle it. 2) Take time to compute it and insert the answer back into the table. 3) Interpolate the answer, knowing that we are trading accuracy for speed. This is often fine in statistical application.
I think it is a valid approach to keep SQL programming as declarative as possible.
|
|
|