Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
Home
Discussion Groups
Database Servers
DB2InformixIngresMS SQLOraclePervasive.SQLPostgreSQLProgressSybase
Desktop Databases
FileMakerFoxProMS AccessParadox
General
General DB TopicsDatabase Theory
Related Topics
Java Development.NET DevelopmentVB DevelopmentMore Topics ...

Database Forum / General DB Topics / DB Theory / January 2008

Tip: Looking for answers? Try searching our database.

how to suppress carefully a recursive tree

Thread view: 
Enable EMail Alerts  Start New Thread
Thread rating: 
fj - 22 Jan 2008 11:04 GMT
I know how to suppress a normal tree but I meet the following kind of
situation :

r1
   -> b1
       -> b2 -> ...
       -> b3 -> ...
   -> b2
       -> b1 -> ...
       -> b1 -> ...
   -> b3
       -> b4

r2
   -> b3 -> ...

A same node can be referenced at several places. Each node is
associated to a storage count :

r1(0)  b1(3)  b2(2)  b3(3)  b4(1)  r2(0)

In a normal tree without recursion (in the example above, recursion
occurs because b1 contains b2 and vice versa), a node is destroyed
when its count storage is equal to zero else its count storage is
simply decremented.

What algorithm should be applied ? I want for instance to cleanup r1
but, of course, r2 must remain valid (=> b3 and b4 are not destroyed
during the process and their storage count must be b3(1) b4(1)).

Notice that the deletion of of a tree must be possible even if the
count storage of the root is not equal to zero :
  r1 -> b1 -> b2 -> r1 -> ...
Jan Hidders - 22 Jan 2008 14:52 GMT
> I know how to suppress a normal tree but I meet the following kind of
> situation :

I'm guessing that when you say "suppress" you mean "represent in a
database". Correct?

> r1
>     -> b1
[quoted text clipped - 8 lines]
> r2
>     -> b3 -> ...

So, a directed graph with multiple roots. Correct? So basically you
want to store arbitrary directed graphs.

> A same node can be referenced at several places. Each node is
> associated to a storage count :
>
> r1(0)  b1(3)  b2(2)  b3(3)  b4(1)  r2(0)

Which would correspond to the number of incoming edges, yes?

> In a normal tree without recursion (in the example above, recursion
> occurs because b1 contains b2 and vice versa), a node is destroyed
[quoted text clipped - 8 lines]
> count storage of the root is not equal to zero :
>    r1 -> b1 -> b2 -> r1 -> ...

It all depends a bit on how large your typical graphs are, how long on
average the simple paths, what type of operations and queries you want
to do on it and how often.  My first guess for the representation
would a simple straightforward adjacency list representation, (a
binary relation that contains all the edges) and if it's not too big
and your paths are often long it might be interesting to maintain an
extra table with the transitive closure of the graph.

If you go for the adjacency list approach, make sure that you do as
much as possible in one SQL statement when you start following the
edges. So look up all nodes that are reachable in one step in one
statement, update those, and store the ones that have to be deleted in
a temporary table. Then again with one statement look up those that
can be reached from in one step from the nodes in the temporary table.
Et cetera.

Does that help?

PS. Expect a shameless but informative plug by Joe Celko for one of
his books. :-)

-- Jan Hidders
fj - 22 Jan 2008 16:10 GMT
> > I know how to suppress a normal tree but I meet the following kind of
> > situation :
>
> I'm guessing that when you say "suppress" you mean "represent in a
> database". Correct?

No : I want to destroy, remove, kill ... a part of the data (a
complete tree or just a branch), but without destroying data shared by
other trees or branches.

> > r1
> >     -> b1
[quoted text clipped - 11 lines]
> So, a directed graph with multiple roots. Correct? So basically you
> want to store arbitrary directed graphs.

Yes and no. The two "roots" r1 r2  could perhaps belong to a bigger
tree.

> > A same node can be referenced at several places. Each node is
> > associated to a storage count :
>
> > r1(0)  b1(3)  b2(2)  b3(3)  b4(1)  r2(0)
>
> Which would correspond to the number of incoming edges, yes?

Yes

> > In a normal tree without recursion (in the example above, recursion
> > occurs because b1 contains b2 and vice versa), a node is destroyed
[quoted text clipped - 16 lines]
> and your paths are often long it might be interesting to maintain an
> extra table with the transitive closure of the graph.

The graph may be quite large. Number of vertices (nodes) : usually
100000, sometimes much more (a very big computation may lead to about
100 millions). This corresponds to a 3D meshing, each mesh (a
particular node) containing information about chemical composition (a
sub-node), temperature, fluid characteristics (another sub-node) ...

Let us precise that simple paths are always short when one excludes
recursive points (a maximum of 10 nodes).

> If you go for the adjacency list approach, make sure that you do as
> much as possible in one SQL statement when you start following the
[quoted text clipped - 3 lines]
> can be reached from in one step from the nodes in the temporary table.
> Et cetera.

I don't use SQL but it does not matter : I can build up easily the
list of nodes related to a particular starting point (the node "env"
in my example). I am also able to compute, for each node, its
"external count" (0 for all the nodes except b3 which has the value
1).

After that I can start to destroy really : I only go down the nodes
which have an external count equal to zero (this will protect b4 in
the example). The problem is that the algorithm is not very efficient
when the list of nodes is high, simply because I need, for each node
having a storage count greater than 0, to look for that node in the
list of all the nodes and to check the external count (CPU cost
proportional to O(n2) if n is the number of concerned nodes).

A short cut would be to store the external counts in the nodes
themselves but, unfortunately, this is (more or less) forbidden : the
data base needs to work in a // environment (openMP) and two
simultaneous calls to the deletion routine with two trees sharing data
could lead to crashing the application. Anyway, this solution will be
adopted if I don't find a better algorithm (the deletion routine will
be protected by a semaphore blocking other threads wanting to destroy
something).

I just wanted to know if a better algorithm was available. The problem
is more or less connected to garbage collector techniques. Python
language should have the same trouble when trying to destroy a
dictionary (variables of a dictionary may belong to another
dictionary).

> Does that help?
>
> PS. Expect a shameless but informative plug by Joe Celko for one of
> his books. :-)
>
> -- Jan Hidders
David BL - 23 Jan 2008 00:12 GMT
> > > I know how to suppress a normal tree but I meet the following kind of
> > > situation :
[quoted text clipped - 5 lines]
> complete tree or just a branch), but without destroying data shared by
> other trees or branches.

Would the mark and sweep algorithm suit you purposes?
fj - 23 Jan 2008 07:10 GMT
> > > > I know how to suppress a normal tree but I meet the following kind of
> > > > situation :
[quoted text clipped - 7 lines]
>
> Would the mark and sweep algorithm suit you purposes?

No

the mark and sweep algorithm needs to know all the tree roots in order
to mark all the used objects.

In my example I indicates that the node b3 belongs to the root r2
just for information to explain the storage count. But the deletion
routine I want to write just receives "r1" as argument, nothing else.
It does not know that r2 exists too ... It is just able to detect the
existence of other objects via the storage count of b3 which is a
little bit too high for being just referenced by r1 !
David Cressey - 23 Jan 2008 11:02 GMT
> > > > > I know how to suppress a normal tree but I meet the following kind of
> > > > > situation :
[quoted text clipped - 19 lines]
> existence of other objects via the storage count of b3 which is a
> little bit too high for being just referenced by r1 !

How is the tree represented?  If it's in the usual child-parent
relationship, then you will have to traverse the sub tree in order to locate
all its nodes.

If it's in the nested set representation,  then there is a simple query to
return every node in the subtree.
fj - 23 Jan 2008 14:28 GMT
> > > > > > I know how to suppress a normal tree but I meet the following kind
> of
[quoted text clipped - 24 lines]
> relationship, then you will have to traverse the sub tree in order to locate
> all its nodes.

Each parent knows its children.
It could be possible to add extra information in children so they will
know theirs parents too.

Yes, at the moment I have to traverse the sub tree in order to locate
all its nodes
David BL - 23 Jan 2008 13:32 GMT
> > > > > I know how to suppress a normal tree but I meet the following kind of
> > > > > situation :
[quoted text clipped - 19 lines]
> existence of other objects via the storage count of b3 which is a
> little bit too high for being just referenced by r1 !

For a directed edge from node n1 towards node n2, let n2 be referred
to as the "destination" node.

  n1 ---->---- n2
 source      destination

Maybe the following approach would work:

It is assumed that node r1 is the root of the "tree" to be deleted.

Step 1:  Perform a trace (in the manner of a mark phase) treating r1
as the one and only root and decrementing the storage counter on the
destination node for each edge that is traversed.  ie find the set X
of nodes that are reachable from r1.  Nodes that have already been
visited are flagged to ensure each reachable node is visited exactly
once.  For each node during the recurse, retrieve the set of outgoing
edges yielding a set of destination nodes.  Decrement each destination
node's storage counter (this must be done for all the destination
nodes - ie including the nodes that have already been visited).  Then
recurse into the destination nodes that haven't been visited before.

Step 2:  Find the subset R of X corresponding to nodes whose storage
counter hasn't fallen to zero.

Step 3:  Run a mark and sweep using roots R and sweeping over X.  For
each edge traversed in the mark phase, increment the storage counter
on the destination node.

In your example I think step 1 will find X = {r1,b1,b2,b3,b4} and only
b3 will end up with a nonzero storage counter, so step 2 yields R =
{b3}.  Then in step 3, a trace from roots {b3} will bump the storage
counter of b4 back up to 1 and only {r1,b1,b2} will be deleted.

Am I making sense?
fj - 23 Jan 2008 15:35 GMT
> > > > > > I know how to suppress a normal tree but I meet the following kind of
> > > > > > situation :
[quoted text clipped - 54 lines]
>
> Am I making sense?

Yes.

I was arrived to a very similar solution as I explained in my second
message. Nevertheless, several steps are expensive, especially if I
suppose forbidden to add new pieces of information in the nodes
themselves : only the creation a new local tables of linked lists is
authorized. For instance :

- in the step 1, flagging a visited node to avoid recursion might be
expensive. At the moment, I create a linked list of nodes in
traversing the sub-tree. So when I visit a node then I start by
looking for it in the linked list. This simple research is expensive
(cost in O(n2) if n is the number of nodes in the subtree). If the
node is not found, then I add a new term in my linked list which
contains the node index and its storage count minus 1 (at the end,
this count will be the external count). If I find it then I just
decrement the external count in the found term.

- the step 2 is OK (cost in O(n))

- the step 3 may be expensive too (all depends on the number of nodes
with an external count greater that 0).

In my solution, the step 3 was replaced by traversing again the tree
from r1 and going down a node only if its external count is equal to
zero. During this step, the true storage count of nodes is decremented
and nodes with a count equal to zero are destroyed. But this solution
has again a drawback : looking for the current visited node in the
linked list in order to get its external count. A possible improvement
was to create a new field in each node used to store the external
count there rather than in a parallel linked list.

Another solution might be to create an array which size is the total
number of nodes. This array will replace the linked list. The
advantage is clear : looking for the external count in this array is
immediate, the node index corresponding to the array index. The
drawback is the size of the array, the total number of nodes being
potentially huge ( > 1 million) whereas the number of nodes in a
deleted branch is generally several orders of magnitude lower (but
often greater than 1000).

An intermediate solution consists in using an extensible hash
table ...

In any case, I would like a solution in O(n) else such algorithm is
not tractable.
fj - 23 Jan 2008 16:24 GMT
> > > > > > > I know how to suppress a normal tree but I meet the following kind of
> > > > > > > situation :
[quoted text clipped - 95 lines]
> deleted branch is generally several orders of magnitude lower (but
> often greater than 1000).

Oups ! I think that the main drawback is the fact that this solution
does not work correctly !
So the step 3 of David is much better !

After looking for GC on internet, I found the Python GC uses a
technique similar to the David's solution

> An intermediate solution consists in using an extensible hash
> table ...
>
> In any case, I would like a solution in O(n) else such algorithm is
> not tractable.
David BL - 24 Jan 2008 00:43 GMT
> > > > > > > > I know how to suppress a normal tree but I meet the following kind of
> > > > > > > > situation :
[quoted text clipped - 102 lines]
> After looking for GC on internet, I found the Python GC uses a
> technique similar to the David's solution.

Interesting.  I had thought Python only used basic reference counting.

> > An intermediate solution consists in using an extensible hash
> > table ...
>
> > In any case, I would like a solution in O(n) else such algorithm is
> > not tractable.

The solution I suggested is O(n) where n = |X| if all of the following
are O(1)
   *    Retrieving the set of outgoing edges for a given node
   *    Retrieving the destination node for a given edge
   *    Reading/writing the visited flag for a given node
   *    Reading/writing the storage count for a given node

I would avoid linked lists.  Assuming hash lookup is O(1), I think my
suggested solution can be O(n) despite constraints on what information
may be stored in a Node.
fj - 24 Jan 2008 11:49 GMT
> > > > > > > > > I know how to suppress a normal tree but I meet the following kind of
> > > > > > > > > situation :
[quoted text clipped - 121 lines]
> suggested solution can be O(n) despite constraints on what information
> may be stored in a Node.

How could you warrant that the step 3 is O[n] ? If the number of roots
is high, this is not obvious because each mark and sweep algorithm is
potentialy O(n).

Or except perhaps if one avoids to visit other roots when dealing with
a particular root  ... Yes this could be O(n) finally.
David BL - 24 Jan 2008 15:38 GMT
> > > > > > > > > > I know how to suppress a normal tree but I meet the following kind of
> > > > > > > > > > situation :
[quoted text clipped - 128 lines]
> Or except perhaps if one avoids to visit other roots when dealing with
> a particular root  ... Yes this could be O(n) finally.

Step 3 only involves a *single* mark and sweep.  It begins by flagging
all the roots as visited and pushing them on a stack.   The mark phase
involves popping a node from the stack and pushing adjacent nodes that
haven't yet been visited, until the stack is empty.   Note that it is
best to mark nodes as visited as they are pushed onto the stack
(rather than when they're popped).  So at all times the stack only
contains nodes that have been marked as visited.   This makes it clear
that the algorithm will never process a node more than once.

Step 3 will typically be faster than step 1 even though it may start
with multiple roots.  The time taken by a trace has more to do with
the total number of visited nodes than the initial number of roots,
and step 3 always visits a subset of the nodes that were visited in
step 1.
fj - 25 Jan 2008 03:37 GMT
> Step 3 only involves a *single* mark and sweep.  It begins by flagging
> all the roots as visited and pushing them on a stack.   The mark phase
[quoted text clipped - 10 lines]
> and step 3 always visits a subset of the nodes that were visited in
> step 1.

OK you are right. I did not realize that this algorithm was not fully
recursive because one visit only non marked objects.

I have programmed it and it seems to work correctly. But I think that
I will adopt the Python strategy :
- for deleting a tree or a branch, I just decrement the reference
count of its root first and I delete it really (and sub-objects
recursively) if the count is equal to zero. This can be done
everywhere in my application but lost objects (those with recursion)
may remain undeleted.
- from time to time, in a single thread part of the application, I can
decide to call a garbage collector which examines all the container
objects. This is possible because I always keep information (pointer)
about all the created objects. The step 1 is even simplified because I
don't need to go down the containers recursively : I know all the
containers and  I just have to examine container sub-objects at the
first level. At the end of this step I know all the root containers
(those which have been protected somewhere in increasing their
reference count without storing them in a container). Then I start
steps 2 and 3.

I have programmed this strategy and it seem to work ... on two simple
examples with few nodes, from 6 (the example proposed) to 15000 (a
short test case). I have now to test it in real conditions (in general
about 1 million of containers for 5 to 10 millions of objects for a
big test case) and measure performances.

Thank you very much for your help !

F. Jacq
Jan Hidders - 23 Jan 2008 12:40 GMT
> > > I know how to suppress a normal tree but I meet the following kind of
> > > situation :
[quoted text clipped - 3 lines]
>
> No : I want to destroy, remove, kill ...

Try "delete", that's a better translation of "supprimer". To suppress
is more like "reprimer", "bannir" or "inhiber".

> a part of the data (a
> complete tree or just a branch), but without destroying data shared by
> other trees or branches.

Ok, understood.

> > > r1
> > >     -> b1
[quoted text clipped - 63 lines]
>
> I don't use SQL but it does not matter :

It might. Outside the context of an SQL database my advice might
actually be counterproductive.

> I can build up easily the
> list of nodes related to a particular starting point (the node "env"
[quoted text clipped - 9 lines]
> list of all the nodes and to check the external count (CPU cost
> proportional to O(n2) if n is the number of concerned nodes).

You might consider defining an index over the set of edges that allows
a quicker look-up on the basis of the begin node. Of course there is a
price to pay for that because updates get more expensive.

-- Jan Hidders
fj - 23 Jan 2008 14:34 GMT
> > > > I know how to suppress a normal tree but I meet the following kind of
> > > > situation :
[quoted text clipped - 6 lines]
> Try "delete", that's a better translation of "supprimer". To suppress
> is more like "reprimer", "bannir" or "inhiber".

Thank you for the English lesson :-)

> > a part of the data (a
> > complete tree or just a branch), but without destroying data shared by
[quoted text clipped - 90 lines]
> a quicker look-up on the basis of the begin node. Of course there is a
> price to pay for that because updates get more expensive.

I don't see very well the goal of that index.  Could you give me
additional information please ?

> -- Jan Hidders
-CELKO- - 26 Jan 2008 08:11 GMT
Get a copy of TREES & HIERARCHIES IN SQL and look up the Nested Sets
model.   You can Google it for the basic idea.

You would put the nodes into one table and the structure into a second
table with (lft, rgt) as its key, then reference the node for that
position.

Jan, sorry to be late, but I was out of town this week :)
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2009 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.