Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] stack overflow
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Yang Shouxun <yangsx@f...>
Subject: Re: [Caml-list] stack overflow
On Wednesday 09 April 2003 16:14, Markus Mottl wrote:
> On Wed, 09 Apr 2003, Yang Shouxun wrote:
> > Yes, the decision tree building function is not tail recursive. I heared
> > people saying C4.5 (in C) also has stack overflow problem when the
> > training dataset becomes very large.
>
> I can't imagine that this is the problem: either the data is
> well-distributed, in which case the stack size will grow roughly
> logarithmically with the size of the data due to partitioning. And if not,
> the maximum depth of the tree is limited by the number of available input
> variables anyway. You'd need many, many thousands of those before this
> becomes a problem, which even large, industrial datasets that I know do
> not exceed.

My training data contain statistical values for word combinations (or 
collocations) extracted from a corpus. The number is indeed very large.

> > I don't know how to write a tail recursive version to build trees.
> > If there are not that many continuous attributes and the dataset is
> > not so large, the tree stops growing before stack overflow.
>
> The trick is to use continuation passing style (CPS): you pass a function
> closure (continuation) containing everything that's needed in subsequent
> computations.  Instead of returning a result, the sub-function calls the
> continuation with the result, which makes the functions tail-recursive.

I've learned this style in Scheme. Yet I feel paralyzed when trying to write 
in it to build trees. The type declaration may make my point clearer.
--8<--
type  dtree = Dnode of dnode | Dtree of (dnode * int * dtree list)
--8<--
The problems are that unless the next call returns, the tree is not complete 
yet and it may have several calls on itself.

> But anyway, I think there must be some fishy operation going on. Why not
> use the debugger to find out? Or even better: send a link to the code :-)

I suppose the program is not buggy so far as it works as expected.  It's buggy 
in the design itself: it must recurse (so far as I can implement) and it 
cannot afford recurse too deeply. Sorry, I don't have a homepage.

> > Can one know the maximal number of calls before it overflow the stack?
>
> It depends: byte-code uses its own stack, which you can query using the
> Gc-module. Otherwise, for native-code, call the ulimit-program (Unix),
> which displays resource limits including stack usage or interface to
> the system call "getrlimit".

I'm running Debian unstale. I checked just now on my laptop and "ulimit -s" 
reurned "unlimited". I suppose the desktop that actually ran the program was 
similarly configured.

> In any case, it would be interesting to see your code. Are you going to
> release it under some free license?

Yes, I'm going to release it under GPL. As you can see, I basically use free 
software and am willing to pay it back. I intend to register a project for it 
on savannah soon. Be warned that my code may look rather ugly.

I also downloaded your AIFAD and had a cursive look at it. I found it does not 
handle continuous attributes yet and your design goal is quite different from 
mine. So I wrote mine from scratch and called it DTLR (Decision Tree Learner 
for Retrieval).

If you are interested, I can send a copy to you tomorrow. It does not 
implement all the features I planned, without documentation except some 
comments, but it is enough for my own needs right now.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners