Browse thread
[Caml-list] stack overflow
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Yang Shouxun <yangsx@f...> |
| Subject: | Re: [Caml-list] stack overflow |
On Wednesday 09 April 2003 16:14, Markus Mottl wrote: > On Wed, 09 Apr 2003, Yang Shouxun wrote: > > Yes, the decision tree building function is not tail recursive. I heared > > people saying C4.5 (in C) also has stack overflow problem when the > > training dataset becomes very large. > > I can't imagine that this is the problem: either the data is > well-distributed, in which case the stack size will grow roughly > logarithmically with the size of the data due to partitioning. And if not, > the maximum depth of the tree is limited by the number of available input > variables anyway. You'd need many, many thousands of those before this > becomes a problem, which even large, industrial datasets that I know do > not exceed. My training data contain statistical values for word combinations (or collocations) extracted from a corpus. The number is indeed very large. > > I don't know how to write a tail recursive version to build trees. > > If there are not that many continuous attributes and the dataset is > > not so large, the tree stops growing before stack overflow. > > The trick is to use continuation passing style (CPS): you pass a function > closure (continuation) containing everything that's needed in subsequent > computations. Instead of returning a result, the sub-function calls the > continuation with the result, which makes the functions tail-recursive. I've learned this style in Scheme. Yet I feel paralyzed when trying to write in it to build trees. The type declaration may make my point clearer. --8<-- type dtree = Dnode of dnode | Dtree of (dnode * int * dtree list) --8<-- The problems are that unless the next call returns, the tree is not complete yet and it may have several calls on itself. > But anyway, I think there must be some fishy operation going on. Why not > use the debugger to find out? Or even better: send a link to the code :-) I suppose the program is not buggy so far as it works as expected. It's buggy in the design itself: it must recurse (so far as I can implement) and it cannot afford recurse too deeply. Sorry, I don't have a homepage. > > Can one know the maximal number of calls before it overflow the stack? > > It depends: byte-code uses its own stack, which you can query using the > Gc-module. Otherwise, for native-code, call the ulimit-program (Unix), > which displays resource limits including stack usage or interface to > the system call "getrlimit". I'm running Debian unstale. I checked just now on my laptop and "ulimit -s" reurned "unlimited". I suppose the desktop that actually ran the program was similarly configured. > In any case, it would be interesting to see your code. Are you going to > release it under some free license? Yes, I'm going to release it under GPL. As you can see, I basically use free software and am willing to pay it back. I intend to register a project for it on savannah soon. Be warned that my code may look rather ugly. I also downloaded your AIFAD and had a cursive look at it. I found it does not handle continuous attributes yet and your design goal is quite different from mine. So I wrote mine from scratch and called it DTLR (Decision Tree Learner for Retrieval). If you are interested, I can send a copy to you tomorrow. It does not implement all the features I planned, without documentation except some comments, but it is enough for my own needs right now. ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners