Browse thread
[Caml-list] Reading a large text file
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Eric Stokes <eric.stokes@c...> |
| Subject: | Re: [Caml-list] Reading a large text file |
If you don't care about line breaks you might try Buffer.add_channel. This should perform close to O(1) for the append operations, and it will bypass the call to input_line, which I have found to be an order of magnitude slower than the read system call. You can always break the lines up later. On May 1, 2004, at 7:03 AM, Brian Hurt wrote: > > I'm still digging through a serious backlog of email, so your question > may > have already been answered. If so, ignore this. > > But what I'm guessing is happening is that you are appending (adding to > the end of) your list, and that this is what is killing you. To add an > element to the *end* of a list, Ocaml has to completely reallocate the > entire list- turning what you might think is an O(1) operation into an > O(n) operation. > > The solution to this is to build the list backwards- instead of adding > things to the end of the list, add them to the begining. Then, when > you're done, just reverse the list using List.rev. > > Lets look at the example of just reading the lines from a file and > making > a list of them. This code is bad: > > let readfile chan = > let rec loop lst = > try > loop (lst @ [ (input_line chan) ]) > with > | End_of_file -> lst > in > loop [] > ;; > > It works, but to read n lines requires O(n^2) work, because the @ has > to > reallocate the entire list every iteration. Worse yet it isn't tail > recursive (a recursive call inside a try/catch isn't a tail call). > > A better implementation of this would be: > let readfile chan = > let rec loop rlst = > let line, eof = > try > (input_line chan), false > with > | End_of_file -> "", true > in > if not eof then > loop (line :: rlst) > else > List.rev rlst > in > loop [] > ;; > > Now, the first thing to notice is that we're using the :: operator > (which > is O(1)), not the @ operator (which is O(n))- we're prepending things > onto > the list, not appending them. We're building up the list backwards, > and > then, when we hit the end of the file, reversing the entire list. > This is > a standard pattern in Ocaml. > > The second thing to notice is that the recursive call has been hoisted > up > out of the try/catch block. We've introduced a new boolean flag to > note > when we've hit the end of the file- catching the exception simply sets > the > flag to true. So now our function is tail recursive, and operates in > constant stack space. > > Brian > > On Wed, 28 Apr 2004, André Luiz Moura wrote: > >> Hi List, >> >> I wrote an OCaml function to read a text file with a million of lines >> with five separate columns for spaces. I based myself on previous >> messages of Caml-list, but however the work is taking time frightful >> (many minutes). >> This function written in C, for example, does not take more than 4 >> seconds. >> >> I am executing the program in a PC Pentium III of 128 MB of RAM under >> Windows platform. How would be implemented the function to decide >> this problem? >> >> File format: >> <vertex #> <x> <y> [attributes] [boundary marker] >> >> Thanks, >> >> André >> >> >> >> >> >> >> --------------------------------- >> Yahoo! Messenger - Fale com seus amigos online. Instale agora! > > -- > "Usenet is like a herd of performing elephants with diarrhea -- > massive, > difficult to redirect, awe-inspiring, entertaining, and a source of > mind-boggling amounts of excrement when you least expect it." > - Gene Spafford > Brian > > ------------------- > To unsubscribe, mail caml-list-request@inria.fr Archives: > http://caml.inria.fr > Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: > http://caml.inria.fr/FAQ/ > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners