Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Reading a large text file
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Brian Hurt <bhurt@s...>
Subject: Re: [Caml-list] Reading a large text file

I'm still digging through a serious backlog of email, so your question may 
have already been answered.  If so, ignore this.

But what I'm guessing is happening is that you are appending (adding to 
the end of) your list, and that this is what is killing you.  To add an 
element to the *end* of a list, Ocaml has to completely reallocate the 
entire list- turning what you might think is an O(1) operation into an 
O(n) operation.

The solution to this is to build the list backwards- instead of adding 
things to the end of the list, add them to the begining.  Then, when 
you're done, just reverse the list using List.rev.

Lets look at the example of just reading the lines from a file and making 
a list of them.  This code is bad:

let readfile chan =
    let rec loop lst =
        try
            loop (lst @ [ (input_line chan) ])
        with
            | End_of_file -> lst
    in
    loop []
;;

It works, but to read n lines requires O(n^2) work, because the @ has to
reallocate the entire list every iteration.  Worse yet it isn't tail
recursive (a recursive call inside a try/catch isn't a tail call).

A better implementation of this would be:
let readfile chan =
    let rec loop rlst =
       let line, eof = 
           try
               (input_line chan), false
           with
               | End_of_file -> "", true
       in
       if not eof then
           loop (line :: rlst)
       else
           List.rev rlst
   in
   loop []
;;

Now, the first thing to notice is that we're using the :: operator (which
is O(1)), not the @ operator (which is O(n))- we're prepending things onto
the list, not appending them.  We're building up the list backwards, and
then, when we hit the end of the file, reversing the entire list.  This is
a standard pattern in Ocaml.

The second thing to notice is that the recursive call has been hoisted up 
out of the try/catch block.  We've introduced a new boolean flag to note 
when we've hit the end of the file- catching the exception simply sets the 
flag to true.  So now our function is tail recursive, and operates in 
constant stack space.

Brian

On Wed, 28 Apr 2004, André Luiz Moura wrote:

> Hi List, 
>  
> I wrote an OCaml function to read a text file with a million of lines with  five separate columns for spaces. I based myself on previous messages of Caml-list, but however the work is taking time frightful (many minutes). 
> This function written in C, for example, does not take more than 4 seconds. 
>  
> I am executing the program in a PC Pentium III of 128 MB of RAM under Windows platform. How would be implemented the function to decide this problem?
> 
> File format:
> <vertex #> <x> <y> [attributes] [boundary marker] 
> 
> Thanks, 
> 
> André
> 
> 
> 
> 
> 
> 
> ---------------------------------
> Yahoo! Messenger - Fale com seus amigos online. Instale agora!

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners