Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Reading a large text file
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Eric Stokes <eric.stokes@c...>
Subject: Re: [Caml-list] Reading a large text file
If you don't care about line breaks you might try Buffer.add_channel. 
This should perform
close to O(1) for the append operations, and it will bypass the call to 
input_line, which I have
found to be an order of magnitude slower than the read system call. You 
can always break
the lines up later.

On May 1, 2004, at 7:03 AM, Brian Hurt wrote:

>
> I'm still digging through a serious backlog of email, so your question 
> may
> have already been answered.  If so, ignore this.
>
> But what I'm guessing is happening is that you are appending (adding to
> the end of) your list, and that this is what is killing you.  To add an
> element to the *end* of a list, Ocaml has to completely reallocate the
> entire list- turning what you might think is an O(1) operation into an
> O(n) operation.
>
> The solution to this is to build the list backwards- instead of adding
> things to the end of the list, add them to the begining.  Then, when
> you're done, just reverse the list using List.rev.
>
> Lets look at the example of just reading the lines from a file and 
> making
> a list of them.  This code is bad:
>
> let readfile chan =
>     let rec loop lst =
>         try
>             loop (lst @ [ (input_line chan) ])
>         with
>             | End_of_file -> lst
>     in
>     loop []
> ;;
>
> It works, but to read n lines requires O(n^2) work, because the @ has 
> to
> reallocate the entire list every iteration.  Worse yet it isn't tail
> recursive (a recursive call inside a try/catch isn't a tail call).
>
> A better implementation of this would be:
> let readfile chan =
>     let rec loop rlst =
>        let line, eof =
>            try
>                (input_line chan), false
>            with
>                | End_of_file -> "", true
>        in
>        if not eof then
>            loop (line :: rlst)
>        else
>            List.rev rlst
>    in
>    loop []
> ;;
>
> Now, the first thing to notice is that we're using the :: operator 
> (which
> is O(1)), not the @ operator (which is O(n))- we're prepending things 
> onto
> the list, not appending them.  We're building up the list backwards, 
> and
> then, when we hit the end of the file, reversing the entire list.  
> This is
> a standard pattern in Ocaml.
>
> The second thing to notice is that the recursive call has been hoisted 
> up
> out of the try/catch block.  We've introduced a new boolean flag to 
> note
> when we've hit the end of the file- catching the exception simply sets 
> the
> flag to true.  So now our function is tail recursive, and operates in
> constant stack space.
>
> Brian
>
> On Wed, 28 Apr 2004, André Luiz Moura wrote:
>
>> Hi List,
>>
>> I wrote an OCaml function to read a text file with a million of lines 
>> with  five separate columns for spaces. I based myself on previous 
>> messages of Caml-list, but however the work is taking time frightful 
>> (many minutes).
>> This function written in C, for example, does not take more than 4 
>> seconds.
>>
>> I am executing the program in a PC Pentium III of 128 MB of RAM under 
>> Windows platform. How would be implemented the function to decide 
>> this problem?
>>
>> File format:
>> <vertex #> <x> <y> [attributes] [boundary marker]
>>
>> Thanks,
>>
>> André
>>
>>
>>
>>
>>
>>
>> ---------------------------------
>> Yahoo! Messenger - Fale com seus amigos online. Instale agora!
>
> -- 
> "Usenet is like a herd of performing elephants with diarrhea -- 
> massive,
> difficult to redirect, awe-inspiring, entertaining, and a source of
> mind-boggling amounts of excrement when you least expect it."
>                                 - Gene Spafford
> Brian
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: 
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: 
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners