This site is updated infrequently. For up-to-date information, please visit the new OCaml website at ocaml.org.

[Caml-list] Reading a large text file
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
 Date: 2004-05-01 (13:58) From: Brian Hurt Subject: Re: [Caml-list] Reading a large text file
```
I'm still digging through a serious backlog of email, so your question may

But what I'm guessing is happening is that you are appending (adding to
the end of) your list, and that this is what is killing you.  To add an
element to the *end* of a list, Ocaml has to completely reallocate the
entire list- turning what you might think is an O(1) operation into an
O(n) operation.

The solution to this is to build the list backwards- instead of adding
things to the end of the list, add them to the begining.  Then, when
you're done, just reverse the list using List.rev.

Lets look at the example of just reading the lines from a file and making
a list of them.  This code is bad:

let rec loop lst =
try
loop (lst @ [ (input_line chan) ])
with
| End_of_file -> lst
in
loop []
;;

It works, but to read n lines requires O(n^2) work, because the @ has to
reallocate the entire list every iteration.  Worse yet it isn't tail
recursive (a recursive call inside a try/catch isn't a tail call).

A better implementation of this would be:
let rec loop rlst =
let line, eof =
try
(input_line chan), false
with
| End_of_file -> "", true
in
if not eof then
loop (line :: rlst)
else
List.rev rlst
in
loop []
;;

Now, the first thing to notice is that we're using the :: operator (which
is O(1)), not the @ operator (which is O(n))- we're prepending things onto
the list, not appending them.  We're building up the list backwards, and
then, when we hit the end of the file, reversing the entire list.  This is
a standard pattern in Ocaml.

The second thing to notice is that the recursive call has been hoisted up
out of the try/catch block.  We've introduced a new boolean flag to note
when we've hit the end of the file- catching the exception simply sets the
flag to true.  So now our function is tail recursive, and operates in
constant stack space.

Brian

On Wed, 28 Apr 2004, André Luiz Moura wrote:

> Hi List,
>
> I wrote an OCaml function to read a text file with a million of lines with  five separate columns for spaces. I based myself on previous messages of Caml-list, but however the work is taking time frightful (many minutes).
> This function written in C, for example, does not take more than 4 seconds.
>
> I am executing the program in a PC Pentium III of 128 MB of RAM under Windows platform. How would be implemented the function to decide this problem?
>
> File format:
> <vertex #> <x> <y> [attributes] [boundary marker]
>
> Thanks,
>
> André
>
>
>
>
>
>
> ---------------------------------
> Yahoo! Messenger - Fale com seus amigos online. Instale agora!

--
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
- Gene Spafford
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

```