Version française
Home     About     Download     Resources     Contact us    
Browse thread
best and fastest way to read lines from a file?
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: YC <yinso.chen@g...>
Subject: best and fastest way to read lines from a file?
Hi all -

Newbie question: I'm wondering what's the most efficient way to read in a
file line by line?  I wrote a routine in both python and ocaml to read in a
file with 345K lines to do line count and was surprised that python's code
run roughly 3x faster.

I thought the speed should be equivalent and/or somewhat in ocaml favor,
given this is an IO-bound comparison, but perhaps Python's simplistic for
loop have a read-ahead buffer built-in, and perhaps ocaml's input channel is
unbuffered, but I'm not sure how to write a buffered code that can do a line
by line read-in.

Any insight is appreciated, thanks ;)

yc

Python code:
# test.py
#!/usr/bin/python

file = <345k-line.txt>
count = 0
for line in open (file, "r"):
    count = count + 1
print "Done: ", count

OCaml code:
(* test.ml *)
let rec line_count filename =
  let f = open_in filename in
  let rec loop file count =
    try
      ignore (input_line file);
      loop file (count + 1)
    with
      End_of_file -> count
  in
    loop f 0;;

let count = line_count <345k-line.txt> in
    Printf.printf "Done: %d" count;;

Test
$ time ./test.py
Done: 345001

real    0m0.416s
user   0m0.101s
sys    0m0.247s

$ ocamlopt -o test test.ml
$ time ./test
Done: 345001
real    0m1.483s
user   0m0.631s
sys    0m0.685s