English version
Accueil     Ŕ propos     Téléchargement     Ressources     Contactez-nous    

Ce site est rarement mis ŕ jour. Pour les informations les plus récentes, rendez-vous sur le nouveau site OCaml ŕ l'adresse ocaml.org.

Browse thread
[Caml-list] ocaml and large development projects
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2003-05-21 (07:23)
From: Siegfried Gonzi <siegfried.gonzi@s...>
Subject: Re: [Caml-list] Reading a file
Michal Moskal wrote:

>If you expand each line of megabyte file to list of characters -- it
>cannot be fast.
Enclosed the OCaml version in question:

'split' has been pinched up from comp.lang.functional. A year ago I had
a conversation there and someone posted this split function tailored to
my request: split "nil,2.23,3.34,nil" (-1.0) = [-1.0,2.23,3.34,-1.0]

'extractFloats' opens a file and applies split to every line and stores
the result into a list:

let split s c =
    let rec loop start acc =
  	  let next = String.index_from s start c in
  	  let substring = String.sub s start (next-start) in
  	  loop (next+1) (substring :: acc)
  	  Not_found ->
		let len = String.length s in
  		let substring = String.sub s start (len-start) in
  		List.rev (substring :: acc)
     in loop 0 []

let frob userval s =
    match s with
    | "n/a" -> userval
    | "nil" -> userval
    | _ -> float_of_string s

let extractFloats file del nanProxy =
  let rec readLoop i acc =
      let line = input_line file in
      let floatL = List.map (frob nanProxy) (split line del) in
      readLoop  (i+1) (floatL :: acc)
      End_of_file ->
	List.rev acc
    readLoop 0 []

let f = open_in "/home/gonzi/test.txt";;
let erg2 = extractFloats f ',' (-1.0);;
let rows = List.length erg2;;

Enclosed also the Clean function. This version would be way more
readable than the Ocaml version. But I do not know how to translate it
to OCaml. My Clean function reads line after line and passes this
string-line on to RealsFromString. The latter function converts the
string-line to a char-list: [x\\x <-: string-line] and uses takeWhile,
toString, dropWhile and toReal in order to get the double numbers. As I
said the function is incredibly fast and takes for a 50MB file about 15

Ocaml takes 8minutes. If I try to read the file line by line only
(without the conversion to double numbers) then Ocaml would take
 about 1 minutes. Where is the bottleneck here? List.map or what?

I think everybody has one specific task which he tries to implement in
every programming language he encounters. My specific task is this
floating-point extraction from string-files.

I didn't play around with different OCaml solutions, because I
had to play a bit with OCaml's Psilab implementation (if you need
something like Python+Numeric+Dislin you could give Psilab a try).

If you need the whole Clean program drop me a note. By the way: my
Scheme version is clumsy and is more or less similar to the OCaml
version. I wrote this verbose Scheme (Bigloo) version a year ago when I
was a beginner of Scheme. The performance of the Scheme (Bigloo) version
is about 30 seconds for this 50MB file and is therefore similar to the
C++-template version which takes about 30 seconds.

Oh yes: do not draw to close out a comment when I write "clumsy" which
implies OCaml is clumsy too; I have the strong believing that OCaml's
exception handling mechanism is more or less better than Clean's one
because Clean does not posses such a thing as exception handling, so to

S. Gonzi

// The dead as Latin functional language
// whith the most readable syntax out there
// and one of the /fastest functional languages/:
// Clean (In the meantime open(source?)
// for Linux/Unix). But as life plays:
// nobody jumps onto the Clean-bandwagon. Is this
// a pity or a bless? Why doesn't the "most"
// readable syntax plays a role in real life?
// Do not get me wrong, but why does always the
// "punctuation syntax" win in real life?
FExtractReals:: HeaderKeys File -> [[Real]]
FExtractReals h file
		  | sfend file = []
		  # (line,nextline) = sfreadline file
		  = [(RealsFromString line h.del h.nan h.nanProxy) :
		     (FExtractReals h nextline)]

RealsFromString:: String Char String Real -> [Real]
RealsFromString line del nan nanProxy= searchDel [x\\x<-:line]
	searchDel:: [Char] -> [Real]
	searchDel [] = []
	searchDel linerest
		  # val = toString( takeWhile notDelNl linerest )
		  # rest = dropWhile ((<>)del) linerest
		  = [toRealNaN val nan : searchDel (drop 1 rest)]
	notDelNl::Char -> Bool
	notDelNl x
	       | x==del = False
	       | x==' ' = False
	       | x=='\t' = False
	       | x=='\n' = False
	       = True
	toRealNaN:: String String -> Real
	toRealNaN s nan
		  | s==nan = nanProxy
		  = toReal(s)

To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners