Browse thread
[Caml-list] ocaml and large development projects
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Siegfried Gonzi <siegfried.gonzi@s...> |
| Subject: | Re: [Caml-list] Reading a file |
Michal Moskal wrote:
>
>If you expand each line of megabyte file to list of characters -- it
>cannot be fast.
>
Enclosed the OCaml version in question:
'split' has been pinched up from comp.lang.functional. A year ago I had
a conversation there and someone posted this split function tailored to
my request: split "nil,2.23,3.34,nil" (-1.0) = [-1.0,2.23,3.34,-1.0]
'extractFloats' opens a file and applies split to every line and stores
the result into a list:
==
let split s c =
let rec loop start acc =
try
let next = String.index_from s start c in
let substring = String.sub s start (next-start) in
loop (next+1) (substring :: acc)
with
Not_found ->
let len = String.length s in
let substring = String.sub s start (len-start) in
List.rev (substring :: acc)
in loop 0 []
;;
let frob userval s =
match s with
| "n/a" -> userval
| "nil" -> userval
| _ -> float_of_string s
;;
let extractFloats file del nanProxy =
let rec readLoop i acc =
try
let line = input_line file in
let floatL = List.map (frob nanProxy) (split line del) in
readLoop (i+1) (floatL :: acc)
with
End_of_file ->
List.rev acc
in
readLoop 0 []
;;
let f = open_in "/home/gonzi/test.txt";;
let erg2 = extractFloats f ',' (-1.0);;
let rows = List.length erg2;;
rows;;
====
Enclosed also the Clean function. This version would be way more
readable than the Ocaml version. But I do not know how to translate it
to OCaml. My Clean function reads line after line and passes this
string-line on to RealsFromString. The latter function converts the
string-line to a char-list: [x\\x <-: string-line] and uses takeWhile,
toString, dropWhile and toReal in order to get the double numbers. As I
said the function is incredibly fast and takes for a 50MB file about 15
seconds.
Ocaml takes 8minutes. If I try to read the file line by line only
(without the conversion to double numbers) then Ocaml would take
about 1 minutes. Where is the bottleneck here? List.map or what?
I think everybody has one specific task which he tries to implement in
every programming language he encounters. My specific task is this
floating-point extraction from string-files.
I didn't play around with different OCaml solutions, because I
had to play a bit with OCaml's Psilab implementation (if you need
something like Python+Numeric+Dislin you could give Psilab a try).
If you need the whole Clean program drop me a note. By the way: my
Scheme version is clumsy and is more or less similar to the OCaml
version. I wrote this verbose Scheme (Bigloo) version a year ago when I
was a beginner of Scheme. The performance of the Scheme (Bigloo) version
is about 30 seconds for this 50MB file and is therefore similar to the
C++-template version which takes about 30 seconds.
Oh yes: do not draw to close out a comment when I write "clumsy" which
implies OCaml is clumsy too; I have the strong believing that OCaml's
exception handling mechanism is more or less better than Clean's one
because Clean does not posses such a thing as exception handling, so to
speak.
S. Gonzi
====
////////////////////////////////////////////////
// The dead as Latin functional language
// whith the most readable syntax out there
// and one of the /fastest functional languages/:
// Clean (In the meantime open(source?)
// for Linux/Unix). But as life plays:
// nobody jumps onto the Clean-bandwagon. Is this
// a pity or a bless? Why doesn't the "most"
// readable syntax plays a role in real life?
// Do not get me wrong, but why does always the
// "punctuation syntax" win in real life?
////////////////////////////////////////////////
FExtractReals:: HeaderKeys File -> [[Real]]
FExtractReals h file
| sfend file = []
# (line,nextline) = sfreadline file
= [(RealsFromString line h.del h.nan h.nanProxy) :
(FExtractReals h nextline)]
RealsFromString:: String Char String Real -> [Real]
RealsFromString line del nan nanProxy= searchDel [x\\x<-:line]
where
searchDel:: [Char] -> [Real]
searchDel [] = []
searchDel linerest
# val = toString( takeWhile notDelNl linerest )
# rest = dropWhile ((<>)del) linerest
= [toRealNaN val nan : searchDel (drop 1 rest)]
notDelNl::Char -> Bool
notDelNl x
| x==del = False
| x==' ' = False
| x=='\t' = False
| x=='\n' = False
= True
toRealNaN:: String String -> Real
toRealNaN s nan
| s==nan = nanProxy
= toReal(s)
====
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners