Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] Bug somewhere...
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Pierre Weis <pierre.weis@i...>
Subject: Re: [Caml-list] Bug somewhere...
> Alessandro Baretta wrote:
> > It's either on my brain or in the Scanf module, the former possibility 
> > being definitely more likely.
> > 
> > I have written a very simple program to compute md5 checksums of a codes 
> > taken from a text file. Here it is:
> > 
> > let scan_line () = Scanf.scanf "%[^\n\r]\n" (fun a -> a)
> > let digest s = String.uppercase
> >   (Digest.to_hex(Digest.string s))
> > let digest_line s = print_endline (s ^ "#" ^ (digest s))
> > let _ = try while true do digest_line (scan_line ()) done
> >   with End_of_file -> ()
> I have rewritten my program in ocamllex. This one works. 
> Here it is.
> {
> }
> rule scanline = parse
> | [^'\n''\r']*  {Lexing.lexeme lexbuf}
> | ['\n''\r']*   {scanline lexbuf    }
> | eof           {raise End_of_file}
> {
> let lexbuf = Lexing.from_channel stdin in
> let digest s = String.uppercase
>    (Digest.to_hex (Digest.string s)) in
> let digest_line s = print_endline (s ^ "#" ^ (digest s)) in
>    try while true do digest_line (scanline lexbuf) done
>    with End_of_file -> ()
> }
> > Seems very reasonable...
> What's wrong with the Scanf version?
> Alex

A lot of problems in here: some are due to the semantics of the Scanf
module some are due to the implementation, some are even deeper than
those two!

Indeed the two programs are not equivalent (and their behaviour are
indeed different!).

The first reason is that you cannot match eof (as you did with your
lexer) using Scanf. This could be considered as a missing feature and
we may add a convention to match end of file (either ``@.'', ``@$'',
or ``$'' ?).

Second, your lexer uses an explicitely allocated buffer lexbuf, while
the scanf corresponding call allocates a new input buffer for each
invocation; but the semantics of Scanf imposes a look ahead of 1
character to check that no other \n follows the \n that ends your
pattern (the semantics of \n being to match 0 or more \n, space, tab,
or return). For each line Scanf reads an extra character after the end
of line; it stores this character (wihch is a '(' by the way) in the
input buffer; but note that the character has been read from the
in_channel; now the next scanf invocation will allocate a new input
buffer that reads from stdin starting after the last character read by
the preceding invocation (the '(' looahead character). Hence you
see that a '(' is missing at the beginning of each line after the
first one!

To solve this problem, you should use bscanf and an explicitely
allocated input buffer that would survive from one call to scanf to
the next one. Considering that this phenomenon is general concerning
stdin and scanf, I rewrote the scanf code such that it allocates a
buffer once and for all. Hence this problem is solved in the working

In the mean time explicitely allocating an input buffer would solve
this problem for you:

let lexbuf = Scanf.Scanning.from_channel stdin
let scan_line () = Scanf.bscanf lexbuf "%[^\n\r]\n" (fun a -> a)
let digest s = String.uppercase
  (Digest.to_hex(Digest.string s))
let digest_line s = print_endline (s ^ "#" ^ (digest s))
let _ = try while true do digest_line (scan_line ()) done
   with End_of_file -> ()

Another semantical question is: should the call

sscanf "" "%[^\n\r]\n" (fun x -> x)

be successful or not ? If yes, what happens to your problem ?

An interesting example indeed that helps precising the semantics of
Scanf patterns and functions, thank you very much!

Pierre Weis

INRIA, Projet Cristal,,

To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: