Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
[Caml-list] Bug somewhere...
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Alessandro Baretta <alex@b...>
Subject: Re: [Caml-list] Bug somewhere...

Pierre Weis wrote:
> A lot of problems in here: some are due to the semantics of the Scanf
> module some are due to the implementation, some are even deeper than
> those two!
> Indeed the two programs are not equivalent (and their behaviour are
> indeed different!).

They are meant to be equivalent under the following 
assumption: the input file is divided in lines which are 
terminated by either '\n' or '\r'. The difference is mostly 
due to the fact that Scanf 3.06 reads an extra character 
with respect to the specified format string. Any other 
differences are attributable to faulty connections in my brain.

> The first reason is that you cannot match eof (as you did with your
> lexer) using Scanf. This could be considered as a missing feature and
> we may add a convention to match end of file (either ``@.'', ``@$'',
> or ``$'' ?).

I can live with this. What Scanf *really lacks* is a 
C-equivalent support for partial matches. If a C-format 
matches only partially, only the conversions specified in 
the matched prefix are performed. In O'Caml, Scanf throws an 
exception. A better solution would be for Scanf.scanf to 
have type :
('a, Scanning.scanbuf, 'b) format -> 'a option -> 'b
If a conversion is performed then the callback function is 
passed Some(<result>); otherwise, in a partial match f gets 
a number of None actual parameters from scanf.

This approach would make Scanf much more useful. We would be 
  able to explicitly code simple parsers in Ocaml logic and 
Scanf formats, when, at present, we would be forced to go 
with Ocamllex/yacc. Take my case, for example.

> Second, your lexer uses an explicitely allocated buffer lexbuf, while
> the scanf corresponding call allocates a new input buffer for each
> invocation; but the semantics of Scanf imposes a look ahead of 1
> character to check that no other \n follows the \n that ends your
> pattern (the semantics of \n being to match 0 or more \n, space, tab,
> or return). For each line Scanf reads an extra character after the end
> of line; it stores this character (wihch is a '(' by the way) in the
> input buffer; but note that the character has been read from the
> in_channel; now the next scanf invocation will allocate a new input
> buffer that reads from stdin starting after the last character read by
> the preceding invocation (the '(' looahead character). Hence you
> see that a '(' is missing at the beginning of each line after the
> first one!

This behaviour is couterintuitive, and should be considered 

> To solve this problem, you should use bscanf and an explicitely
> allocated input buffer that would survive from one call to scanf to
> the next one. Considering that this phenomenon is general concerning
> stdin and scanf, I rewrote the scanf code such that it allocates a
> buffer once and for all. Hence this problem is solved in the working
> sources.

Very good. Thank you very much.

> ...
> Another semantical question is: should the call
> sscanf "" "%[^\n\r]\n" (fun x -> x)
> be successful or not ? If yes, what happens to your problem ?

With the present semantics, it should raise an exception. 
With the semantics of partial matches it should succeed.

> An interesting example indeed that helps precising the semantics of
> Scanf patterns and functions, thank you very much!
> Pierre Weis

I humbly bow to your kindness. Thank you very much for 
sharing your work with all of us.


To unsubscribe, mail Archives:
Bug reports: FAQ:
Beginner's list: