Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0004562OCaml~DO NOT USE (was: OCaml general)public2008-06-05 12:162009-04-29 20:44
Assigned Toweis 
PlatformOSOS Version
Product Version3.10.2 
Target VersionFixed in Version3.11.0+beta 
Summary0004562: scanf produces wrong %n output value after integer conversion
DescriptionComments added for clarity.

$ ocaml
        Objective Caml version 3.10.2

# let g s = Scanf.sscanf s "%d%n" (fun i n -> (i, n));;
val g : string -> int * int = <fun>
# g "99";;
- : int * int = (99, 2) (* Correct. *)
# g "99 syntaxes all in a row";;
- : int * int = (99, 3) (* Wrong. *)
# g "-20 degrees Celsius";;
- : int * int = (-20, 4) (* Also wrong. *)
# for i = 32 to 126 do
    if ((i < 48) || (i >= 58)) && (i != 95) then
      let (i, n) = g ("42" ^ (String.make 1 (char_of_int i))) in
      if n != 3 then Printf.printf "Hmm: %d\n%!" n
- : unit = () (* Happens with all printable
                                         ASCII chars in [^0-9_]. *)

Additional InformationThis is on Debian unstable AMD64, version 3.10.2-3 of the "ocaml" package.
A cursory glance at stdlib/ makes me think that not only is it
peeking a char and then erroneously counting that as part of the character
count, but the Scanning stuff doesn't have a way to do otherwise, making
this probably require a larger change to fix than I would have otherwise
expected. :-(
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
weis (developer)
2008-06-06 10:12

This is clearly a semantical issue, not a bug.
weis (developer)
2008-06-06 11:41

I think you overlooked the definition of the %n conversion; in the documentation for Scanf, it is stated as:

- [n]: returns the number of characters read so far.

If we accept this definition, %n is not supposed to give the number of characters of tokens, or even be related to the length of tokens: it just returns the number of characters that have been ``read so far'' to return those tokens. Hence, there is no errors in the examples you gave: ``the number of characters read so far'' to return the tokens you asked for are precisely those reported by the call to scanf.

This behaviour is also briefly explained in a note of the documentation:

    Note: a scan may often require to examine one character in advance;
    when this ``lookahead'' character does not belong to the token read,
    it is stored back in the scanning buffer and becomes the next
    character read.

A seminal example of this kind of scan that require a lookahead character is the very useful %0c conversion that means: ``test the current input character without reading it''. To let you examine the NEXT character to be read, the %0c conversion must read this character and stores it to be the next character to be read.

This behaviour is not at all uncommon: in fact, almost all the conversions necessitate such a lookahead, %s, %d, %f, and so on. This is clear if asking an integer from the string "0123abc": scanf must read the character 'a' before stating that the number indeed ends at character '3' of the input. Hence after reading 123, the %n conversion returns the exact count of character read so far which is 5.

# Scanf.sscanf "0123abc" "%i%n" (fun n count_for_n -> n, count_for_n);;
- : int * int = (123, 5)

Note also that reading a single character after the integer does not change the ``number of character read so far'', since there is no need to read any character more to find 'a':
# Scanf.sscanf "0123abc" "%i%n%c%n"
   (fun n count_for_n c count_for_c -> n, count_for_n, c, count_for_c);;
- : int * int * char * int = (123, 5, 'a', 5)
premchai21 (reporter)
2008-06-06 12:04

I do not see that note paragraph anywhere in [^] or in my local copy of the documentation. Where is that note located?

Every C scanf implementation that I have seen defines %n to mean "number of characters read so far" with the semantics of "number of characters consumed that were used to match tokens or other parts of the format string, not including any lookahead characters read from the input stream". In the absence of a formal definition, the Caml documentation can reasonably be interpreted this way as well. It is also a much more common and useful case to require the number of characters matched without including any lookahead characters. Making the interpretation of lookahead (which is more an internal detail of the Scanf module) a necessary part of constructing the conversion strings and functions feels rather unclean.

Even the note paragraph that you quote doesn't seem to contradict that idea; it states that when a lookahead character is stored back into the scanning buffer, it becomes the next character read. This to me implies that it has been _unread_ and is therefore no longer considered read as regards the _logical_ state of the scanner, even if one more character had to be physically read from the input in order to produce this state.
weis (developer)
2008-09-08 14:49

This is fixed in the current development version:

        Objective Caml version 3.11+dev15

# let g s = Scanf.sscanf s "%d%n" (fun i n -> (i, n));;
val g : string -> int * int = <fun>
# g "99";;
- : int * int = (99, 2)
# g "99 syntaxes all in a row";;
- : int * int = (99, 2)
# g "-20 degrees Celsius";;
- : int * int = (-20, 3)

So, now, the lookahead character is no more counted as read, even if it really has been. I agree with you that this semantics is clearer and more sound.

- Issue History
Date Modified Username Field Change
2008-06-05 12:16 premchai21 New Issue
2008-06-06 10:12 weis Note Added: 0004514
2008-06-06 10:12 weis Assigned To => weis
2008-06-06 10:12 weis Status new => assigned
2008-06-06 11:41 weis Note Added: 0004515
2008-06-06 12:04 premchai21 Note Added: 0004516
2008-09-08 14:49 weis Note Added: 0004602
2008-09-08 14:50 weis Status assigned => resolved
2008-09-08 14:50 weis Resolution open => fixed
2009-04-29 20:44 weis Status resolved => closed
2009-04-29 20:44 weis Fixed in Version => 3.11.0+beta
2017-02-23 16:36 doligez Category OCaml general => -OCaml general
2017-03-03 17:55 doligez Category -OCaml general => -(deprecated) general
2017-03-03 18:01 doligez Category -(deprecated) general => ~deprecated (was: OCaml general)
2017-03-06 17:04 doligez Category ~deprecated (was: OCaml general) => ~DO NOT USE (was: OCaml general)

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker