New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scanf produces wrong %n output value after integer conversion #4562
Comments
Comment author: @pierreweis This is clearly a semantical issue, not a bug. |
Comment author: @pierreweis I think you overlooked the definition of the %n conversion; in the documentation for Scanf, it is stated as:
If we accept this definition, %n is not supposed to give the number of characters of tokens, or even be related to the length of tokens: it just returns the number of characters that have been This behaviour is also briefly explained in a note of the documentation:
A seminal example of this kind of scan that require a lookahead character is the very useful %0c conversion that means: ``test the current input character without reading it''. To let you examine the NEXT character to be read, the %0c conversion must read this character and stores it to be the next character to be read. This behaviour is not at all uncommon: in fact, almost all the conversions necessitate such a lookahead, %s, %d, %f, and so on. This is clear if asking an integer from the string "0123abc": scanf must read the character 'a' before stating that the number indeed ends at character '3' of the input. Hence after reading 123, the %n conversion returns the exact count of character read so far which is 5. Scanf.sscanf "0123abc" "%i%n" (fun n count_for_n -> n, count_for_n);;
Note also that reading a single character after the integer does not change the ``number of character read so far'', since there is no need to read any character more to find 'a': Scanf.sscanf "0123abc" "%i%n%c%n"(fun n count_for_n c count_for_c -> n, count_for_n, c, count_for_c);;
|
Comment author: premchai21 I do not see that note paragraph anywhere in http://caml.inria.fr/pub/docs/manual-ocaml/libref/Scanf.html or in my local copy of the documentation. Where is that note located? Every C scanf implementation that I have seen defines %n to mean "number of characters read so far" with the semantics of "number of characters consumed that were used to match tokens or other parts of the format string, not including any lookahead characters read from the input stream". In the absence of a formal definition, the Caml documentation can reasonably be interpreted this way as well. It is also a much more common and useful case to require the number of characters matched without including any lookahead characters. Making the interpretation of lookahead (which is more an internal detail of the Scanf module) a necessary part of constructing the conversion strings and functions feels rather unclean. Even the note paragraph that you quote doesn't seem to contradict that idea; it states that when a lookahead character is stored back into the scanning buffer, it becomes the next character read. This to me implies that it has been unread and is therefore no longer considered read as regards the logical state of the scanner, even if one more character had to be physically read from the input in order to produce this state. |
Comment author: @pierreweis This is fixed in the current development version:
let g s = Scanf.sscanf s "%d%n" (fun i n -> (i, n));;val g : string -> int * int = g "99";;
g "99 syntaxes all in a row";;
g "-20 degrees Celsius";;
So, now, the lookahead character is no more counted as read, even if it really has been. I agree with you that this semantics is clearer and more sound. |
Original bug ID: 4562
Reporter: premchai21
Assigned to: @pierreweis
Status: closed (set by @pierreweis on 2009-04-29T18:44:06Z)
Resolution: fixed
Priority: normal
Severity: minor
Version: 3.10.2
Fixed in version: 3.11.0+beta
Category: ~DO NOT USE (was: OCaml general)
Bug description
Comments added for clarity.
$ ocaml
Objective Caml version 3.10.2
let g s = Scanf.sscanf s "%d%n" (fun i n -> (i, n));;
val g : string -> int * int =
g "99";;
g "99 syntaxes all in a row";;
g "-20 degrees Celsius";;
for i = 32 to 126 do
done;;
ASCII chars in [^0-9_]. *)
Additional information
This is on Debian unstable AMD64, version 3.10.2-3 of the "ocaml" package.
A cursory glance at stdlib/scanf.ml makes me think that not only is it
peeking a char and then erroneously counting that as part of the character
count, but the Scanning stuff doesn't have a way to do otherwise, making
this probably require a larger change to fix than I would have otherwise
expected. :-(
The text was updated successfully, but these errors were encountered: