Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0005380OCamlOCaml generalpublic2011-10-22 19:492013-08-02 22:18
Reporterbluestorm 
Assigned Toweis 
PrioritynormalSeveritycrashReproducibilityalways
StatusclosedResolutionfixed 
PlatformOSOS Version
Product Version3.12.0 
Target VersionFixed in Version 
Summary0005380: Strange sscanf input segfault
DescriptionFollowing a question of Jianzhou Zhao on the beginner list (use Scanf to parse %-separated strings), I tried the following code, which typechecks and segfault on my machine:

  Scanf.sscanf "string1%string2" "%s@%%s" (fun s1 s2 -> s1, s2) ();;

I have reproduced the bug on 3.12.0, 3.10.2 and 3.11.2.
TagsNo tags attached.
Attached Files

- Relationships
has duplicate 0005605closedprotz Scanf.scanf works differently around "@" char in 4.01.0+dev2 
related to 0005973resolvedweis Format module incorrectly parses format string allowing to escape arguments 
related to 0006115resolvedgasche Unsound type checking of format strings for the Format module 

-  Notes
(0006172)
bluestorm (reporter)
2011-10-22 19:50

Note : also reproduced in SVN trunk.
(0006177)
weis (developer)
2011-10-24 11:21

Thank you for reporting this strange and puzzling bug.

I will correct it as soon as possible; for the time being, you may use the ``%,'' conversion separator to clearly delineate your two conversions:

        Objective Caml version 3.12.2+dev1 (2011-08-03)

# Scanf.sscanf "string1%string2" "%s@%%,%s" (fun s1 s2 -> s1, s2);;
- : string * string = ("string1", "string2")

As an additional benefit, you will also get:

# Scanf.sscanf "string1%string2" "%s@%%,%s" (fun s1 s2 -> s1, s2) ();;
Error: This expression has type string * string
       but an expression was expected of type 'a -> 'b

which is indeed the expected behaviour of the type checker.

Stay tune for a complete correction in the working sources.
(0006179)
xclerc (developer)
2011-10-24 14:22

Tentative fix by revision 11233 in branch "version/3.12".
(0006181)
xclerc (developer)
2011-10-25 07:34
edited on: 2011-10-25 07:34

Reverted the commit, the tentative fix being awfully wrong.

(0006183)
weis (developer)
2011-10-26 01:57

I fixed the bug. It was indeed not trivial to correct and was hiding there for years!

In short, previous versions of the compiler accepted incorrect format strings that should not be typable (although those format strings were conformant to the documentation). Hence the seg fault you observed :(

To correct this nasty situation, I was obliged to slightly modify the conventions for the @ character in format strings, as follows.

As you may know, %% is equivalent to a plain % character and @@ to a plain @ character. I had to add the extra convention that @% is equivalent to a plain % character. As a consequence, some code that was (wrongly) accepted before may now fail to compile with a typing error. Consider for instance the "@%s" format string; before the correction of the bug it was made of one plain @ character followed by a string conversion; now, the new convention turns @% to a plain % character: the format string is thus equivalent to 2 plain characters, and the string conversion has vanished. To correct this format, simply double the @ to recover the plain @ character, writing "@@%s".

The good news is that the corrected format string is already valid and equivalent to the wrong one in all the distributed versions of the compiler (including 3.10, 3.11, 3.12, and the SVN trunk).

So that you can safely correct your code in advance and prevent any bad surprise with the forecoming versions of OCaml!
(0006198)
weis (developer)
2011-11-08 10:06

This is indeed a tough issue!

My correction was still buggy. I reopen the BR to find a better and correct way to get rid of the bug.
(0006302)
weis (developer)
2011-12-15 09:45

The solution was simply to enforce the current treatment of '%' in format strings:

Every occurrence of '%' in a format string is considered as introducing a conversion, unless escaped as "%%" to stand for a plain '%' character. This rule now stands within character ranges and format string indications.

So, to read a string until a plain character '%' you must write the format string "%s@%%".

Mutatis mutandis, the same rule apply to '@' characters in format strings:

Every occurrence of '@' in a format string is considered as introducing a format string indication, unless escaped as "%@" to stand for a plain '@' character. This rule now stands within character ranges and format string indications.

So, to read a string until a plain character '@' you must write the format string "%s@%@".

For sake of backward compatibility, occurrences of '@' that does not start a valid format string indication are still accepted.

This unified treatment of '%' and '@' in format strings corrects the bug and preserves existing programs.
(0006303)
weis (developer)
2011-12-15 09:50

Corrected in version 3.12.2.

- Issue History
Date Modified Username Field Change
2011-10-22 19:49 bluestorm New Issue
2011-10-22 19:50 bluestorm Note Added: 0006172
2011-10-24 00:14 weis Assigned To => weis
2011-10-24 00:14 weis Status new => confirmed
2011-10-24 11:21 weis Note Added: 0006177
2011-10-24 14:22 xclerc Note Added: 0006179
2011-10-25 07:34 xclerc Note Added: 0006181
2011-10-25 07:34 xclerc Note Edited: 0006181
2011-10-26 01:57 weis Note Added: 0006183
2011-10-26 01:58 weis Status confirmed => closed
2011-11-08 10:06 weis Note Added: 0006198
2011-11-08 10:06 weis Status closed => assigned
2011-12-15 09:45 weis Note Added: 0006302
2011-12-15 09:50 weis Note Added: 0006303
2011-12-15 09:50 weis Status assigned => closed
2011-12-15 09:50 weis Resolution open => fixed
2012-05-03 09:23 protz Relationship added has duplicate 0005605
2013-04-03 10:52 gasche Relationship added related to 0005973
2013-08-02 22:18 gasche Relationship added related to 0006115


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker