Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006695OCaml~DO NOT USE (was: OCaml general)public2014-12-07 16:482016-12-07 11:37
Assigned Towhitequark 
PlatformOSOS Version
Product Version 
Target VersionFixed in Version4.03.0+dev / +beta1 
Summary0006695: Do not treat paths as encoded in ISO-8859-1
DescriptionCurrently, ocamlc uses `String.capitalize` and `String.uncapitalize` extensively when deriving filenames from module names and vice versa. These functions treat the strings as ISO-8859-1, and attempt to case-fold letters such as \248 (ΓΈ).

Today, no supported operating system where OCaml runs always encodes paths as ISO-8859-1. Rather, UTF-8 is used on sane platforms, and a locale-specific encoding on Windows. Thus, this case-folding is practically always broken and the derived name will contain garbage if the first letter is not included in US-ASCII.

This is a separate issue from [^] Not only the impact in this case is very clear and the scope is limited to the compiler, but the current behavior is also more clearly broken.
TagsNo tags attached.
Attached Files

- Relationships
related to 0006694closedgasche Do not implicitly use ISO-8859-1 in Char.uppercase/lowercase and derived functions 
related to 0003771resolveddra Reading Unicode filenames fails on Windows 
related to 0006692closed Identifiers in Unicode 
related to 0006697acknowledged Turn warning on using non-US-ASCII ISO-8859-1 characters in identifiers into error 

-  Notes
frisch (developer)
2014-12-08 10:45

This is related to 0003771 as well, which would have the effect of representing filenames under Windows as utf8 strings.
frisch (developer)
2014-12-08 10:51

Note that currently, source code is interpreted as a Latin1 stream, and Latin1 letters are allowed in module identifiers (although is deprecated and raises Warning 3). It's probably a good time to turn this into a proper error, otherwise we need to specify how these names are mapped to filenames, and I don't think we want to go into that.
whitequark (developer)
2014-12-08 10:52

Agreed. I will open another issue to track that.
whitequark (developer)
2014-12-12 17:18 [^]
gasche (administrator)
2014-12-21 12:56

The patch of whitequark that uses the *_ascii functions everywhere inside the compiler has been merged.

- Issue History
Date Modified Username Field Change
2014-12-07 16:48 whitequark New Issue
2014-12-07 16:48 whitequark Summary Do not treat paths as encoded in latin-1 => Do not treat paths as encoded in ISO-8859-1
2014-12-07 16:49 whitequark Description Updated View Revisions
2014-12-07 16:49 whitequark Description Updated View Revisions
2014-12-07 16:50 whitequark Description Updated View Revisions
2014-12-07 16:59 whitequark Description Updated View Revisions
2014-12-07 17:40 gasche Relationship added related to 0006694
2014-12-08 10:45 frisch Relationship added related to 0003771
2014-12-08 10:45 frisch Note Added: 0012714
2014-12-08 10:51 frisch Note Added: 0012715
2014-12-08 10:52 whitequark Note Added: 0012716
2014-12-08 10:54 whitequark Relationship added related to 0006697
2014-12-12 17:18 whitequark Note Added: 0012773
2014-12-13 22:01 gasche Relationship added related to 0006692
2014-12-19 18:22 gasche Status new => acknowledged
2014-12-21 12:56 gasche Note Added: 0012906
2014-12-21 12:56 gasche Status acknowledged => resolved
2014-12-21 12:56 gasche Resolution open => fixed
2014-12-21 12:56 gasche Assigned To => whitequark
2014-12-21 12:57 gasche Fixed in Version => 4.03.0+dev / +beta1
2016-12-07 11:37 xleroy Status resolved => closed
2017-02-23 16:36 doligez Category OCaml general => -OCaml general
2017-03-03 17:55 doligez Category -OCaml general => -(deprecated) general
2017-03-03 18:01 doligez Category -(deprecated) general => ~deprecated (was: OCaml general)
2017-03-06 17:04 doligez Category ~deprecated (was: OCaml general) => ~DO NOT USE (was: OCaml general)

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker