Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006525OCamlstandard librarypublic2014-08-28 23:072016-01-27 09:22
Assigned To 
PlatformOSOS Version
Product Version 
Target Version4.03.0+dev / +beta1Fixed in Version 
Summary0006525: add a UChar module to the standard library
DescriptionGithub Pull Request by Daniel Bünzli: [^]

> As I already made clear in previous discussions on the `caml-list`,
> I find that OCaml's current support for Unicode is *outstanding*
> (au propre comme au figuré).
> I don't think introducing a Unicode string data structure and
> a corresponding syntax for literals would be a good thing do
> to. Since, if one wanted to that in a *correct* and *useful* way, it
> would entail importing a good deal of the Unicode processing machinery
> (e.g. normalization) in the compiler and I really think it's better to
> leave that outside the compiler. Unicode processing can perfectly be
> left to a set of modularized, external libraries. I also think it's
> actually a good idea to proceed that way as libraries are in a better
> position to evolve with the standard (e.g. newly encoded characters on
> Unicode standard updates may imply changes to normalisation results
> and would entail updates to the compiler).
> There is however one thing that I really find missing to get *utterly
> excellent* Unicode support in OCaml: an abstract datatype, *in the
> standard library*, to represent an Unicode scalar value (by abusing
> terminology: an Unicode character). An [Unicode scalar
> value]( [^]) is
> simply an integer in the ranges `0x0000…0xD7FF` or `0xE000…0x10FFFF`.
> Such a data type would allow independent libraries dealing with
> unicode characters (e.g. [`ulex`]( [^]),
> [`camomile`]( [^]),
> [`uutf`]( [^]),
> [`uunf`]( [^]),
> [`uucp`]( [^]),
> [`uucd`]( [^])) to interchange data
> without relying on `int`s and as such strengthen the abstractions and
> guarantees a bit; avoid documentation warnings blabla that the given
> `int`s need to be in the above range, avoid needless (re)checks if
> data flows among modules, well you get the idea, the basic advantages
> of data abstraction...
> This proposal simply adds such a minimal data type along with a few
> functions which by themselves don't do much except integrating with
> the standard library; doing real Unicode processing is left to
> external libraries, as it should be.
> One question is whether a `Pervasives.uchar` type equal to `Uchar.t`
> should be introduced (not part of this proposal). I don't think it's
> essential, it could be a nice touch though.
Tagsgithub, patch
Attached Filespatch file icon 80.patch [^] (7,125 bytes) 2014-08-28 23:07 [Show Content]

- Relationships

-  Notes
doligez (administrator)
2014-11-24 18:09

A question for Daniel: would you mind having to spell your name in pure ASCII? As part of the (slow) transition away from Latin-1, I'm trying to get all the source code of the system in pure ASCII, even in the comments.
dbuenzli (reporter)
2014-12-06 14:44

Sorry forgot to monitor and missed your request. Done in the PR.
frisch (developer)
2016-01-27 09:22

The Github PR has been merged.

- Issue History
Date Modified Username Field Change
2014-08-28 23:07 gasche New Issue
2014-08-28 23:07 gasche File Added: 80.patch
2014-08-28 23:07 gasche Tag Attached: github
2014-08-28 23:07 gasche Tag Attached: patch
2014-09-04 00:25 doligez Target Version 4.02.1+dev => undecided
2014-11-24 18:09 doligez Note Added: 0012572
2014-11-24 18:10 doligez Status new => acknowledged
2014-12-06 14:44 dbuenzli Note Added: 0012682
2015-12-03 15:35 frisch Target Version undecided => 4.03.0+dev / +beta1
2016-01-27 09:22 frisch Note Added: 0015274
2016-01-27 09:22 frisch Status acknowledged => closed
2016-01-27 09:22 frisch Resolution open => fixed
2017-02-23 16:43 doligez Category OCaml standard library => standard library

Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker