Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0006525OCamlOCaml standard librarypublic2014-08-28 23:072014-11-24 18:10
Reportergasche 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityN/A
StatusacknowledgedResolutionopen 
PlatformOSOS Version
Product Version 
Target VersionundecidedFixed in Version 
Summary0006525: add a UChar module to the standard library
DescriptionGithub Pull Request by Daniel Bünzli:
  https://github.com/ocaml/ocaml/pull/80 [^]

> As I already made clear in previous discussions on the `caml-list`,
> I find that OCaml's current support for Unicode is *outstanding*
> (au propre comme au figuré).
>
> I don't think introducing a Unicode string data structure and
> a corresponding syntax for literals would be a good thing do
> to. Since, if one wanted to that in a *correct* and *useful* way, it
> would entail importing a good deal of the Unicode processing machinery
> (e.g. normalization) in the compiler and I really think it's better to
> leave that outside the compiler. Unicode processing can perfectly be
> left to a set of modularized, external libraries. I also think it's
> actually a good idea to proceed that way as libraries are in a better
> position to evolve with the standard (e.g. newly encoded characters on
> Unicode standard updates may imply changes to normalisation results
> and would entail updates to the compiler).
>
> There is however one thing that I really find missing to get *utterly
> excellent* Unicode support in OCaml: an abstract datatype, *in the
> standard library*, to represent an Unicode scalar value (by abusing
> terminology: an Unicode character). An [Unicode scalar
> value](http://www.unicode.org/glossary/#unicode_scalar_value [^]) is
> simply an integer in the ranges `0x0000…0xD7FF` or `0xE000…0x10FFFF`.
>
> Such a data type would allow independent libraries dealing with
> unicode characters (e.g. [`ulex`](http://www.cduce.org/ulex/ [^]),
> [`camomile`](http://camomile.sourceforge.net/ [^]),
> [`uutf`](http://erratique.ch/software/uutf [^]),
> [`uunf`](http://erratique.ch/software/uunf [^]),
> [`uucp`](http://erratique.ch/software/uucp [^]),
> [`uucd`](http://erratique.ch/software/uucd [^])) to interchange data
> without relying on `int`s and as such strengthen the abstractions and
> guarantees a bit; avoid documentation warnings blabla that the given
> `int`s need to be in the above range, avoid needless (re)checks if
> data flows among modules, well you get the idea, the basic advantages
> of data abstraction...
>
> This proposal simply adds such a minimal data type along with a few
> functions which by themselves don't do much except integrating with
> the standard library; doing real Unicode processing is left to
> external libraries, as it should be.
>
> One question is whether a `Pervasives.uchar` type equal to `Uchar.t`
> should be introduced (not part of this proposal). I don't think it's
> essential, it could be a nice touch though.
Tagsgithub, patch
Attached Filespatch file icon 80.patch [^] (7,125 bytes) 2014-08-28 23:07 [Show Content]

- Relationships

-  Notes
(0012572)
doligez (administrator)
2014-11-24 18:09

A question for Daniel: would you mind having to spell your name in pure ASCII? As part of the (slow) transition away from Latin-1, I'm trying to get all the source code of the system in pure ASCII, even in the comments.

- Issue History
Date Modified Username Field Change
2014-08-28 23:07 gasche New Issue
2014-08-28 23:07 gasche File Added: 80.patch
2014-08-28 23:07 gasche Tag Attached: github
2014-08-28 23:07 gasche Tag Attached: patch
2014-09-04 00:25 doligez Target Version 4.02.1+dev => undecided
2014-11-24 18:09 doligez Note Added: 0012572
2014-11-24 18:10 doligez Status new => acknowledged


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker