Version française
Home     About     Download     Resources     Contact us    

This site is updated infrequently. For up-to-date information, please visit the new OCaml website at

Browse thread
Windows filenames and Unicode
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: 2010-09-29 (07:57)
From: Michael Ekstrand <michael@e...>
Subject: Re: [Caml-list] Windows filenames and Unicode
On Wed, 2010-09-29 at 17:26 +1000, Paul Steckler wrote:
> Hmmm, I shouldn't have to do this.  Are there plans afoot to modernize
> OCaml's string-handling?

The Batteries project aims to provide more modernized string handling,
and we already go a long way with the UTF8 module (from Camomile) and
ropes.  That does not, however, affect the file opening routines, as the
current Batteries design requires you to open files using platform
strings for their names.

It may be interesting to look at allowing files to be opened using
unicode names.  However, this is fraught with difficulties, particularly
in cross-platform situation.  Handling filenames in Unicode is incorrect
on Unix and Linux systems where the locale encoding does not have an
idempotent conversion to and from Unicode.  Therefore, the correct way
to handle filenames in a cross-platform fashion is to always store them
in the system filename encoding (any Unicode encoding on Windows when
the wide functions are supported, the current locale encoding on Unix)
and convert them to Unicode only for display.  IMO any enhanced string
design covering filenames must encourage this.  Fortunately, OCaml's
type system makes such an API fairly natural, it just needs to be
designed and implemented.

- Michael

Jabber/Google Talk: this e-mail address
mouse, n: a device for pointing at the xterm in which you want to type