Version française
Home     About     Download     Resources     Contact us    
Browse thread
[Caml-list] Ocaml interface to ctype.h functions
[ Home ] [ Index: by date | by threads ]
[ Search: ]

[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
Date: -- (:)
From: Xavier Leroy <Xavier.Leroy@i...>
Subject: Re: [Caml-list] Ocaml interface to ctype.h functions
> I've been working on some projects recently where it would be nice to have
> access to the ctype.h character classification functions (isalpha(),
> isspace(), etc.) in Ocaml, and couldn't find anything like them in a search
> through the standard library. It's easy to whip up a library for this, but
> before doing so, I thought I'd ask if there's any plans to put them in the
> Character module or some other place it makes sense to have them.

It would make sense to have classification functions in the Char
module.  The main issue is: what is a letter?, or: how to deal with
character sets.

If only one, fixed character set is supported (e.g. US-ASCII or
Latin-1), it's truly easy, but will not satisfy everyone.  OCaml has
already been criticized for supporting ISO Latin-1 accented letters in
identifiers!  (Look at the caml-list archives if you don't believe me.)

Building on the C functions isalpha(), etc, is a bit of a cop-out,
because then we're dependent on what these functions actually do on a
variety of Unix, Windows and Macintosh systems.  In particular, we
become dependent on the ISO C internationalization framework ("locales"),
which I think is a mess because it relies too much on a global state
(the current locale).

To give an example of the kind of problems I fear, just doing
setlocale(LC_ALL, "fr_FR") in an OCaml program causes
float_of_string "3.14" to return 0.0.  Guess why?  float_of_string
relies on the C function atof(), which is internationalized, and
doesn't recognize "." as a decimal point -- French uses a "," instead...

Finally, there's the Unicode approach.  Letters, etc, are well defined
without reference to a "locale" or whatever piece of state.  But then
we've just shifted the problem to a more general one: retrofitting
Unicode into OCaml, which again has been the subject of lively
discussions on this mailing list :-)

> If it's
> just a matter of waiting for someone to do it, I'm willing to volunteer, as
> I'd probably be doing it anyways on my own.

It's mostly a matter of knowing what we want these classification
functions to do.  Meanwhile, it might be easier to define your own
isalpha, etc, predicates; at least you get to choose the encoding!
Besides, it's really easy using pattern-matching, e.g. for ASCII:

let isalpha = function 'A'..'Z'|'a'..'z' -> true | _ -> false

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr