Browse thread
[Caml-list] Ocaml interface to ctype.h functions
[
Home
]
[ Index:
by date
|
by threads
]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
[ Message by date: previous | next ] [ Message in thread: previous | next ] [ Thread: previous | next ]
| Date: | -- (:) |
| From: | Xavier Leroy <Xavier.Leroy@i...> |
| Subject: | Re: [Caml-list] Ocaml interface to ctype.h functions |
> I've been working on some projects recently where it would be nice to have
> access to the ctype.h character classification functions (isalpha(),
> isspace(), etc.) in Ocaml, and couldn't find anything like them in a search
> through the standard library. It's easy to whip up a library for this, but
> before doing so, I thought I'd ask if there's any plans to put them in the
> Character module or some other place it makes sense to have them.
It would make sense to have classification functions in the Char
module. The main issue is: what is a letter?, or: how to deal with
character sets.
If only one, fixed character set is supported (e.g. US-ASCII or
Latin-1), it's truly easy, but will not satisfy everyone. OCaml has
already been criticized for supporting ISO Latin-1 accented letters in
identifiers! (Look at the caml-list archives if you don't believe me.)
Building on the C functions isalpha(), etc, is a bit of a cop-out,
because then we're dependent on what these functions actually do on a
variety of Unix, Windows and Macintosh systems. In particular, we
become dependent on the ISO C internationalization framework ("locales"),
which I think is a mess because it relies too much on a global state
(the current locale).
To give an example of the kind of problems I fear, just doing
setlocale(LC_ALL, "fr_FR") in an OCaml program causes
float_of_string "3.14" to return 0.0. Guess why? float_of_string
relies on the C function atof(), which is internationalized, and
doesn't recognize "." as a decimal point -- French uses a "," instead...
Finally, there's the Unicode approach. Letters, etc, are well defined
without reference to a "locale" or whatever piece of state. But then
we've just shifted the problem to a more general one: retrofitting
Unicode into OCaml, which again has been the subject of lively
discussions on this mailing list :-)
> If it's
> just a matter of waiting for someone to do it, I'm willing to volunteer, as
> I'd probably be doing it anyways on my own.
It's mostly a matter of knowing what we want these classification
functions to do. Meanwhile, it might be easier to define your own
isalpha, etc, predicates; at least you get to choose the encoding!
Besides, it's really easy using pattern-matching, e.g. for ASCII:
let isalpha = function 'A'..'Z'|'a'..'z' -> true | _ -> false
- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr. Archives: http://caml.inria.fr