Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extensions to stdlib #2394

Closed
vicuna opened this issue Mar 15, 2000 · 2 comments
Closed

extensions to stdlib #2394

vicuna opened this issue Mar 15, 2000 · 2 comments

Comments

@vicuna
Copy link

vicuna commented Mar 15, 2000

Original bug ID: 57
Reporter: administrator
Status: closed
Resolution: fixed
Priority: normal
Severity: feature
Category: ~DO NOT USE (was: OCaml general)

Bug description

Hello,

as I was told on the mailing list, one should send patches for missing
features in the stdlib to the bugtracking list. At the end of the file you
find the patch containing additions to the following modules/files:

char.ml
char.mli
set.ml
set.mli
stack.ml
stack.mli
string.ml
string.mli

The patch can be applied from within the "stdlib" directory by simply
calling "patch < {some_path}/stdlib.patch"

In short, these additions contain the following extra functions (no changes
to implementations of already existing functions) plus their stdlib-style
comments:

module char:

The basic character classification functions as found in the C, but
with more readable syntax as normally used in OCaml (e.g. "is_print"
instead of "isprint"):

  val is_alpha : char -> bool
  val is_upper : char -> bool
  val is_lower : char -> bool
  val is_digit : char -> bool
  val is_xdigit : char -> bool
  val is_alnum : char -> bool
  val is_space : char -> bool
  val is_punct : char -> bool
  val is_print : char -> bool
  val is_graph : char -> bool
  val is_cntrl : char -> bool
  val is_ascii : char -> bool

module set:

Some functions also available in the "list"-module, but which make
perfect sense for sets:

  val for_all : pred:(elt -> bool) -> t -> bool
  val exists : pred:(elt -> bool) -> t -> bool
  val filter : pred:(elt -> bool) -> t -> t
  val find_all : pred:(elt -> bool) -> t -> t
  val partition : pred:(elt -> bool) -> t -> t * t

module stack:

The otherwise awkward to emulate function "top" (currently emulated by
"pop" + "push"):

  val top : 'a t -> 'a

module string:

The functions "explode" and "implode" to convert between strings and
lists. Since string handling is a very common problem, there should be
an easy way to do this, last but not least for teaching the functional
approach to string handling algorithms.

  val explode : string -> char list
  val implode : char list -> string

I have also changed the formatting of interfaces slightly to a style, which
seems to become predominant in most other modules in the stdlib, obviously
due to the merger with OLabl:

before:

val foo: 'a -> 'b

after:

val foo : 'a -> 'b

This allows editors (e.g. VIM, but possibly others, too) to make a
difference between labels and function declarations for e.g. highlighting.
This is the reason why the patch is a bit long.

I have tried hard to make all changes "INRIA-style" and would be surprised
if it doesn't fit into the library. Of course, all changes have been
thoroughly checked for bugs and the implementation should be fairly
optimal. I have also compiled the OCaml-compiler with the changes, which
seems to work fine, except for some compiler crashes, which are very surely
not due to these changes but rather in the code generator (at least for
Alphas) - see my later bug report.

If you object to all or parts of these changes, I won't feel offended, of
course. After all, it is an important goal to keep the standard library
"tidy"...

Best regards,
Markus Mottl

--
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl

Contents of file "stdlib.patch" below the following bar:

Index: char.ml

RCS file: /caml/ocaml/stdlib/char.ml,v
retrieving revision 1.9
diff -u -r1.9 char.ml
--- char.ml 1999/11/17 18:58:23 1.9
+++ char.ml 2000/03/15 19:25:02
@@ -14,15 +14,15 @@

(* Character operations *)

-external code: char -> int = "%identity"
-external unsafe_chr: int -> char = "%identity"
+external code : char -> int = "%identity"
+external unsafe_chr : int -> char = "%identity"

let chr n =
if n < 0 or n > 255 then invalid_arg "Char.chr" else unsafe_chr n

-external is_printable: char -> bool = "is_printable"
+external is_printable : char -> bool = "is_printable"

-external string_create: int -> string = "create_string"
+external string_create : int -> string = "create_string"
external string_unsafe_get : string -> int -> char = "%string_unsafe_get"
external string_unsafe_set : string -> int -> char -> unit
= "%string_unsafe_set"
@@ -59,3 +59,17 @@
|| (c >= '\248' && c <= '\254')
then unsafe_chr(code c - 32)
else c
+
+let is_upper c = c <= 'Z' && c >= 'A'
+let is_lower c = c <= 'z' && c >= 'a'
+let is_alpha c = is_lower c || is_upper c
+let is_digit c = c <= '9' && c >= '0'
+let is_xdigit c = is_digit c || c <= 'f' && c >= 'a' || c <= 'F' && c >= 'A'
+let is_alnum c = is_alpha c || is_digit c
+let is_space c = c = ' ' || c <= '\r' && c >= '\t'
+let is_punct c = c <= '~' && c >= '{' || c <= '`' && c >= '[' ||

  •              c >= '!' && c <= '/' || c <= '@' && c >= ':'
    

+let is_print c = c <= '' && c >= ' '
+let is_graph c = c <= '
' && c >= '!'
+let is_cntrl c = c <= '\031' || c = '\127'
+let is_ascii c = c < '\128'
Index: char.mli

RCS file: /caml/ocaml/stdlib/char.mli,v
retrieving revision 1.11
diff -u -r1.11 char.mli
--- char.mli 1999/11/17 18:58:23 1.11
+++ char.mli 2000/03/15 19:25:02
@@ -16,7 +16,7 @@

external code : char -> int = "%identity"
(* Return the ASCII code of the argument. )
-val chr: int -> char
+val chr : int -> char
(
Return the character with the given ASCII code.
Raise [Invalid_argument "Char.chr"] if the argument is
outside the range 0--255. )
@@ -24,10 +24,48 @@
(
Return a string representing the given character,
with special characters escaped following the lexical conventions
of Objective Caml. )
-val lowercase: char -> char
-val uppercase: char -> char
+val lowercase : char -> char
+val uppercase : char -> char
(
Convert the given character to its equivalent lowercase or
uppercase character, respectively. )
+
+(
* Predicates for classification of 7-bit ASCII characters **)
+
+val is_alpha : char -> bool

  •    (* [is_alpha c] tests whether [is_upper] or [is_lower] is true
    
  •       for [c]. *)
    

+val is_upper : char -> bool

  •    (* [is_upper c] tests whether [c] is an upper-case character. *)
    

+val is_lower : char -> bool

  •    (* [is_lower c] tests whether [c] is a lower-case character. *)
    

+val is_digit : char -> bool

  •    (* [is_digit c] tests whether [c] is a decimal-digit character. *)
    

+val is_xdigit : char -> bool

  •    (* [is_xdigit c] tests whether [c] is a hexadecimal digit
    
  •       character ([0-9], [A-F], or [a-f]). *)
    

+val is_alnum : char -> bool

  •    (* [is_alnum c] tests whether [is_alpha] or [is_digit] is true
    
  •       for [c]. *)
    

+val is_space : char -> bool

  •    (* [is_space c] tests whether [c] is a space, tab,
    
  •       carriage-return, newline, vertical-tab or form-feed (standard
    
  •       white-space character). *)
    

+val is_punct : char -> bool

  •    (* [is_punct c] tests whether [c] is neither a space  ("")
    
  •       nor a character for which [is_alnum] or [is_cntrl] is true. *)
    

+val is_print : char -> bool

  •    (* [is_print c] tests whether [is_punct], [is_upper], [is_lower],
    
  •       or [is_digit] is true for [c] or if it is the space character. *)
    

+val is_graph : char -> bool

  •    (* [is_graph c] tests whether [is_punct], [is_upper], [is_lower],
    
  •       or [is_digit] is true for [c]. *)
    

+val is_cntrl : char -> bool

  •    (* [is_cntrl c] tests whether [c] is a control character as
    
  •       defined by the character set. *)
    

+val is_ascii : char -> bool

  •    (* [is_ascii c] tests whether [c] is an ASCII character with
    
  •       code between (decimal) 0 and 127 inclusive. *)
    

(--)

-external unsafe_chr: int -> char = "%identity"
+external unsafe_chr : int -> char = "%identity"
Index: set.ml

RCS file: /caml/ocaml/stdlib/set.ml,v
retrieving revision 1.12
diff -u -r1.12 set.ml
--- set.ml 1999/11/17 18:58:29 1.12
+++ set.ml 2000/03/15 19:25:02
@@ -17,32 +17,37 @@
module type OrderedType =
sig
type t

  • val compare: t -> t -> int
  • val compare : t -> t -> int
    end

module type S =
sig
type elt
type t

  • val empty: t
  • val is_empty: t -> bool
  • val mem: elt -> t -> bool
  • val add: elt -> t -> t
  • val singleton: elt -> t
  • val remove: elt -> t -> t
  • val union: t -> t -> t
  • val inter: t -> t -> t
  • val diff: t -> t -> t
  • val compare: t -> t -> int
  • val equal: t -> t -> bool
  • val subset: t -> t -> bool
  • val iter: (elt -> unit) -> t -> unit
  • val fold: (elt -> 'a -> 'a) -> t -> 'a -> 'a
  • val cardinal: t -> int
  • val elements: t -> elt list
  • val min_elt: t -> elt
  • val max_elt: t -> elt
  • val choose: t -> elt
  • val empty : t
  • val is_empty : t -> bool
  • val mem : elt -> t -> bool
  • val add : elt -> t -> t
  • val singleton : elt -> t
  • val remove : elt -> t -> t
  • val union : t -> t -> t
  • val inter : t -> t -> t
  • val diff : t -> t -> t
  • val compare : t -> t -> int
  • val equal : t -> t -> bool
  • val subset : t -> t -> bool
  • val iter : (elt -> unit) -> t -> unit
  • val fold : (elt -> 'a -> 'a) -> t -> 'a -> 'a
  • val for_all : (elt -> bool) -> t -> bool
  • val exists : (elt -> bool) -> t -> bool
  • val filter : (elt -> bool) -> t -> t
  • val find_all : (elt -> bool) -> t -> t
  • val partition : (elt -> bool) -> t -> t * t
  • val cardinal : t -> int
  • val elements : t -> elt list
  • val min_elt : t -> elt
  • val max_elt : t -> elt
  • val choose : t -> elt
    end

module Make(Ord: OrderedType) =
@@ -254,6 +259,30 @@
match s with
Empty -> accu
| Node(l, v, r, _) -> fold f l (f v (fold f r accu))
+

  • let rec for_all p = function

  •    Empty -> true
    
  •  | Node(l, v, r, _) -> p v && for_all p l && for_all p r
    
  • let rec exists p = function

  •    Empty -> false
    
  •  | Node(l, v, r, _) -> p v || exists p l || exists p r
    
  • let find_all p =

  •  let rec find accu = function
    
  •    | Empty -> accu
    
  •    | Node(l, v, r, _) ->
    
  •        find (find (if p v then add v accu else accu) l) r in
    
  •  find Empty
    
  • let filter = find_all

  • let partition p =

  •  let rec part (t, f as accu) = function
    
  •    | Empty -> accu
    
  •    | Node(l, v, r, _) ->
    
  •        part (part (if p v then add v t, f else t, add v f) l) r in
    
  •  part (Empty, Empty)
    

    let rec cardinal = function
    Empty -> 0
    Index: set.mli
    ===================================================================
    RCS file: /caml/ocaml/stdlib/set.mli,v
    retrieving revision 1.17
    diff -u -r1.17 set.mli
    --- set.mli 2000/02/01 06:52:39 1.17
    +++ set.mli 2000/03/15 19:25:02
    @@ -24,7 +24,7 @@
    module type OrderedType =
    sig
    type t

  • val compare: t -> t -> int
  • val compare : t -> t -> int
    end
    (* The input signature of the functor [Set.Make].
    [t] is the type of the set elements.
    @@ -42,57 +42,73 @@
    (* The type of the set elements. )
    type t
    (
    The type of sets. *)
  • val empty: t
  • val empty : t
    (* The empty set. *)
  • val is_empty: t -> bool
  • val is_empty : t -> bool
    (* Test whether a set is empty or not. *)
  • val mem: item:elt -> t -> bool
  • val mem : item:elt -> t -> bool
    (* [mem x s] tests whether [x] belongs to the set [s]. *)
  • val add: item:elt -> t -> t
  • val add : item:elt -> t -> t
    (* [add x s] returns a set containing all elements of [s],
    plus [x]. If [x] was already in [s], [s] is returned unchanged. *)
  • val singleton: elt -> t
  • val singleton : elt -> t
    (* [singleton x] returns the one-element set containing only [x]. *)
  • val remove: item:elt -> t -> t
  • val remove : item:elt -> t -> t
    (* [remove x s] returns a set containing all elements of [s],
    except [x]. If [x] was not in [s], [s] is returned unchanged. *)
  • val union: t -> t -> t
  • val inter: t -> t -> t
  • val diff: t -> t -> t
  • val union : t -> t -> t
  • val inter : t -> t -> t
  • val diff : t -> t -> t
    (* Union, intersection and set difference. *)
  • val compare: t -> t -> int
  • val compare : t -> t -> int
    (* Total ordering between sets. Can be used as the ordering function
    for doing sets of sets. *)
  • val equal: t -> t -> bool
  • val equal : t -> t -> bool
    (* [equal s1 s2] tests whether the sets [s1] and [s2] are
    equal, that is, contain equal elements. *)
  • val subset: t -> t -> bool
  • val subset : t -> t -> bool
    (* [subset s1 s2] tests whether the set [s1] is a subset of
    the set [s2]. *)
  • val iter: fun:(elt -> unit) -> t -> unit
  • val iter : fun:(elt -> unit) -> t -> unit
    (* [iter f s] applies [f] in turn to all elements of [s].
    The order in which the elements of [s] are presented to [f]
    is unspecified. *)
  • val fold: fun:(elt -> acc:'a -> 'a) -> t -> acc:'a -> 'a
  • val fold : fun:(elt -> acc:'a -> 'a) -> t -> acc:'a -> 'a
    (* [fold f s a] computes [(f xN ... (f x2 (f x1 a))...)],
    where [x1 ... xN] are the elements of [s].
    The order in which elements of [s] are presented to [f] is
    unspecified. *)
  • val cardinal: t -> int
  • val for_all : pred:(elt -> bool) -> t -> bool
  •    (* [for_all p s] checks if all elements of the set
    
  •       satisfy the predicate [p]. *)
    
  • val exists : pred:(elt -> bool) -> t -> bool
  •    (* [exists p s] checks if at least one element of
    
  •       the set satisfies the predicate [p]. *)
    
  • val filter : pred:(elt -> bool) -> t -> t
  • val find_all : pred:(elt -> bool) -> t -> t
  •    (* [filter p s] returns the set of all elements in [s]
    
  •       that satisfy predicate [p]. [find_all] is another name
    
  •       for [filter]. *)
    
  • val partition : pred:(elt -> bool) -> t -> t * t
  •    (* [partition p s] returns a pair of sets [(s1, s2)], where
    
  •       [s1] is the set of all the elements of [s] that satisfy the
    
  •       predicate [p], and [s2] is the set of all the elements of
    
  •       [s] that do not satisfy [p]. *)
    
  • val cardinal : t -> int
    (* Return the number of elements of a set. *)
  • val elements: t -> elt list
  • val elements : t -> elt list
    (* Return the list of all elements of the given set.
    The returned list is sorted in increasing order with respect
    to the ordering [Ord.compare], where [Ord] is the argument
    given to [Set.Make]. *)
  • val min_elt: t -> elt
  • val min_elt : t -> elt
    (* Return the smallest element of the given set
    (with respect to the [Ord.compare] ordering), or raise
    [Not_found] if the set is empty. *)
  • val max_elt: t -> elt
  • val max_elt : t -> elt
    (* Same as [min_elt], but returns the largest element of the
    given set. *)
  • val choose: t -> elt
  • val choose : t -> elt
    (* Return one element of the given set, or raise [Not_found] if
    the set is empty. Which element is chosen is unspecified,
    but equal elements will be chosen for equal sets. *)
    Index: stack.ml
    ===================================================================
    RCS file: /caml/ocaml/stdlib/stack.ml,v
    retrieving revision 1.5
    diff -u -r1.5 stack.ml
    --- stack.ml 1999/11/17 18:58:30 1.5
    +++ stack.ml 2000/03/15 19:25:02
    @@ -27,6 +27,11 @@
    hd::tl -> s.c <- tl; hd
    | [] -> raise Empty

+let top s =

  • match s.c with
  • hd::_ -> hd
  • | [] -> raise Empty

let length s = List.length s.c

let iter f s = List.iter f s.c
Index: stack.mli

RCS file: /caml/ocaml/stdlib/stack.mli,v
retrieving revision 1.10
diff -u -r1.10 stack.mli
--- stack.mli 1999/11/30 16:06:59 1.10
+++ stack.mli 2000/03/15 19:25:02
@@ -22,18 +22,21 @@
exception Empty
(* Raised when [pop] is applied to an empty stack. *)

-val create: unit -> 'a t
+val create : unit -> 'a t
(* Return a new stack, initially empty. )
-val push: 'a -> 'a t -> unit
+val push : 'a -> 'a t -> unit
(
[push x s] adds the element [x] at the top of stack [s]. )
-val pop: 'a t -> 'a
+val pop : 'a t -> 'a
(
[pop s] removes and returns the topmost element in stack [s],
or raises [Empty] if the stack is empty. *)
+val top : 'a t -> 'a

  •    (* [top s] returns the topmost element in stack [s],
    
  •       or raises [Empty] if the stack is empty. *)
    

val clear : 'a t -> unit
(* Discard all elements from a stack. )
-val length: 'a t -> int
+val length : 'a t -> int
(
Return the number of elements in a stack. )
-val iter: fun:('a -> unit) -> 'a t -> unit
+val iter : fun:('a -> unit) -> 'a t -> unit
(
[iter f s] applies [f] in turn to all elements of [s],
from the element at the top of the stack to the element at the
bottom of the stack. The stack itself is unchanged. *)
Index: string.ml

RCS file: /caml/ocaml/stdlib/string.ml,v
retrieving revision 1.19
diff -u -r1.19 string.ml
--- string.ml 2000/01/12 15:53:18 1.19
+++ string.ml 2000/03/15 19:25:03
@@ -17,7 +17,7 @@
external length : string -> int = "%string_length"
external get : string -> int -> char = "%string_safe_get"
external set : string -> int -> char -> unit = "%string_safe_set"
-external create: int -> string = "create_string"
+external create : int -> string = "create_string"
external unsafe_get : string -> int -> char = "%string_unsafe_get"
external unsafe_set : string -> int -> char -> unit = "%string_unsafe_set"
external unsafe_blit : string -> int -> string -> int -> int -> unit
@@ -74,9 +74,9 @@
tl;
r

-external is_printable: char -> bool = "is_printable"
-external char_code: char -> int = "%identity"
-external char_chr: int -> char = "%identity"
+external is_printable : char -> bool = "is_printable"
+external char_code : char -> int = "%identity"
+external char_chr : int -> char = "%identity"

let escaped s =
let n = ref 0 in
@@ -167,3 +167,15 @@
try ignore(rindex_rec s i c); true with Not_found -> false;;

let contains s c = contains_from s 0 c;;
+
+let rec explode_aux s i accu =

  • if i < 0 then accu
  • else explode_aux s (i-1) (unsafe_get s i :: accu)

+let explode s = explode_aux s (length s - 1) []
+
+let rec implode_aux s i = function

  • | [] -> s
  • | h::t -> unsafe_set s i h; implode_aux s (i+1) t

+let implode l = implode_aux (create (List.length l)) 0 l
Index: string.mli

RCS file: /caml/ocaml/stdlib/string.mli,v
retrieving revision 1.23
diff -u -r1.23 string.mli
--- string.mli 2000/01/07 16:43:10 1.23
+++ string.mli 2000/03/15 19:25:03
@@ -71,21 +71,21 @@
(* [String.concat sep sl] catenates the list of strings [sl],
inserting the separator string [sep] between each. *)

-val escaped: string -> string
+val escaped : string -> string
(* Return a copy of the argument, with special characters represented
by escape sequences, following the lexical conventions of
Objective Caml. *)

-val index: string -> char:char -> int
+val index : string -> char:char -> int
(* [String.index s c] returns the position of the leftmost
occurrence of character [c] in string [s].
Raise [Not_found] if [c] does not occur in [s]. )
-val rindex: string -> char:char -> int
+val rindex : string -> char:char -> int
(
[String.rindex s c] returns the position of the rightmost
occurrence of character [c] in string [s].
Raise [Not_found] if [c] does not occur in [s]. )
-val index_from: string -> pos:int -> char:char -> int
-val rindex_from: string -> pos:int -> char:char -> int
+val index_from : string -> pos:int -> char:char -> int
+val rindex_from : string -> pos:int -> char:char -> int
(
Same as [String.index] and [String.rindex], but start
searching at the character position given as second argument.
[String.index s c] is equivalent to [String.index_from s 0 c],
@@ -106,18 +106,24 @@
of [s] to index [stop].
Raise [Invalid_argument] if [stop] is not a valid index of [s]. *)

-val uppercase: string -> string
+val explode : string -> char list

  •    (* [String.explode s] return a list of all chars in [s] in order
    
  •       of appearance. *)
    

+val implode : char list -> string

  •    (* [String.implode l] makes a string out of the list of chars [l]. *)
    

+val uppercase : string -> string
(* Return a copy of the argument, with all lowercase letters
translated to uppercase, including accented letters of the ISO
Latin-1 (8859-1) character set. )
-val lowercase: string -> string
+val lowercase : string -> string
(
Return a copy of the argument, with all uppercase letters
translated to lowercase, including accented letters of the ISO
Latin-1 (8859-1) character set. )
-val capitalize: string -> string
+val capitalize : string -> string
(
Return a copy of the argument, with the first letter
set to uppercase. )
-val uncapitalize: string -> string
+val uncapitalize : string -> string
(
Return a copy of the argument, with the first letter
set to lowercase. *)

@vicuna
Copy link
Author

vicuna commented Apr 13, 2000

Comment author: administrator

Hello,

Sorry for not having responded earlier, I got side-tracked by other stuff...

as I was told on the mailing list, one should send patches for missing
features in the stdlib to the bugtracking list. At the end of the file you
find the patch containing additions to the following modules/files:

char.ml
char.mli
set.ml
set.mli
stack.ml
stack.mli
string.ml
string.mli

Your additions to Stack and Set are well taken, and I've merged them
in the source tree.

I have reservations about String.explode and String.implode: similar
functions were in the old Caml V3.1 and even earlier in classic Lisps,
but I've always seen people abuse them for writing horribly
inefficient functions over strings, then comply that strings were
inefficient in Caml :-) So, I don't want to encourage anyone to
process strings using lists of characters, because there is almost
always a better way.

As for the classification functions in Char, I agree they can be
useful, however your implementation works only for ASCII. I believe
support for extended 8-bit character sets is necessary. However, I
don't know whether we should hard-wire the use of ISO-Latin 1
(like the Char.uppercase and Char.lowercase currently do), or build on
C's setlocale() mechanism to let the user specify a character set. I
need to think more about this issue.

Thanks for your input.

All the best,

  • Xavier Leroy

@vicuna
Copy link
Author

vicuna commented Nov 6, 2002

Comment author: administrator

Merged Set and Stack extensions on 2000-04-13. -Xavier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant