Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007445OCamlstandard librarypublic2016-12-31 19:482017-02-20 11:54
Reportermarkghayden 
Assigned To 
PrioritynormalSeverityfeatureReproducibilityalways
StatusacknowledgedResolutionopen 
PlatformOSOS Version
Product Version4.04.0 
Target VersionFixed in Version 
Summary0007445: requested Stdlib functions ({Array,String,Byte}.{is_empty, sub_equal, bounds, sub_bounds}).
DescriptionPlease consider adding the following functions.

{Array,String,Byte}.is_empty: Test for empty objects (0 length). This primitive could be compiled to more efficient assembly than calculating length and testing for 0.

val is_empty : 'a array -> bool
val is_empty : t -> bool

{Array.String,Byte}.sub_equal: Comparing sub-ranges. The compiler would have the option of optimizing these, for instance with use of unsafe primitives or calling out to C primitives to use memcmp().

val sub_equal : 'a array -> int -> 'a array -> int -> int -> bool
val sub_equal : t-> int -> t -> int -> int -> bool

[Array,String,Byte}.bounds: Test for bounds. This would be a no-op when compiled with -unsafe. This primitive can be optimized by the compiler through use of unsigned integer operations not available to ML code. Existing sodlib functions could be modified to use this. Another option may be to have the function return bool value so that user can determine action to take; with -unsafe, the function always returns [true] and so action becomes dead code.

val bounds : 'a array -> int -> unit
val bounds : t -> int -> unit

[Array.String,Byte}.sub_bounds: Test for bounds for a range, raising invalid_arg if not a valid sub-range. The test is similar to that used by blit (ofs & len). A no-op when compiled with -unsafe. Similarly to above, this can be optimized by the compiler through unsigned integer operations not available to ML. Existing stdlib functions, such as blit, could be modified to use this, allowing them to benefit from more efficient implementation.

val sub_bounds : t -> int -> int -> unit
TagsNo tags attached.
Attached Files

- Relationships

-  Notes
(0017056)
dra (developer)
2017-01-01 15:14

How much are you hoping to squeeze out of the constant-time operations {Array,String,Bytes}.length?

I'm not at all convinced by the names proposed for the other two functions(out of curiosity, have you borrowed them from another language, or just proposed them?). Perhaps is_valid_index and is_valid_range? I imagine your proposed versions returning unit are expected to raise an exception if the index/range is not valid? I'm not clear of the motivation for these two functions: why would one be trying so hard to optimise checks which are removed in unsafe mode, especially when these two checks relate to correctness properties, anyway?
(0017057)
markghayden (reporter)
2017-01-01 16:59

My understanding is that Array.length needs to be concerned about the possibility of the array being a double arrray or not. In these cases, the is_empty could eliminate that check.

For {Bytes,String}.length the length operation must get the word-length and then read the last byte of the object and subtract that from the words-size * length. An is_empty would get the word length, exit with false if not 1, and if 1 then get the next word and check the last byte. There are two opportunities for optimization: (1) often eliminate the second memory operation (and effect on the cache) and (2) strength reduction because second read is not dependent on the result of word-size.

Most of what optimizers do is squeeze performance of constant time operations. In this case, the change only requires allowing the developer to use x.is_empty rather than x.length = 0. No complex type analysis is required.

Regarding naming, the names are only my suggestions. is_valid_index and is_valid_range seem like good options for functions that return bool. Regarding the versions returning 'unit', the functions would raise an exception if a index/range is invalid.

For us, we never build/ship our software with -unsafe, since we like to benefit from safety of most of the run-time checks. For our part, we have many modules built on Array, Bytes, and String and the availability of built-in bounds checks would allow us to use unsafe_{get, set}, but control in our software whether to include bounds check on a more granular basis (by using 'if is_debug' where debug is a constant bool so the conditional is eliminated at compile time). When we do include bounds checks, we would like to know they are optimized similar to the bounds check in underlying Array,etc module.
(0017061)
dra (developer)
2017-01-01 23:28
edited on: 2017-01-01 23:29

OK, to me this is definitely two separate changes. I should have said before that I think is_empty is a good addition even without performance considerations, and I agree that they would best be added as primitives straight away (rather than simple OCaml stubs later converted to primitives). Would you be willing to propose a patch for this part?

The second part I'm still very uneasy about. Adding new functions which raise exceptions rather goes against current aims, but I'm also struggling to stomach the idea of a function whose return is short-circuited by -unsafe (i.e. the function is ignored and [true] assumed). Perhaps therefore the names should include assert (e.g. assert_valid_index and assert_valid_range) to make it clear that they can be removed and the name makes the use of Assert_failure as the exception clear.

However, I'm still unconvinced by the need for these functions (and still less of any mention of their being optimised) - perhaps someone else may like to chime in?

(0017062)
dra (developer)
2017-01-01 23:30

Oops - included in the "first part" is also the sub_equal proposed functions!
(0017063)
markghayden (reporter)
2017-01-02 00:34

Thanks for the suggestions. We'll look at proposing patches.

We have many cases in our code where we make [assert(ofs >= 0 && ofs < Array.length arr)] (and similar for string and bytes). It would be good to have these checks optimized when enabled. This needs to be a primitive because (I believe) at least the optimized index check cannot currently be coded in ML (due to lack of unsigned integers). It seems making this type of test a part of the respective modules makes sense since any use of Bytes/String/Array needs to be implemented with the bounds in mind and it is often helpful to make the test separate from the actual Bytes/String/Array API call's.

I guess the connection with '-unsafe' is not required, at least in our case. I had suggested this as a way to stay consistent with other bounds checks and how they are enabled/disabled. Then again, at least several bounds checks such as for sub/blit appear to always remain even when '-unsafe' flag is used.

I'm not it is clear what "bounds check optimizations I'm referring to. The bounds checks generated by the compiler for a safe array access ([a.(i)] involve a single unsigned integer comparison/branch. This uses the fact that negative signed integers will be large unsigned integers. The equivalent test in ML [assert(ofs >= 0 && ofs < Array.length arr)] is compiled as 2 comparison/branch. An optimized primitive saves 1 of the compare/branch. See code below & comments.

A similar optimization reduces [ofs >= 0 && len >= 0 && ofs + len < Array.length arr] to a single unsigned addition and comparison/branch. This is a reduction from 3 comparison/branch to 1. Note that {Array,Bytes,String}.{blit,sub} could benefit from this.


type t = int array ;;

let int_get0 a i =
  (* No bounds check.
   *)
  Array.unsafe_get (a:t) i
;;

let int_get1 a i =
  (* Without '-unsafe' the bounds check uses 1 unsigned integer comparison.
   *)
  (a:t).(i)
;;

let [@inline] bounds_check a i =
  (* This compiles to 2 comparisons/branches.
   *)
  assert(i >= 0 && i < Array.length (a:t))
;;

let int_get2 a i =
  bounds_check a i ;
  Array.unsafe_get (a:t) i
;;
(0017379)
frisch (developer)
2017-02-20 11:54

sub_equal: why not, but it should take an equality predicate. We could also provide optimized monomorphic versions for e.g. int or float (although for float, I guess we cannot use memcmp and get the natural comparison semantics for IEEE floats).

- Issue History
Date Modified Username Field Change
2016-12-31 19:48 markghayden New Issue
2017-01-01 15:14 dra Note Added: 0017056
2017-01-01 16:59 markghayden Note Added: 0017057
2017-01-01 23:28 dra Note Added: 0017061
2017-01-01 23:29 dra Note Edited: 0017061 View Revisions
2017-01-01 23:30 dra Note Added: 0017062
2017-01-01 23:31 dra Status new => acknowledged
2017-01-02 00:34 markghayden Note Added: 0017063
2017-02-20 11:47 frisch Summary requested Stdlib functions => requested Stdlib functions ({Array,String,Byte}.{is_empty, sub_equal, bounds, sub_bounds}).
2017-02-20 11:54 frisch Note Added: 0017379
2017-02-23 16:43 doligez Category OCaml standard library => standard library


Copyright © 2000 - 2011 MantisBT Group
Powered by Mantis Bugtracker