RE: Syntax for label, NEW PROPOSAL

From: Don Syme (dsyme@microsoft.com)
Date: Wed Mar 15 2000 - 21:40:22 MET

  • Next message: John Max Skaller: "Re: Syntax for label, NEW PROPOSAL"

    I agree very much with Pierre's comments. Labels _do_ get in the way, and
    probably shouldn't be used in the standard library in 3.00.

    It seems pretty clear to me that a typical programmer only keeps an active
    vocabulary of roughly 100-200 identifiers. (I mean identifiers from
    libraries they are using but have not written themselves - I think you could
    do experiments to confirm the figures, but whatever the range we all know
    from experience that it is limited). If you accept that, then think about
    the impact that labels have on this budget. A quick grep reveals around 40
    different labels. It's clearly _much_ better to spend the budget on top
    level identifiers rather than labels, because otherwise programmers, even
    good ones, just become less effective. The very presence of labels in the
    standard library means that new users will look at them, learn them, try to
    use them, and the size of their working set of library functions will
    decrease as a result.

    For example, using the "blit" function from the new version of the array
    library module requires knowing not _one_ identifier but _six_. An existing
    user could still use the unlabelled version, but most new users will use
    labels, thinking that this is leading to better code and better programming
    (incorrectly, I believe, or at best a marginal improvement). It's like C++
    operator overloading - it looks oh so appealing, and a lot of fun for the
    language designer to put in, but it nearly always ends up a waste of time to
    use.

    That's not to say labels are inappropriate everywhere, but it does explain
    why they shouldn't be used on functions which users already use accurately,
    or where errors in use are already caught by the type checker. This is the
    fundamental test: do labels help programmers use functions with fewer errors
    at _runtime_, and is the advantage sufficient to make up for the extra
    "weight" of even having the labels around at all? Remember Caml won out as
    the best ML implementation by being Caml-Light, not
    Caml-you-must-be-aware-of-the-intricacies-of-my-features-to-learn-to-use-me
    ;-)

    Thus, I'd argue for even stronger rules than Pierre:
      - Labels are only appropriate where a significant number of users
        routinely make mistakes when using a function, and it is
        clear that adding labels would solve the problem.
      - This means no labels on functions with 1 or 2 arguments.
      - This means no labels on functions with 3 arguments unless the
        types are directly ambiguous (and probably not even then).
      - This means no labels on functions where a natural order exists for the
        arguments.
      - This also means no labels on polymorphic functions such as Hash.add (I
    think
        it would be very rare that the typechecker wouldn't spot a misuse of
    that
        function)
      - No labels inside the arguments of higher order functions. This
        will really confuse new users who try not to use labels!
        e.g.. no "acc" in the first argument of
             val fold_right: fun:('b -> acc:'a -> 'a) -> 'b array -> acc:'a ->
    'a

    And it's not always clear that labels are such a great help - even in the
    case
    of Array.blit, users may not use the labelled function
    much more accurately, given the time it takes to look up the label names,
    correct the errors in misspelling the labels, and given that there is a
    natural
    default rule in functional programming that a source operand
    come before a destination. Even worse, because the programmer has
    to remember the damn label names, there may be another 3 or 4
    library functions that they've never learnt to use at all.

    Here's a story: In 1990, a new version of the HOL theorem prover (hol90) was
    released. The re-implementation was quite good, but the implementer made a
    major mistake - he used labelled versions (actually SML records) of many,
    many functions where nothing was gained by doing so. This was a complete
    waste of time, and was a major factor that lead to the splitting of the HOL
    effort between "hol-light" and "HOL98", a split that took years to correct.
    As Pierre describes, the object system was carefully designed not to put
    people off, and if the standard libraries had been objectified then most
    existing users would not have moved to OCaml.

    Again, that's not to say I don't like labels - they are clearly useful when
    functions take many arguments that have no natural order, and will be a god
    send for some APIs. However using them prolifically in the standard library
    in this version is simply a bad idea. Remember, you can always add them to
    the standard library later, but you can't take them away!

    Cheers,
    Don

    -----Original Message-----
    From: Pierre Weis [mailto:Pierre.Weis@inria.fr]
    Sent: 15 March 2000 14:10
    To: caml-redistribution@pauillac.inria.fr
    Cc: caml-list@inria.fr
    Subject: Re: Syntax for label, NEW PROPOSAL

    [Sorry, no french version for this long message]

    Abstract:

    A long answer to Jacques's proposal. I do not discuss syntax but
    semantic issues of the label extension. My conclusion is to be very
    careful in adding labels into the standard libraries, and also state
    as a extremely desirable design guideline to keep the usage of higher
    order functions as simple as possible.

    > *** Proposal
    >
    > Objective Caml 3.00 is not yet released, and I believe we can still
    > have modifications on this point.

    Yes, you're perfectly right, we can still modify several points.
    However, I think there are many other points that are more important
    than the choice of ``%'' instead of ``:'', which is only cosmetic
    after all.

    Thus, I would prefer to discuss deeper and more semantic problems:

    -- Problem1: labels can be reserved keywords. This is questionable
    and it has been strongly criticised by some Caml users, especially when
    reading in the code the awful sequence fun:begin fun ...

    -- Problem2: labels that spread all over the standard libraries, even
    when they do not add any good. I would cite:

       * the labels completely redundant with the types
         (E.g. char:char in the type of String.contains or String.index)

       * undesired labels: in many cases I don't want to have labels just
         because I don't want to remember their names. (E.g., I very often
         mispell the label acc since I've always used accu to name an
         accumulator; furthermore, when I do not mispell this label, I feel
         acc:accu extremely verbose). Also because labels are verbose at
         application.

       * labels that prevent you to use comfortably your traditional functions.
         This is particularly evident for the List.map or List.fold_right
         higher-order functionals.

    This last point is a real problem. Compare the usual way of using
    functionals to define the sum of the elements of a list:

    $ ocaml
            Objective Caml version 2.99+10

    # let sum l = List.fold_right ( + ) l 0;;
    val sum : int list -> int = <fun>

    Clearly application is denoted in ML with only one character: a space.

    Now, consider using the so-called ``Modern'' versions of these
    functionals, obtained with the -modern option of the compiler:

    $ ocamlpedantic
            Objective Caml version 2.99+10

    # let sum l = List.fold_right ( + ) l 0;;
                                  ^^^^^
    This expression has type int -> int -> int but is here used with type 'a
    list

    Clearly, there is something wrong now! We may remark that the error
    message is not that clear, but this is a minor point, since error
    messages are never clear enough anyway!

    The real problem is that fixing the code makes no good at all to its
    readability (at least that's what I would say):

    # let sum l = List.fold_right fun:begin fun x acc:y -> x + y end acc:0;;
    val sum : 'a -> int list -> int = <fun>

    It seems that, in the ``modern'' mode, application of higher order
    functions is now denoted by a new kind of parens opening by
    ``fun:begin fun'' and ending by ``end''. This is extremely explicit
    but also a bit heavy (in my mind).

    For all these reasons, I would suggest to carefully use labels into
    the standard libraries:

    -- remove labels from higher-order functional
    -- remove redundant labels: when no ambiguity can occur you need not
       to add a label.
    -- use labels when typechecking ambiguity is evident (for instance
    when there are two or more parameters with the same type).

    Labels must enforce readability of code or help documenting the
    libraries, it should not be an extra burden to the programmer and a
    way of offuscating code.

    Evidently, as any other extension, labels must not offuscate the
    overall picture, that is they must not clobber the semantics, nor add
    extra exceptional cases to the few general rules we have for the
    syntax and semantics of Caml.

    In this respect, optional labelled arguments might also be discussed,
    particularly for the following facts:

    -- syntactically identical patterns and expressions now may have
    incompatible types:
       # let f ?style:x _ = x;;
       val f : ?style:'a -> 'b -> 'a option = <fun>

       As a pattern on the left-hand side x has type 'a, while as an
       expression on the right hand side it has type 'a option

    -- some expressions can be only written as arguments in an application
       context:
       # let f ?style:x g = ?style:x;;
                            ^
       Syntax error
       # let f ?style:x g = g ?style:x;;
       val f : ?style:'a -> (?style:'a -> 'b) -> 'b = <fun>

    -- the simple addition of a default value to an optional argument may
       trigger a typechecking error:

       # let f ?(style:x) g = g ?style:x;;
       val f : ?style:'a -> (?style:'a -> 'b) -> 'b = <fun>

       # let f ?(style:x = 1) g = g ?style:x;;
       This expression has type int but is here used with type 'a option

    Do not forget the design decision that has always been used before in
    the development of Caml: interesting but not universal extensions to
    the language must carefully be kept orthogonal to the core language
    and its libraries. This has been successfully achieved for the
    important addition of modules (that do not prevent the users from
    using the old interface-implementation view of modules) as well as for
    the objects system addition that has been also maintained orthogonal
    to the rest of the language (in particular the standard library has
    never been ``objectified''). I don't know of any reason why labels
    cannot follow the same safe guidelines.

    > Here is an alternative proposal, to use `%' in place of `:'. Labels
    > are kept as a lexical entity. This still breaks some programs, since
    > `%' was registered as infix, but this is not so bad.

    > Con:
    > * I still think that `:' looks better, particularly inside types.
    > * On my keyboard I can type in `:' without pressing shift :-)
    > * We will need some tool to convert existing code.

    I think that % should be the infix integer modulo symbol.

    > Do you think it would be better?

    No.

    > Are there people around who would rather keep `:' ?

    Yes. However this is syntax and we have to consider semantics in the
    first place.

    There are also people around that would like to keep Caml a true
    functional language, where usage of higer order functions is easy and
    natural. We have to be careful not to lose what is the actual
    strength of the language.

    -- 
    Pierre Weis
    

    INRIA, Projet Cristal, http://pauillac.inria.fr/~weis



    This archive was generated by hypermail 2b29 : Fri Mar 17 2000 - 09:58:11 MET