Re: convincing management to switch to Ocaml
 Pierre Weis
[
Home
]
[ Index:
by date

by threads
]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
[ Message by date: previous  next ] [ Message in thread: previous  next ] [ Thread: previous  next ]
Date:   (:) 
From:  Pierre Weis <Pierre.Weis@i...> 
Subject:  Re: convincing management to switch to Ocaml 
> Sure, I will try. First note that my background is > in procedural languages. So I am used to calling functions like: > > f(x,y) > > In ocaml, if I used curried functions, then > > f x+1 y > > is an error: I have to put brackets around the individual arguments > instead of the argument list. Secondly, I have a math background, So, we can consistently consider the mathematical notations as natural, and use them in Caml :) The problem in this example has nothing to do with currying, since you would the same problem writing f x+1 instead of f (x+1) Then, compare this expression with the mathematical meaning of sin x+1 versus sin (x+1) As in Caml, sin x+1 means (sin x) + 1. > where forward polish is right associative: > > f g x = f (g (x)) > > but with curried functions, > > f g x = (f g) x > > and the _other_ style I've used in math is reverse polish, which > is similar to the usual OO notation: > > x g f = x.g().f() > > and now ocaml is yet another convention. Yes, and once more the Caml convention is the mathematical one. If you read a succession of functions separated by spaces in a mathematical book, you certainly will interpret it as an error, instead of trying to interpret it as some strange stack notation! For instance sin cos tan 0 will be seen as a meaningless sequence of symbols, unless you're studying the mathematical theory of functions (namely the lambdacalculus), where f g h means f (g h). Then, since Caml is a functional language, it uses the mathematical notation for function application borrowed from the mathematical theory of functions. > As a usage example, > I often have to write: > > print_endline (x ^ (string_of_int (f y))) > > whereas in a procedural language it would be > > print_endline(x ^ string_of_int(f(y))) As a mathematician, and knowing the mathematical natural conventions of precedences between function applications and operations, you write sin x + cos x and not (sin (x)) + (cos (x)) (even if the second expression, overparenthesized, is mathematically accepted). Hence, as you simply write x + sin (cos y) instead of x + (sin (cos (y))) you write in Caml x ^ string_of_int (f y) instead of x ^ (string_of_int (f (y))) (even if the second expression, overparenthesized, is camlly accepted). > In other words, I'm used to function calls being syntactically > atomic: the arguments are bracketed and can be 'ignored' visually, > 'as if they were subscripts' As in mathematics you can use this technique, and add parens to any function calls if you want. As in sin (x) instead of sin x > > print_endline (...) > x ^ string_of_int (...) > f (..) > x > > In ocaml, the function being called is _inside_ the brackets, > first on the left, instead of labelling the 'node'. So it > has the same visual status as its arguments. Normal for a functional programming language where functions can be arguments of other functions, at any nested level. Completely normal when functions can be stored into data structures as any other values. In fact, we do want the functions to be regular objects of the langage, not just ``things'' with special status that can only be applied to something. So, you're perfectly right ``the function as the same'' blabla ``as its argument'' is just what we want in Caml! (including when ``blabla'' is visual status). > When I write ocaml, I often do the following to get brackets right: > I put the brackets in first: > > print_endline () > print_endline (x ^ ()) > print_endline (x ^ (string_of_int x)) > > because I cannot count :). This is a pain, because I have to > keep backspacing. Even worse is: I have a string, > and want to put an integer inside it: > > "xxxx" > "xx""xx" > "xx"^^"xx" > "xx"^()^"xx" > "xx"^(string_of_int x)^"xx" > > which is four editing steps, requiring 6 punctuators, > and three backspaces. If the > string was the argument to a print_endline function, I _also_ > have to put brackets around the whole string expression > where none were previously needed: > > ("xx"^(string_of_int x)^"xx") > > This is particularly bad, since an insertion reqires > nonlocal editing. Once more this is a problem you already had in mathematics when you change sin 0.5 into sin (x + 0.5) And, as in maths, where you write x + sin x + x instead of x + (sin x) + x you can write "xx" ^ string_of_int x ^ "xx" instead of "xx"^(string_of_int x)^"xx" And, as in maths, you must add parens around the whole expression to pass it as argument to a function. You write atan (x + sin x + x) and not atan x + sin x + x [...] > This problem is exacerbated by the fact that a lot of > ocaml is an expression, so you can do a lot of nesting. As in maths ? > The second problem I have is with the constructions that > are like 'prefix/infix' operators, for example > > if .. then .. else ... > try .. with ... > match .. with  .. > ..  .. > .. > > Sudies were done in the 70's showing this is not as readable as > constructions bounded lexically at both ends. > > Both Pascal and C have this fault, but only in a few constructions. > These faults were corrected in Modula: IF .. THEN .. ELSE .. ENDIF > It is a welcome relief to see while .. do .. done in ocaml: > it is not necessary to remember precedence rules that span > many lines of code. You're right. You just have to learn the rules for these constructs (more exactly for parts of them): 1) Any sequence in a arm of a conditional MUST be bracketed. 2) Any match or try in to a pattern matching clause MUST be bracketed. > Similarly, using separators like pascal's ";" is not as good > as terminators like C's ";". Clearly, these things are much harder > to edit (i.e, cut and paste doesn't work with separators, especially > ones that are visually terminators like ";"  you can write > > a > ; > b > > but it looks horrid compared to: > > a; > b > > and then > > a; > b; > is more 'symmetrical' visually. You're right, that's why YOU CAN use the ; as terminator in Caml. > For example, an expression like > > if x then y else z > > cannot be augmented to > > if x then y; y' else z > > instead, nonlocal editing is required: > > if x then begin y; y' end else z Same problem as for function arguments: if you want to be able to add stuff to a function argument, you just have to systematically add parens to any function argument. If you want to do the same for conditional, add preventive begin and end keywords to any arms of any conditional. > Similarly, nested matches/try expressions always require begin/end: > unless the nesting is right at the end. > > A general rule for a good syntax would be that augmentation > of some term of a construction only required local editing. > This means: > > terminators, not separators > mandatory begin/end keywords This cannot be done elegantly with function application, except if we use a lisplike syntax. > Of course, that would lead to a grossly OVER verbose syntax, > so it is certainly necessary to have _some_ infix operators > with precedences: a reasonable set is: > > unary + > binary */ > binary + > binary < > <= >= <> != > unary not > binary and > binary or > > plus at most two or three more. (the listed set is > _already_ 7 levels of precedence!) > > Anything more than that will conflict with more or less universal > conventions. For example, Python and C and Ocaml all have bitshift operations, > with _different_ precedences which lead to a fault in > a routine I had doing UTF8 conversions, it took an hour > to figure out that I needed more brackets. Bitwise operations in Caml are not operators with special precedences, which I consider as a problem, and that why you had a problem, since relative priority of land lor lsl and asr are all the same, which is confusing! > A good technique is probably to have a lot of nonassociating > operators, so that there is never any 'mental' ambiguity. > For example, the power operator should be unassociative: > > a ** b ** c > > should be an error. Difficult not to generalize to other operators ... > I think there is a point here that, because I know a lot > of programming languages, I get confused with the various > precedence rules: I JUST DON'T WANT TO KNOW. So I program > defensively: I put the brackets in to be sure of the semantics. > This is useful for readers who are also not experts in the > language as well. This is legal to add parens almost everywhere in Caml. If you want to program with a lot of parens, you can, and that's a safe style, although I will not recommend it. As in maths, I will recommend to know what you're writing when you are writing it. As a mathematician, you are accustomed to more baroque and complex conventions than those of Caml: prefix operators (primitive and integral of functions, and usual functions application), postfix operators (factorial, derivation), superscript (exponentiation), prefixpostfix operator (2 numbers between big parens to express binomial coefficients), division (one argument over, one argument under a small horizontal line), infix operators (+, , ...). The more interesting are f' that is completely ambiguous or the invisible operator * in xy ... When designing Caml, we tried not to be that ambiguous, and to use the conventions you already have learned in mathematics; hence the use of parens in expression, the relative precedences of + and *, etc... [...] > and the same applies in ocaml. Indeed, [] is used in ocaml > for lists as well: > > b [ a.[1] ] > > and here only the . tells that .[] is an infix/suffix operator, > whereas b [] is a function with a list argument, so the > [ in the argument is a prefix/suffix operator. :) In the same vein, would you complain about r.lab compared to r.(lab) or r(lab) or even r lab ? or also module.f compared to Module.f ? More generally, do you mean that if we change a single character in a program the meaning should not change ? Or that any notation should not be a prefix or a suffix of another one ? There are many constructs to express, so we need many notations. We try to reuse traditional notations if available, but from time to time we have to invent a new one. In this case the users need to learn it if they want to feel easy when using the language ! > Another visual problem I have is that I think symbols > (i.e. punctuation marks) should bind more strongly > than words: it looks ugly to put brackets around words: > > (fun x > print_endline ..) > > is ugly compared to: > > begin fun x > print_endline .. end You can use this form if you want it. > but the ideal would be > > fun x is print_endline .. efun This is not light enough for a language that uses so many anonymous functions. Once more generalizing it to efor, ewhile, elet, etry, ematch would lead to a painful language. [...] > Of course, there are worse languages: the lambda calculus > being a case that is only marginally better than C declarations > in the competition for all time most unreadable languages :) > [The prize winner, however, is certainly category theory :] Wao! Too bad for us to have chosen the function application convention from the lambdacalculus! [Moreover, don't forget that the Caml effort has been started after studies in category theory, witnessed in the name of the language that is an acronyme for Categorical Abstract Machine Language!] Pierre Weis INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://cristal.inria.fr/~weis/